diff --git a/docs/stable/store/quickstart.md b/docs/stable/store/quickstart.md index 25c0f62..a749e09 100644 --- a/docs/stable/store/quickstart.md +++ b/docs/stable/store/quickstart.md @@ -109,7 +109,14 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ## Usage with vLLM -To use ServerlessLLM as a load format for vLLM, you need to apply our patch `serverless_llm/store/vllm_patch/sllm_load.patch` to the installed vLLM library. Therefore, please make sure you have read and followed the steps in the `vLLM Patch` section under our [installation guide](../getting_started/installation.md). +:::tip +To use ServerlessLLM as the load format for vLLM, you need to apply our patch `serverless_llm/store/vllm_patch/sllm_load.patch` to the installed vLLM library. Therefore, please ensure you have applied our `vLLM Patch` as instructed in [installation guide](../getting_started/installation.md). +```bash +VLLM_PATH=$(python -c "import vllm; import os; print(os.path.dirname(os.path.abspath(vllm.__file__)))") +patch -p2 -d $VLLM_PATH < serverless_llm/store/vllm_patch/sllm_load.patch +``` +::: + Our api aims to be compatible with the `sharded_state` load format in vLLM. Thus, due to the model modifications about the model architecture done by vLLM, the model format for vLLM is **not** the same as we used in transformers. Thus, the `ServerlessLLM format` mentioned in the subsequent sections means the format integrated with vLLM, which is different from the `ServerlessLLM format` used in the previous sections.