From effe4413af662809c909f626a126e85f2524111d Mon Sep 17 00:00:00 2001 From: Chivier Humber Date: Mon, 21 Oct 2024 16:17:56 +0000 Subject: [PATCH] Document Sync by Tina --- docs/stable/getting_started/installation.md | 44 +++++++++++++++------ docs/stable/store/quickstart.md | 19 ++++----- 2 files changed, 43 insertions(+), 20 deletions(-) diff --git a/docs/stable/getting_started/installation.md b/docs/stable/getting_started/installation.md index 5e24b01..7da4f55 100644 --- a/docs/stable/getting_started/installation.md +++ b/docs/stable/getting_started/installation.md @@ -9,35 +9,57 @@ sidebar_position: 0 - Python: 3.10 - GPU: compute capability 7.0 or higher -## Install with pip -TODO +## Installing with pip +```bash +# On the head node +conda create -n sllm python=3.10 -y +conda activate sllm +pip install serverless-llm +pip install serverless-llm-store + +# On a worker node +conda create -n sllm-worker python=3.10 -y +conda activate sllm-worker +pip install serverless-llm[worker] +pip install serverless-llm-store +``` + +:::note +If you plan to use vLLM with ServerlessLLM, you need to apply our patch to the vLLM repository. Refer to the [vLLM Patch](#vllm-patch) section for more details. +::: + -## Install from source -Install the package from source by running the following commands: +## Installing from source +To install the package from source, follow these steps: ```bash git clone https://github.com/ServerlessLLM/ServerlessLLM cd ServerlessLLM ``` ``` -# head node +# On the head node conda create -n sllm python=3.10 -y conda activate sllm pip install -e . -pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ serverless_llm_store==0.0.1.dev5 +cd sllm_store && rm -rf build +# Installing `sllm_store` from source can be slow. We recommend using pip install. +pip install . -# worker node +# On a worker node conda create -n sllm-worker python=3.10 -y conda activate sllm-worker pip install -e ".[worker]" -pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ serverless_llm_store==0.0.1.dev5 +cd sllm_store && rm -rf build +# Installing `sllm_store` from source can be slow. We recommend using pip install. +pip install . ``` # vLLM Patch -To use vLLM with ServerlessLLM, we need to apply our patch `serverless_llm/store/vllm_patch/sllm_load.patch` to the vLLM repository. Currently, the patch is only tested with vLLM version `0.5.0`. +To use vLLM with ServerlessLLM, you need to apply our patch located at `sllm_store/vllm_patch/sllm_load.patch` to the vLLM repository. to the vLLM repository. +The patch has been tested with vLLM version `0.5.0.post1`. -You may do that by running our script: +You can apply the patch by running the following script: ```bash conda activate sllm-worker -./serverless_llm/store/vllm_patch/patch.sh +./sllm_store/vllm_patch/patch.sh ``` \ No newline at end of file diff --git a/docs/stable/store/quickstart.md b/docs/stable/store/quickstart.md index 1c7d896..42fa077 100644 --- a/docs/stable/store/quickstart.md +++ b/docs/stable/store/quickstart.md @@ -26,7 +26,7 @@ conda activate sllm-store ### Install with pip ```bash -pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ serverless_llm_store==0.0.1.dev5 +pip install serverless-llm-store ``` ### Install from source @@ -34,12 +34,13 @@ pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/ ``` bash git clone git@github.com:ServerlessLLM/ServerlessLLM.git -cd ServerlessLLM/serverless_llm/store +cd ServerlessLLM/sllm_store ``` 2. Install the package from source ```bash +rm -rf build pip install . ``` @@ -55,7 +56,7 @@ ln -s /mnt/nvme/models ./models 1. Convert a model to ServerlessLLM format and save it to a local path: ```python -from serverless_llm_store.transformers import save_model +from sllm_store.transformers import save_model # Load a model from HuggingFace model hub. import torch @@ -84,7 +85,7 @@ docker run -it --rm -v $PWD/models:/app/models checkpoint_store_server ```python import time import torch -from serverless_llm_store.transformers import load_model +from sllm_store.transformers import load_model # warm up the GPU num_gpus = torch.cuda.device_count() @@ -110,19 +111,19 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ## Usage with vLLM :::tip -To use ServerlessLLM as the load format for vLLM, you need to apply our patch `serverless_llm/store/vllm_patch/sllm_load.patch` to the installed vLLM library. Therefore, please ensure you have applied our `vLLM Patch` as instructed in [installation guide](../getting_started/installation.md). +To use ServerlessLLM as the load format for vLLM, you need to apply our patch `sllm_store/vllm_patch/sllm_load.patch` to the installed vLLM library. Therefore, please ensure you have applied our `vLLM Patch` as instructed in [installation guide](../getting_started/installation.md). You may check the patch status by running the following command: ``` bash -./serverless_llm/store/vllm_patch/check_patch.sh +./sllm_store/vllm_patch/check_patch.sh ``` If the patch is not applied, you can apply it by running the following command: ```bash -./serverless_llm/store/vllm_patch/patch.sh +./sllm_store/vllm_patch/patch.sh ``` To remove the applied patch, you can run the following command: ```bash -./serverless_llm/store/vllm_patch/remove_patch.sh +./sllm_store/vllm_patch/remove_patch.sh ``` ::: @@ -219,7 +220,7 @@ downloader = VllmModelDownloader() downloader.download_vllm_model("facebook/opt-1.3b", "float16", 1) ``` -After downloading the model, you can launch the checkpoint store server and load the model in vLLM through `serverless_llm` load format. +After downloading the model, you can launch the checkpoint store server and load the model in vLLM through `sllm` load format. 2. Launch the checkpoint store server in a separate process: ```bash