Skip to content

Commit

Permalink
Document Sync by Tina
Browse files Browse the repository at this point in the history
  • Loading branch information
Chivier committed Oct 21, 2024
1 parent 75e30eb commit effe441
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 20 deletions.
44 changes: 33 additions & 11 deletions docs/stable/getting_started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,35 +9,57 @@ sidebar_position: 0
- Python: 3.10
- GPU: compute capability 7.0 or higher

## Install with pip
TODO
## Installing with pip
```bash
# On the head node
conda create -n sllm python=3.10 -y
conda activate sllm
pip install serverless-llm
pip install serverless-llm-store

# On a worker node
conda create -n sllm-worker python=3.10 -y
conda activate sllm-worker
pip install serverless-llm[worker]
pip install serverless-llm-store
```

:::note
If you plan to use vLLM with ServerlessLLM, you need to apply our patch to the vLLM repository. Refer to the [vLLM Patch](#vllm-patch) section for more details.
:::


## Install from source
Install the package from source by running the following commands:
## Installing from source
To install the package from source, follow these steps:
```bash
git clone https://github.com/ServerlessLLM/ServerlessLLM
cd ServerlessLLM
```

```
# head node
# On the head node
conda create -n sllm python=3.10 -y
conda activate sllm
pip install -e .
pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ serverless_llm_store==0.0.1.dev5
cd sllm_store && rm -rf build
# Installing `sllm_store` from source can be slow. We recommend using pip install.
pip install .
# worker node
# On a worker node
conda create -n sllm-worker python=3.10 -y
conda activate sllm-worker
pip install -e ".[worker]"
pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ serverless_llm_store==0.0.1.dev5
cd sllm_store && rm -rf build
# Installing `sllm_store` from source can be slow. We recommend using pip install.
pip install .
```

# vLLM Patch
To use vLLM with ServerlessLLM, we need to apply our patch `serverless_llm/store/vllm_patch/sllm_load.patch` to the vLLM repository. Currently, the patch is only tested with vLLM version `0.5.0`.
To use vLLM with ServerlessLLM, you need to apply our patch located at `sllm_store/vllm_patch/sllm_load.patch` to the vLLM repository. to the vLLM repository.
The patch has been tested with vLLM version `0.5.0.post1`.

You may do that by running our script:
You can apply the patch by running the following script:
```bash
conda activate sllm-worker
./serverless_llm/store/vllm_patch/patch.sh
./sllm_store/vllm_patch/patch.sh
```
19 changes: 10 additions & 9 deletions docs/stable/store/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,20 +26,21 @@ conda activate sllm-store

### Install with pip
```bash
pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ serverless_llm_store==0.0.1.dev5
pip install serverless-llm-store
```

### Install from source
1. Clone the repository and enter the `store` directory

``` bash
git clone git@github.com:ServerlessLLM/ServerlessLLM.git
cd ServerlessLLM/serverless_llm/store
cd ServerlessLLM/sllm_store
```

2. Install the package from source

```bash
rm -rf build
pip install .
```

Expand All @@ -55,7 +56,7 @@ ln -s /mnt/nvme/models ./models

1. Convert a model to ServerlessLLM format and save it to a local path:
```python
from serverless_llm_store.transformers import save_model
from sllm_store.transformers import save_model

# Load a model from HuggingFace model hub.
import torch
Expand Down Expand Up @@ -84,7 +85,7 @@ docker run -it --rm -v $PWD/models:/app/models checkpoint_store_server
```python
import time
import torch
from serverless_llm_store.transformers import load_model
from sllm_store.transformers import load_model

# warm up the GPU
num_gpus = torch.cuda.device_count()
Expand All @@ -110,19 +111,19 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
## Usage with vLLM

:::tip
To use ServerlessLLM as the load format for vLLM, you need to apply our patch `serverless_llm/store/vllm_patch/sllm_load.patch` to the installed vLLM library. Therefore, please ensure you have applied our `vLLM Patch` as instructed in [installation guide](../getting_started/installation.md).
To use ServerlessLLM as the load format for vLLM, you need to apply our patch `sllm_store/vllm_patch/sllm_load.patch` to the installed vLLM library. Therefore, please ensure you have applied our `vLLM Patch` as instructed in [installation guide](../getting_started/installation.md).

You may check the patch status by running the following command:
``` bash
./serverless_llm/store/vllm_patch/check_patch.sh
./sllm_store/vllm_patch/check_patch.sh
```
If the patch is not applied, you can apply it by running the following command:
```bash
./serverless_llm/store/vllm_patch/patch.sh
./sllm_store/vllm_patch/patch.sh
```
To remove the applied patch, you can run the following command:
```bash
./serverless_llm/store/vllm_patch/remove_patch.sh
./sllm_store/vllm_patch/remove_patch.sh
```
:::

Expand Down Expand Up @@ -219,7 +220,7 @@ downloader = VllmModelDownloader()
downloader.download_vllm_model("facebook/opt-1.3b", "float16", 1)
```

After downloading the model, you can launch the checkpoint store server and load the model in vLLM through `serverless_llm` load format.
After downloading the model, you can launch the checkpoint store server and load the model in vLLM through `sllm` load format.

2. Launch the checkpoint store server in a separate process:
```bash
Expand Down

0 comments on commit effe441

Please sign in to comment.