Skip to content

Commit

Permalink
Document Sync by Tina
Browse files Browse the repository at this point in the history
  • Loading branch information
Chivier committed Jul 30, 2024
1 parent 52c2382 commit 0f6e3cd
Show file tree
Hide file tree
Showing 3 changed files with 189 additions and 6 deletions.
181 changes: 181 additions & 0 deletions docs/stable/cli/sllm_cli_doc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
## ServerlessLLM CLI Documentation

### Overview
`sllm-cli` is a command-line interface (CLI) tool designed for managing and interacting with ServerlessLLM models. This document provides an overview of the available commands and their usage.

### Getting Started

Before using the `sllm-cli` commands, you need to start the ServerlessLLM cluster. Follow the guides below to set up your cluster:

- [Installation Guide](../getting_started/installation.md)
- [Docker Quickstart Guide](../getting_started/docker_quickstart.md)
- [Quickstart Guide](../getting_started/quickstart.md)

After setting up the ServerlessLLM cluster, you can use the commands listed below to manage and interact with your models.

### Example Workflow

1. **Deploy a Model**
> Deploy a model using the model name, which must be a huggingface pretrained model name. i.e. "facebook/opt-1.3b" instead of "opt-1.3b".
```bash
sllm-cli deploy --model facebook/opt-1.3b
```

2. **Generate Output**
```bash
echo '{
"model": "facebook/opt-1.3b",
"messages": [
{
"role": "user",
"content": "Please introduce yourself."
}
],
"temperature": 0.7,
"max_tokens": 50
}' > input.json
sllm-cli generate input.json
```

3. **Delete a Model**
```bash
sllm-cli delete facebook/opt-1.3b
```

### sllm-cli deploy
Deploy a model using a configuration file or model name.

##### Usage
```bash
sllm-cli deploy [OPTIONS]
```

##### Options
- `--model <model_name>`
- Model name to deploy with default configuration. The model name must be a huggingface pretrained model name. You can find the list of available models [here](https://huggingface.co/models).

- `--config <config_path>`
- Path to the JSON configuration file.

##### Example
```bash
sllm-cli deploy --model facebook/opt-1.3b
sllm-cli deploy --config /path/to/config.json
```

##### Example Configuration File (`config.json`)
```json
{
"model": "facebook/opt-1.3b",
"backend": "transformers",
"num_gpus": 1,
"auto_scaling_config": {
"metric": "concurrency",
"target": 1,
"min_instances": 0,
"max_instances": 10
},
"backend_config": {
"pretrained_model_name_or_path": "facebook/opt-1.3b",
"device_map": "auto",
"torch_dtype": "float16"
}
}
```

### sllm-cli delete
Delete deployed models by name.

##### Usage
```bash
sllm-cli delete [MODELS]
```

##### Arguments
- `MODELS`
- Space-separated list of model names to delete.

##### Example
```bash
sllm-cli delete facebook/opt-1.3b facebook/opt-2.7b meta/llama2
```

### sllm-cli generate
Generate outputs using the deployed model.

##### Usage
```bash
sllm-cli generate [OPTIONS] <input_path>
```

##### Options
- `-t`, `--threads <num_threads>`
- Number of parallel generation processes. Default is 1.

##### Arguments
- `input_path`
- Path to the JSON input file.

##### Example
```bash
sllm-cli generate --threads 4 /path/to/request.json
```

##### Example Request File (`request.json`)
```json
{
"model": "facebook/opt-1.3b",
"messages": [
{
"role": "user",
"content": "Please introduce yourself."
}
],
"temperature": 0.3,
"max_tokens": 50
}
```

### sllm-cli replay
Replay requests based on workload and dataset.

##### Usage
```bash
sllm-cli replay [OPTIONS]
```

##### Options
- `--workload <workload_path>`
- Path to the JSON workload file.

- `--dataset <dataset_path>`
- Path to the JSON dataset file.

- `--output <output_path>`
- Path to the output JSON file for latency results. Default is `latency_results.json`.

##### Example
```bash
sllm-cli replay --workload /path/to/workload.json --dataset /path/to/dataset.json --output /path/to/output.json
```

#### sllm-cli update
Update a deployed model using a configuration file or model name.

##### Usage
```bash
sllm-cli update [OPTIONS]
```

##### Options
- `--model <model_name>`
- Model name to update with default configuration.

- `--config <config_path>`
- Path to the JSON configuration file.

##### Example
```bash
sllm-cli update --model facebook/opt-1.3b
sllm-cli update --config /path/to/config.json
```
6 changes: 3 additions & 3 deletions docs/stable/getting_started/docker_quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ export LLM_SERVER_URL=http://localhost:8343/
Deploy a model to the ServerlessLLM server using the `sllm-cli`:

```bash
sllm-cli deploy --model facebook/opt-2.7b
sllm-cli deploy --model facebook/opt-1.3b
```
> Note: This command will spend some time downloading the model from the Hugging Face Model Hub.
> You can use any model from the [Hugging Face Model Hub](https://huggingface.co/models) by specifying the model name in the `--model` argument.
Expand All @@ -120,7 +120,7 @@ Now, you can query the model by any OpenAI API client. For example, you can use
curl http://localhost:8343/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-2.7b",
"model": "facebook/opt-1.3b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is your name?"}
Expand All @@ -131,7 +131,7 @@ curl http://localhost:8343/v1/chat/completions \
Expected output:

```plaintext
{"id":"chatcmpl-8b4773e9-a98b-41db-8163-018ed3dc65e2","object":"chat.completion","created":1720183759,"model":"facebook/opt-2.7b","choices":[{"index":0,"message":{"role":"assistant","content":"system: You are a helpful assistant.\nuser: What is your name?\nsystem: I am a helpful assistant.\n"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":26,"total_tokens":42}}%
{"id":"chatcmpl-8b4773e9-a98b-41db-8163-018ed3dc65e2","object":"chat.completion","created":1720183759,"model":"facebook/opt-1.3b","choices":[{"index":0,"message":{"role":"assistant","content":"system: You are a helpful assistant.\nuser: What is your name?\nsystem: I am a helpful assistant.\n"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":26,"total_tokens":42}}%
```

### Deleting a Model
Expand Down
8 changes: 5 additions & 3 deletions docs/stable/getting_started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,14 @@ ray start --address=localhost:6379 --num-cpus=4 --num-gpus=2 \

Now, let’s start ServerlessLLM.

First, start ServerlessLLM Serve (i.e., `sllm-serve`)
First, in another new terminal, start ServerlessLLM Serve (i.e., `sllm-serve`)

```bash
conda activate sllm
sllm-serve start
```

Next start ServerlessLLM Store server. This server will use `./models` as the storage path by default.
Next, in another new terminal, start ServerlessLLM Store server. This server will use `./models` as the storage path by default.

```bash
conda activate sllm
Expand All @@ -41,7 +41,9 @@ sllm-store-server

Everything is set!

Next, let's deploy a model to the ServerlessLLM server. You can deploy a model by running the following command:
Now you have opened 4 terminals: started a local ray cluster(head node and worker node), started the ServerlessLLM Serve, and started the ServerlessLLM Store server.

Next, open another new terminal, let's deploy a model to the ServerlessLLM server. You can deploy a model by running the following command:

```bash
conda activate sllm
Expand Down

0 comments on commit 0f6e3cd

Please sign in to comment.