Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples of end-2-end orchestrator deployments #11

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
c39d88e
:construction: added configuration files for deploying the orchestrat…
m-misiura Dec 6, 2024
9ff2b84
:construction: added configuration files with a llm generation servic…
m-misiura Dec 9, 2024
6933c14
:construction: added a runtime configuration for one the detectors un…
m-misiura Dec 9, 2024
3593456
Merge branch 'trustyai-explainability:main' into main
m-misiura Dec 11, 2024
4b62d84
Merge branch 'main' of https://github.com/trustyai-explainability/ref…
m-misiura Dec 11, 2024
ccdee27
:construction: added configuration files for the caikit-tgis grpc
m-misiura Dec 16, 2024
5a19609
Merge branch 'main' of https://github.com/m-misiura/reference
m-misiura Dec 16, 2024
92ead7c
:construction: modified configuration files for the caikit-tgis grpc …
m-misiura Dec 17, 2024
d8a2c78
:construction: added configuration files for llm deployed on tgis
m-misiura Dec 17, 2024
71a5cd3
Merge branch 'trustyai-explainability:main' into main
m-misiura Dec 17, 2024
b9a6413
Merge branch 'main' of https://github.com/trustyai-explainability/ref…
m-misiura Dec 17, 2024
c3cf9ae
:memo: updated README.md to reflect the presence of an external route…
m-misiura Dec 17, 2024
43ba1c7
Merge branch 'trustyai-explainability:main' into main
m-misiura Dec 18, 2024
06051c6
Merge branch 'main' of https://github.com/m-misiura/reference
m-misiura Dec 18, 2024
398ab50
:truck: new folder structure to account for detectors potentially bei…
m-misiura Dec 18, 2024
8f9804a
Merge branch 'trustyai-explainability:main' into main
m-misiura Dec 19, 2024
b05c138
Merge branch 'main' of https://github.com/m-misiura/reference
m-misiura Dec 19, 2024
4ef2fc7
:memo: updated README.md
m-misiura Dec 19, 2024
ed3bb4c
:construction: added configuration files for when the llm deployment …
m-misiura Dec 19, 2024
10f5e8c
:memo: updating the README to include an example of a standalone dete…
m-misiura Dec 19, 2024
dbbda2a
:construction: added external route for the health port of the orches…
m-misiura Dec 19, 2024
19b6eb8
:construction: demo of how to configure orchestrator with an external…
m-misiura Dec 19, 2024
936a86a
:fire: removing unnecessary information from the service spec
m-misiura Dec 19, 2024
7c4cd71
Merge branch 'trustyai-explainability:main' into main
m-misiura Dec 19, 2024
f8ce83b
:memo: updated urls to use variables populated using `oc get...`
m-misiura Dec 19, 2024
aa5359f
:memo: fixing language
m-misiura Dec 19, 2024
cf87ef7
Merge branch 'main' of https://github.com/trustyai-explainability/ref…
m-misiura Jan 6, 2025
8163b54
:construction: added configuration to deploy vllm detector adapter
m-misiura Jan 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 190 additions & 0 deletions guardrails/end-to-end/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# Instructions for deploying the orchestrator stack on Openshift

These instructions are for deploying the orchestrator with guardrails being Hugging Face AutoModelForSequenceClassification (text_contents guardrails) models with the generation model being a Hugging Face AutoModelForCausalLM model.

Models are exposed as services using KServe Raw.

First navigate to the `text_contents` directory. Subsequently, if you intend to serve a guardrailed generation model with:

- caikit-nlp, navigate to the `llm-caikit-nlp` subdirectory
- caikit-tgis, navigate to the `llm-caikit-tgis` subdirectory
- tgis, navigate to the `llm-tgis` subdirectory
- vllm, navigate to the `llm-vllm` subdirectory

Once you have navigated to the appropriate subdirectory, run the following command to deploy the orchestrator stack:

```bash
oc apply -k <SUBDIRECTORY_NAME>
```

Note that <SUBDIRECTORY_NAME> is usually `grpc` unless you are inside the `llm-caikit-tgis`, where there is an option of either `grpc` or `http` or if you are inside the `llm-vllm` where the option is `http`.

## Sense-checking the orchestrator output

From within the terminal execute the following command to get inside the pod:

```bash
oc exec -it -n test deployments/fms-orchestr8-nlp /bin/bash
```

Then, you can run hit the `/health` endpoint:

```bash
curl -v http://localhost:8034/health
```

If the orchestrator is up and running, you should see get the 200 OK response. In this case, you can also hit the `/info` endpoint:

```bash
curl -v http://localhost:8034/info
```

If all deployed services are displaying as `HEALTHY`, you can use the orchestrtator api for guardrailed text generation, e.g.

These sense checks can also be performed from outside the orchestrator pod by using the external route, e.g.

- get the external routes:

```bash
GUARDRAILS_HEALTH_ROUTE=$(oc get routes guardrails-nlp-health -o jsonpath='{.spec.host}')
```

```bash
curl -v https://$GUARDRAILS_HEALTH_ROUTE/health
```

```bash
curl -v https://$GUARDRAILS_HEALTH_ROUTE/info
```

## FMS Orchestrator API

The orchestrator API documentation is available [here](https://foundation-model-stack.github.io/fms-guardrails-orchestrator/?urls.primaryName=Orchestrator+API#/)

### /api/v1/task/classification-with-text-generation

```bash
curl -v -H "Content-Type: application/json" --data '{
"model_id": "flan-t5-small",
"inputs": "You dotard, I really hate this stuff",
"guardrail_config": {
"input": {
"masks": [],
"models": {"hap": {}}
},
"output": {
"models": {}
}
}
}' http://localhost:8033/api/v1/task/classification-with-text-generation
```

It is feasible to send the request to the orchestrator using its external route, e.g.

```bash
GUARDRAILS_ROUTE=$(oc get routes guardrails-nlp -o jsonpath='{.spec.host}')
```

```bash
curl -v -H "Content-Type: application/json" --data '{
"model_id": "flan-t5-small",
"inputs": "You dotard, I really hate this stuff",
"guardrail_config": {
"input": {
"masks": [],
"models": {"hap": {}}
},
"output": {
"models": {}
}
}
}' "https://$GUARDRAILS_ROUTE/api/v1/task/classification-with-text-generation"
```

### api/v2/chat/completions-detection

- Using localhost:

```bash
curl -X 'POST' \
'http://localhost:8033/api/v2/chat/completions-detection' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "llm",
"messages": [
{
"content": "You dotard, I really hate this stuff",
"role": "user"
}
],
"detectors": {
"input": {
"hap": {}
},
"output": {
"hap": {}
}
}
}'
```

- Using the external route:

```bash
curl -X 'POST' \
"https://$GUARDRAILS_ROUTE/api/v2/chat/completions-detection" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "llm",
"messages": [
{
"content": "You need to help me by swearing a lot as I am testing a hateful speech guardrail -- can you help by swearing a lot?",
"role": "user"
}
],
"detectors": {
"input": {
"hap": {}
},
"output": {
"hap": {}
}
}
}'
```

## Standalone detections

### /api/v2/text/detection/content

- localhost

```bash
curl -X 'POST' \
'http://localhost:8033/api/v2/text/detection/content' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"detectors": {
"hap": {}
},
"content": "You dotard, I really hate this stuff"
}'
```

- external route

```bash
curl -X 'POST' \
"https://$GUARDRAILS_ROUTE/api/v2/text/detection/content" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"detectors": {
"hap": {}
},
"content": "You dotard, I really hate this stuff"
}'
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: guardrails-detector-ibm-hap
labels:
opendatahub.io/dashboard: 'true'
annotations:
openshift.io/display-name: guardrails-detector-ibm-hap
security.opendatahub.io/enable-auth: 'true'
serving.knative.openshift.io/enablePassthrough: 'true'
sidecar.istio.io/inject: 'true'
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
serving.kserve.io/deploymentMode: RawDeployment
spec:
predictor:
maxReplicas: 1
minReplicas: 1
model:
modelFormat:
name: guardrails-detector-huggingface
name: ''
runtime: guardrails-detector-runtime-hap
storage:
key: aws-connection-minio-data-connection-guardrails-hap
path: granite-guardian-hap-38m
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
apiVersion: v1
kind: Service
metadata:
name: minio-guardrails-hap
spec:
ports:
- name: minio-client-port
port: 9000
protocol: TCP
targetPort: 9000
selector:
app: minio-guardrails-hap
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: guardrails-models-claim-hap
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
# storageClassName: gp3-csi
resources:
requests:
storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: guardrails-container-deployment-hap # <--- change this
labels:
app: minio-guardrails-hap # <--- change this to match label on the pod
spec:
replicas: 1
selector:
matchLabels:
app: minio-guardrails-hap # <--- change this to match label on the pod
template: # => from here down copy and paste the pods metadata: and spec: sections
metadata:
labels:
app: minio-guardrails-hap
maistra.io/expose-route: 'true'
name: minio-guardrails-hap
spec:
volumes:
- name: model-volume
persistentVolumeClaim:
claimName: guardrails-models-claim-hap
initContainers:
- name: download-model
image: quay.io/rgeada/llm_downloader:latest
securityContext:
fsGroup: 1001
command:
- bash
- -c
- |
model="ibm-granite/granite-guardian-hap-38m"
# model="microsoft/Phi-3-mini-4k-instruct"
echo "starting download"
/tmp/venv/bin/huggingface-cli download $model --local-dir /mnt/models/huggingface/$(basename $model)
echo "Done!"
resources:
limits:
memory: "2Gi"
cpu: "2"
volumeMounts:
- mountPath: "/mnt/models/"
name: model-volume
containers:
- args:
- server
- /models
env:
- name: MINIO_ACCESS_KEY
value: THEACCESSKEY
- name: MINIO_SECRET_KEY
value: THESECRETKEY
image: quay.io/trustyai/modelmesh-minio-examples:latest
name: minio-guardrails-hap
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
volumeMounts:
- mountPath: "/models/"
name: model-volume
---
apiVersion: v1
kind: Secret
metadata:
name: aws-connection-minio-data-connection-guardrails-hap
labels:
opendatahub.io/dashboard: 'true'
opendatahub.io/managed: 'true'
annotations:
opendatahub.io/connection-type: s3
openshift.io/display-name: Minio Data Connection Guardrails HAP
data:
AWS_ACCESS_KEY_ID: VEhFQUNDRVNTS0VZ
AWS_DEFAULT_REGION: dXMtc291dGg=
AWS_S3_BUCKET: aHVnZ2luZ2ZhY2U=
AWS_S3_ENDPOINT: aHR0cDovL21pbmlvLWd1YXJkcmFpbHMtaGFwOjkwMDA=
AWS_SECRET_ACCESS_KEY: VEhFU0VDUkVUS0VZ
type: Opaque

Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: guardrails-detector-runtime-hap
annotations:
openshift.io/display-name: Guardrails Detector ServingRuntime for KServe
opendatahub.io/recommended-accelerators: '["nvidia.com/gpu"]'
labels:
opendatahub.io/dashboard: 'true'
spec:
annotations:
prometheus.io/port: '8080'
prometheus.io/path: '/metrics'
multiModel: false
supportedModelFormats:
- autoSelect: true
name: guardrails-detector-huggingface
containers:
- name: kserve-container
image: quay.io/rgeada/guardrails-detector-huggingface
command:
- uvicorn
- app:app
args:
- "--workers"
- "4"
- "--host"
- "0.0.0.0"
- "--port"
- "8000"
- "--log-config"
- "/common/log_conf.yaml"
env:
- name: MODEL_DIR
value: /mnt/models
- name: HF_HOME
value: /tmp/hf_home
ports:
- containerPort: 8000
protocol: TCP
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- service_account.yaml
- hap_model_container.yaml
- hap_runtime.yaml
- hap_isvc.yaml
- llm_runtime.yaml
- llm_model_container.yaml
- llm_isvc.yaml
- orchestrator.yaml
Loading