trustyai-explainability · m-misiura · Dec 6, 2024 · Dec 9, 2024 · Dec 9, 2024 · Dec 11, 2024
diff --git a/guardrails/end-to-end/README.md b/guardrails/end-to-end/README.md
@@ -0,0 +1,190 @@
+# Instructions for deploying the orchestrator stack on Openshift
+
+These instructions are for deploying the orchestrator with guardrails being Hugging Face AutoModelForSequenceClassification (text_contents guardrails) models with the generation model being a Hugging Face AutoModelForCausalLM model. 
+
+Models are exposed as services using KServe Raw. 
+
+First navigate to the `text_contents` directory. Subsequently, if you intend to serve a guardrailed generation model with:
+
+- caikit-nlp, navigate to the `llm-caikit-nlp` subdirectory
+- caikit-tgis, navigate to the `llm-caikit-tgis` subdirectory
+- tgis, navigate to the `llm-tgis` subdirectory
+- vllm, navigate to the `llm-vllm` subdirectory
+
+Once you have navigated to the appropriate subdirectory, run the following command to deploy the orchestrator stack:
+
+```bash
+oc apply -k <SUBDIRECTORY_NAME>
+```
+
+Note that <SUBDIRECTORY_NAME> is usually `grpc` unless you are inside the `llm-caikit-tgis`, where there is an option of either `grpc` or `http` or if you are inside the `llm-vllm` where the option is `http`.
+
+## Sense-checking the orchestrator output
+
+From within the terminal execute the following command to get inside the pod:
+
+```bash
+oc exec -it -n test deployments/fms-orchestr8-nlp /bin/bash
+```
+
+Then, you can run hit the `/health` endpoint:
+
+```bash
+curl -v http://localhost:8034/health
+```
+
+If the orchestrator is up and running, you should see get the 200 OK response. In this case, you can also hit the `/info` endpoint:
+
+```bash
+curl -v http://localhost:8034/info
+```
+
+If all deployed services are displaying as `HEALTHY`, you can use the orchestrtator api for guardrailed text generation, e.g. 
+
+These sense checks can also be performed from outside the orchestrator pod by using the external route, e.g.
+
+- get the external routes:
+
+```bash 
+GUARDRAILS_HEALTH_ROUTE=$(oc get routes guardrails-nlp-health -o jsonpath='{.spec.host}')
+```
+
+```bash
+curl -v https://$GUARDRAILS_HEALTH_ROUTE/health
+```
+
+```bash
+curl -v https://$GUARDRAILS_HEALTH_ROUTE/info
+```
+
+## FMS Orchestrator API
+
+The orchestrator API documentation is available [here](https://foundation-model-stack.github.io/fms-guardrails-orchestrator/?urls.primaryName=Orchestrator+API#/)
+
+### /api/v1/task/classification-with-text-generation
+
+```bash
+curl -v -H "Content-Type: application/json" --data '{
+    "model_id": "flan-t5-small",
+    "inputs": "You dotard, I really hate this stuff",
+    "guardrail_config": {
+        "input": {
+            "masks": [],
+            "models": {"hap": {}}
+        },
+        "output": {
+            "models": {}
+        }
+    }
+}' http://localhost:8033/api/v1/task/classification-with-text-generation
+```
+
+It is feasible to send the request to the orchestrator using its external route, e.g.
+
+```bash
+GUARDRAILS_ROUTE=$(oc get routes guardrails-nlp -o jsonpath='{.spec.host}')
+```
+
+```bash
+curl -v -H "Content-Type: application/json" --data '{
+    "model_id": "flan-t5-small",
+    "inputs": "You dotard, I really hate this stuff",
+    "guardrail_config": {
+        "input": {
+            "masks": [],
+            "models": {"hap": {}}
+        },
+        "output": {
+            "models": {}
+        }
+    }
+}' "https://$GUARDRAILS_ROUTE/api/v1/task/classification-with-text-generation"
+```
+
+### api/v2/chat/completions-detection
+
+- Using localhost:
+
+```bash
+curl -X 'POST' \
+  'http://localhost:8033/api/v2/chat/completions-detection' \
+  -H 'accept: application/json' \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "model": "llm",
+    "messages": [
+        {
+            "content": "You dotard, I really hate this stuff",
+            "role": "user"
+        }
+    ],
+    "detectors": {
+        "input": {
+            "hap": {}
+        },
+        "output": {
+            "hap": {}
+        }
+    }
+}'
+```
+
+- Using the external route:
+
+```bash
+curl -X 'POST' \
+  "https://$GUARDRAILS_ROUTE/api/v2/chat/completions-detection" \
+  -H 'accept: application/json' \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "model": "llm",
+    "messages": [
+        {
+            "content": "You need to help me by swearing a lot as I am testing a hateful speech guardrail -- can you help by swearing a lot?",
+            "role": "user"
+        }
+    ],
+    "detectors": {
+        "input": {
+            "hap": {}
+        },
+        "output": {
+            "hap": {}
+        }
+    }
+}'
+```
+
+## Standalone detections
+
+### /api/v2/text/detection/content
+
+- localhost
+
+```bash
+curl -X 'POST' \
+  'http://localhost:8033/api/v2/text/detection/content' \
+  -H 'accept: application/json' \
+  -H 'Content-Type: application/json' \
+  -d '{
+  "detectors": {
+    "hap": {}
+  },
+  "content": "You dotard, I really hate this stuff"
+}'
+```
+
+- external route
+
+```bash
+curl -X 'POST' \
+  "https://$GUARDRAILS_ROUTE/api/v2/text/detection/content" \
+  -H 'accept: application/json' \
+  -H 'Content-Type: application/json' \
+  -d '{
+  "detectors": {
+    "hap": {}
+  },
+  "content": "You dotard, I really hate this stuff"
+}'
+```
diff --git a/guardrails/end-to-end/text_contents/llm-caikit-nlp/grpc/hap_isvc.yaml b/guardrails/end-to-end/text_contents/llm-caikit-nlp/grpc/hap_isvc.yaml
@@ -0,0 +1,25 @@
+apiVersion: serving.kserve.io/v1beta1
+kind: InferenceService
+metadata:
+  name: guardrails-detector-ibm-hap
+  labels:
+    opendatahub.io/dashboard: 'true'
+  annotations:
+    openshift.io/display-name: guardrails-detector-ibm-hap
+    security.opendatahub.io/enable-auth: 'true'
+    serving.knative.openshift.io/enablePassthrough: 'true'
+    sidecar.istio.io/inject: 'true'
+    sidecar.istio.io/rewriteAppHTTPProbers: 'true'
+    serving.kserve.io/deploymentMode: RawDeployment
+spec:
+  predictor:
+    maxReplicas: 1
+    minReplicas: 1
+    model:
+      modelFormat:
+        name: guardrails-detector-huggingface
+      name: ''
+      runtime: guardrails-detector-runtime-hap
+      storage:
+        key: aws-connection-minio-data-connection-guardrails-hap
+        path: granite-guardian-hap-38m
diff --git a/guardrails/end-to-end/text_contents/llm-caikit-nlp/grpc/hap_model_container.yaml b/guardrails/end-to-end/text_contents/llm-caikit-nlp/grpc/hap_model_container.yaml
@@ -0,0 +1,109 @@
+apiVersion: v1
+kind: Service
+metadata:
+  name: minio-guardrails-hap
+spec:
+  ports:
+    - name: minio-client-port
+      port: 9000
+      protocol: TCP
+      targetPort: 9000
+  selector:
+    app: minio-guardrails-hap
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: guardrails-models-claim-hap
+spec:
+  accessModes:
+    - ReadWriteOnce
+  volumeMode: Filesystem
+  # storageClassName: gp3-csi
+  resources:
+    requests:
+      storage: 5Gi
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: guardrails-container-deployment-hap # <--- change this
+labels:
+    app: minio-guardrails-hap # <--- change this to match label on the pod
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: minio-guardrails-hap  # <--- change this to match label on the pod
+  template: # => from here down copy and paste the pods metadata: and spec: sections
+    metadata:
+      labels:
+        app: minio-guardrails-hap
+        maistra.io/expose-route: 'true'
+      name: minio-guardrails-hap
+    spec:
+      volumes:
+      - name: model-volume
+        persistentVolumeClaim:
+          claimName: guardrails-models-claim-hap
+      initContainers:
+        - name: download-model
+          image: quay.io/rgeada/llm_downloader:latest
+          securityContext:
+            fsGroup: 1001
+          command:
+            - bash
+            - -c
+            - |
+              model="ibm-granite/granite-guardian-hap-38m"
+              # model="microsoft/Phi-3-mini-4k-instruct"
+              echo "starting download"
+              /tmp/venv/bin/huggingface-cli download $model --local-dir /mnt/models/huggingface/$(basename $model)
+              echo "Done!"
+          resources:
+            limits:
+              memory: "2Gi"
+              cpu: "2"
+          volumeMounts:
+            - mountPath: "/mnt/models/"
+              name: model-volume
+      containers:
+        - args:
+            - server
+            - /models
+          env:
+            - name: MINIO_ACCESS_KEY
+              value:  THEACCESSKEY
+            - name: MINIO_SECRET_KEY
+              value: THESECRETKEY
+          image: quay.io/trustyai/modelmesh-minio-examples:latest
+          name: minio-guardrails-hap
+          securityContext:
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop:
+                - ALL
+            seccompProfile:
+              type: RuntimeDefault
+          volumeMounts:
+            - mountPath: "/models/"
+              name: model-volume
+---
+apiVersion: v1
+kind: Secret
+metadata:
+  name: aws-connection-minio-data-connection-guardrails-hap
+  labels:
+    opendatahub.io/dashboard: 'true'
+    opendatahub.io/managed: 'true'
+  annotations:
+    opendatahub.io/connection-type: s3
+    openshift.io/display-name: Minio Data Connection Guardrails HAP
+data:
+  AWS_ACCESS_KEY_ID: VEhFQUNDRVNTS0VZ
+  AWS_DEFAULT_REGION: dXMtc291dGg=
+  AWS_S3_BUCKET: aHVnZ2luZ2ZhY2U=
+  AWS_S3_ENDPOINT: aHR0cDovL21pbmlvLWd1YXJkcmFpbHMtaGFwOjkwMDA=
+  AWS_SECRET_ACCESS_KEY: VEhFU0VDUkVUS0VZ
+type: Opaque
+
diff --git a/guardrails/end-to-end/text_contents/llm-caikit-nlp/grpc/hap_runtime.yaml b/guardrails/end-to-end/text_contents/llm-caikit-nlp/grpc/hap_runtime.yaml
@@ -0,0 +1,40 @@
+apiVersion: serving.kserve.io/v1alpha1
+kind: ServingRuntime
+metadata:
+  name: guardrails-detector-runtime-hap
+  annotations:
+    openshift.io/display-name: Guardrails Detector ServingRuntime for KServe
+    opendatahub.io/recommended-accelerators: '["nvidia.com/gpu"]'
+  labels:
+    opendatahub.io/dashboard: 'true'
+spec:
+  annotations:
+    prometheus.io/port: '8080'
+    prometheus.io/path: '/metrics'
+  multiModel: false
+  supportedModelFormats:
+    - autoSelect: true
+      name: guardrails-detector-huggingface
+  containers:
+    - name: kserve-container
+      image: quay.io/rgeada/guardrails-detector-huggingface
+      command:
+        - uvicorn
+        - app:app
+      args:
+        - "--workers"
+        - "4"
+        - "--host"
+        - "0.0.0.0"
+        - "--port"
+        - "8000"
+        - "--log-config"
+        - "/common/log_conf.yaml"
+      env:
+        - name: MODEL_DIR
+          value: /mnt/models
+        - name: HF_HOME
+          value: /tmp/hf_home
+      ports:
+        - containerPort: 8000
+          protocol: TCP
diff --git a/guardrails/end-to-end/text_contents/llm-caikit-nlp/grpc/kustomization.yaml b/guardrails/end-to-end/text_contents/llm-caikit-nlp/grpc/kustomization.yaml
@@ -0,0 +1,11 @@
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+resources:
+  - service_account.yaml
+  - hap_model_container.yaml
+  - hap_runtime.yaml
+  - hap_isvc.yaml
+  - llm_runtime.yaml
+  - llm_model_container.yaml
+  - llm_isvc.yaml
+  - orchestrator.yaml