You will need access to an S3-compatible object storage, for example MinIO. To configure access to the object storage, use the storage-config
secret.
Alternatively, models can be stored on a Kubernetes Persistent Volume. Persistent Volume Claims can either be pre-configured in the storage-config
secret, or, the allowAnyPVC
configuration flag can be enabled, so that any PVC can be mounted dynamically at the time a predictor or inference service is deployed.
Here we show an example using an ONNX model for MNIST.
A bucket in MinIO needs to be created to copy the model into, which either requires MinIO Client or port-forwarding the minio service and logging in using the web interface.
# Install minio client
$ brew install minio/stable/mc
$ mc --help
NAME:
mc - MinIO Client for cloud storage and filesystems.
....
# test setup - mc is pre-configured with https://play.min.io, aliased as "play".
# list all buckets in play
$ mc ls play
[2021-06-10 21:04:25 EDT] 0B 2063b651-92a3-4a20-a4a5-03a96e7c5a89/
[2021-06-11 02:40:33 EDT] 0B 5ddfe44282319c500c3a4f9b/
[2021-06-11 05:15:45 EDT] 0B 6dkmmiqcdho1zoloomsj3620cocs6iij/
[2021-06-11 02:39:54 EDT] 0B 9jo5omejcyyr62iizn02ex982eapipjr/
[2021-06-11 02:33:53 EDT] 0B a-test-zip/
[2021-06-11 09:14:28 EDT] 0B aio-ato/
[2021-06-11 09:14:29 EDT] 0B aio-ato-art/
...
# add cloud storage service
$ mc alias set <ALIAS> <YOUR-S3-ENDPOINT> [YOUR-ACCESS-KEY] [YOUR-SECRET-KEY]
# for example if you installed with --quickstart
$ mc alias set myminio http://localhost:9000 EXAMPLE_ACESS_KEY example/secret/EXAMPLEKEY
Added `myminio` successfully.
# create bucket
$ mc mb myminio/models/onnx
Bucket created successfully myminio/models/onnx.
$ mc tree myminio
myminio
└─ models
└─ onnx
# copy object -- must copy into an existing bucket
$ mc cp ~/Downloads/mnist-8.onnx myminio/models/onnx
...model.lr.zip: 26.45 KiB / 26.45 KiB ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 2.74 MiB/s 0s
$ mc ls myminio/models/onnx
[2021-06-11 11:55:48 EDT] 26KiB mnist-8.onnx
Ensure there is a key defined in the common storage-config
secret corresponding to the S3-based storage instance holding your model. The value of this secret key should be JSON like the following, default_bucket
is optional.
Users can specify use of a custom certificate via the storage config certificate
parameter. The custom certificate should be in the form of an embedded Certificate Authority (CA) bundle in PEM format.
Using MinIO the JSON contents look like:
{
"type": "s3",
"access_key_id": "minioadmin",
"secret_access_key": "minioadmin/K7JTCMP/EXAMPLEKEY",
"endpoint_url": "http://127.0.0.1:9000:9000",
"default_bucket": "",
"region": "us-east"
}
Example secret key contents for GCS and Azure Blob Storage are:
gcsKey: |
{
"type": "gcs",
"private_key": "-----BEGIN PRIVATE KEY-----\nAABBCC1122----END PRIVATE KEY-----\n",
"client_email": "storage-auth@secret-12345.gserviceaccount.com",
"token_uri": "https://oauth2.googleapis.com/token"
}
azureKey: |
{
"type": "azure",
"account_name": "az-account",
"container": "az-container",
"connection_string": "DefaultEndpointsProtocol=https;AccountName=test;AccountKey=Yabc983f11822334455;EndpointSuffix=core.windows.net"
}
Remember that after updating the storage config secret, there may be a delay of up to 2 minutes until the change is picked up. You should take this into account when creating/updating InferenceService
s that use storage keys which have just been added or updated - they may fail to load otherwise.
Models can be stored on Kubernetes Persistent Volumes.
There are two ways to enable PVC support in ModelMesh:
- The Persistent Volume Claims can be added in the
storage-config
secret. This way all PVCs will be mounted to all serving runtime pods. - The
allowAnyPVC
configuration flag can be set totrue
. This way the ModelMesh controller will dynamically mount the PVC to a runtime pod at the time a predictor or inference service requiring it is being deployed.
Follow the example instructions below to create a PVC, store a model on it, and configure ModelMesh to mount the PVC to the runtime serving pods so that the model can be loaded for inferencing.
Persistent Volumes are namespace-scoped, so we have to create it in the same namespace as the ModelMesh serving deployment. We are using namespace modelmesh-serving
here.
kubectl config set-context --current --namespace=modelmesh-serving
Now we create the Persistent Volume Claim my-models-pvc
. Along with it, we deploy a pvc-access
pod in order to copy our model to the Persistent Volume later.
kubectl apply -f - <<EOF
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: "my-models-pvc"
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
name: "pvc-access"
spec:
containers:
- name: main
image: ubuntu
command: ["/bin/sh", "-ec", "sleep 10000"]
volumeMounts:
- name: "my-pvc"
mountPath: "/mnt/models"
volumes:
- name: "my-pvc"
persistentVolumeClaim:
claimName: "my-models-pvc"
EOF
For this example we are using the MNIST SVM scikit-learn model from the kserve/modelmesh-minio-examples repo.
# create a temp directory and download the scikit-learn MNIST SVM model
mkdir -p temp/sklearn && cd temp/sklearn && \
curl -sOL https://github.com/kserve/modelmesh-minio-examples/raw/main/sklearn/mnist-svm.joblib && \
cd -
# verify the sklearn model exists
ls -al temp/sklearn/
# total 680
# drwxr-xr-x 3 owner group 96 Mar 16 01:18 .
# drwxr-xr-x 9 owner group 288 Mar 16 01:18 ..
# -rw-r--r-- 1 owner group 344817 Mar 16 01:18 mnist-svm.joblib
Copy the sklearn model onto the PVC via the pvc-access
pod that we deployed alongside the my-models-pvc
.
# create a sub-folder 'sklearn' on the persistent volume
kubectl exec -it pvc-access -- mkdir -p /mnt/models/sklearn
# copy the sklearn/mnist-svm.joblib file we downloaded earlier onto the PV which is mounted to the pvc-access pod
kubectl cp temp/sklearn/mnist-svm.joblib pvc-access:/mnt/models/sklearn/mnist-svm.joblib
# verify the model exists on the PV
kubectl exec -it pvc-access -- ls -alr /mnt/models/sklearn/
# total 352
# -rw-r--r-- 1 501 staff 344817 Mar 16 08:55 mnist-svm.joblib
# drwxr-xr-x 3 nobody 4294967294 4096 Mar 16 08:55 ..
# drwxr-xr-x 2 nobody 4294967294 4096 Mar 16 08:55 .
The storage-config
secret is part of the ModelMesh Quickstart deployment. If you deployed ModelMesh without it, you can create it using the YAML spec outlined below.
To configure ModelMesh to mount the PVC to the runtime serving pods, we need to add an entry of type pvc
to the secret's stringData
. The chosen key pvc1
is of no consequence. Note that the localMinIO
and the pvc2
entries are only for illustration.
apiVersion: v1
kind: Secret
metadata:
name: storage-config
stringData:
# localMinIO: |
# {
# "type": "s3",
# "access_key_id": "AKIAIOSFODNN7EXAMPLE",
# "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
# "endpoint_url": "http://minio:9000",
# "default_bucket": "modelmesh-example-models",
# "region": "us-south"
# }
pvc1: |
{
"type": "pvc",
"name": "my-models-pvc"
}
# pvc2: |
# {
# "type": "pvc",
# "name": "some-other-pvc"
# }
After updating or creating the storage-config
secret, the modelmesh-serving
deployment will get updated and the serving runtime pods will get restarted to mount the Persistent Volumes. Depending on the number of replicas and deployed predictors, this update may take a few minutes.
As an alternative to preconfiguring all allowed PVCs in the storage-config
secret, you can set the allowAnyPVC
configuration flag to true
. With allowAnyPVC
enabled, users can deploy Predictors or InferenceServices with models stored on any PVC in the model serving namespace.
Let's update (or create) the model-serving-config
ConfigMap.
Note, if you already have a model-serving-config
ConfigMap, you might want to retain the existing config overrides. You can check your current configuration flags by running:
kubectl get cm "model-serving-config" -o jsonpath="{.data['config\.yaml']}"`
The minimal model-serving-config
for our example requires the settings allowAnyPVC
and restProxy
to be enabled:
kubectl apply -f - <<EOF
---
apiVersion: v1
kind: ConfigMap
metadata:
name: model-serving-config
data:
config.yaml: |
# check which other config overrides should be here:
# kubectl get cm "model-serving-config" -o jsonpath="{.data['config\.yaml']}"
allowAnyPVC: true
restProxy:
enabled: true
EOF
After applying the new configuration, the modelmesh-serving
deployment might get updated and the serving runtime pods may get restarted.
In order to use the model from the PVC, we need to set the storage
parameters
of the predictor
as type: pvc
and name: my-models-pvc
like this:
kubectl apply -f - <<EOF
---
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: sklearn-pvc-example
annotations:
serving.kserve.io/deploymentMode: ModelMesh
spec:
predictor:
model:
modelFormat:
name: sklearn
storage:
parameters:
type: pvc
name: my-models-pvc
path: sklearn/mnist-svm.joblib
EOF
After a few seconds, the new InferenceService sklearn-pvc-example
should be ready:
kubectl get isvc
# NAME URL READY PREV LATEST AGE
# sklearn-pvc-example grpc://modelmesh-serving.modelmesh-serving:8033 True 23s
We need to set up a port-forward to facilitate REST requests.
kubectl port-forward --address 0.0.0.0 service/modelmesh-serving 8008 &
# [1] running kubectl port-forward in the background
# Forwarding from 0.0.0.0:8008 -> 8008
With curl
we can perform an inference request to the SKLearn MNIST model. Make sure the MODEL_NAME
variable is set to the name of your InferenceService
.
MODEL_NAME="sklearn-pvc-example"
curl -X POST -k "http://localhost:8008/v2/models/${MODEL_NAME}/infer" -d '{"inputs": [{ "name": "predict", "shape": [1, 64], "datatype": "FP32", "data": [0.0, 0.0, 1.0, 11.0, 14.0, 15.0, 3.0, 0.0, 0.0, 1.0, 13.0, 16.0, 12.0, 16.0, 8.0, 0.0, 0.0, 8.0, 16.0, 4.0, 6.0, 16.0, 5.0, 0.0, 0.0, 5.0, 15.0, 11.0, 13.0, 14.0, 0.0, 0.0, 0.0, 0.0, 2.0, 12.0, 16.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 13.0, 16.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 16.0, 16.0, 16.0, 7.0, 0.0, 0.0, 0.0, 0.0, 11.0, 13.0, 12.0, 1.0, 0.0]}]}'
The response should look like the following:
{
"model_name": "sklearn-pvc-example__isvc-3d2daa3370",
"outputs": [
{"name": "predict", "datatype": "INT64", "shape": [1, 1], "data": [8]}
]
}
You can find more detailed information about running inference requests here.
To delete the resources created in this example, run the following commands:
kubectl delete isvc "sklearn-pvc-example"
kubectl delete pod "pvc-access"
kubectl delete pvc "my-models-pvc"