This operator is the primary operator for Open Data Hub. It is responsible for enabling Data science applications like
Jupyter Notebooks, Modelmesh serving, Datascience pipelines etc. The operator makes use of DataScienceCluster
CRD to deploy
and configure these applications.
- Usage
- Developer Guide
- Pre-requisites
- Download manifests
- Structure of
COMPONENT_MANIFESTS
- Workflow
- Local Storage
- Adding New Components
- Customizing Manifests Source
- Build Image
- Deployment
- Test with customized manifests
- Update API docs
- Enabled logging
- Example DSCInitialization
- Example DataScienceCluster
- Run functional Tests
- Run e2e Tests
- API Overview
- Component Integration
- Troubleshooting
- Upgrade testing
If single model serving configuration
is used or if Kserve
component is used then please make sure to install the following operators before proceeding to create a DSCI and DSC instances.
Additionally installing Authorino operator
& Service Mesh operator
enhances user-experience by providing a single sign on experience.
-
The latest version of operator can be installed from the
community-operators
catalog onOperatorHub
.Please note that the latest releases are made in the
Fast
channel. -
It can also be build and installed from source manually, see the Developer guide for further instructions.
-
Subscribe to operator by creating following subscription
cat <<EOF | oc create -f - apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: opendatahub-operator namespace: openshift-operators spec: channel: fast name: opendatahub-operator source: community-operators sourceNamespace: openshift-marketplace EOF
-
Create DSCInitialization CR manually. You can also use operator to create default DSCI CR by removing env variable DISABLE_DSC_CONFIG from CSV or changing the value to "false", followed by restarting the operator pod.
-
Create DataScienceCluster CR to enable components
-
- Go version go1.22
- operator-sdk version can be updated to v1.31.1
The get_all_manifests.sh script facilitates the process of fetching manifests from remote git repositories. It is configured to work with a predefined map of components and their corresponding manifest locations.
Each component is associated with its manifest location in the COMPONENT_MANIFESTS
map. The key is the component's name, and the value is its location, formatted as <repo-org>:<repo-name>:<branch-name>:<source-folder>:<target-folder>
- The script clones the remote repository
<repo-org>/<repo-name>
from the specified<branch-name>
. - It then copies the content from the relative path
<source-folder>
to the localopt/manifests/<target-folder>
folder.
The script utilizes a local, empty folder named opt/manifests
to host all required manifests, sourced directly from each component’s source repository.
To include a new component in the list of manifest repositories, simply extend the COMPONENT_MANIFESTS
map with a new entry, as shown below:
declare -A COMPONENT_MANIFESTS=(
// existing components ...
["new-component"]="<repo-org>:<repo-name>:<branch-name>:<source-folder>:<target-folder>"
)
You have the flexibility to change the source of the manifests. Invoke the get_all_manifests.sh
script with specific flags, as illustrated below:
./get_all_manifests.sh --odh-dashboard="maistra:odh-dashboard:test-manifests:manifests:odh-dashboard"
If the flag name matches components key defined in COMPONENT_MANIFESTS
it will overwrite its location, otherwise the command will fail.
make get-manifests
This first cleanup your local opt/manifests
folder.
Ensure back up before run this command if you have local changes of manifests want to reuse later.
make image-build
By default, building an image without any local changes(as a clean build) This is what the production build system is doing.
In order to build an image with local opt/manifests
folder set USE_LOCAL
make variable to true
e.g make image-build USE_LOCAL=true"
-
Custom operator image can be built using your local repository
make image IMG=quay.io/<username>/opendatahub-operator:<custom-tag>
The default image used is
quay.io/opendatahub/opendatahub-operator:dev-0.0.1
when not supply argument formake image
-
Once the image is created, the operator can be deployed either directly, or through OLM. For each deployment method a kubeconfig should be exported
export KUBECONFIG=<path to kubeconfig>
Deploying operator locally
-
Define operator namespace
export OPERATOR_NAMESPACE=<namespace-to-install-operator>
-
Deploy the created image in your cluster using following command:
make deploy IMG=quay.io/<username>/opendatahub-operator:<custom-tag> OPERATOR_NAMESPACE=<namespace-to-install-operator>
-
To remove resources created during installation use:
make undeploy
Deploying operator using OLM
-
To create a new bundle in defined operator namespace, run following command:
export OPERATOR_NAMESPACE=<namespace-to-install-operator> make bundle
Note : Skip the above step if you want to run the existing operator bundle.
-
Build Bundle Image:
make bundle-build bundle-push BUNDLE_IMG=quay.io/<username>/opendatahub-operator-bundle:<VERSION>
-
Run the Bundle on a cluster:
operator-sdk run bundle quay.io/<username>/opendatahub-operator-bundle:<VERSION> --namespace $OPERATOR_NAMESPACE --decompression-image quay.io/project-codeflare/busybox:1.36
There are 2 ways to test your changes with modification:
-
Each component in the
DataScienceCluster
CR hasdevFlags.manifests
field, which can be used to pull down the manifests from the remote git repos of the respective components. By using this method, it overwrites manifests and creates customized resources for the respective components. -
[Under implementation] build operator image with local manifests.
Whenever a new api is added or a new field is added to the CRD, please make sure to run the command:
make api-docs
This will ensure that the doc for the apis are updated accordingly.
Global logger configuration can be changed with an environemnt variable ZAP_LOG_LEVEL
or a command line switch --log-mode <mode>
for example from CSV.
Command line switch has higher priority.
Valid values for <mode>
: "" (as default) || prod || production || devel || development.
Verbosity level is INFO. To fine tune zap backend standard operator sdk zap switches can be used.
Log level can be changed by DSCI devFlags during runtime by setting
.spec.devFlags.logLevel. It accepts the same values as --zap-log-level
command line switch. See example :
apiVersion: dscinitialization.opendatahub.io/v1
kind: DSCInitialization
metadata:
name: default-dsci
spec:
devFlags:
logLevel: debug
...
logmode | stacktrace level | verbosity | Output | Comments |
---|---|---|---|---|
devel | WARN | INFO | Console | lowest level, using epoch time |
development | WARN | INFO | Console | same as devel |
"" | ERROR | INFO | JSON | default option |
prod | ERROR | INFO | JSON | highest level, using human readable timestamp |
production | ERROR | INFO | JSON | same as prod |
Below is the default DSCI CR config
kind: DSCInitialization
apiVersion: dscinitialization.opendatahub.io/v1
metadata:
name: default-dsci
spec:
applicationsNamespace: opendatahub
monitoring:
managementState: Managed
namespace: opendatahub
serviceMesh:
controlPlane:
metricsCollection: Istio
name: data-science-smcp
namespace: istio-system
managementState: Managed
trustedCABundle:
customCABundle: ''
managementState: Managed
Apply this example with modification for your usage.
When the operator is installed successfully in the cluster, a user can create a DataScienceCluster
CR to enable ODH
components. At a given time, ODH supports only one instance of the CR, which can be updated to get custom list of components.
- Enable all components
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
name: default-dsc
spec:
components:
codeflare:
managementState: Managed
dashboard:
managementState: Managed
datasciencepipelines:
managementState: Managed
kserve:
managementState: Managed
nim:
managementState: Managed
serving:
ingressGateway:
certificate:
type: OpenshiftDefaultIngress
managementState: Managed
name: knative-serving
kueue:
managementState: Managed
modelmeshserving:
managementState: Managed
modelregistry:
managementState: Managed
registriesNamespace: "rhoai-model-registries"
ray:
managementState: Managed
trainingoperator:
managementState: Managed
trustyai:
managementState: Managed
workbenches:
managementState: Managed
- Enable only Dashboard and Workbenches
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
name: default-dsc
spec:
components:
dashboard:
managementState: Managed
workbenches:
managementState: Managed
Note: Default value for managementState in component is false
.
The functional tests are writted based on ginkgo and gomega. In order to run the tests, the user needs to setup the envtest which provides a mocked kubernetes cluster. A detailed explanation on how to configure envtest is provided here.
To run the test on individual controllers, change directory into the contorller's folder and run
ginkgo -v
This provides detailed logs of the test spec.
Note: When runninng tests for each controller, make sure to add the BinaryAssetsDirectory
attribute in the envtest.Environment
in the suite_test.go
file. The value should point to the path where the envtest binaries are installed.
In order to run tests for all the controllers, we can use the make
command
make unit-test
Note: The make command should be executed on the root project level.
A user can run the e2e tests in the same namespace as the operator. To deploy opendatahub-operator refer to this section. The following environment variables must be set when running locally:
export KUBECONFIG=/path/to/kubeconfig
Ensure when testing RHOAI operator in dev mode, no ODH CSV exists Once the above variables are set, run the following:
make e2e-test
Additional flags that can be passed to e2e-tests by setting up E2E_TEST_FLAGS
variable. Following table lists all the available flags to run the tests:
Flag | Description | Default value |
---|---|---|
--skip-deletion | To skip running of dsc-deletion test that includes deleting DataScienceCluster resources. Assign this variable to true to skip DataScienceCluster deletion. |
false |
--test-operator-controller | To configure the execution of tests related to the Operator POD, this is useful to run e2e tests for an operator running out of the cluster i.e. for debugging purposes | true |
--test-webhook | To configure the execution of tests rellated to the Operator WebHooks, this is useful to run e2e tests for an operator running out of the cluster i.e. for debugging purposes | true |
--test-component | A repeatable flag that control what component should be tested, by default all component specific test are executed | true |
Example command to run full test suite skipping the test for DataScienceCluster deletion.
make e2e-test OPERATOR_NAMESPACE=<namespace> E2E_TEST_FLAGS="--skip-deletion=true"
Example commands to run test suite for the dashboard component
only, with the operator running out of the cluster.
make run-nowebhook
Unit tests for Prometheus alerts are included in the repository. You can run them using the following command:
make test-alerts
To check for alerts that don't have unit tests, run the below command:
make check-prometheus-alert-unit-tests
To add a new unit test file, name it the same as the rules file in the prometheus ConfigMap, just with the .rules
suffix replaced with .unit-tests.yaml
Please refer to api documentation
Please refer to components docs
Please refer to troubleshooting documentation
Please refer to upgrade testing documentation