-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: fix(backend): add retries to getOrInsertContext in MLMD client #74
base: master
Are you sure you want to change the base?
WIP: fix(backend): add retries to getOrInsertContext in MLMD client #74
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Commit Checker results:
|
A set of new images have been built to help with testing out this PR: |
An OCP cluster where you are logged in as cluster admin is required. The Data Science Pipelines team recommends testing this using the Data Science Pipelines Operator. Check here for more information on using the DSPO. To use and deploy a DSP stack with these images (assuming the DSPO is deployed), first save the following YAML to a file named apiVersion: datasciencepipelinesapplications.opendatahub.io/v1alpha1
kind: DataSciencePipelinesApplication
metadata:
name: pr-74
spec:
dspVersion: v2
apiServer:
image: "quay.io/opendatahub/ds-pipelines-api-server:pr-74"
argoDriverImage: "quay.io/opendatahub/ds-pipelines-driver:pr-74"
argoLauncherImage: "quay.io/opendatahub/ds-pipelines-launcher:pr-74"
persistenceAgent:
image: "quay.io/opendatahub/ds-pipelines-persistenceagent:pr-74"
scheduledWorkflow:
image: "quay.io/opendatahub/ds-pipelines-scheduledworkflow:pr-74"
mlmd:
deploy: true # Optional component
grpc:
image: "quay.io/opendatahub/mlmd-grpc-server:latest"
envoy:
image: "registry.redhat.io/openshift-service-mesh/proxyv2-rhel8:2.3.9-2"
mlpipelineUI:
deploy: true # Optional component
image: "quay.io/opendatahub/ds-pipelines-frontend:pr-74"
objectStorage:
minio:
deploy: true
image: 'quay.io/opendatahub/minio:RELEASE.2019-08-14T20-37-41Z-license-compliance' Then run the following: cd $(mktemp -d)
git clone git@github.com:opendatahub-io/data-science-pipelines.git
cd data-science-pipelines/
git fetch origin pull/74/head
git checkout -b pullrequest 7fb8762303a8ed015d3aa77bda5440e136cba83c
oc apply -f dspa.pr-74.yaml More instructions here on how to deploy and test a Data Science Pipelines Application. |
Commit Checker results:
|
Change to PR detected. A new PR build was completed. |
This function is known to be flaky and racy right after a server is initially created and pipeline runs are just starting, so we retry up to 3 times. The actual cause of the race isn't fully understood, but it's probably caused by deadlocks in MLMD SQL code. General MySQL advice is to retry on deadlock errors. Signed-off-by: Greg Sheremeta <gshereme@redhat.com>
Signed-off-by: Greg Sheremeta <gshereme@redhat.com>
fc9fdad
to
796ecbc
Compare
Commit Checker results:
|
Change to PR detected. A new PR build was completed. |
Description of your changes:
Add retries to getOrInsertContext in MLMD client.
This function is known to be flaky and racy right after a server is initially created and pipeline runs are just starting, so we retry up to 3 times. The actual cause of the race isn't fully understood, but it's probably caused by deadlocks in MLMD SQL code. General MySQL advice is to retry on deadlock errors.
Checklist: