[WIP] Refactor legacy informer/handler framework #2372

trozet · 2024-12-03T02:59:06Z

The legacy event driven informer/handler framework has flaws with performance. The regular "informer" object is not multi-threaded and handles events without a queue. The federated "queued informer" sends events to queues where worker threads pull items off and process them. In either informer version, the processing of objects is paused when a new handler is added to the informer. When a new handler is added, all existing objects in the cluster are processed as ADDs. This is extremely slow with UDN and adding new handlers on the fly as UDNs are created. Additionally, the previous model for queued informer did not actually stop the shared informer from enqueuing objects directly. It only paused the workers, which indirectly would eventually stop the shared informer from enqueuing after the channel buffer (size 10) was full.

To address this, the following has been changed:

Classic informer type object is removed, all legacy informers now use a queued informer. These queued informers only use 1 worker thread.
The queued informer now only enqueues events on initial add. The queue size has been adjusted to 10k to account for buffering initial add objects. Can tweak this later.
Locking has been modified. The informer lock is now used to control event flow into the queues. In order to enqueue events from the informer cache, a read lock must be obtained. When a handler is added the informer lock is temporarily taken, to stop the informer cache so that initial objects can be enqueued. Additionally a new handlerMutex is introduced to control adding/removing handlers on an queued informer object.
Since now all objects are enqueued when a handler is added and all handlers will iterate over these already processed objects, a map has been added to the handler. This processedInitialAdd map allows the handler to determine if it has already processed the object or not. In the case of existing handlers, they will ignore adds for objects they already procesed. For newly added handlers, they will ignore updates and deletes received before they have first added the object.

Signed-off-by: Tim Rozet trozet@redhat.com
(cherry picked from commit 27998123011b1eb6cb056e9a307a62c8eea20026)

📑 Description

Fixes #

Additional Information for reviewers

✅ Checks

My code requires changes to the documentation
if so, I have updated the documentation as required
My code requires tests
if so, I have added and/or updated the tests as required
All the tests have passed in the CI

How to verify it

The legacy event driven informer/handler framework has flaws with performance. The regular "informer" object is not multi-threaded and handles events without a queue. The federated "queued informer" sends events to queues where worker threads pull items off and process them. In either informer version, the processing of objects is paused when a new handler is added to the informer. When a new handler is added, all existing objects in the cluster are processed as ADDs. This is extremely slow with UDN and adding new handlers on the fly as UDNs are created. Additionally, the previous model for queued informer did not actually stop the shared informer from enqueuing objects directly. It only paused the workers, which indirectly would eventually stop the shared informer from enqueuing after the channel buffer (size 10) was full. To address this, the following has been changed: 1. Classic informer type object is removed, all legacy informers now use a queued informer. These queued informers only use 1 worker thread. 2. The queued informer now only enqueues events on initial add. The queue size has been adjusted to 10k to account for buffering initial add objects. Can tweak this later. 3. Locking has been modified. The informer lock is now used to control event flow into the queues. In order to enqueue events from the informer cache, a read lock must be obtained. When a handler is added the informer lock is temporarily taken, to stop the informer cache so that initial objects can be enqueued. Additionally a new handlerMutex is introduced to control adding/removing handlers on an queued informer object. 4. Since now all objects are enqueued when a handler is added and all handlers will iterate over these already processed objects, a map has been added to the handler. This processedInitialAdd map allows the handler to determine if it has already processed the object or not. In the case of existing handlers, they will ignore adds for objects they already procesed. For newly added handlers, they will ignore updates and deletes received before they have first added the object. Signed-off-by: Tim Rozet <trozet@redhat.com> (cherry picked from commit 27998123011b1eb6cb056e9a307a62c8eea20026)

trozet · 2024-12-03T02:59:56Z

/test qe-perfscale-aws-ovn-small-udn-density-l3

openshift-ci · 2024-12-03T03:00:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: trozet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [trozet]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jtaleric · 2024-12-03T20:40:20Z

/test qe-perfscale-aws-ovn-small-udn-density-l3

openshift-ci · 2024-12-04T03:18:17Z

@trozet: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-metal-ipi-ovn-dualstack-techpreview	`c1ba61c`	link	false	`/test e2e-metal-ipi-ovn-dualstack-techpreview`
ci/prow/e2e-openstack-ovn	`c1ba61c`	link	false	`/test e2e-openstack-ovn`
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway-techpreview	`c1ba61c`	link	false	`/test e2e-metal-ipi-ovn-dualstack-local-gateway-techpreview`
ci/prow/security	`c1ba61c`	link	false	`/test security`
ci/prow/e2e-azure-ovn	`c1ba61c`	link	false	`/test e2e-azure-ovn`
ci/prow/e2e-azure-ovn-techpreview	`c1ba61c`	link	false	`/test e2e-azure-ovn-techpreview`
ci/prow/e2e-aws-ovn-hypershift-conformance-techpreview	`c1ba61c`	link	false	`/test e2e-aws-ovn-hypershift-conformance-techpreview`
ci/prow/e2e-aws-ovn-shared-to-local-gateway-mode-migration	`c1ba61c`	link	true	`/test e2e-aws-ovn-shared-to-local-gateway-mode-migration`
ci/prow/e2e-metal-ipi-ovn-dualstack	`c1ba61c`	link	true	`/test e2e-metal-ipi-ovn-dualstack`
ci/prow/e2e-aws-ovn-serial	`c1ba61c`	link	true	`/test e2e-aws-ovn-serial`
ci/prow/e2e-vsphere-ovn-techpreview	`c1ba61c`	link	false	`/test e2e-vsphere-ovn-techpreview`
ci/prow/e2e-gcp-ovn	`c1ba61c`	link	true	`/test e2e-gcp-ovn`
ci/prow/e2e-aws-ovn-upgrade-local-gateway	`c1ba61c`	link	true	`/test e2e-aws-ovn-upgrade-local-gateway`
ci/prow/e2e-aws-ovn-single-node-techpreview	`c1ba61c`	link	false	`/test e2e-aws-ovn-single-node-techpreview`
ci/prow/okd-scos-e2e-aws-ovn	`c1ba61c`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/e2e-metal-ipi-ovn-ipv6	`c1ba61c`	link	true	`/test e2e-metal-ipi-ovn-ipv6`
ci/prow/e2e-azure-ovn-upgrade	`c1ba61c`	link	true	`/test e2e-azure-ovn-upgrade`
ci/prow/e2e-ovn-hybrid-step-registry	`c1ba61c`	link	false	`/test e2e-ovn-hybrid-step-registry`
ci/prow/4.18-upgrade-from-stable-4.17-e2e-aws-ovn-upgrade	`c1ba61c`	link	true	`/test 4.18-upgrade-from-stable-4.17-e2e-aws-ovn-upgrade`
ci/prow/e2e-vsphere-ovn	`c1ba61c`	link	false	`/test e2e-vsphere-ovn`
ci/prow/e2e-gcp-ovn-techpreview	`c1ba61c`	link	true	`/test e2e-gcp-ovn-techpreview`
ci/prow/e2e-aws-ovn	`c1ba61c`	link	true	`/test e2e-aws-ovn`
ci/prow/e2e-metal-ipi-ovn-techpreview	`c1ba61c`	link	false	`/test e2e-metal-ipi-ovn-techpreview`
ci/prow/e2e-aws-ovn-local-to-shared-gateway-mode-migration	`c1ba61c`	link	true	`/test e2e-aws-ovn-local-to-shared-gateway-mode-migration`
ci/prow/e2e-aws-ovn-hypershift	`c1ba61c`	link	true	`/test e2e-aws-ovn-hypershift`
ci/prow/e2e-aws-ovn-local-gateway	`c1ba61c`	link	true	`/test e2e-aws-ovn-local-gateway`
ci/prow/e2e-aws-ovn-kubevirt	`c1ba61c`	link	false	`/test e2e-aws-ovn-kubevirt`
ci/prow/e2e-aws-ovn-techpreview	`c1ba61c`	link	false	`/test e2e-aws-ovn-techpreview`
ci/prow/e2e-aws-ovn-upgrade	`c1ba61c`	link	true	`/test e2e-aws-ovn-upgrade`
ci/prow/e2e-metal-ipi-ovn-ipv6-techpreview	`c1ba61c`	link	false	`/test e2e-metal-ipi-ovn-ipv6-techpreview`
ci/prow/4.18-upgrade-from-stable-4.17-e2e-gcp-ovn-rt-upgrade	`c1ba61c`	link	true	`/test 4.18-upgrade-from-stable-4.17-e2e-gcp-ovn-rt-upgrade`
ci/prow/e2e-aws-ovn-windows	`c1ba61c`	link	true	`/test e2e-aws-ovn-windows`
ci/prow/qe-perfscale-aws-ovn-small-udn-density-l3	`c1ba61c`	link	false	`/test qe-perfscale-aws-ovn-small-udn-density-l3`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

trozet changed the title ~~Refactor legacy informer/handler framework~~ [WIP] Refactor legacy informer/handler framework Dec 3, 2024

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 3, 2024

trozet mentioned this pull request Dec 3, 2024

[WIP] Refactor legacy informer/handler framework ovn-kubernetes/ovn-kubernetes#4859

Open

openshift-ci bot requested review from JacobTanenbaum and jcaamano December 3, 2024 03:00

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Refactor legacy informer/handler framework #2372

[WIP] Refactor legacy informer/handler framework #2372

trozet commented Dec 3, 2024

trozet commented Dec 3, 2024

openshift-ci bot commented Dec 3, 2024

jtaleric commented Dec 3, 2024

openshift-ci bot commented Dec 4, 2024

[WIP] Refactor legacy informer/handler framework #2372

Are you sure you want to change the base?

[WIP] Refactor legacy informer/handler framework #2372

Conversation

trozet commented Dec 3, 2024

📑 Description

Additional Information for reviewers

✅ Checks

How to verify it

trozet commented Dec 3, 2024

openshift-ci bot commented Dec 3, 2024

jtaleric commented Dec 3, 2024

openshift-ci bot commented Dec 4, 2024