Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Refactor legacy informer/handler framework #2372

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

trozet
Copy link
Contributor

@trozet trozet commented Dec 3, 2024

The legacy event driven informer/handler framework has flaws with performance. The regular "informer" object is not multi-threaded and handles events without a queue. The federated "queued informer" sends events to queues where worker threads pull items off and process them. In either informer version, the processing of objects is paused when a new handler is added to the informer. When a new handler is added, all existing objects in the cluster are processed as ADDs. This is extremely slow with UDN and adding new handlers on the fly as UDNs are created. Additionally, the previous model for queued informer did not actually stop the shared informer from enqueuing objects directly. It only paused the workers, which indirectly would eventually stop the shared informer from enqueuing after the channel buffer (size 10) was full.

To address this, the following has been changed:

  1. Classic informer type object is removed, all legacy informers now use a queued informer. These queued informers only use 1 worker thread.
  2. The queued informer now only enqueues events on initial add. The queue size has been adjusted to 10k to account for buffering initial add objects. Can tweak this later.
  3. Locking has been modified. The informer lock is now used to control event flow into the queues. In order to enqueue events from the informer cache, a read lock must be obtained. When a handler is added the informer lock is temporarily taken, to stop the informer cache so that initial objects can be enqueued. Additionally a new handlerMutex is introduced to control adding/removing handlers on an queued informer object.
  4. Since now all objects are enqueued when a handler is added and all handlers will iterate over these already processed objects, a map has been added to the handler. This processedInitialAdd map allows the handler to determine if it has already processed the object or not. In the case of existing handlers, they will ignore adds for objects they already procesed. For newly added handlers, they will ignore updates and deletes received before they have first added the object.

Signed-off-by: Tim Rozet trozet@redhat.com
(cherry picked from commit 27998123011b1eb6cb056e9a307a62c8eea20026)

📑 Description

Fixes #

Additional Information for reviewers

✅ Checks

  • My code requires changes to the documentation
  • if so, I have updated the documentation as required
  • My code requires tests
  • if so, I have added and/or updated the tests as required
  • All the tests have passed in the CI

How to verify it

The legacy event driven informer/handler framework has flaws with
performance. The regular "informer" object is not multi-threaded and
handles events without a queue. The federated "queued informer" sends
events to queues where worker threads pull items off and process them.
In either informer version, the processing of objects is paused when a
new handler is added to the informer. When a new handler is added, all
existing objects in the cluster are processed as ADDs. This is extremely
slow with UDN and adding new handlers on the fly as UDNs are created.
Additionally, the previous model for queued informer did not actually
stop the shared informer from enqueuing objects directly. It only paused
the workers, which indirectly would eventually stop the shared informer
from enqueuing after the channel buffer (size 10) was full.

To address this, the following has been changed:
 1. Classic informer type object is removed, all legacy informers now
    use a queued informer. These queued informers only use 1 worker
    thread.
 2. The queued informer now only enqueues events on initial add. The
    queue size has been adjusted to 10k to account for buffering initial
    add objects. Can tweak this later.
 3. Locking has been modified. The informer lock is now used to control
    event flow into the queues. In order to enqueue events from the
    informer cache, a read lock must be obtained. When a handler is
    added the informer lock is temporarily taken, to stop the informer
    cache so that initial objects can be enqueued. Additionally a new
    handlerMutex is introduced to control adding/removing handlers on
    an queued informer object.
 4. Since now all objects are enqueued when a handler is added and all
    handlers will iterate over these already processed objects, a map
    has been added to the handler. This processedInitialAdd map allows
    the handler to determine if it has already processed the object or
    not. In the case of existing handlers, they will ignore adds for
    objects they already procesed. For newly added handlers, they will
    ignore updates and deletes received before they have first added
    the object.

Signed-off-by: Tim Rozet <trozet@redhat.com>
(cherry picked from commit 27998123011b1eb6cb056e9a307a62c8eea20026)
@trozet trozet changed the title Refactor legacy informer/handler framework [WIP] Refactor legacy informer/handler framework Dec 3, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 3, 2024
@trozet
Copy link
Contributor Author

trozet commented Dec 3, 2024

/test qe-perfscale-aws-ovn-small-udn-density-l3

Copy link
Contributor

openshift-ci bot commented Dec 3, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: trozet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 3, 2024
@jtaleric
Copy link

jtaleric commented Dec 3, 2024

/test qe-perfscale-aws-ovn-small-udn-density-l3

Copy link
Contributor

openshift-ci bot commented Dec 4, 2024

@trozet: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-ovn-dualstack-techpreview c1ba61c link false /test e2e-metal-ipi-ovn-dualstack-techpreview
ci/prow/e2e-openstack-ovn c1ba61c link false /test e2e-openstack-ovn
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway-techpreview c1ba61c link false /test e2e-metal-ipi-ovn-dualstack-local-gateway-techpreview
ci/prow/security c1ba61c link false /test security
ci/prow/e2e-azure-ovn c1ba61c link false /test e2e-azure-ovn
ci/prow/e2e-azure-ovn-techpreview c1ba61c link false /test e2e-azure-ovn-techpreview
ci/prow/e2e-aws-ovn-hypershift-conformance-techpreview c1ba61c link false /test e2e-aws-ovn-hypershift-conformance-techpreview
ci/prow/e2e-aws-ovn-shared-to-local-gateway-mode-migration c1ba61c link true /test e2e-aws-ovn-shared-to-local-gateway-mode-migration
ci/prow/e2e-metal-ipi-ovn-dualstack c1ba61c link true /test e2e-metal-ipi-ovn-dualstack
ci/prow/e2e-aws-ovn-serial c1ba61c link true /test e2e-aws-ovn-serial
ci/prow/e2e-vsphere-ovn-techpreview c1ba61c link false /test e2e-vsphere-ovn-techpreview
ci/prow/e2e-gcp-ovn c1ba61c link true /test e2e-gcp-ovn
ci/prow/e2e-aws-ovn-upgrade-local-gateway c1ba61c link true /test e2e-aws-ovn-upgrade-local-gateway
ci/prow/e2e-aws-ovn-single-node-techpreview c1ba61c link false /test e2e-aws-ovn-single-node-techpreview
ci/prow/okd-scos-e2e-aws-ovn c1ba61c link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-metal-ipi-ovn-ipv6 c1ba61c link true /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-azure-ovn-upgrade c1ba61c link true /test e2e-azure-ovn-upgrade
ci/prow/e2e-ovn-hybrid-step-registry c1ba61c link false /test e2e-ovn-hybrid-step-registry
ci/prow/4.18-upgrade-from-stable-4.17-e2e-aws-ovn-upgrade c1ba61c link true /test 4.18-upgrade-from-stable-4.17-e2e-aws-ovn-upgrade
ci/prow/e2e-vsphere-ovn c1ba61c link false /test e2e-vsphere-ovn
ci/prow/e2e-gcp-ovn-techpreview c1ba61c link true /test e2e-gcp-ovn-techpreview
ci/prow/e2e-aws-ovn c1ba61c link true /test e2e-aws-ovn
ci/prow/e2e-metal-ipi-ovn-techpreview c1ba61c link false /test e2e-metal-ipi-ovn-techpreview
ci/prow/e2e-aws-ovn-local-to-shared-gateway-mode-migration c1ba61c link true /test e2e-aws-ovn-local-to-shared-gateway-mode-migration
ci/prow/e2e-aws-ovn-hypershift c1ba61c link true /test e2e-aws-ovn-hypershift
ci/prow/e2e-aws-ovn-local-gateway c1ba61c link true /test e2e-aws-ovn-local-gateway
ci/prow/e2e-aws-ovn-kubevirt c1ba61c link false /test e2e-aws-ovn-kubevirt
ci/prow/e2e-aws-ovn-techpreview c1ba61c link false /test e2e-aws-ovn-techpreview
ci/prow/e2e-aws-ovn-upgrade c1ba61c link true /test e2e-aws-ovn-upgrade
ci/prow/e2e-metal-ipi-ovn-ipv6-techpreview c1ba61c link false /test e2e-metal-ipi-ovn-ipv6-techpreview
ci/prow/4.18-upgrade-from-stable-4.17-e2e-gcp-ovn-rt-upgrade c1ba61c link true /test 4.18-upgrade-from-stable-4.17-e2e-gcp-ovn-rt-upgrade
ci/prow/e2e-aws-ovn-windows c1ba61c link true /test e2e-aws-ovn-windows
ci/prow/qe-perfscale-aws-ovn-small-udn-density-l3 c1ba61c link false /test qe-perfscale-aws-ovn-small-udn-density-l3

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants