Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spike] Investigate DNSrecond events #1108

Open
Boomatang opened this issue Jan 13, 2025 · 1 comment
Open

[spike] Investigate DNSrecond events #1108

Boomatang opened this issue Jan 13, 2025 · 1 comment

Comments

@Boomatang
Copy link
Contributor

Boomatang commented Jan 13, 2025

DNS Records status is updated frequently due to the "queuedAt" field, let's investigate if there is a way to remove this field without risking excessive communication with the DNS Provider API.

This was discovered during the investigation into #1085 it was notice after the load test had completed there was a number of events triggered by a DNSrecord. There was nothing in the logs to suggest what actions the operator was preforming.

Attached is the pod logs from two different load test runs, the pods were restarted with new resource limits between runs.

kuadrant-operator-controller-manager-8464cd4785-82lfs-manager.log
kuadrant-operator-controller-manager-db6784dfc-ghv9d-manager.log

@mikenairn
Copy link
Member

With the current implementation of the DNSRecord status update I'd say this is expected.

In order to facilitate multiple records and therefore multiple clusters sharing a hostname, a mechanism by which the consistency of the records in the provider was enforced was added to the DNS record reconcile. This in short comes down to periodically reconciling the DNSRecord every x amount of time (15 minutes i believe), and during the initial creation doing this more frequently, and slowly backing off until we reach the max polling interval (15 mins). It would be expected during this time, especially if multiple records are adding values to the same record set, that multiple writes are happening to the DNSRecord status.

Following is an example of the diff of the same record that has been created for a few hours:

31c31
<         "resourceVersion": "90148",
---
>         "resourceVersion": "121484",
219c219
<         "queuedAt": "2025-01-13T22:34:58Z",
---
>         "queuedAt": "2025-01-14T08:41:03Z",

Diff of a record that has only been created for a shorter while (Notice the validFor is changing):

31c31
<         "resourceVersion": "35504",
---
>         "resourceVersion": "39965",
219c219
<         "queuedAt": "2025-01-13T17:12:40Z",
---
>         "queuedAt": "2025-01-13T17:23:20Z",
338c338
<         "validFor": "10m40s",
---
>         "validFor": "15m0s",

There is an issue with the endpoint list not being ordered and can be updated sometimes unnecessarily, but because we are updating these other values in the status on every reconcile, it's going to cause an update regardless.

Might be possible to patch those fields instead of update, not totally sure if that makes any difference, i would have thought it still needed to update the resourceVersion field in order to avoid conflicts.

The resourceVersion field forever increasing in value is a bit concerning.

You might find more info on it in here , although its probably way of date now tbh.

@philbrookes philbrookes moved this to Todo in Kuadrant Jan 16, 2025
@philbrookes philbrookes changed the title Investigate DNSrecond events [spike] Investigate DNSrecond events Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants