-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[spike] Investigate DNSrecond events #1108
Comments
With the current implementation of the DNSRecord status update I'd say this is expected. In order to facilitate multiple records and therefore multiple clusters sharing a hostname, a mechanism by which the consistency of the records in the provider was enforced was added to the DNS record reconcile. This in short comes down to periodically reconciling the DNSRecord every x amount of time (15 minutes i believe), and during the initial creation doing this more frequently, and slowly backing off until we reach the max polling interval (15 mins). It would be expected during this time, especially if multiple records are adding values to the same record set, that multiple writes are happening to the DNSRecord status. Following is an example of the diff of the same record that has been created for a few hours:
Diff of a record that has only been created for a shorter while (Notice the validFor is changing):
There is an issue with the endpoint list not being ordered and can be updated sometimes unnecessarily, but because we are updating these other values in the status on every reconcile, it's going to cause an update regardless. Might be possible to patch those fields instead of update, not totally sure if that makes any difference, i would have thought it still needed to update the The You might find more info on it in here , although its probably way of date now tbh. |
DNS Records status is updated frequently due to the "queuedAt" field, let's investigate if there is a way to remove this field without risking excessive communication with the DNS Provider API.
This was discovered during the investigation into #1085 it was notice after the load test had completed there was a number of events triggered by a DNSrecord. There was nothing in the logs to suggest what actions the operator was preforming.
Attached is the pod logs from two different load test runs, the pods were restarted with new resource limits between runs.
kuadrant-operator-controller-manager-8464cd4785-82lfs-manager.log
kuadrant-operator-controller-manager-db6784dfc-ghv9d-manager.log
The text was updated successfully, but these errors were encountered: