-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deletion race condition when fetching secret from the Kubernetes secrets provider cache #6340
Comments
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
There is another potential problem here, where time since last access is evaluated twice: Once here where we refresh the value if the TTL hasn't expired, but the problem is if the value changed and agent should update the component model with the new value, we don't update the last access. elastic-agent/internal/pkg/composable/providers/kubernetessecrets/kubernetes_secrets.go Lines 174 to 182 in 54932dc
The TTL is then evaluated again when merging the maps which can lead to deletion even though the value updated: elastic-agent/internal/pkg/composable/providers/kubernetessecrets/kubernetes_secrets.go Lines 128 to 130 in 54932dc
This would only matter if the TTLDelete and the secret rotation time were very close but we have an internal example of that. |
The design of this provider doesn't sit well with me. What we're trying to do is very simple: Keep an up-to-date local cache of a small number of Secrets. Intuitively, it'd be best to have an informer for these. However, informers (and watches in general) don't have great support for selecting resources based on field names. It's possible to create an informer which tracks a particular singular Secret by using a field selector, but if we want multiple Secrets, we need to have an informer for each of them, which puts some undue strain on the Kubernetes API Server. If this was anything other than Secrets, I'd strongly consider using a single informer and just filtering within the application, but cluster operators get understandably touchy about applications wanting to access a lot of Secrets, so this doesn't sound like the best of ideas either. Maybe we should at least use a battle-tested off-the-shelf component for this instead of implementing an expiration cache ourselves and opening ourselves up to concurrency bugs. https://pkg.go.dev/k8s.io/client-go/tools/cache#ExpirationCache looks like it would fit. @pkoutsovasilis maybe you'll have a better idea than me here. 🙏 |
yes AFAIK multiple FieldSelectors (aka selecting by name) is not a thing in k8s. Chained selectors is the closest thing to "multi-values" fieldselectors but they are chained with logical "and" and we need logical "or" here, thus unusable.
As is now the agent RBAC requires just a "get" for just the secrets this provider has to "watch"; where with a single informer we would need "list", "watch" (and "get") for every secret out there. So just echoing the same message as the quote, not the best option
Utilising a battle-tested off-the-shelf component seems like the best way forward 🙂 |
+1 from me. TIL https://pkg.go.dev/k8s.io/client-go/tools/cache#ExpirationCache exists, thanks. |
The Kubernetes secrets provider caches secrets and updates them on a configurable schedule to avoid placing unnecessary load on the Kubernetes API.
The cache is updated in a goroutine:
elastic-agent/internal/pkg/composable/providers/kubernetessecrets/kubernetes_secrets.go
Lines 88 to 90 in 54932dc
The cache is copied, the latest value for each non-expired secret is fetched, and then the copy is merged into the active map.
elastic-agent/internal/pkg/composable/providers/kubernetessecrets/kubernetes_secrets.go
Lines 164 to 199 in 54932dc
The logic to get a value from the cache follows below.
elastic-agent/internal/pkg/composable/providers/kubernetessecrets/kubernetes_secrets.go
Lines 204 to 225 in 54932dc
The race condition is that after 1 and 2 have completed, the
updateCache()
method that can delete secrets from the cache could have run and deleted the secret right before 3 below where the value is returned directly from the cache with no check for whether the secret still exists.elastic-agent/internal/pkg/composable/providers/kubernetessecrets/kubernetes_secrets.go
Lines 218 to 222 in 54932dc
The text was updated successfully, but these errors were encountered: