Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8.13.0+ Fleet managed agents upgraded to the currently running version report failed upgrades until a successful upgrade occurs #6186

Closed
cmacknz opened this issue Dec 2, 2024 · 1 comment · Fixed by #6273
Assignees
Labels
bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@cmacknz
Copy link
Member

cmacknz commented Dec 2, 2024

8.13.0+ Fleet managed agents upgraded to the currently running version report failed upgrades indefinitely until a successful upgrade occurs. If the agent is running the latest version at the time of the duplicate upgrade attempt, it will stay in the upgrade failed state until the next version is released as there is no newer version to upgrade to.

Prior to v8.13.0 we logged a warning when instructed to upgrade to the currently running version: https://github.com/elastic/elastic-agent/blob/v8.12.2/internal/pkg/agent/application/upgrade/upgrade.go#L182-L184

	if strings.HasPrefix(release.Commit(), newHash) {
		u.log.Warn("Upgrade action skipped: upgrade did not occur because its the same version")
		return nil, nil
	}

In v8.13.0 the agent upgrade process was refactored to account for the situation where the agent package version changed, but the agent binary version and commit hash was unchanged. When this was done an attempt to upgrade to the same version was changed to return an error. This error ultimately ends up in the upgrade details reported to fleet, but because there is no alternative upgrade path if you are upgrading to the currently running version and that version is the latest, the upgrade details persist the failed state indefinitely.

same, newVersion := isSameVersion(u.log, currentVersion, metadata, version)
if same {
return nil, fmt.Errorf("agent version is already %s", currentVersion)
}

There are two improvements to make here:

  1. We make upgrades to the currently running version a no-op and implicitly successful. The upgrade process becomes idempotent in this case. Logging a warning was the correct approach in the original implementation.
  2. We should move this check before the point in time where we download the agent package when it is possible. For official stack releases and independent agent releases, the version in the downloaded agent artifact is always unique between versions. The edge case and exception is snapshots which can have the same version (e.g. 8.16.0-SNAPSHOT) but be a different build, in which case the snapshot ID will be different once it is obtained.
@cmacknz cmacknz added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Dec 2, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants