Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better feedback when agent isn't running after ungraceful exit #344

Open
tegefaulkes opened this issue Dec 10, 2024 · 1 comment
Open

Better feedback when agent isn't running after ungraceful exit #344

tegefaulkes opened this issue Dec 10, 2024 · 1 comment
Labels
development Standard development

Comments

@tegefaulkes
Copy link
Contributor

tegefaulkes commented Dec 10, 2024

Specification

Currently when you you use the CLI to make calls to an agent, it will check the existing status file in the node path and use that for the connection information. When it's running this file contains useful running information. When it stops gracefully this file reports that it is currently stopped.

However when the agent crashes and fails to shut down gracefully. this status file is left as is from the moment of the crash. It is never updated. So we have a situation where the status file reports a running agent but we can't connect to it.

ERROR:polykey.PolykeyClient.WebSocketClient:ErrorWebSocketConnectionLocal: WebSocket Connection local error - WebSocket could not open due to internal error
ERROR:polykey.PolykeyClient.WebSocketClient.WebSocketConnection 0:ErrorWebSocketConnectionLocal: WebSocket Connection local error - WebSocket could not open due to internal error
ErrorPolykeyCLIUnexpectedError: An unexpected error occured - Thrown 'ErrorWebSocketConnectionLocal'
  cause: ErrorWebSocketConnectionLocal: WebSocket could not open due to internal error

As of now this was expected behaviour. But this feedback looks worse than the actual problem of the node not running. We need better feedback for this scenario.

So we need to following changes.

  1. If a Websocket client fails to connect then we need a nicer error to be returned without all this error logging from the logger.
  2. If we take the connection info from the status file but fail to connect with these details, we need the nicer connection failure message AND report that the status file was incorrect and attempt to correct the status file.

Additional context

Related: #198 (comment)
Related: #198

Tasks

  1. Clean up the error reporting if we fail to connect with a websocket. WE shouldn't get a bunch of ERROR level logs, we should catch the connection failure and report it directly with a nicer formatted error.
  2. We need a more specific error reported if we failed to connect with details taken from a status file with the --node-path option.
  3. We need to clean up the status file if we determine it to be stale and orphaned.
@tegefaulkes tegefaulkes added the development Standard development label Dec 10, 2024
Copy link

linear bot commented Dec 10, 2024

@aryanjassal aryanjassal changed the title Better feedback when agent isn't running after ungraceful exit. Better feedback when agent isn't running after ungraceful exit Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development
Development

No branches or pull requests

1 participant