Better feedback when agent isn't running after ungraceful exit #344

tegefaulkes · 2024-12-10T02:18:31Z

Specification

Currently when you you use the CLI to make calls to an agent, it will check the existing status file in the node path and use that for the connection information. When it's running this file contains useful running information. When it stops gracefully this file reports that it is currently stopped.

However when the agent crashes and fails to shut down gracefully. this status file is left as is from the moment of the crash. It is never updated. So we have a situation where the status file reports a running agent but we can't connect to it.

ERROR:polykey.PolykeyClient.WebSocketClient:ErrorWebSocketConnectionLocal: WebSocket Connection local error - WebSocket could not open due to internal error
ERROR:polykey.PolykeyClient.WebSocketClient.WebSocketConnection 0:ErrorWebSocketConnectionLocal: WebSocket Connection local error - WebSocket could not open due to internal error
ErrorPolykeyCLIUnexpectedError: An unexpected error occured - Thrown 'ErrorWebSocketConnectionLocal'
  cause: ErrorWebSocketConnectionLocal: WebSocket could not open due to internal error

As of now this was expected behaviour. But this feedback looks worse than the actual problem of the node not running. We need better feedback for this scenario.

So we need to following changes.

If a Websocket client fails to connect then we need a nicer error to be returned without all this error logging from the logger.
If we take the connection info from the status file but fail to connect with these details, we need the nicer connection failure message AND report that the status file was incorrect and attempt to correct the status file.

Additional context

Related: #198 (comment)
Related: #198

Tasks

Clean up the error reporting if we fail to connect with a websocket. WE shouldn't get a bunch of ERROR level logs, we should catch the connection failure and report it directly with a nicer formatted error.
We need a more specific error reported if we failed to connect with details taken from a status file with the --node-path option.
We need to clean up the status file if we determine it to be stale and orphaned.

The text was updated successfully, but these errors were encountered:

linear · 2024-12-10T02:18:34Z

ENG-486 Better feedback when agent isn't running after ungraceful exit.

tegefaulkes added the development Standard development label Dec 10, 2024

aryanjassal changed the title ~~Better feedback when agent isn't running after ungraceful exit.~~ Better feedback when agent isn't running after ungraceful exit Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better feedback when agent isn't running after ungraceful exit #344

Better feedback when agent isn't running after ungraceful exit #344

tegefaulkes commented Dec 10, 2024 •

edited by aryanjassal

Loading

linear bot commented Dec 10, 2024

Better feedback when agent isn't running after ungraceful exit #344

Better feedback when agent isn't running after ungraceful exit #344

Comments

tegefaulkes commented Dec 10, 2024 • edited by aryanjassal Loading

Specification

Additional context

Tasks

linear bot commented Dec 10, 2024

tegefaulkes commented Dec 10, 2024 •

edited by aryanjassal

Loading