-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed vault clone
or vault pull
on N1 causes N2 to crash
#324
Comments
This along with #198 is definitely due to some sort of resource leak coming out of node to node connections/streams. |
The only way we were able to proceed was to delete the entire state of the polykey client node state and restart a new node, which means a new NodeId too. |
This bug issue is really focusing on the inter-node behaviour which is quite critical. However the state reset indicates that there's some corruption of the state... not sure where or what would cause the |
TypeError: Invalid state: WritableStream is closed
vault clone
or vault pull
on N1 causes N2 to crash
Due to not having access to the corrupted Polykey state or another reliable method to replicate this issue, it is really challenging to pinpoint the issue. This will need an in-depth investigation. |
Try testing it with the other team members PK. Don't just do self pull/clone. There's resource leaks in the nodes domain atm anyway. |
Also you can always run different versions of PK too you can try to use the nixpkgs pin to different versions and run them or clone them separately. |
I think this was addressed when we fixed the leaking errors when addressing MatrixAI/js-quic#128. @aryanjassal you'll need to try and recreate the problem here. and see if it still happens. If not then we can mark this as done. an easy way to trigger a timeout when cloning/pulling is to try and clone/pull a vault with a few megabytes in it. I'll be assigning this to you @aryanjassal |
Describe the bug
When N1 tries to clone/pull the vault, sometimes due to unknown bug, state corruption or something, it causes a
ErrorRPCTimeout
.After a little bit of time, the agent on N2 reports:
TypeError: Invalid state: WritableStream is closed
.This then causes the entire agent to shutdown. I suspect this has common factors with #115, #185, #198.
To Reproduce
["0.10.0","1.14.0","1","1"]
, but it doesn't appear that the version is the problem.["0.13.0","1.15.1","1","1"]
Expected behavior
Regardless of what is happening, I believe the network streams is not properly being garbage collected or handled. It doesn't matter if the client is broken. The agent that is serving the vault SHOULD NOT FAIL.
I'm pretty sure this is similar to #198.
The point is something is causing
ErrorRPCTimeout
, and it seemed to only be fixed through a full state reset. And this implies there's some amount of state corruption that is occurring too.Screenshots
Platform (please complete the following information)
Additional context
polykey agent stop
command not terminating properly #185Notify maintainers
@tegefaulkes @aryanjassal
The text was updated successfully, but these errors were encountered: