Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Don't know how to handle this now" in zmq.rs send_round_robin #151

Closed
apowers313 opened this issue Dec 24, 2021 · 15 comments · Fixed by #152
Closed

"Don't know how to handle this now" in zmq.rs send_round_robin #151

apowers313 opened this issue Dec 24, 2021 · 15 comments · Fixed by #152

Comments

@apowers313
Copy link

In developing Jupyter for Deno, I'm hitting the error "Don't know how to handle this now" in send_round_robin. It's not clear to me whether this is a mistake I'm making or an unimplemented code path of zmq.rs. Can you help me?

Jupyter console output attached for context -- it looks like maybe it's disconnect / reconnecting just before the error occurs, which might be part of the problem based on the comments in the zmq.rs code.

@Alexei-Kornienko
Copy link
Collaborator

Seems like it's the case when sending message to next peer in queue fails for some reason. That's definitely not your mistake. Could you build zmq from your fork and add dbg!(e) to that code branch? It would be useful to know the actual error you are hitting. Maybe it should be enough to just add a continue there to just grab the next peer available.

@bartlomieju
Copy link

I added the debug log there, here's the error:

error in round robin Io(
    Os {
        code: 32,
        kind: BrokenPipe,
        message: "Broken pipe",
    },
)

@Alexei-Kornienko
Copy link
Collaborator

Released a version 0.3.2 with a fix. Feel free to reopen if you have any issues.

@bartlomieju
Copy link

Thanks for quick fix @Alexei-Kornienko, I tested it and panic is no longer raised, however the error is somehow never surfaced. I added a bunch of logging and don't see it returned from socket.send() in our code, it just causes the notebook to freeze. I will debug some more, but it's unclear ATM what's going wrong.

@Alexei-Kornienko
Copy link
Collaborator

Alexei-Kornienko commented Dec 26, 2021

  1. error should be raised and you should catch in in socket.send() Are you sure that you check result you are getting?
  2. freeze might be caused by missing reconnect. If you are using zmq socket to connect to some endpoint it should automatically reconnect in case of any network errors. This is currently not implemented - (reconnect #143 ).

Could you please describe your use case in more details? What sockets you are using and how? Maybe I'll be able to help you with this.

@bartlomieju
Copy link

Could you please describe your use case in more details? What sockets you are using and how? Maybe I'll be able to help you with this.

Thank for taking the time!

  1. error should be raised and you should catch in in socket.send() Are you sure that you check result you are getting?

Right, so we actually surface all errors, these two are simple wrappers around zmq.rs:

And they are called here (you might need to click "Show diff" to jump to relevant line):

All these errors are later print!ed, but I don't get any message in the terminal.

  1. freeze might be caused by missing reconnect. If you are using zmq socket to connect to some endpoint it should automatically reconnect in case of any network errors. This is currently not implemented - (reconnect #143 ).

That's a good pointer and seems very much related to the work we're doing.

Also huge thanks for zmq.rs - I know you don't recommend to use it in production, but once we get to satisfactory featurefulness, we'd love to ship deno jupyter. Let me know if there's anything we could work on to help with the library.

@Alexei-Kornienko
Copy link
Collaborator

One of the biggest blockers (IMHO) is reconnect. If it's going to be implemented close to ZMQ spec I would say that existing sockets implementation would be good enough to use it. Many options (like buffer size, etc.) should be configurable but in my opinion it's a second priority

@Alexei-Kornienko
Copy link
Collaborator

Would be really good to get some feedback from real users about the APIs and their usability (multipart message construction, etc.). Maybe something needs changing before I can mark them as stable..

@bartlomieju
Copy link

Sounds good! Just to be clear, we're fine to stay on the bleeding edge for some time and it seems reconnection logic might be the only blocker at the moment for us. Sudden API changes in unstable libs are expected and we're fine with it. I will debug some more the problems I'm having with the error not being raised.

@Alexei-Kornienko
Copy link
Collaborator

After some thinking it seems that my commit is not correct and might cause deadlock...
I try to call self.peer_disconnected(&next_peer_id) while holding mutable reference to the same peer. That's definitely not going to work

@bartlomieju
Copy link

@Alexei-Kornienko I was going to strace my program on that hang, but it's a great news that you figured this out already. Just FYI - when I replaced the panic with pass (or println! to be precise) I was not experiencing any problems.

Alexei-Kornienko added a commit that referenced this issue Dec 27, 2021
@Alexei-Kornienko
Copy link
Collaborator

@bartlomieju Could you please test this branch - fix/151 and see if it works for you?

@bartlomieju
Copy link

@bartlomieju Could you please test this branch - fix/151 and see if it works for you?

Sure thing, on it.

@bartlomieju
Copy link

@Alexei-Kornienko indeed this fixes the problem, I'm no longer getting hangs and error is showing up in the terminal:
[shell] error handling kernel_info_request: Codec Error: Broken pipe (os error 32)

@Alexei-Kornienko
Copy link
Collaborator

Cool.. so this problem is solved. I'll bump a new version with the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants