Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconnect triggered by zero bytes received fails #58

Open
GoogleCodeExporter opened this issue Apr 7, 2015 · 4 comments
Open

Reconnect triggered by zero bytes received fails #58

GoogleCodeExporter opened this issue Apr 7, 2015 · 4 comments

Comments

@GoogleCodeExporter
Copy link

This error is in the simple message library but it presents itself when used 
with the Fanuc controller.  This seems to occur because the Fanuc controller 
keeps socket connections open.

What steps will reproduce the problem?
1. Start the ROS-I software on the Fanuc controller: ldr_ros_i.tp
2. Start the ROS-I software on the PC: roslaunch fanuc_common 
robot_state_visualize_<robot>.launch robot_ip:<ip address>
3. Abort or pause the programs executed by ldr_ros_i.tp
4. Restart ldr_ros_i.tp

What is the expected output? What do you see instead?

The ROS logs will show reconnection errors (EISCONN 106 /* Transport endpoint 
is already connected).  The reconnect is triggered by receiving zero bytes, not 
an error code returned by the socket read.  The socket connection is never 
reacquired.  The socket should reconnect when ldr_ros_i.tp is executed.


Please use labels and text to provide additional information.

Original issue reported on code.google.com by shaun.ed...@gmail.com on 19 Apr 2013 at 3:29

@GoogleCodeExporter
Copy link
Author

We might need to introduce a heartbeat / watchdog feature in the Fanuc nodes. 
Minimum time-out value for any TCP TAG on the Fanuc seems to be 1 minute, which 
is obviously too long.

Making the KAREL side explicitly disconnect TAGs after failing to receive 
heartbeats (for some time) would probably mitigate the issue described above.

Original comment by colaed11 on 19 Apr 2013 at 1:15

@GoogleCodeExporter
Copy link
Author

That could be the problem.  We should probably make the KAREL socket code 
disconnect on a program abort/pause event.  This can easily be done using the 
CONDITION monitors in KAREL(what to you think?).  Fixing the client side to 
react correctly to the "already connected error" should still be done.  I think 
these two fixes together should make the communication robust enough.  TPC/IP 
already has a heartbeat built in, so I am hesitant to implement one at another 
level.  

Original comment by shaun.ed...@gmail.com on 22 Apr 2013 at 5:04

@GoogleCodeExporter
Copy link
Author

Disconnecting on ABORT is an option. It will only cover the case you describe 
here though. Adding a heartbeat would make disconnects (and subsequent 
reconnects) due to loss of carrier or ROS hang / crash much faster. Unless the 
TAG is disconnected it won't accept any other connections (eventhough the 
remote socket is already gone).

I think both mechanisms should be implemented. The heartbeat could simply be a 
PING (type 1) every N seconds.

Original comment by colaed11 on 28 Apr 2013 at 5:10

@GoogleCodeExporter
Copy link
Author

I haven't noticed problems reconnecting after killing the ROS node.  It seems 
to reconnect ok under those circumstances (Am i wrong?)  I'm really hesitant to 
recommend implementing a heartbeat since TCP/IP is a connected socket and must 
have something like a heartbeat at some level.

Original comment by shaun.ed...@gmail.com on 29 Apr 2013 at 1:38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant