Reconnect triggered by zero bytes received fails #58

GoogleCodeExporter · 2015-04-07T18:02:09Z

This error is in the simple message library but it presents itself when used 
with the Fanuc controller.  This seems to occur because the Fanuc controller 
keeps socket connections open.

What steps will reproduce the problem?
1. Start the ROS-I software on the Fanuc controller: ldr_ros_i.tp
2. Start the ROS-I software on the PC: roslaunch fanuc_common 
robot_state_visualize_<robot>.launch robot_ip:<ip address>
3. Abort or pause the programs executed by ldr_ros_i.tp
4. Restart ldr_ros_i.tp

What is the expected output? What do you see instead?

The ROS logs will show reconnection errors (EISCONN 106 /* Transport endpoint 
is already connected).  The reconnect is triggered by receiving zero bytes, not 
an error code returned by the socket read.  The socket connection is never 
reacquired.  The socket should reconnect when ldr_ros_i.tp is executed.


Please use labels and text to provide additional information.

Original issue reported on code.google.com by shaun.ed...@gmail.com on 19 Apr 2013 at 3:29

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter · 2015-04-07T18:02:09Z

We might need to introduce a heartbeat / watchdog feature in the Fanuc nodes. 
Minimum time-out value for any TCP TAG on the Fanuc seems to be 1 minute, which 
is obviously too long.

Making the KAREL side explicitly disconnect TAGs after failing to receive 
heartbeats (for some time) would probably mitigate the issue described above.

Original comment by colaed11 on 19 Apr 2013 at 1:15

GoogleCodeExporter · 2015-04-07T18:02:09Z

That could be the problem.  We should probably make the KAREL socket code 
disconnect on a program abort/pause event.  This can easily be done using the 
CONDITION monitors in KAREL(what to you think?).  Fixing the client side to 
react correctly to the "already connected error" should still be done.  I think 
these two fixes together should make the communication robust enough.  TPC/IP 
already has a heartbeat built in, so I am hesitant to implement one at another 
level.

Original comment by shaun.ed...@gmail.com on 22 Apr 2013 at 5:04

GoogleCodeExporter · 2015-04-07T18:02:09Z

Disconnecting on ABORT is an option. It will only cover the case you describe 
here though. Adding a heartbeat would make disconnects (and subsequent 
reconnects) due to loss of carrier or ROS hang / crash much faster. Unless the 
TAG is disconnected it won't accept any other connections (eventhough the 
remote socket is already gone).

I think both mechanisms should be implemented. The heartbeat could simply be a 
PING (type 1) every N seconds.

Original comment by colaed11 on 28 Apr 2013 at 5:10

GoogleCodeExporter · 2015-04-07T18:02:10Z

I haven't noticed problems reconnecting after killing the ROS node.  It seems 
to reconnect ok under those circumstances (Am i wrong?)  I'm really hesitant to 
recommend implementing a heartbeat since TCP/IP is a connected socket and must 
have something like a heartbeat at some level.

Original comment by shaun.ed...@gmail.com on 29 Apr 2013 at 1:38

GoogleCodeExporter added Priority-Medium Type-Defect auto-migrated Package-Simple_Message Stack-fanuc Stack-industrial_core Release-Hydro labels Apr 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconnect triggered by zero bytes received fails #58

Reconnect triggered by zero bytes received fails #58

GoogleCodeExporter commented Apr 7, 2015

GoogleCodeExporter commented Apr 7, 2015

GoogleCodeExporter commented Apr 7, 2015

GoogleCodeExporter commented Apr 7, 2015

GoogleCodeExporter commented Apr 7, 2015

Reconnect triggered by zero bytes received fails #58

Reconnect triggered by zero bytes received fails #58

Comments

GoogleCodeExporter commented Apr 7, 2015

GoogleCodeExporter commented Apr 7, 2015

GoogleCodeExporter commented Apr 7, 2015

GoogleCodeExporter commented Apr 7, 2015

GoogleCodeExporter commented Apr 7, 2015