[zeromq-dev] Publishers stops sending data after sometime. Subscriber is required to fix publisher

Ravi Joshi ravi.joshi53 at yahoo.com
Mon Jan 29 01:59:01 CET 2018


Thank you very much. I really appreciate your kind support.
-Ravi 

    On Monday, 29 January 2018 9:56 AM, Justin Karneges <justin at karneges.com> wrote:
 

 #yiv9786610205 p.yiv9786610205MsoNormal, #yiv9786610205 p.yiv9786610205MsoNoSpacing{margin:0;}libzmq handles automatically reconnecting, so you shouldn't call connect again without disconnecting first.

But yes, a disconnect and then connect is a way to force a reconnect, if that's what you're asking. This is what you'd need to do if you were sending/receiving your own heartbeats and needed a way to reconnect after some silence.

If TCP keepalives are working though, then that's best since libzmq will take care of the reconnecting for you.

On Sun, Jan 28, 2018, at 4:46 PM, Ravi Joshi via zeromq-dev wrote:

Hi,

Setting following KEEPALIVEs in ZMQ SUB, the problem seems fixed. Since I haven't seen any issue from few hours.

ZMQ_TCP_KEEPALIVE = 1
ZMQ_TCP_KEEPALIVE_IDLE = 30
ZMQ_TCP_KEEPALIVE_INTVL = 5
ZMQ_TCP_KEEPALIVE_CNT = 6

Thank you very much. Meanwhile, an alternate idea came into my mind. Please see the following code snippet-
--------------------------------------------------------------------------------------
while (ros::ok()) {
    zmq::message_t msg;
    int rc = 0;
    try {
        rc = zmq_socket.recv(&msg);
    }
    catch (zmq::error_t& e) {
    }
    if (rc) {
        // do work here
    }
    else {
        zmq_socket.connect(socket_address.c_str()); // re-connect
    }
}
--------------------------------------------------------------------------------------

The above code snippet is not tested. It just came into my mind. Intuitively, it looks okay to me but I am not sure if it is acceptable in ZMQ.

What do you say?

-
Thanks
Ravi 


On Sunday, 28 January 2018 4:37 PM, Justin Karneges <justin at karneges.com> wrote:


All that seems fine, but to get TCP keepalives to use a shorter timeout you'll want to set those additional options, yes.

For example, in my apps I use:

ZMQ_TCP_KEEPALIVE = 1
ZMQ_TCP_KEEPALIVE_IDLE = 30
ZMQ_TCP_KEEPALIVE_INTVL = 5
ZMQ_TCP_KEEPALIVE_CNT = 6

What this means is if there is 30 seconds of no I/O, then the peer will be pinged every 5 seconds, up to 6 times, before closing the connection. Thus, the connection should recover after about a minute.

If you don't set these additional options, then the OS defaults are used, which can sometimes be hours (!) long.

Justin

On Sat, Jan 27, 2018, at 9:02 PM, Ravi Joshi via zeromq-dev wrote:

Hi,

I am little confused. Let me first explain the scenario again.

There are 3 publishers written in C# language running on Windows 10 OS. On the other hand, there are 3 subscribers written in C++ language running on Ubuntu 14.04 LTS OS. The mapping from publisher to subscriber is one to one.

Now let me mention ZMQ socket configurations.
Publisher configuration-
    SetOption(ZSocketOption.CONFLATE, 1);

Subscriber configuration-
    socket.setsockopt(ZMQ_SUBSCRIBE, "", 0); // allow all messages
    socket.setsockopt(ZMQ_RCVTIMEO, &timeout, sizeof(timeout)); // int timeout = 1000
    socket.setsockopt(ZMQ_LINGER, &linger, sizeof(linger)); // int linger = 0
    socket.setsockopt(ZMQ_CONFLATE, &conflate, sizeof(conflate)); // int conflate = 1
    socket.setsockopt(ZMQ_TCP_KEEPALIVE, &tcp_keepalive, sizeof(tcp_keepalive)); // int tcp_keepalive = 1

Do the above configurations look fine? Or do you want me to change and try once?

I am confused since I am not able to find that how to set HEARTBEAT in Publisher-Subscriber.  Any suggestions, please?

Regarding ZMQ_TCP_KEEPALIVE_*, I found following three variables ZMQ_TCP_KEEPALIVE_IDLE, ZMQ_TCP_KEEPALIVE_CNT, and ZMQ_TCP_KEEPALIVE_INTVL. The values for these variables is not clear from the documentation. Any suggestions, please?


Thank you very much.

-
Ravi

On Sunday, 28 January 2018 3:10 AM, Justin Karneges <justin at karneges.com> wrote:


You'd still have to wait for the TCP keepalive to timeout the connection before it will recover. On Ubuntu this might be a very long time, so be sure to set all the ZMQ_TCP_KEEPALIVE_* options to ensure a shorter timeout.

On Sat, Jan 27, 2018, at 2:27 AM, Ravi Joshi via zeromq-dev wrote:

Hi Justin,

I will check it using netstat.

Meanwhile, ZMQ_TCP_KEEPALIVE seems not working. I still see that after some time, Windows OS, where publishers are running, is showing 0 MBPS transmission rate. After I restart subscribers in ROS on Ubuntu, publishers start working. Please note that during this process I am not restarting publishers at all.

Below is the code snippet added to all subscribers-

int tcp_keepalive = 1;

zmq_socket.setsockopt(ZMQ_TCP_KEEPALIVE, &tcp_keepalive, sizeof(tcp_keepalive));


 -
Thanks
Ravi


On Saturday, 27 January 2018 5:36 PM, Justin Karneges <justin at karneges.com> wrote:


One thing you might do is run netstat on both sides to see if the connections are still listed. In a dead connection scenario, netstat should no longer list the connection on the PUB side, but should remain listing it on the SUB side.

Note that it can take time for the PUB connection to give up. On Linux, the default is something like 20 minutes after it dies, so give the PUB side some extra time after messages stop transmitting. If transmission hasn't worked for over 20 minutes and netstat is still showing the connection on the PUB side, then the problem may be something else.

On Sat, Jan 27, 2018, at 12:13 AM, Ravi Joshi via zeromq-dev wrote:

Hi Justin,

Thank you very much. How do I make sure that I am getting dead connections?

For time being, I am enabling ZMQ_TCP_KEEPALIVE on all 3 SUB sockets.

I will tell you the status of it after sometime.

Thanks
-
Ravi
Sent from Yahoo Mail for iPhone

On Saturday, January 27, 2018, 3:27 PM, Justin Karneges <justin at karneges.com> wrote:

Hi,

One issue with socket types that don't usually write data (such as SUB) is that a dead connection might go unnoticed forever. You can work around this by enabling TCP keep alives on the SUB socket. I don't know if you're getting dead connections here but just thought I'd mention it.

Justin

On Fri, Jan 26, 2018, at 9:33 PM, Ravi Joshi via zeromq-dev wrote:
> Hi,
> 
> I am using Publisher-Subscriber pattern consisting of 3 publishers to 
> publish 3 different types of data. All 3 publishers are written in a 
> single C# file. However, each subscriber is written in a separate C++ 
> file inside ROS. From the point of ZeroMQ, there is no difference in 
> each subscriber, since context, socket initialization and receiving 
> message is done in the same way for all subscriber. Hence, in order to 
> make the mail shorter, I am just posting code snippet of 1 
> subscriber below.
> 
> The publisher code in C# snippet is available in Pastebin 
> (https://pastebin.com/S65LmwuV).
> The subscriber code in C++ snippet is available in Pastebin 
> (https://pastebin.com/xb3V0n0u).
> 
> The publisher works well initially for some time and successfully 
> transmits data at 700MBPS rate but stops transmitting any data after 5-6 
> hours.
> 
> In order to make publisher working again, I need to restart the 
> subscribers. This is strange to me since it is unexpected behavior as 
> per the Publisher-Subscriber pattern is concerned.
> 
> Why such weird behavior? Any workaround, please.
> 
> -
> Thanks
> Ravi
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev

_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev




_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev


_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev

_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev


_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev

_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev


_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev

_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev


_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev


   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20180129/23a7200e/attachment.htm>


More information about the zeromq-dev mailing list