[zeromq-dev] Publishers stops sending data after sometime. Subscriber is required to fix publisher

Justin Karneges justin at karneges.com
Mon Jan 29 01:56:23 CET 2018


libzmq handles automatically reconnecting, so you shouldn't call connect
again without disconnecting first.
But yes, a disconnect and then connect is a way to force a reconnect, if
that's what you're asking. This is what you'd need to do if you were
sending/receiving your own heartbeats and needed a way to reconnect
after some silence.
If TCP keepalives are working though, then that's best since libzmq will
take care of the reconnecting for you.
On Sun, Jan 28, 2018, at 4:46 PM, Ravi Joshi via zeromq-dev wrote:
> Hi,
> 
> Setting following KEEPALIVEs in ZMQ SUB, the problem seems fixed.
> Since I haven't seen any issue from few hours.> 
> ZMQ_TCP_KEEPALIVE = 1
> ZMQ_TCP_KEEPALIVE_IDLE = 30
> ZMQ_TCP_KEEPALIVE_INTVL = 5
> ZMQ_TCP_KEEPALIVE_CNT = 6
> 
> Thank you very much. Meanwhile, an alternate idea came into my mind.
> Please see the following code snippet-> ----------------------------------------------------------------------
> ----------------> while (ros::ok()) {
>     zmq::message_t msg;
>     int rc = 0;
>     try {
>         rc = zmq_socket.recv(&msg);
>     }
>     catch (zmq::error_t& e) {
>     }
>     if (rc) {
>         // do work here
>     }
>     else {
>         zmq_socket.connect(socket_address.c_str()); // re-connect
>     }
> }
> ----------------------------------------------------------------------
> ----------------> 
> The above code snippet is not tested. It just came into my mind.
> Intuitively, it looks okay to me but I am not sure if it is
> acceptable in ZMQ.> 
> What do you say?
> 
> -
> Thanks
> Ravi 
> 
> 
> On Sunday, 28 January 2018 4:37 PM, Justin Karneges
> <justin at karneges.com> wrote:> 
> 
> All that seems fine, but to get TCP keepalives to use a shorter
> timeout you'll want to set those additional options, yes.> 
> For example, in my apps I use:
> 
> ZMQ_TCP_KEEPALIVE = 1
> ZMQ_TCP_KEEPALIVE_IDLE = 30
> ZMQ_TCP_KEEPALIVE_INTVL = 5
> ZMQ_TCP_KEEPALIVE_CNT = 6
> 
> What this means is if there is 30 seconds of no I/O, then the peer
> will be pinged every 5 seconds, up to 6 times, before closing the
> connection. Thus, the connection should recover after about a minute.> 
> If you don't set these additional options, then the OS defaults are
> used, which can sometimes be hours (!) long.> 
> Justin
> 
> On Sat, Jan 27, 2018, at 9:02 PM, Ravi Joshi via zeromq-dev wrote:
>> Hi,
>> 
>> I am little confused. Let me first explain the scenario again.
>> 
>> There are 3 publishers written in C# language running on Windows 10
>> OS. On the other hand, there are 3 subscribers written in C++
>> language running on Ubuntu 14.04 LTS OS. The mapping from publisher
>> to subscriber is one to one.>> 
>> Now let me mention ZMQ socket configurations.
>> Publisher configuration-
>>     SetOption(ZSocketOption.CONFLATE, 1);
>> 
>> Subscriber configuration-
>>     socket.setsockopt(ZMQ_SUBSCRIBE, "", 0); // allow all messages
>>     socket.setsockopt(ZMQ_RCVTIMEO, &timeout, sizeof(timeout)); //
>>     int timeout = 1000>>     socket.setsockopt(ZMQ_LINGER, &linger, sizeof(linger)); // int
>>     linger = 0>>     socket.setsockopt(ZMQ_CONFLATE, &conflate, sizeof(conflate)); //
>>     int conflate = 1>>     socket.setsockopt(ZMQ_TCP_KEEPALIVE, &tcp_keepalive,
>>     sizeof(tcp_keepalive)); // int tcp_keepalive = 1>> 
>> Do the above configurations look fine? Or do you want me to change
>> and try once?>> 
>> I am confused since I am not able to find that how to set HEARTBEAT
>> in Publisher-Subscriber.  Any suggestions, please?>> 
>> Regarding ZMQ_TCP_KEEPALIVE_*, I found following three variables
>> ZMQ_TCP_KEEPALIVE_IDLE, ZMQ_TCP_KEEPALIVE_CNT, and
>> ZMQ_TCP_KEEPALIVE_INTVL. The values for these variables is not clear
>> from the documentation. Any suggestions, please?>> 
>> 
>> Thank you very much.
>> 
>> -
>> Ravi
>> 
>> On Sunday, 28 January 2018 3:10 AM, Justin Karneges
>> <justin at karneges.com> wrote:>> 
>> 
>> You'd still have to wait for the TCP keepalive to timeout the
>> connection before it will recover. On Ubuntu this might be a very
>> long time, so be sure to set all the ZMQ_TCP_KEEPALIVE_* options to
>> ensure a shorter timeout.>> 
>> On Sat, Jan 27, 2018, at 2:27 AM, Ravi Joshi via zeromq-dev wrote:
>>> Hi Justin,
>>> 
>>> I will check it using netstat.
>>> 
>>> Meanwhile, ZMQ_TCP_KEEPALIVE seems not working. I still see that
>>> after some time, Windows OS, where publishers are running, is
>>> showing 0 MBPS transmission rate. After I restart subscribers in ROS
>>> on Ubuntu, publishers start working. Please note that during this
>>> process I am not restarting publishers at all.>>> 
>>> Below is the code snippet added to all subscribers-
>>> 
>>> int tcp_keepalive = 1;
>>> 
>>> zmq_socket.setsockopt(ZMQ_TCP_KEEPALIVE, &tcp_keepalive,
>>> sizeof(tcp_keepalive));>>> 
>>> 
>>>  -
>>> Thanks
>>> Ravi
>>> 
>>> 
>>> On Saturday, 27 January 2018 5:36 PM, Justin Karneges
>>> <justin at karneges.com> wrote:>>> 
>>> 
>>> One thing you might do is run netstat on both sides to see if the
>>> connections are still listed. In a dead connection scenario, netstat
>>> should no longer list the connection on the PUB side, but should
>>> remain listing it on the SUB side.>>> 
>>> Note that it can take time for the PUB connection to give up. On
>>> Linux, the default is something like 20 minutes after it dies, so
>>> give the PUB side some extra time after messages stop transmitting.
>>> If transmission hasn't worked for over 20 minutes and netstat is
>>> still showing the connection on the PUB side, then the problem may
>>> be something else.>>> 
>>> On Sat, Jan 27, 2018, at 12:13 AM, Ravi Joshi via zeromq-dev wrote:>>>> Hi Justin,
>>>> 
>>>> Thank you very much. How do I make sure that I am getting dead
>>>> connections?>>>> 
>>>> For time being, I am enabling ZMQ_TCP_KEEPALIVE on all 3 SUB
>>>> sockets.>>>> 
>>>> I will tell you the status of it after sometime.
>>>> 
>>>> Thanks
>>>> -
>>>> Ravi
>>>> Sent from Yahoo Mail for iPhone[1]
>>>> 
>>>> On Saturday, January 27, 2018, 3:27 PM, Justin Karneges
>>>> <justin at karneges.com> wrote:>>>>> Hi,
>>>>> 
>>>>> One issue with socket types that don't usually write data (such as
>>>>> SUB) is that a dead connection might go unnoticed forever. You can
>>>>> work around this by enabling TCP keep alives on the SUB socket. I
>>>>> don't know if you're getting dead connections here but just
>>>>> thought I'd mention it.>>>>> 
>>>>> Justin
>>>>> 
>>>>> On Fri, Jan 26, 2018, at 9:33 PM, Ravi Joshi via zeromq-dev wrote:>>>>> > Hi,
>>>>> > 
>>>>> > I am using Publisher-Subscriber pattern consisting of 3
>>>>> > publishers to>>>>> > publish 3 different types of data. All 3 publishers are written
>>>>> > in a>>>>> > single C# file. However, each subscriber is written in a
>>>>> > separate C++>>>>> > file inside ROS. From the point of ZeroMQ, there is no
>>>>> > difference in>>>>> > each subscriber, since context, socket initialization and
>>>>> > receiving>>>>> > message is done in the same way for all subscriber. Hence, in
>>>>> > order to>>>>> > make the mail shorter, I am just posting code snippet of 1 
>>>>> > subscriber below.
>>>>> > 
>>>>> > The publisher code in C# snippet is available in Pastebin 
>>>>> > (https://pastebin.com/S65LmwuV).
>>>>> > The subscriber code in C++ snippet is available in Pastebin 
>>>>> > (https://pastebin.com/xb3V0n0u).
>>>>> > 
>>>>> > The publisher works well initially for some time and
>>>>> > successfully>>>>> > transmits data at 700MBPS rate but stops transmitting any data
>>>>> > after 5-6>>>>> > hours.
>>>>> > 
>>>>> > In order to make publisher working again, I need to restart the>>>>> > subscribers. This is strange to me since it is unexpected
>>>>> > behavior as>>>>> > per the Publisher-Subscriber pattern is concerned.
>>>>> > 
>>>>> > Why such weird behavior? Any workaround, please.
>>>>> > 
>>>>> > -
>>>>> > Thanks
>>>>> > Ravi
>>>>> > _______________________________________________
>>>>> > zeromq-dev mailing list
>>>>> > zeromq-dev at lists.zeromq.org
>>>>> > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>> 
>>>>> _______________________________________________
>>>>> zeromq-dev mailing list
>>>>> zeromq-dev at lists.zeromq.org
>>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>> 
>>>> _________________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> 
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> 
>>> _________________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> 
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> 
>> _________________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> 
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> 
> _________________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev


Links:

  1. https://overview.mail.yahoo.com/?.src=iOS
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20180128/9a4432d7/attachment.htm>


More information about the zeromq-dev mailing list