[zeromq-dev] Zyre Wi-Fi Rejoin Issue

Pieter Hintjens ph at imatix.com
Tue Jun 10 15:52:04 CEST 2014


Hi Steve,

I have found the cause of the WiFI rejoin issue (#200) and fixed it, I
think. The problem was old/new clients connecting with the same
identity, where the router socket incorrectly delivered messages from
the old client rather than the new one. It may be an issue in libzmq
but I think rather it's a combination of the TCP stack retrying, and
delivering, old messages, plus the router socket doing something weird
with the new client connection. I'm not quite sure where the HELLO
messages disappear to...

Anyhow, the fix is to use ZMQ_ROUTER_HANDOVER in zyre_node, and there
is no need to remove peers or do other hacks. It works as we'd expect.

Pull request is on zyre master.

-Pieter

On Sat, Jun 7, 2014 at 9:26 PM, Pieter Hintjens <ph at imatix.com> wrote:
> OK, I did a simple test to try to reproduce this at the dealer-router
> level and it doesn't happen. So it's not a libzmq issue. I'll dig
> deeper, it has to be something in the way Zyre is managing its
> sockets...
>
> On Fri, Jun 6, 2014 at 11:25 PM, Steven Rasmussen
> <Steve.Rasmussen at rassimtech.com> wrote:
>> At little more information:
>>
>> One of the first things I tried, when the Wi-Fi connection was
>> re-established, was delaying sending the START message,  until after the old
>> messages had been received. I couldn't figure out a good time to delay, but
>> If I delayed it long enough, the HELLO would get through and kick off the
>> handshake. This made it seem to me that messages were being buffered
>> somewhere.
>>
>> If I just started periodically sending HELLO messages, after receiving
>> beacons, without removing the peer, the HELLO messages would not ever get
>> through.
>>
>> -Steve
>>
>> -----Original Message-----
>> From: zeromq-dev-bounces at lists.zeromq.org
>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter Hintjens
>> Sent: Friday, June 6, 2014 1:18 PM
>> To: ZeroMQ development list
>> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>>
>> OK, I've pushed a patch that fixes it, using your workaround more or less.
>>
>> I want to test this at the libzmq level, it's weird that old messages are
>> getting through and the new ones aren't.
>>
>> -Pieter
>>
>> On Fri, Jun 6, 2014 at 6:36 PM, Pieter Hintjens <ph at imatix.com> wrote:
>>> OK, I've reproduced the problem quite easily. Something strange with
>>> messages being delivered even though the socket they're sent on is
>>> torn down entirely. I'm investigating...
>>>
>>> On Fri, Jun 6, 2014 at 5:57 PM, Pieter Hintjens <ph at imatix.com> wrote:
>>>> OK, I'll simulate this in the code. The peers should automatically
>>>> resend HELLO if they lost contact.
>>>>
>>>> No thanks needed, we enjoy making this software and use it in
>>>> everything we make. :-)
>>>>
>>>> On Fri, Jun 6, 2014 at 4:12 PM, Steve Rasmussen
>>>> <Steve.Rasmussen at rassimtech.com> wrote:
>>>>>> In principle if the connection is re-established there should be no
>>>>>> new
>>>>> HELLO message sent.
>>>>>
>>>>> This problem occurs after the Wi-Fi connection has been down long
>>>>> enough for the peers to remove each other. When the connection come
>>>>> back up, as I understand it, the HELLO message is necessary to kick-off
>> handshaking.
>>>>>
>>>>>> Can you find a way to reproduce the problem easily?
>>>>> The easiest method that I've found is using a modified version of
>>>>> the zpinger tool on two laptops. The modified zpinger tool is set up
>>>>> to send a whisper, after a time delay, anytime it receives a whisper
>>>>> from a peer. I either turn the Wi-Fi adapter off/on or move the
>>>>> laptop out of range to perform the test.
>>>>>
>>>>> It seems like this may have something to do with the sockets
>>>>> maintaining the TCP/IP connection during the break and then being in
>>>>> a bad state when the Wi-Fi connection comes back up. Is this
>>>>> possible? If so is there some way to reset the TCP/IP connection?
>>>>>
>>>>>> Thanks for taking the time to analyse the problem.
>>>>>
>>>>> I need this capability for the system I'm developing. Thank you and
>>>>> your colleagues for ZeroMQ, CZMQ, Zyre, ...
>>>>>
>>>>> Regards,
>>>>>
>>>>> Steve
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: zeromq-dev-bounces at lists.zeromq.org
>>>>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter
>>>>> Hintjens
>>>>> Sent: Thursday, June 5, 2014 5:22 PM
>>>>> To: ZeroMQ development list
>>>>> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>>>>>
>>>>> On Thu, Jun 5, 2014 at 5:32 PM, Steve Rasmussen
>>>>> <Steve.Rasmussen at rassimtech.com> wrote:
>>>>>
>>>>>> The problem seems to be with the TCP/IP connection not the beacon.
>>>>>> After a
>>>>> network break, the beacon reestablishes the connection, but no data
>>>>> is getting through the tcp/ip connection.
>>>>>> It looks as if there are messages that are being buffered before
>>>>>> the break
>>>>> and then delivered after. This prevents the "HELLO" message from
>>>>> getting through. I've tried various things, but the closest the I've
>>>>> come, so far, is to keep removing the peer until it is reported as
>>>>> being ready. I'm doing this in the "zyre_node_require_peer"
>>>>> function. If a peer exists I check to see if it is ready,
>>>>> "zyre_peer_ready" and if not, I remove the peer,
>>>>> "zyre_node_remove_peer". This seems to fix the problem that I'm having,
>> but it seems a little kludgie.
>>>>>
>>>>> Thanks for taking the time to analyse the problem.
>>>>>
>>>>> In principle if the connection is re-established there should be no
>>>>> new HELLO message sent. Can you find a way to reproduce the problem
>> easily?
>>>>>
>>>>> Feel free to make a pull request with your change anyhow. I'm
>>>>> reworking a lot of this code atm so will try to include your change
>>>>> if I can reproduce the error.
>>>>>
>>>>> -Pieter
>>>>> _______________________________________________
>>>>> zeromq-dev mailing list
>>>>> zeromq-dev at lists.zeromq.org
>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>
>>>>> _______________________________________________
>>>>> zeromq-dev mailing list
>>>>> zeromq-dev at lists.zeromq.org
>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev



More information about the zeromq-dev mailing list