[zeromq-dev] Zyre Wi-Fi Rejoin Issue

Steve Rasmussen Steve.Rasmussen at RasSimTech.com
Tue Jun 10 16:02:47 CEST 2014


Hey Pieter,

That is great news! I was just getting back into this problem. I'll try out
your fixes and let you know that they work :)

Thanks again!

Regards,

Steve

-----Original Message-----
From: zeromq-dev-bounces at lists.zeromq.org
[mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter Hintjens
Sent: Tuesday, June 10, 2014 9:52 AM
To: ZeroMQ development list
Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue

Hi Steve,

I have found the cause of the WiFI rejoin issue (#200) and fixed it, I
think. The problem was old/new clients connecting with the same identity,
where the router socket incorrectly delivered messages from the old client
rather than the new one. It may be an issue in libzmq but I think rather
it's a combination of the TCP stack retrying, and delivering, old messages,
plus the router socket doing something weird with the new client connection.
I'm not quite sure where the HELLO messages disappear to...

Anyhow, the fix is to use ZMQ_ROUTER_HANDOVER in zyre_node, and there is no
need to remove peers or do other hacks. It works as we'd expect.

Pull request is on zyre master.

-Pieter

On Sat, Jun 7, 2014 at 9:26 PM, Pieter Hintjens <ph at imatix.com> wrote:
> OK, I did a simple test to try to reproduce this at the dealer-router 
> level and it doesn't happen. So it's not a libzmq issue. I'll dig 
> deeper, it has to be something in the way Zyre is managing its 
> sockets...
>
> On Fri, Jun 6, 2014 at 11:25 PM, Steven Rasmussen 
> <Steve.Rasmussen at rassimtech.com> wrote:
>> At little more information:
>>
>> One of the first things I tried, when the Wi-Fi connection was 
>> re-established, was delaying sending the START message,  until after 
>> the old messages had been received. I couldn't figure out a good time 
>> to delay, but If I delayed it long enough, the HELLO would get 
>> through and kick off the handshake. This made it seem to me that 
>> messages were being buffered somewhere.
>>
>> If I just started periodically sending HELLO messages, after 
>> receiving beacons, without removing the peer, the HELLO messages 
>> would not ever get through.
>>
>> -Steve
>>
>> -----Original Message-----
>> From: zeromq-dev-bounces at lists.zeromq.org
>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter 
>> Hintjens
>> Sent: Friday, June 6, 2014 1:18 PM
>> To: ZeroMQ development list
>> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>>
>> OK, I've pushed a patch that fixes it, using your workaround more or
less.
>>
>> I want to test this at the libzmq level, it's weird that old messages 
>> are getting through and the new ones aren't.
>>
>> -Pieter
>>
>> On Fri, Jun 6, 2014 at 6:36 PM, Pieter Hintjens <ph at imatix.com> wrote:
>>> OK, I've reproduced the problem quite easily. Something strange with 
>>> messages being delivered even though the socket they're sent on is 
>>> torn down entirely. I'm investigating...
>>>
>>> On Fri, Jun 6, 2014 at 5:57 PM, Pieter Hintjens <ph at imatix.com> wrote:
>>>> OK, I'll simulate this in the code. The peers should automatically 
>>>> resend HELLO if they lost contact.
>>>>
>>>> No thanks needed, we enjoy making this software and use it in 
>>>> everything we make. :-)
>>>>
>>>> On Fri, Jun 6, 2014 at 4:12 PM, Steve Rasmussen 
>>>> <Steve.Rasmussen at rassimtech.com> wrote:
>>>>>> In principle if the connection is re-established there should be 
>>>>>> no new
>>>>> HELLO message sent.
>>>>>
>>>>> This problem occurs after the Wi-Fi connection has been down long 
>>>>> enough for the peers to remove each other. When the connection 
>>>>> come back up, as I understand it, the HELLO message is necessary 
>>>>> to kick-off
>> handshaking.
>>>>>
>>>>>> Can you find a way to reproduce the problem easily?
>>>>> The easiest method that I've found is using a modified version of 
>>>>> the zpinger tool on two laptops. The modified zpinger tool is set 
>>>>> up to send a whisper, after a time delay, anytime it receives a 
>>>>> whisper from a peer. I either turn the Wi-Fi adapter off/on or 
>>>>> move the laptop out of range to perform the test.
>>>>>
>>>>> It seems like this may have something to do with the sockets 
>>>>> maintaining the TCP/IP connection during the break and then being 
>>>>> in a bad state when the Wi-Fi connection comes back up. Is this 
>>>>> possible? If so is there some way to reset the TCP/IP connection?
>>>>>
>>>>>> Thanks for taking the time to analyse the problem.
>>>>>
>>>>> I need this capability for the system I'm developing. Thank you 
>>>>> and your colleagues for ZeroMQ, CZMQ, Zyre, ...
>>>>>
>>>>> Regards,
>>>>>
>>>>> Steve
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: zeromq-dev-bounces at lists.zeromq.org
>>>>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter 
>>>>> Hintjens
>>>>> Sent: Thursday, June 5, 2014 5:22 PM
>>>>> To: ZeroMQ development list
>>>>> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>>>>>
>>>>> On Thu, Jun 5, 2014 at 5:32 PM, Steve Rasmussen 
>>>>> <Steve.Rasmussen at rassimtech.com> wrote:
>>>>>
>>>>>> The problem seems to be with the TCP/IP connection not the beacon.
>>>>>> After a
>>>>> network break, the beacon reestablishes the connection, but no 
>>>>> data is getting through the tcp/ip connection.
>>>>>> It looks as if there are messages that are being buffered before 
>>>>>> the break
>>>>> and then delivered after. This prevents the "HELLO" message from 
>>>>> getting through. I've tried various things, but the closest the 
>>>>> I've come, so far, is to keep removing the peer until it is 
>>>>> reported as being ready. I'm doing this in the
"zyre_node_require_peer"
>>>>> function. If a peer exists I check to see if it is ready, 
>>>>> "zyre_peer_ready" and if not, I remove the peer, 
>>>>> "zyre_node_remove_peer". This seems to fix the problem that I'm 
>>>>> having,
>> but it seems a little kludgie.
>>>>>
>>>>> Thanks for taking the time to analyse the problem.
>>>>>
>>>>> In principle if the connection is re-established there should be 
>>>>> no new HELLO message sent. Can you find a way to reproduce the 
>>>>> problem
>> easily?
>>>>>
>>>>> Feel free to make a pull request with your change anyhow. I'm 
>>>>> reworking a lot of this code atm so will try to include your 
>>>>> change if I can reproduce the error.
>>>>>
>>>>> -Pieter
>>>>> _______________________________________________
>>>>> zeromq-dev mailing list
>>>>> zeromq-dev at lists.zeromq.org
>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>
>>>>> _______________________________________________
>>>>> zeromq-dev mailing list
>>>>> zeromq-dev at lists.zeromq.org
>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev




More information about the zeromq-dev mailing list