[zeromq-dev] Zyre Wi-Fi Rejoin Issue

Steven Rasmussen Steve.Rasmussen at RasSimTech.com
Fri Jun 6 23:25:59 CEST 2014


At little more information:

One of the first things I tried, when the Wi-Fi connection was
re-established, was delaying sending the START message,  until after the old
messages had been received. I couldn't figure out a good time to delay, but
If I delayed it long enough, the HELLO would get through and kick off the
handshake. This made it seem to me that messages were being buffered
somewhere. 

If I just started periodically sending HELLO messages, after receiving
beacons, without removing the peer, the HELLO messages would not ever get
through.

-Steve

-----Original Message-----
From: zeromq-dev-bounces at lists.zeromq.org
[mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter Hintjens
Sent: Friday, June 6, 2014 1:18 PM
To: ZeroMQ development list
Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue

OK, I've pushed a patch that fixes it, using your workaround more or less.

I want to test this at the libzmq level, it's weird that old messages are
getting through and the new ones aren't.

-Pieter

On Fri, Jun 6, 2014 at 6:36 PM, Pieter Hintjens <ph at imatix.com> wrote:
> OK, I've reproduced the problem quite easily. Something strange with 
> messages being delivered even though the socket they're sent on is 
> torn down entirely. I'm investigating...
>
> On Fri, Jun 6, 2014 at 5:57 PM, Pieter Hintjens <ph at imatix.com> wrote:
>> OK, I'll simulate this in the code. The peers should automatically 
>> resend HELLO if they lost contact.
>>
>> No thanks needed, we enjoy making this software and use it in 
>> everything we make. :-)
>>
>> On Fri, Jun 6, 2014 at 4:12 PM, Steve Rasmussen 
>> <Steve.Rasmussen at rassimtech.com> wrote:
>>>> In principle if the connection is re-established there should be no 
>>>> new
>>> HELLO message sent.
>>>
>>> This problem occurs after the Wi-Fi connection has been down long 
>>> enough for the peers to remove each other. When the connection come 
>>> back up, as I understand it, the HELLO message is necessary to kick-off
handshaking.
>>>
>>>> Can you find a way to reproduce the problem easily?
>>> The easiest method that I've found is using a modified version of 
>>> the zpinger tool on two laptops. The modified zpinger tool is set up 
>>> to send a whisper, after a time delay, anytime it receives a whisper 
>>> from a peer. I either turn the Wi-Fi adapter off/on or move the 
>>> laptop out of range to perform the test.
>>>
>>> It seems like this may have something to do with the sockets 
>>> maintaining the TCP/IP connection during the break and then being in 
>>> a bad state when the Wi-Fi connection comes back up. Is this 
>>> possible? If so is there some way to reset the TCP/IP connection?
>>>
>>>> Thanks for taking the time to analyse the problem.
>>>
>>> I need this capability for the system I'm developing. Thank you and 
>>> your colleagues for ZeroMQ, CZMQ, Zyre, ...
>>>
>>> Regards,
>>>
>>> Steve
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: zeromq-dev-bounces at lists.zeromq.org
>>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter 
>>> Hintjens
>>> Sent: Thursday, June 5, 2014 5:22 PM
>>> To: ZeroMQ development list
>>> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>>>
>>> On Thu, Jun 5, 2014 at 5:32 PM, Steve Rasmussen 
>>> <Steve.Rasmussen at rassimtech.com> wrote:
>>>
>>>> The problem seems to be with the TCP/IP connection not the beacon. 
>>>> After a
>>> network break, the beacon reestablishes the connection, but no data 
>>> is getting through the tcp/ip connection.
>>>> It looks as if there are messages that are being buffered before 
>>>> the break
>>> and then delivered after. This prevents the "HELLO" message from 
>>> getting through. I've tried various things, but the closest the I've 
>>> come, so far, is to keep removing the peer until it is reported as 
>>> being ready. I'm doing this in the "zyre_node_require_peer" 
>>> function. If a peer exists I check to see if it is ready, 
>>> "zyre_peer_ready" and if not, I remove the peer, 
>>> "zyre_node_remove_peer". This seems to fix the problem that I'm having,
but it seems a little kludgie.
>>>
>>> Thanks for taking the time to analyse the problem.
>>>
>>> In principle if the connection is re-established there should be no 
>>> new HELLO message sent. Can you find a way to reproduce the problem
easily?
>>>
>>> Feel free to make a pull request with your change anyhow. I'm 
>>> reworking a lot of this code atm so will try to include your change 
>>> if I can reproduce the error.
>>>
>>> -Pieter
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev




More information about the zeromq-dev mailing list