[zeromq-dev] Zyre Wi-Fi Rejoin Issue

Pieter Hintjens ph at imatix.com
Sat Jun 7 21:26:55 CEST 2014


OK, I did a simple test to try to reproduce this at the dealer-router
level and it doesn't happen. So it's not a libzmq issue. I'll dig
deeper, it has to be something in the way Zyre is managing its
sockets...

On Fri, Jun 6, 2014 at 11:25 PM, Steven Rasmussen
<Steve.Rasmussen at rassimtech.com> wrote:
> At little more information:
>
> One of the first things I tried, when the Wi-Fi connection was
> re-established, was delaying sending the START message,  until after the old
> messages had been received. I couldn't figure out a good time to delay, but
> If I delayed it long enough, the HELLO would get through and kick off the
> handshake. This made it seem to me that messages were being buffered
> somewhere.
>
> If I just started periodically sending HELLO messages, after receiving
> beacons, without removing the peer, the HELLO messages would not ever get
> through.
>
> -Steve
>
> -----Original Message-----
> From: zeromq-dev-bounces at lists.zeromq.org
> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter Hintjens
> Sent: Friday, June 6, 2014 1:18 PM
> To: ZeroMQ development list
> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>
> OK, I've pushed a patch that fixes it, using your workaround more or less.
>
> I want to test this at the libzmq level, it's weird that old messages are
> getting through and the new ones aren't.
>
> -Pieter
>
> On Fri, Jun 6, 2014 at 6:36 PM, Pieter Hintjens <ph at imatix.com> wrote:
>> OK, I've reproduced the problem quite easily. Something strange with
>> messages being delivered even though the socket they're sent on is
>> torn down entirely. I'm investigating...
>>
>> On Fri, Jun 6, 2014 at 5:57 PM, Pieter Hintjens <ph at imatix.com> wrote:
>>> OK, I'll simulate this in the code. The peers should automatically
>>> resend HELLO if they lost contact.
>>>
>>> No thanks needed, we enjoy making this software and use it in
>>> everything we make. :-)
>>>
>>> On Fri, Jun 6, 2014 at 4:12 PM, Steve Rasmussen
>>> <Steve.Rasmussen at rassimtech.com> wrote:
>>>>> In principle if the connection is re-established there should be no
>>>>> new
>>>> HELLO message sent.
>>>>
>>>> This problem occurs after the Wi-Fi connection has been down long
>>>> enough for the peers to remove each other. When the connection come
>>>> back up, as I understand it, the HELLO message is necessary to kick-off
> handshaking.
>>>>
>>>>> Can you find a way to reproduce the problem easily?
>>>> The easiest method that I've found is using a modified version of
>>>> the zpinger tool on two laptops. The modified zpinger tool is set up
>>>> to send a whisper, after a time delay, anytime it receives a whisper
>>>> from a peer. I either turn the Wi-Fi adapter off/on or move the
>>>> laptop out of range to perform the test.
>>>>
>>>> It seems like this may have something to do with the sockets
>>>> maintaining the TCP/IP connection during the break and then being in
>>>> a bad state when the Wi-Fi connection comes back up. Is this
>>>> possible? If so is there some way to reset the TCP/IP connection?
>>>>
>>>>> Thanks for taking the time to analyse the problem.
>>>>
>>>> I need this capability for the system I'm developing. Thank you and
>>>> your colleagues for ZeroMQ, CZMQ, Zyre, ...
>>>>
>>>> Regards,
>>>>
>>>> Steve
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: zeromq-dev-bounces at lists.zeromq.org
>>>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter
>>>> Hintjens
>>>> Sent: Thursday, June 5, 2014 5:22 PM
>>>> To: ZeroMQ development list
>>>> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>>>>
>>>> On Thu, Jun 5, 2014 at 5:32 PM, Steve Rasmussen
>>>> <Steve.Rasmussen at rassimtech.com> wrote:
>>>>
>>>>> The problem seems to be with the TCP/IP connection not the beacon.
>>>>> After a
>>>> network break, the beacon reestablishes the connection, but no data
>>>> is getting through the tcp/ip connection.
>>>>> It looks as if there are messages that are being buffered before
>>>>> the break
>>>> and then delivered after. This prevents the "HELLO" message from
>>>> getting through. I've tried various things, but the closest the I've
>>>> come, so far, is to keep removing the peer until it is reported as
>>>> being ready. I'm doing this in the "zyre_node_require_peer"
>>>> function. If a peer exists I check to see if it is ready,
>>>> "zyre_peer_ready" and if not, I remove the peer,
>>>> "zyre_node_remove_peer". This seems to fix the problem that I'm having,
> but it seems a little kludgie.
>>>>
>>>> Thanks for taking the time to analyse the problem.
>>>>
>>>> In principle if the connection is re-established there should be no
>>>> new HELLO message sent. Can you find a way to reproduce the problem
> easily?
>>>>
>>>> Feel free to make a pull request with your change anyhow. I'm
>>>> reworking a lot of this code atm so will try to include your change
>>>> if I can reproduce the error.
>>>>
>>>> -Pieter
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev



More information about the zeromq-dev mailing list