[zeromq-dev] Zyre Wi-Fi Rejoin Issue

Steve Rasmussen Steve.Rasmussen at RasSimTech.com
Wed Jun 11 17:59:53 CEST 2014


Sounds good, I'll order it and look forward to reading it. 
-Steve

-----Original Message-----
From: zeromq-dev-bounces at lists.zeromq.org
[mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter Hintjens
Sent: Wednesday, June 11, 2014 11:50 AM
To: ZeroMQ development list
Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue

Hi Steve,

I'm glad it works. :-) All this testing has been very good for Zyre.

If you want to read my next book, it's "Culture & Empire: Digital
Revolution", in paper and electrons from Amazon.com.

-Pieter


On Wed, Jun 11, 2014 at 5:45 PM, Steve Rasmussen
<Steve.Rasmussen at rassimtech.com> wrote:
> Hey Pieter,
>
> This "works as we'd expect ", which is to say, great!
>
> We implemented this type of discovery last year using multicast and 
> tcp/ip and it worked, ok. In December 2014, we were introduced to 
> ZeroMQ and we based our architecture around it. I bought a copy of 
> your book and it was a great help in getting started with ZeroMQ, 
> understanding a better way of constructing programs, and visualizing your
vision for building a community.
> You need to write more books :)
>
> Thanks again for all of your help!
>
> Best Regards,
>
> Steve
>
> -----Original Message-----
> From: zeromq-dev-bounces at lists.zeromq.org
> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter 
> Hintjens
> Sent: Wednesday, June 11, 2014 3:51 AM
> To: ZeroMQ development list
> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>
> I could make it work on older libzmq versions but that'd require 
> changing the protocol to move away from explicit identities.
>
> That might be more robust in any case... I'll think about that.
>
> On Wed, Jun 11, 2014 at 1:52 AM, Steven Rasmussen 
> <Steve.Rasmussen at rassimtech.com> wrote:
>> Yea, I figured that out, thanks.
>>
>> -----Original Message-----
>> From: zeromq-dev-bounces at lists.zeromq.org
>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter 
>> Hintjens
>> Sent: Tuesday, June 10, 2014 5:45 PM
>> To: ZeroMQ development list
>> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>>
>> You can't define this; you will have to use the libzmq master version 
>> to get this functionality.
>>
>> On Tue, Jun 10, 2014 at 9:24 PM, Steve Rasmussen 
>> <Steve.Rasmussen at rassimtech.com> wrote:
>>> Hey Pieter,
>>>
>>> I haven't quite got this working. After I define the symbol 
>>> ZMQ_ROUTER_HANDOVER, I start getting the following assert:
>>> lt-zpinger: zsock_option.c:82: zsock_set_router_handover: Assertion 
>>> `rc == 0
>>> || zmq_errno () == (156384712 + 53)' failed.
>>> Aborted (core dumped)
>>>
>>> Any ideas on what I'm doing wrong?
>>>
>>> Thanks,
>>>
>>> -Steve
>>>
>>> -----Original Message-----
>>> From: zeromq-dev-bounces at lists.zeromq.org
>>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Steve 
>>> Rasmussen
>>> Sent: Tuesday, June 10, 2014 10:03 AM
>>> To: 'ZeroMQ development list'
>>> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>>>
>>> Hey Pieter,
>>>
>>> That is great news! I was just getting back into this problem. I'll 
>>> try out your fixes and let you know that they work :)
>>>
>>> Thanks again!
>>>
>>> Regards,
>>>
>>> Steve
>>>
>>> -----Original Message-----
>>> From: zeromq-dev-bounces at lists.zeromq.org
>>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter 
>>> Hintjens
>>> Sent: Tuesday, June 10, 2014 9:52 AM
>>> To: ZeroMQ development list
>>> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>>>
>>> Hi Steve,
>>>
>>> I have found the cause of the WiFI rejoin issue (#200) and fixed it, 
>>> I think. The problem was old/new clients connecting with the same 
>>> identity, where the router socket incorrectly delivered messages 
>>> from the old client rather than the new one. It may be an issue in 
>>> libzmq but I think rather it's a combination of the TCP stack 
>>> retrying, and delivering, old messages, plus the router socket doing 
>>> something weird
>> with the new client connection.
>>> I'm not quite sure where the HELLO messages disappear to...
>>>
>>> Anyhow, the fix is to use ZMQ_ROUTER_HANDOVER in zyre_node, and 
>>> there is no need to remove peers or do other hacks. It works as we'd
expect.
>>>
>>> Pull request is on zyre master.
>>>
>>> -Pieter
>>>
>>> On Sat, Jun 7, 2014 at 9:26 PM, Pieter Hintjens <ph at imatix.com> wrote:
>>>> OK, I did a simple test to try to reproduce this at the 
>>>> dealer-router level and it doesn't happen. So it's not a libzmq 
>>>> issue. I'll dig deeper, it has to be something in the way Zyre is 
>>>> managing its sockets...
>>>>
>>>> On Fri, Jun 6, 2014 at 11:25 PM, Steven Rasmussen 
>>>> <Steve.Rasmussen at rassimtech.com> wrote:
>>>>> At little more information:
>>>>>
>>>>> One of the first things I tried, when the Wi-Fi connection was 
>>>>> re-established, was delaying sending the START message,  until 
>>>>> after the old messages had been received. I couldn't figure out a 
>>>>> good time to delay, but If I delayed it long enough, the HELLO 
>>>>> would get through and kick off the handshake. This made it seem to 
>>>>> me that messages were being buffered somewhere.
>>>>>
>>>>> If I just started periodically sending HELLO messages, after 
>>>>> receiving beacons, without removing the peer, the HELLO messages 
>>>>> would not ever get through.
>>>>>
>>>>> -Steve
>>>>>
>>>>> -----Original Message-----
>>>>> From: zeromq-dev-bounces at lists.zeromq.org
>>>>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter 
>>>>> Hintjens
>>>>> Sent: Friday, June 6, 2014 1:18 PM
>>>>> To: ZeroMQ development list
>>>>> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>>>>>
>>>>> OK, I've pushed a patch that fixes it, using your workaround more 
>>>>> or
>>> less.
>>>>>
>>>>> I want to test this at the libzmq level, it's weird that old 
>>>>> messages are getting through and the new ones aren't.
>>>>>
>>>>> -Pieter
>>>>>
>>>>> On Fri, Jun 6, 2014 at 6:36 PM, Pieter Hintjens <ph at imatix.com> wrote:
>>>>>> OK, I've reproduced the problem quite easily. Something strange 
>>>>>> with messages being delivered even though the socket they're sent 
>>>>>> on is torn down entirely. I'm investigating...
>>>>>>
>>>>>> On Fri, Jun 6, 2014 at 5:57 PM, Pieter Hintjens <ph at imatix.com>
wrote:
>>>>>>> OK, I'll simulate this in the code. The peers should 
>>>>>>> automatically resend HELLO if they lost contact.
>>>>>>>
>>>>>>> No thanks needed, we enjoy making this software and use it in 
>>>>>>> everything we make. :-)
>>>>>>>
>>>>>>> On Fri, Jun 6, 2014 at 4:12 PM, Steve Rasmussen 
>>>>>>> <Steve.Rasmussen at rassimtech.com> wrote:
>>>>>>>>> In principle if the connection is re-established there should 
>>>>>>>>> be no new
>>>>>>>> HELLO message sent.
>>>>>>>>
>>>>>>>> This problem occurs after the Wi-Fi connection has been down 
>>>>>>>> long enough for the peers to remove each other. When the 
>>>>>>>> connection come back up, as I understand it, the HELLO message 
>>>>>>>> is necessary to kick-off
>>>>> handshaking.
>>>>>>>>
>>>>>>>>> Can you find a way to reproduce the problem easily?
>>>>>>>> The easiest method that I've found is using a modified version 
>>>>>>>> of the zpinger tool on two laptops. The modified zpinger tool 
>>>>>>>> is set up to send a whisper, after a time delay, anytime it 
>>>>>>>> receives a whisper from a peer. I either turn the Wi-Fi adapter 
>>>>>>>> off/on or move the laptop out of range to perform the test.
>>>>>>>>
>>>>>>>> It seems like this may have something to do with the sockets 
>>>>>>>> maintaining the TCP/IP connection during the break and then 
>>>>>>>> being in a bad state when the Wi-Fi connection comes back up. 
>>>>>>>> Is this possible? If so is there some way to reset the TCP/IP
> connection?
>>>>>>>>
>>>>>>>>> Thanks for taking the time to analyse the problem.
>>>>>>>>
>>>>>>>> I need this capability for the system I'm developing. Thank you 
>>>>>>>> and your colleagues for ZeroMQ, CZMQ, Zyre, ...
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Steve
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: zeromq-dev-bounces at lists.zeromq.org
>>>>>>>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of 
>>>>>>>> Pieter Hintjens
>>>>>>>> Sent: Thursday, June 5, 2014 5:22 PM
>>>>>>>> To: ZeroMQ development list
>>>>>>>> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>>>>>>>>
>>>>>>>> On Thu, Jun 5, 2014 at 5:32 PM, Steve Rasmussen 
>>>>>>>> <Steve.Rasmussen at rassimtech.com> wrote:
>>>>>>>>
>>>>>>>>> The problem seems to be with the TCP/IP connection not the beacon.
>>>>>>>>> After a
>>>>>>>> network break, the beacon reestablishes the connection, but no 
>>>>>>>> data is getting through the tcp/ip connection.
>>>>>>>>> It looks as if there are messages that are being buffered 
>>>>>>>>> before the break
>>>>>>>> and then delivered after. This prevents the "HELLO" message 
>>>>>>>> from getting through. I've tried various things, but the 
>>>>>>>> closest the I've come, so far, is to keep removing the peer 
>>>>>>>> until it is reported as being ready. I'm doing this in the
>>> "zyre_node_require_peer"
>>>>>>>> function. If a peer exists I check to see if it is ready, 
>>>>>>>> "zyre_peer_ready" and if not, I remove the peer, 
>>>>>>>> "zyre_node_remove_peer". This seems to fix the problem that I'm 
>>>>>>>> having,
>>>>> but it seems a little kludgie.
>>>>>>>>
>>>>>>>> Thanks for taking the time to analyse the problem.
>>>>>>>>
>>>>>>>> In principle if the connection is re-established there should 
>>>>>>>> be no new HELLO message sent. Can you find a way to reproduce 
>>>>>>>> the problem
>>>>> easily?
>>>>>>>>
>>>>>>>> Feel free to make a pull request with your change anyhow. I'm 
>>>>>>>> reworking a lot of this code atm so will try to include your 
>>>>>>>> change if I can reproduce the error.
>>>>>>>>
>>>>>>>> -Pieter
>>>>>>>> _______________________________________________
>>>>>>>> zeromq-dev mailing list
>>>>>>>> zeromq-dev at lists.zeromq.org
>>>>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> zeromq-dev mailing list
>>>>>>>> zeromq-dev at lists.zeromq.org
>>>>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>> _______________________________________________
>>>>> zeromq-dev mailing list
>>>>> zeromq-dev at lists.zeromq.org
>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>
>>>>> _______________________________________________
>>>>> zeromq-dev mailing list
>>>>> zeromq-dev at lists.zeromq.org
>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev




More information about the zeromq-dev mailing list