[zeromq-dev] Zyre Wi-Fi Rejoin Issue
Steve Rasmussen
Steve.Rasmussen at RasSimTech.com
Wed Jun 11 17:45:30 CEST 2014
Hey Pieter,
This "works as we'd expect ", which is to say, great!
We implemented this type of discovery last year using multicast and tcp/ip
and it worked, ok. In December 2014, we were introduced to ZeroMQ and we
based our architecture around it. I bought a copy of your book and it was a
great help in getting started with ZeroMQ, understanding a better way of
constructing programs, and visualizing your vision for building a community.
You need to write more books :)
Thanks again for all of your help!
Best Regards,
Steve
-----Original Message-----
From: zeromq-dev-bounces at lists.zeromq.org
[mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter Hintjens
Sent: Wednesday, June 11, 2014 3:51 AM
To: ZeroMQ development list
Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
I could make it work on older libzmq versions but that'd require changing
the protocol to move away from explicit identities.
That might be more robust in any case... I'll think about that.
On Wed, Jun 11, 2014 at 1:52 AM, Steven Rasmussen
<Steve.Rasmussen at rassimtech.com> wrote:
> Yea, I figured that out, thanks.
>
> -----Original Message-----
> From: zeromq-dev-bounces at lists.zeromq.org
> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter
> Hintjens
> Sent: Tuesday, June 10, 2014 5:45 PM
> To: ZeroMQ development list
> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>
> You can't define this; you will have to use the libzmq master version
> to get this functionality.
>
> On Tue, Jun 10, 2014 at 9:24 PM, Steve Rasmussen
> <Steve.Rasmussen at rassimtech.com> wrote:
>> Hey Pieter,
>>
>> I haven't quite got this working. After I define the symbol
>> ZMQ_ROUTER_HANDOVER, I start getting the following assert:
>> lt-zpinger: zsock_option.c:82: zsock_set_router_handover: Assertion
>> `rc == 0
>> || zmq_errno () == (156384712 + 53)' failed.
>> Aborted (core dumped)
>>
>> Any ideas on what I'm doing wrong?
>>
>> Thanks,
>>
>> -Steve
>>
>> -----Original Message-----
>> From: zeromq-dev-bounces at lists.zeromq.org
>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Steve
>> Rasmussen
>> Sent: Tuesday, June 10, 2014 10:03 AM
>> To: 'ZeroMQ development list'
>> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>>
>> Hey Pieter,
>>
>> That is great news! I was just getting back into this problem. I'll
>> try out your fixes and let you know that they work :)
>>
>> Thanks again!
>>
>> Regards,
>>
>> Steve
>>
>> -----Original Message-----
>> From: zeromq-dev-bounces at lists.zeromq.org
>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter
>> Hintjens
>> Sent: Tuesday, June 10, 2014 9:52 AM
>> To: ZeroMQ development list
>> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>>
>> Hi Steve,
>>
>> I have found the cause of the WiFI rejoin issue (#200) and fixed it,
>> I think. The problem was old/new clients connecting with the same
>> identity, where the router socket incorrectly delivered messages from
>> the old client rather than the new one. It may be an issue in libzmq
>> but I think rather it's a combination of the TCP stack retrying, and
>> delivering, old messages, plus the router socket doing something
>> weird
> with the new client connection.
>> I'm not quite sure where the HELLO messages disappear to...
>>
>> Anyhow, the fix is to use ZMQ_ROUTER_HANDOVER in zyre_node, and there
>> is no need to remove peers or do other hacks. It works as we'd expect.
>>
>> Pull request is on zyre master.
>>
>> -Pieter
>>
>> On Sat, Jun 7, 2014 at 9:26 PM, Pieter Hintjens <ph at imatix.com> wrote:
>>> OK, I did a simple test to try to reproduce this at the
>>> dealer-router level and it doesn't happen. So it's not a libzmq
>>> issue. I'll dig deeper, it has to be something in the way Zyre is
>>> managing its sockets...
>>>
>>> On Fri, Jun 6, 2014 at 11:25 PM, Steven Rasmussen
>>> <Steve.Rasmussen at rassimtech.com> wrote:
>>>> At little more information:
>>>>
>>>> One of the first things I tried, when the Wi-Fi connection was
>>>> re-established, was delaying sending the START message, until
>>>> after the old messages had been received. I couldn't figure out a
>>>> good time to delay, but If I delayed it long enough, the HELLO
>>>> would get through and kick off the handshake. This made it seem to
>>>> me that messages were being buffered somewhere.
>>>>
>>>> If I just started periodically sending HELLO messages, after
>>>> receiving beacons, without removing the peer, the HELLO messages
>>>> would not ever get through.
>>>>
>>>> -Steve
>>>>
>>>> -----Original Message-----
>>>> From: zeromq-dev-bounces at lists.zeromq.org
>>>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter
>>>> Hintjens
>>>> Sent: Friday, June 6, 2014 1:18 PM
>>>> To: ZeroMQ development list
>>>> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>>>>
>>>> OK, I've pushed a patch that fixes it, using your workaround more
>>>> or
>> less.
>>>>
>>>> I want to test this at the libzmq level, it's weird that old
>>>> messages are getting through and the new ones aren't.
>>>>
>>>> -Pieter
>>>>
>>>> On Fri, Jun 6, 2014 at 6:36 PM, Pieter Hintjens <ph at imatix.com> wrote:
>>>>> OK, I've reproduced the problem quite easily. Something strange
>>>>> with messages being delivered even though the socket they're sent
>>>>> on is torn down entirely. I'm investigating...
>>>>>
>>>>> On Fri, Jun 6, 2014 at 5:57 PM, Pieter Hintjens <ph at imatix.com> wrote:
>>>>>> OK, I'll simulate this in the code. The peers should
>>>>>> automatically resend HELLO if they lost contact.
>>>>>>
>>>>>> No thanks needed, we enjoy making this software and use it in
>>>>>> everything we make. :-)
>>>>>>
>>>>>> On Fri, Jun 6, 2014 at 4:12 PM, Steve Rasmussen
>>>>>> <Steve.Rasmussen at rassimtech.com> wrote:
>>>>>>>> In principle if the connection is re-established there should
>>>>>>>> be no new
>>>>>>> HELLO message sent.
>>>>>>>
>>>>>>> This problem occurs after the Wi-Fi connection has been down
>>>>>>> long enough for the peers to remove each other. When the
>>>>>>> connection come back up, as I understand it, the HELLO message
>>>>>>> is necessary to kick-off
>>>> handshaking.
>>>>>>>
>>>>>>>> Can you find a way to reproduce the problem easily?
>>>>>>> The easiest method that I've found is using a modified version
>>>>>>> of the zpinger tool on two laptops. The modified zpinger tool is
>>>>>>> set up to send a whisper, after a time delay, anytime it
>>>>>>> receives a whisper from a peer. I either turn the Wi-Fi adapter
>>>>>>> off/on or move the laptop out of range to perform the test.
>>>>>>>
>>>>>>> It seems like this may have something to do with the sockets
>>>>>>> maintaining the TCP/IP connection during the break and then
>>>>>>> being in a bad state when the Wi-Fi connection comes back up. Is
>>>>>>> this possible? If so is there some way to reset the TCP/IP
connection?
>>>>>>>
>>>>>>>> Thanks for taking the time to analyse the problem.
>>>>>>>
>>>>>>> I need this capability for the system I'm developing. Thank you
>>>>>>> and your colleagues for ZeroMQ, CZMQ, Zyre, ...
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Steve
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: zeromq-dev-bounces at lists.zeromq.org
>>>>>>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter
>>>>>>> Hintjens
>>>>>>> Sent: Thursday, June 5, 2014 5:22 PM
>>>>>>> To: ZeroMQ development list
>>>>>>> Subject: Re: [zeromq-dev] Zyre Wi-Fi Rejoin Issue
>>>>>>>
>>>>>>> On Thu, Jun 5, 2014 at 5:32 PM, Steve Rasmussen
>>>>>>> <Steve.Rasmussen at rassimtech.com> wrote:
>>>>>>>
>>>>>>>> The problem seems to be with the TCP/IP connection not the beacon.
>>>>>>>> After a
>>>>>>> network break, the beacon reestablishes the connection, but no
>>>>>>> data is getting through the tcp/ip connection.
>>>>>>>> It looks as if there are messages that are being buffered
>>>>>>>> before the break
>>>>>>> and then delivered after. This prevents the "HELLO" message from
>>>>>>> getting through. I've tried various things, but the closest the
>>>>>>> I've come, so far, is to keep removing the peer until it is
>>>>>>> reported as being ready. I'm doing this in the
>> "zyre_node_require_peer"
>>>>>>> function. If a peer exists I check to see if it is ready,
>>>>>>> "zyre_peer_ready" and if not, I remove the peer,
>>>>>>> "zyre_node_remove_peer". This seems to fix the problem that I'm
>>>>>>> having,
>>>> but it seems a little kludgie.
>>>>>>>
>>>>>>> Thanks for taking the time to analyse the problem.
>>>>>>>
>>>>>>> In principle if the connection is re-established there should be
>>>>>>> no new HELLO message sent. Can you find a way to reproduce the
>>>>>>> problem
>>>> easily?
>>>>>>>
>>>>>>> Feel free to make a pull request with your change anyhow. I'm
>>>>>>> reworking a lot of this code atm so will try to include your
>>>>>>> change if I can reproduce the error.
>>>>>>>
>>>>>>> -Pieter
>>>>>>> _______________________________________________
>>>>>>> zeromq-dev mailing list
>>>>>>> zeromq-dev at lists.zeromq.org
>>>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> zeromq-dev mailing list
>>>>>>> zeromq-dev at lists.zeromq.org
>>>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
More information about the zeromq-dev
mailing list