[zeromq-dev] ZRE/Zbeacon improvement proposal

Arnaud Loonstra arnaud at sphaero.org
Sat Oct 26 23:54:01 CEST 2013


On 10/26/2013 12:09 PM, Pieter Hintjens wrote:
> On Sat, Oct 26, 2013 at 10:53 AM, Arnaud Loonstra <arnaud at sphaero.org> wrote:
>
>> 1 - Only one node can run on a host (or one per interface)
>> 1 - The issue is caused by using a broadcast socket. It's possible to
>> bind multiple programs to the same address and port but packets are then
>> load balanced between the sockets.
>
> Is this right? My understanding (and test results) were that when
> multiple listeners bound to the same socket, they all received the
> incoming messages. The Zyre load tester uses dozens of nodes on the
> same host:port. They're definitely all getting each others' packets.
>

I did some tests again and I think you are right. I must have mistakenly 
used this new socket option:
https://lwn.net/Articles/542629/ which is actually an option for load 
balancing. Sorry 'bout that.

>> 3 - Nodes are trying to connect to dead peers (peers that are gone)
>
> This is normal in some senses. ZRE was designed for WiFi and under
> load, clients can come and go randomly. To get a resilient network you
> need to be optimistic about connecting, and slower to discard peers as
> "dead".
>
> Multicast would be fine; the Android issue is afaik already solved in
> Zyre through the tactic of double handshaking on the TCP connection.
> I.e. if A can see B, and B cannot see A, then A will connect over TCP
> to B and then B will reconnect back to A over TCP (not having seen it
> before).
>
> We can make multicast a configurable option on beacons. I'd not change
> the existing default since it'll break applications.
>

I can understand that.

> I'm not so sure about using multicast to cross segments; it might be
> better to do this explicitly with forwarding, i.e. when a node has two
> network interfaces, it bridges traffic between them.
>

Well, multicast is designed for it. So if multicast is an option next to 
broadcast you would get it for free. :) Just set the TTL to 1 by default 
if you want to be safe. Of course having a zeromq router to handle it 
could be an option.

>> 3 - What I observed was nodes not knowing when a peer node was gone
>> trying to connect to the peer node. This is obviously to no success. It
>> also results in unnecessary traffic and handling. I would propose to
>> embody a node state identifier inside the beacon. This way when a node
>> exits it can send its exit state before terminating and other nodes have
>> the possibility to pick this up. This would also be a welcome feature to
>> inform nodes of each other state. For example when a node is overloaded
>> it could inform about this using its state. A broker can leave him alone
>> for a while. I think this could be useful in general.
>
> UDP broadcasts / multicasts are not reliable and are the first things
> to be lost when the network is stressed (which is when you get client
> disconnections).
>
> It would not be wise to try to use these for state propagation.
>
> Please consider zbeacon within the context of a full protocol such as
> ZRE, which builds a TCP cluster on top of the zbeacon discovery. It
> would be a shame to start mixing abstractions.
>

Agreed, state propagation is of course not essential. But wouldn't you 
agree broadcasting on exit would be better than determining a nodes 
state by trying to connect to it? Sending an exit message through the 
TCP handshake would possibly take too long when using lots of nodes? 
Broadcast on exit would work stress preventing. I've seen this a lot 
with OSPF in which detection of dead peers is taking too long and so 
traffic is dropped.

>> - For passing multiple network segments multicast is often not possible.
>> Crossing internet using multicast is nowhere available, to my knowledge.
>
> Indeed. I do not like using multicast for its ability to leak across
> networks; that's so hard to get right and can lead to such a mess.
> Much better IMO to see 1-segment UDP as one mechanism for discovery,
> tied together with internetwork discovery over TCP (which would work
> across the Internet).
>
> One step at a time. Could you think about not adding state to beacons
> and instead looking at how ZRE does its interconnect. This will also
> help you understand how to do other forms of discovery (e.g. an
> application could simply tell a node, via the API, "here is a node at
> hostname:port".
>

Will do :), I'm replying to your other mail in here as well:

 > Do you have a simple example case we could work through? Preferably
 > something real, not theoretical.

Our use case is for an orchestration system. Currently we are seeing a 
lot of creative coding applications 
(PureData/MaxMSP/Blender/Isadora/VVVV/etc) Most inter-application 
communication is done using OSC protocols (UDP). This works great in a 
lot of cases however since this is often hardcoded into the applications 
it is not very flexible.

We are currently looking into a newer approach in which we foresee a 
protocol that can exchange all the meta data of these systems (nodes) so 
they are easier orchestrated. We had a prototype that does almost the 
same as ZRE. One thing we really liked in the prototype was that when 
nodes would change state (ie. exit, which they do often) all other nodes 
would know instantly. So a video stream was stopped being send to a node 
immediately. This was done using multicast.

For this protocol we are looking into keeping everything is as simple 
and efficient as possible. So we prefer sticking to how things work. 
multicast to send data to multiple nodes for example. Just because we 
know that when you need speed you'll want it to be done in hardware or 
as low level as possible. ZRE was fitting our use case perfectly for the 
discovery and meta data exchange. Applications still use OSC and other 
protocols to send the real data. That's a legacy we just have for now.

I'm now building a simple system in which nodes would boot, discover 
each other, exchange capabilities, and be controlled dynamically.

Rg,

Arnaud
-- 
w: http://www.sphaero.org
t: http://twitter.com/sphaero
g: http://github.com/sphaero
i: freenode: sphaero_z25



More information about the zeromq-dev mailing list