[zeromq-dev] ZRE/Zbeacon improvement proposal

Arnaud Loonstra arnaud at sphaero.org
Sat Oct 26 10:53:00 CEST 2013


Hi all,

Last few weeks I have been investigating the Zyre/ZRE codebase. I have 
some suggestion which could improve things I would like to discuss. I've 
been working with Zbeacon mostly so my proposes are for ZBeacon only 
currently. Here are the issues I've found:

1 - Only one node can run on a host (or one per interface)
2 - Network segments can't be covered
3 - Nodes are trying to connect to dead peers (peers that are gone)

I've been working on some solutions to overcome this and I would propose 
these changes. As I'm not a frequent C programmer and thus find it hard 
to be a bit creative with it I ported Zbeacon to Python. In the Python 
code I've implemented solutions to these issues. I've only been testing 
these solutions on Linux platforms, currently:

1 - The issue is caused by using a broadcast socket. It's possible to 
bind multiple programs to the same address and port but packets are then 
load balanced between the sockets. This results in missing beacons. The 
solution is to use multicast instead. I've talked with Pieter about it 
and he said broadcast was chosen because of multicast issues with 
Android. I'm not sure what the current state is of multicast in Android 
so I left broadcast as an option determined by the address.
2 - Broadcast is limited to one network segment unless you do some 
broadcast forwarding on a router device. Multicast is again a solution 
(if you have the right network equipment/configuration). In most cases 
where multicast is not really multicast it's usually handled as 
broadcast traffic anyways.
3 - What I observed was nodes not knowing when a peer node was gone 
trying to connect to the peer node. This is obviously to no success. It 
also results in unnecessary traffic and handling. I would propose to 
embody a node state identifier inside the beacon. This way when a node 
exits it can send its exit state before terminating and other nodes have 
the possibility to pick this up. This would also be a welcome feature to 
inform nodes of each other state. For example when a node is overloaded 
it could inform about this using its state. A broker can leave him alone 
for a while. I think this could be useful in general.

To facilitate the extra state identifier it could be simply appended to 
the body:
+---+---+---+------+  +------+------+
| Z | R | E | %x01 |  | UUID | port |
+---+---+---+------+  +------+------+
        Header               Body

+---+---+---+------+  +------+------+-------+
| Z | R | E | %x01 |  | UUID | port | state |
+---+---+---+------+  +------+------+-------+
        Header               Body

These extra statements would apply:
* The state identifier uses the following 1 octet values; 0: INIT, 
1=ERROR, 2=EXIT, 3=IDLE, 4=RECEIVING, 5=PROCESSING
* A node shall send a beacon with a state value of EXIT(2) just before 
it shuts down

The python code is available at http://github.com/sphaero/pyzyre

Would these proposals be of use for ZRE?

More ideas?

- For passing multiple network segments multicast is often not possible. 
Crossing internet using multicast is nowhere available, to my knowledge. 
In order to overcome this I think the only solution would be to use DNS. 
What I'm thinking of is the following sequence for node discovery:
   1 - multicast/broadcast as it is now
   2 - DNS SRV records
   3 - static/manual config (by passing a config parameter)

- When a node receives a beacon from it's own ipaddress but from a 
different uuid than it's own it should start open ipc/inproc sockets and 
use those to communicate to each other. This is more efficient in the 
case when multiple nodes run on one machine.

Rg,

Arnaud Loonstra
-- 
w: http://www.sphaero.org
t: http://twitter.com/sphaero
g: http://github.com/sphaero
i: freenode: sphaero_z25



More information about the zeromq-dev mailing list