[zeromq-dev] PGM: multiple listener behavior?

Stuart Levy salevy at illinois.edu
Wed Aug 8 19:09:57 CEST 2012


On 8/8/12 9:59 AM, Steven McCoy wrote:
> On 7 August 2012 13:33, Stuart Levy <salevy at illinois.edu 
> <mailto:salevy at illinois.edu>> wrote:
>
>     Thanks - could you explain a bit more?   In the
>     PUB+SUB-on-same-port-on-same-host situation Pierre described -
>     which applies to me too - *all* NAKs will be lost, not just some
>     small fraction of them, if the SUB socket gets opened before PUB,
>     IIUC.   Does that imply that recovery can never happen, regardless
>     of rate limiting, etc.?   I don't want the loss of a single
>     multicast packet to mean that the whole system will need to be
>     restarted...
>
>
> With multiple senders on the same host on the same PGM session only 
> the first will receive NAKs due to unicast routing.  It's an OS and 
> protocol design feature.
>
> I believe with 0MQ only one PGM publisher should ever be active on the 
> network: a bus topology is not supported by the 0MQ socket.  This is 
> probably the more pertinent issue that voids a lot of the previous 
> discussion.
Aha - thanks *very* much for that clarification.  A bus is exactly what 
I was hoping to achieve, so if it's not intended to work with 0MQ, 
that's good to know.
>
>
>     If so, then I don't see how I can use zeromq multicast to create a
>     barrier primitive - that I'd be better off just doing all the
>     low-level networking myself.  That's a pity, if true.
>
>
> Need a bit more clarification on what you are looking for?  Atomic 
> multicast?  I think there was a brief mention about this previously.
>
What I want: a distributed barrier primitive.  I want to synchronize 
graphical displays driven by a bunch of machines, so ~30Hz update rate 
across maybe N=20 hosts.  All hosts are on a common LAN, so IP multicast 
works fine.

I want the barrier to introduce minimal extra delay.  So would rather 
avoid the straightforward scheme
    a) all N nodes send Ready to central coordinator host, which waits 
to hear from all of them
    b) coordinator then sends Go to all N nodes

What I could do with raw multicast (and had hoped to do with 0MQ 
PUB/SUB) is:
    a) all N nodes bind to a single common multicast address/port (yep, 
a bus!)
    b) at barrier time, each node sends a message "node <i> is ready at 
time <T>" to the mcast group
    c) all nodes listen for Ready messages on the bus
    d) when each node has heard from all others, with <T> matching its 
own clock, the barrier is done

There's additional processing for newly-attached nodes (unexpected <i>),
or aberrant clocks (ignore old <T>; resynchronize our own clock if we 
hear from future <T>),
and for timeouts if some expected nodes either drop out or aren't heard 
from.

There's one process per host node, so the app could have exactly one 
Unix/Windows UDP multicast socket => no ambiguity in packet delivery.

I'm not sure about atomic multicast.  The messages needed would be short 
enough that they would easily be single unfragmented UDP packets; is 
that what you mean?

cheers

     Stuart
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20120808/191a5217/attachment.htm>


More information about the zeromq-dev mailing list