[zeromq-dev] Does ZMQ "Over Send" Using OpenPGM

Martin Sustrik sustrik at 250bpm.com
Wed Oct 20 11:22:33 CEST 2010

Hi Steven,

Ok. Understood. That would account for messages not being delivered.

However, what Bob is seeing is message duplicates are delivered when he 
subscribes to multiple multicast groups.

Any idea how that could happen?


On 10/20/2010 11:07 AM, Steven McCoy wrote:
> On 20 October 2010 16:22, Martin Sustrik <sustrik at 250bpm.com
> <mailto:sustrik at 250bpm.com>> wrote:
>     On 10/20/2010 04:41 AM, Steven McCoy wrote:
>         The exact value is set by IP_MAX_MEMBERSHIPS which can be found
>         on Linux
>         in /usr/include/bits/in.h.  Although this explicitly for groups
>         on one
>         socket, as such in OpenPGM you will get an error trying assign
>         more than
>         20 groups to one transport.
>     The value limits number of multicast groups per OS *socket*, right?
> Correct.
> As has been already discussed on many different OS mailing lists the
> limit is artificial and has been raised on OpenBSD and others already.
>     0MQ creates a special instance of pgm_socket for each connect. I
>     would expect that each pgm_socket would in turn create a separate
>     raw socket, am I right? Thus max number of multicast groups per
>     socket is always 1.
>     Or maybe OpenPGM uses a single OS socket for all the pgm_sockets?
> There are three sockets per pgm_socket_t, one subscribing and two
> publishing.  With the OpenPGM API you can add groups and SSM sources
> after creating the socket.  ZeroMQ only permits and ZeroMQ socket creation.
> Fundamentally there are many real and artificial limits for multicast,
> unfortunately all the vendors have repeatedly proven that they do not
> wish to disclose these limits per piece of hardware.  If you exceed the
> hardware limit you end up with filtering moved to the software stack and
> performance will be affected, for both operating systems, switching and
> routing devices.  For some hardware limits you might experience extra
> group subscriptions simply failing or silently not receiving anything.
> The mysterious rule of thumb has been 20 multicast groups per node,
> although this is more targeted to the subscriber as the publishing side
> doesn't care, similarly you have to be aware of group hashing
> limitations.  If you want to exceed this you have to really thoroughly
> test all the hardware involved, especially when you are talking about
> Cisco intermediaries and different revisions of Intel E1000 NICs.  Intel
> is constantly pushing out new firmware for their NICs and together with
> demands for iSCSI acceleration from virtual hosting you see Intel Server
> NICs are faster and can store more state information than before.
> Vendor support on testing and diagnosing problems can also be
> problematic depending on where you are physically located as
> to whether the vendor has proven multicast MAN experience such as with
> TIBCO and Reuters clients.  For example I have had significant problems
> with Cisco in Sweden tracing a routing fault between two buildings
> either side of Stockholm, Cisco refused to admit fault and my client had
> to bypass the faulty routers with a new line.
> To reiterate the known multicast features:
>     *
>       /Avoid 224.0.0.x/--Traffic to addresses of the form 224.0.0./x/ is
>       often flooded to all switch ports. This address range is reserved
>       for link-local uses. Many routing protocols assume that all
>       traffic within this range will be received by all routers on the
>       network. Hence (at least all Cisco) switches flood traffic within
>       this range. The flooding behavior overrides the normal selective
>       forwarding behavior of a multicast-aware switch (e.g. IGMP
>       snooping, CGMP, etc.).
>     *
>       /Watch for 32:1 overlap/--32 non-contiguous IP multicast addresses
>       are mapped onto each Ethernet multicast address. A receiver that
>       joins a single IP multicast group implicitly joins 31 others due
>       to this overlap. Of course, filtering in the operating system
>       discards undesired multicast traffic from applications, but NIC
>       bandwidth and CPU resources are nonetheless consumed discarding
>       it. The overlap occurs in the 5 high-order bits, so it's best to
>       use the 23 low-order bits to make distinct multicast streams
>       unique. For example, IP multicast addresses in the range
>       to all map to unique Ethernet multicast addresses.
>       However, IP multicast address 239./128/.0.0 maps to the same
>       Ethernet multicast address as 239./0/.0.0, 239./128/.0.1 maps to
>       the same Ethernet multicast address as 239./0/.0.1, etc.
>     *
>       /Avoid x.0.0.y and x.128.0.y/--Combining the above two
>       considerations, it's best to avoid using IP multicast addresses of
>       the form /x/.0.0./y/ and /x/.128.0./y/ since they all map onto the
>       range of Ethernet multicast addresses that are flooded to all
>       switch ports.
> With more details from Cisco here, "Guidelines for Enterprise IP
> Multicast Address Allocation"
> http://www.cisco.com/en/US/tech/tk828/technologies_white_paper09186a00802d4643.shtml
> --
> Steve-o

More information about the zeromq-dev mailing list