[zeromq-dev] Does ZMQ "Over Send" Using OpenPGM
Steven McCoy
steven.mccoy at miru.hk
Wed Oct 20 11:07:50 CEST 2010
On 20 October 2010 16:22, Martin Sustrik <sustrik at 250bpm.com> wrote:
> On 10/20/2010 04:41 AM, Steven McCoy wrote:
>
> The exact value is set by IP_MAX_MEMBERSHIPS which can be found on Linux
>> in /usr/include/bits/in.h. Although this explicitly for groups on one
>> socket, as such in OpenPGM you will get an error trying assign more than
>> 20 groups to one transport.
>>
>
> The value limits number of multicast groups per OS *socket*, right?
>
Correct.
As has been already discussed on many different OS mailing lists the limit
is artificial and has been raised on OpenBSD and others already.
> 0MQ creates a special instance of pgm_socket for each connect. I would
> expect that each pgm_socket would in turn create a separate raw socket, am I
> right? Thus max number of multicast groups per socket is always 1.
>
> Or maybe OpenPGM uses a single OS socket for all the pgm_sockets?
There are three sockets per pgm_socket_t, one subscribing and two
publishing. With the OpenPGM API you can add groups and SSM sources after
creating the socket. ZeroMQ only permits and ZeroMQ socket creation.
Fundamentally there are many real and artificial limits for multicast,
unfortunately all the vendors have repeatedly proven that they do not wish
to disclose these limits per piece of hardware. If you exceed the hardware
limit you end up with filtering moved to the software stack and performance
will be affected, for both operating systems, switching and routing devices.
For some hardware limits you might experience extra group subscriptions
simply failing or silently not receiving anything.
The mysterious rule of thumb has been 20 multicast groups per node, although
this is more targeted to the subscriber as the publishing side doesn't care,
similarly you have to be aware of group hashing limitations. If you want to
exceed this you have to really thoroughly test all the hardware involved,
especially when you are talking about Cisco intermediaries and different
revisions of Intel E1000 NICs. Intel is constantly pushing out new firmware
for their NICs and together with demands for iSCSI acceleration from virtual
hosting you see Intel Server NICs are faster and can store more state
information than before.
Vendor support on testing and diagnosing problems can also be problematic
depending on where you are physically located as to whether the vendor has
proven multicast MAN experience such as with TIBCO and Reuters clients. For
example I have had significant problems with Cisco in Sweden tracing a
routing fault between two buildings either side of Stockholm, Cisco refused
to admit fault and my client had to bypass the faulty routers with a new
line.
To reiterate the known multicast features:
-
*Avoid 224.0.0.x*--Traffic to addresses of the form 224.0.0.*x* is often
flooded to all switch ports. This address range is reserved for link-local
uses. Many routing protocols assume that all traffic within this range will
be received by all routers on the network. Hence (at least all Cisco)
switches flood traffic within this range. The flooding behavior overrides
the normal selective forwarding behavior of a multicast-aware switch (e.g.
IGMP snooping, CGMP, etc.).
-
*Watch for 32:1 overlap*--32 non-contiguous IP multicast addresses are
mapped onto each Ethernet multicast address. A receiver that joins a single
IP multicast group implicitly joins 31 others due to this overlap. Of
course, filtering in the operating system discards undesired multicast
traffic from applications, but NIC bandwidth and CPU resources are
nonetheless consumed discarding it. The overlap occurs in the 5 high-order
bits, so it's best to use the 23 low-order bits to make distinct multicast
streams unique. For example, IP multicast addresses in the range 239.0.0.0
to 239.127.255.255 all map to unique Ethernet multicast addresses. However,
IP multicast address 239.*128*.0.0 maps to the same Ethernet multicast
address as 239.*0*.0.0, 239.*128*.0.1 maps to the same Ethernet multicast
address as 239.*0*.0.1, etc.
-
*Avoid x.0.0.y and x.128.0.y*--Combining the above two considerations,
it's best to avoid using IP multicast addresses of the form *x*.0.0.*y*
and *x*.128.0.*y* since they all map onto the range of Ethernet
multicast addresses that are flooded to all switch ports.
With more details from Cisco here, "Guidelines for Enterprise IP Multicast
Address Allocation"
http://www.cisco.com/en/US/tech/tk828/technologies_white_paper09186a00802d4643.shtml
--
Steve-o
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20101020/c6538ca5/attachment.htm>
More information about the zeromq-dev
mailing list