[zeromq-dev] ZMQ, PGM and Message Loss

Steven McCoy steven.mccoy at miru.hk
Fri Feb 8 17:57:45 CET 2013


On 8 February 2013 11:27, Charles Remes <lists at chuckremes.com> wrote:

> You can configure the retry mechanisms in epgm itself but those can get
> overrun too. If you need more guarantees on delivery, take a look a the
> guid at zguide.zeromq.org and read up on the "clone pattern."
>


Reliability in 0mq multicast is an in-memory window similar to TCP but
without congestion control and can get very quickly overrun in high speed
environments. The underlying protocol is PGM and is detailed in RFC 3208.

A full message queuing paradigm might entail a small window broadcast PGM
protocol with out-of-band recovery using a TCP socket.  Example systems
using this paradigm with easy to find documentation include the BATS
exchange, search for "BATS multicast" for various literature:

http://cdn.batstrading.com/resources/membership/BATS_MC_PITCH_Specification.pdf
http://cdn.batstrading.com/resources/participant_resources/BATS_Europe_MC_PITCH_Specification.pdf
http://cdn.batstrading.com/resources/membership/BATS_Latency_Feed_Specification.pdf


For some rough figures you should generally limit PGM socket usage to under
100,000 packets-per-second which can be a bit difficult to correlate with
0mq as multiple messages may be packed into a single packet.  Bandwidth
should be limited to under 100MB per second as CPU time will be consumed by
checksum calculation.  These can be alleviated by using multiple sockets,
basically a socket per core.  For real world usage these might appear
wildly optimistic especially if one of the components of the system is
using a Windows platform.

A low-end setting maybe 1,000 packets-per-second and 10mb bandwidth:
 Windows is not stable at high datagram rates and can block by design
10,000 datagram-per-second sockets if multimedia is in use; the default PGM
protocol parameters are tuned for 10mb MAN networks.

Beware a fundamental design feature of 0mq is no transport feedback, i.e.
from congestion control, and attempting to send as fast as possible, above
any set data rate will cause 0mq to create a growing backlog of messages
and ensuing consequences.

-- 
Steve-o
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130208/ddf455cc/attachment.html>


More information about the zeromq-dev mailing list