[zeromq-dev] missing messages on 40GbE network

Ben Kloosterman bklooste at gmail.com
Wed Jul 1 14:40:31 CEST 2015


more likely the nic buffer ,driver than zeromq.

ben

On Wed, Jul 1, 2015 at 9:45 PM, Marko Vendelin <markov at sysbio.ioc.ee> wrote:

> Dear ØMQ developers:
>
> Synopsis: I am observing a strange interaction between storing
> datastream on harddisks and a loss of ZeroMQ messages. It seems that
> in my use case, when messages are larger than 2MB, some of them are
> randomly dropped.
>
> Full story:
>
> I need to pump images acquired by fast scientific cameras into the
> files with the rates approaching 25Gb/s. For that, images are acquired
> in one server and transferred into the harddisk array using 40Gb/s
> network. Since Linux-based solutions using iSCSI were not working very
> well (maybe need to optimize more) and plain network applications
> could use the full bandwidth, I decided to use RAID-0 inspired
> approach: make filesystem on each of 32 harddisks separately, run
> small slave programs one per filesystem and let the slaves ask the
> dataset server for a dataset in a loop. As a messaging system, I use
> ZeroMQ and REQ/REP connection. In general, all seem to work perfectly:
> I am able to stream and record data at about 36Gb/s rates. However, at
> some point (within 5-10 min), sometimes messages get lost.
> Intriguingly, this occurs only if I write files and messages are 2MB
> or larger. Much smaller messages do not seem to trigger this effect.
> If I just stream data and either dump it or just calculate on the
> basis of it, all messages go through. All messages go through if I use
> 1Gb network.
>
> While in production code I stream data into HDF5, use zmqpp and
> pooling to receive messages, I have reduced the problematic code into
> the simplest case using zmq.hpp, regular files, and plain send/recv
> calls. Code is available at
>
> http://www.ioc.ee/~markov/zmq/problem-missing-messages/
>
> At the same time, there don't seem to be any excessive drops in
> ethernet cards, as reported by ifconfig in Linux (slaves run on
> Gentoo, server on Ubuntu):
>
>
> ens1f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>         inet 192.168.38.1  netmask 255.255.255.252  broadcast 192.168.38.3
>         inet6 fe80::225:90ff:fe9c:62c3  prefixlen 64  scopeid 0x20<link>
>         ether 00:25:90:9c:62:c3  txqueuelen 1000  (Ethernet)
>         RX packets 8568340799  bytes 76612663159251 (69.6 TiB)
>         RX errors 7  dropped 0  overruns 0  frame 7
>         TX packets 1558294820  bytes 93932603947 (87.4 GiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>
> eth3      Link encap:Ethernet  HWaddr 00:25:90:9c:63:1a
>           inet addr:192.168.38.2  Bcast:192.168.38.3  Mask:255.255.255.252
>           inet6 addr: fe80::225:90ff:fe9c:631a/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
>           RX packets:1558294810 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:8570261350 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:102083292705 (102.0 GB)  TX bytes:76629844394725 (76.6
> TB)
>
>
> So, it should not be a simple dropped frames problem.
>
> Since the problem occurs only with larger messages, is there any
> size-limited buffer in ZeroMQ that may cause dropping of the messages?
> Or any other possible solution?
>
> Thank you for your help,
>
> Marko
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20150701/27cb9a46/attachment.htm>


More information about the zeromq-dev mailing list