[zeromq-dev] missing messages on 40GbE network

Marko Vendelin markov at sysbio.ioc.ee
Wed Jul 1 15:19:15 CEST 2015

Hi Ben,

any idea on how can I check it? Should error message come through ZMQ
or somehow from kernel?


On Wed, Jul 1, 2015 at 3:40 PM, Ben Kloosterman <bklooste at gmail.com> wrote:
> more likely the nic buffer ,driver than zeromq.
> ben
> On Wed, Jul 1, 2015 at 9:45 PM, Marko Vendelin <markov at sysbio.ioc.ee> wrote:
>> Dear ØMQ developers:
>> Synopsis: I am observing a strange interaction between storing
>> datastream on harddisks and a loss of ZeroMQ messages. It seems that
>> in my use case, when messages are larger than 2MB, some of them are
>> randomly dropped.
>> Full story:
>> I need to pump images acquired by fast scientific cameras into the
>> files with the rates approaching 25Gb/s. For that, images are acquired
>> in one server and transferred into the harddisk array using 40Gb/s
>> network. Since Linux-based solutions using iSCSI were not working very
>> well (maybe need to optimize more) and plain network applications
>> could use the full bandwidth, I decided to use RAID-0 inspired
>> approach: make filesystem on each of 32 harddisks separately, run
>> small slave programs one per filesystem and let the slaves ask the
>> dataset server for a dataset in a loop. As a messaging system, I use
>> ZeroMQ and REQ/REP connection. In general, all seem to work perfectly:
>> I am able to stream and record data at about 36Gb/s rates. However, at
>> some point (within 5-10 min), sometimes messages get lost.
>> Intriguingly, this occurs only if I write files and messages are 2MB
>> or larger. Much smaller messages do not seem to trigger this effect.
>> If I just stream data and either dump it or just calculate on the
>> basis of it, all messages go through. All messages go through if I use
>> 1Gb network.
>> While in production code I stream data into HDF5, use zmqpp and
>> pooling to receive messages, I have reduced the problematic code into
>> the simplest case using zmq.hpp, regular files, and plain send/recv
>> calls. Code is available at
>> http://www.ioc.ee/~markov/zmq/problem-missing-messages/
>> At the same time, there don't seem to be any excessive drops in
>> ethernet cards, as reported by ifconfig in Linux (slaves run on
>> Gentoo, server on Ubuntu):
>> ens1f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>>         inet  netmask  broadcast
>>         inet6 fe80::225:90ff:fe9c:62c3  prefixlen 64  scopeid 0x20<link>
>>         ether 00:25:90:9c:62:c3  txqueuelen 1000  (Ethernet)
>>         RX packets 8568340799  bytes 76612663159251 (69.6 TiB)
>>         RX errors 7  dropped 0  overruns 0  frame 7
>>         TX packets 1558294820  bytes 93932603947 (87.4 GiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>> eth3      Link encap:Ethernet  HWaddr 00:25:90:9c:63:1a
>>           inet addr:  Bcast:  Mask:
>>           inet6 addr: fe80::225:90ff:fe9c:631a/64 Scope:Link
>>           UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
>>           RX packets:1558294810 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:8570261350 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:1000
>>           RX bytes:102083292705 (102.0 GB)  TX bytes:76629844394725 (76.6
>> TB)
>> So, it should not be a simple dropped frames problem.
>> Since the problem occurs only with larger messages, is there any
>> size-limited buffer in ZeroMQ that may cause dropping of the messages?
>> Or any other possible solution?
>> Thank you for your help,
>> Marko
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev

More information about the zeromq-dev mailing list