[zeromq-dev] missing messages on 40GbE network
Marko Vendelin
markov at sysbio.ioc.ee
Wed Jul 1 15:19:15 CEST 2015
Hi Ben,
any idea on how can I check it? Should error message come through ZMQ
or somehow from kernel?
Marko
On Wed, Jul 1, 2015 at 3:40 PM, Ben Kloosterman <bklooste at gmail.com> wrote:
> more likely the nic buffer ,driver than zeromq.
>
> ben
>
> On Wed, Jul 1, 2015 at 9:45 PM, Marko Vendelin <markov at sysbio.ioc.ee> wrote:
>>
>> Dear ØMQ developers:
>>
>> Synopsis: I am observing a strange interaction between storing
>> datastream on harddisks and a loss of ZeroMQ messages. It seems that
>> in my use case, when messages are larger than 2MB, some of them are
>> randomly dropped.
>>
>> Full story:
>>
>> I need to pump images acquired by fast scientific cameras into the
>> files with the rates approaching 25Gb/s. For that, images are acquired
>> in one server and transferred into the harddisk array using 40Gb/s
>> network. Since Linux-based solutions using iSCSI were not working very
>> well (maybe need to optimize more) and plain network applications
>> could use the full bandwidth, I decided to use RAID-0 inspired
>> approach: make filesystem on each of 32 harddisks separately, run
>> small slave programs one per filesystem and let the slaves ask the
>> dataset server for a dataset in a loop. As a messaging system, I use
>> ZeroMQ and REQ/REP connection. In general, all seem to work perfectly:
>> I am able to stream and record data at about 36Gb/s rates. However, at
>> some point (within 5-10 min), sometimes messages get lost.
>> Intriguingly, this occurs only if I write files and messages are 2MB
>> or larger. Much smaller messages do not seem to trigger this effect.
>> If I just stream data and either dump it or just calculate on the
>> basis of it, all messages go through. All messages go through if I use
>> 1Gb network.
>>
>> While in production code I stream data into HDF5, use zmqpp and
>> pooling to receive messages, I have reduced the problematic code into
>> the simplest case using zmq.hpp, regular files, and plain send/recv
>> calls. Code is available at
>>
>> http://www.ioc.ee/~markov/zmq/problem-missing-messages/
>>
>> At the same time, there don't seem to be any excessive drops in
>> ethernet cards, as reported by ifconfig in Linux (slaves run on
>> Gentoo, server on Ubuntu):
>>
>>
>> ens1f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
>> inet 192.168.38.1 netmask 255.255.255.252 broadcast 192.168.38.3
>> inet6 fe80::225:90ff:fe9c:62c3 prefixlen 64 scopeid 0x20<link>
>> ether 00:25:90:9c:62:c3 txqueuelen 1000 (Ethernet)
>> RX packets 8568340799 bytes 76612663159251 (69.6 TiB)
>> RX errors 7 dropped 0 overruns 0 frame 7
>> TX packets 1558294820 bytes 93932603947 (87.4 GiB)
>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
>>
>> eth3 Link encap:Ethernet HWaddr 00:25:90:9c:63:1a
>> inet addr:192.168.38.2 Bcast:192.168.38.3 Mask:255.255.255.252
>> inet6 addr: fe80::225:90ff:fe9c:631a/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
>> RX packets:1558294810 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:8570261350 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:102083292705 (102.0 GB) TX bytes:76629844394725 (76.6
>> TB)
>>
>>
>> So, it should not be a simple dropped frames problem.
>>
>> Since the problem occurs only with larger messages, is there any
>> size-limited buffer in ZeroMQ that may cause dropping of the messages?
>> Or any other possible solution?
>>
>> Thank you for your help,
>>
>> Marko
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
More information about the zeromq-dev
mailing list