[zeromq-dev] missing messages on 40GbE network
Ben Kloosterman
bklooste at gmail.com
Wed Jul 1 14:40:31 CEST 2015
more likely the nic buffer ,driver than zeromq.
ben
On Wed, Jul 1, 2015 at 9:45 PM, Marko Vendelin <markov at sysbio.ioc.ee> wrote:
> Dear ØMQ developers:
>
> Synopsis: I am observing a strange interaction between storing
> datastream on harddisks and a loss of ZeroMQ messages. It seems that
> in my use case, when messages are larger than 2MB, some of them are
> randomly dropped.
>
> Full story:
>
> I need to pump images acquired by fast scientific cameras into the
> files with the rates approaching 25Gb/s. For that, images are acquired
> in one server and transferred into the harddisk array using 40Gb/s
> network. Since Linux-based solutions using iSCSI were not working very
> well (maybe need to optimize more) and plain network applications
> could use the full bandwidth, I decided to use RAID-0 inspired
> approach: make filesystem on each of 32 harddisks separately, run
> small slave programs one per filesystem and let the slaves ask the
> dataset server for a dataset in a loop. As a messaging system, I use
> ZeroMQ and REQ/REP connection. In general, all seem to work perfectly:
> I am able to stream and record data at about 36Gb/s rates. However, at
> some point (within 5-10 min), sometimes messages get lost.
> Intriguingly, this occurs only if I write files and messages are 2MB
> or larger. Much smaller messages do not seem to trigger this effect.
> If I just stream data and either dump it or just calculate on the
> basis of it, all messages go through. All messages go through if I use
> 1Gb network.
>
> While in production code I stream data into HDF5, use zmqpp and
> pooling to receive messages, I have reduced the problematic code into
> the simplest case using zmq.hpp, regular files, and plain send/recv
> calls. Code is available at
>
> http://www.ioc.ee/~markov/zmq/problem-missing-messages/
>
> At the same time, there don't seem to be any excessive drops in
> ethernet cards, as reported by ifconfig in Linux (slaves run on
> Gentoo, server on Ubuntu):
>
>
> ens1f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
> inet 192.168.38.1 netmask 255.255.255.252 broadcast 192.168.38.3
> inet6 fe80::225:90ff:fe9c:62c3 prefixlen 64 scopeid 0x20<link>
> ether 00:25:90:9c:62:c3 txqueuelen 1000 (Ethernet)
> RX packets 8568340799 bytes 76612663159251 (69.6 TiB)
> RX errors 7 dropped 0 overruns 0 frame 7
> TX packets 1558294820 bytes 93932603947 (87.4 GiB)
> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
>
> eth3 Link encap:Ethernet HWaddr 00:25:90:9c:63:1a
> inet addr:192.168.38.2 Bcast:192.168.38.3 Mask:255.255.255.252
> inet6 addr: fe80::225:90ff:fe9c:631a/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
> RX packets:1558294810 errors:0 dropped:0 overruns:0 frame:0
> TX packets:8570261350 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:102083292705 (102.0 GB) TX bytes:76629844394725 (76.6
> TB)
>
>
> So, it should not be a simple dropped frames problem.
>
> Since the problem occurs only with larger messages, is there any
> size-limited buffer in ZeroMQ that may cause dropping of the messages?
> Or any other possible solution?
>
> Thank you for your help,
>
> Marko
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20150701/27cb9a46/attachment.htm>
More information about the zeromq-dev
mailing list