[zeromq-dev] missing messages on 40GbE network
Pieter Hintjens
ph at imatix.com
Fri Jul 3 21:18:46 CEST 2015
It's strange that doing disk i/o would affect this. Perhaps you have
CPU contention that shows up more with ZeroMQ than Nano.
On Fri, Jul 3, 2015 at 1:19 PM, Marko Vendelin <markov at sysbio.ioc.ee> wrote:
> Thank you, I have made some testing and below are the results. Before
> results, few words on the configuration: we have two 40GbE cards
> linked directly, without any switch. When I am NOT writing to files, I
> can get sustained 36Gb/s transfers with ZeroMQ for as long as I tried.
> Few dropped frames probably occurred during a boot. I have rebooted
> the both machines and now, after all tests, we have no-errors ifconfig
> output:
>
> <receiver> ens1f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
> inet 192.168.38.1 netmask 255.255.255.252 broadcast 192.168.38.3
> inet6 fe80::225:90ff:fe9c:62c3 prefixlen 64 scopeid 0x20<link>
> ether 00:25:90:9c:62:c3 txqueuelen 1000 (Ethernet)
> RX packets 1263870484 bytes 11244160379023 (10.2 TiB)
> RX errors 0 dropped 0 overruns 0 frame 0
> TX packets 220745627 bytes 14803910877 (13.7 GiB)
> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
>
> <sender> eth3 Link encap:Ethernet HWaddr 00:25:90:9c:63:1a
> inet addr:192.168.38.2 Bcast:192.168.38.3 Mask:255.255.255.252
> inet6 addr: fe80::225:90ff:fe9c:631a/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
> RX packets:258755797 errors:0 dropped:0 overruns:0 frame:0
> TX packets:1403606650 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:17238102838 (17.2 GB) TX bytes:12498689995721 (12.4 TB)
>
> Tests have been performed after setting limits as follows (reboot
> after limits were set):
>
> <receiver> ulimit -a
> -t: cpu time (seconds) unlimited
> -f: file size (blocks) unlimited
> -d: data seg size (kbytes) unlimited
> -s: stack size (kbytes) 8192
> -c: core file size (blocks) unlimited
> -m: resident set size (kbytes) unlimited
> -u: processes 257167
> -n: file descriptors 1024000
> -l: locked-in-memory size (kbytes) 64
> -v: address space (kbytes) unlimited
> -x: file locks unlimited
> -i: pending signals 257167
> -q: bytes in POSIX msg queues 819200
> -e: max nice 0
> -r: max rt priority 0
> -N 15: unlimited
>
> <sender>
> -t: cpu time (seconds) unlimited
> -f: file size (blocks) unlimited
> -d: data seg size (kbytes) unlimited
> -s: stack size (kbytes) 8192
> -c: core file size (blocks) unlimited
> -m: resident set size (kbytes) unlimited
> -u: processes 257240
> -n: file descriptors 1024000
> -l: locked-in-memory size (kbytes) 62914560
> -v: address space (kbytes) unlimited
> -x: file locks unlimited
> -i: pending signals 257240
> -q: bytes in POSIX msg queues 819200
> -e: max nice 0
> -r: max rt priority 99
> -N 15: unlimited
>
> As you could see, I set nofiles to very large value. However, it seems
> that the same results were obtained using 10240 as a limit.
>
> Tests:
>
> * As mentioned above, without storing datasets to files, transfer of
> 36-37Gb/s is sustained. No messages are lost and all arrive in
> specified 10s timeout (using polling). It seems to me that this rules
> out network card problems, as a first guess.
>
> * When writing to datasets to file, on receiver, at some point, socket
> does not receive new messages. I can close the socket, make a new one,
> and get new messages after asking for them from the sender (REQ-REP
> pattern). While it helps to keep the rate about 29-32 Gb/s in the
> beginning, eventually, after 5-15 minutes the transfer rate slowly
> starts to reduce and reaches sub-1Gb/s rates in 20-30 minutes. The
> same occurs whether I use zero-copy or not.
>
> * I have rewrote the simple programs to use nanomsg. Using nanomsg, I
> can obtain the sustained rates of 33.9 Gb/s while writing to files and
> using their zero-copy mechanism. No missing frames have been
> identified and the load is distributed among the slaves rather equally
> (would be disturbed if the missing messages would occur).
>
> On the basis of these tests, it seems to me that either there is a
> hardware bug that gets triggered by ZeroMQ or there is some
> restriction in ZeroMQ that my use pattern hits. If it is a hardware
> bug, nanomsg manages to avoid it somehow. If it is a ZeroMQ
> restriction, what could it be?
>
> Best wishes,
>
> Marko
>
> On Fri, Jul 3, 2015 at 7:15 AM, Ben Kloosterman <bklooste at gmail.com> wrote:
>> Try change /proc/sys/net/core/rmem_default and /proc/sys/net/core/rmem_max
>>
>> also try test with tcpdump and check for drops.
>>
>> Frame errors are CRC no many but i bet they are the big packets you lost.
>> This could be cable , switch etc.
>>
>>
>> Here is some older stuff http://datatag.web.cern.ch/datatag/howto/tcp.html
>> for 10Gig.
>>
>> Regards,
>>
>> Ben
>>
>> On Wed, Jul 1, 2015 at 9:45 PM, Marko Vendelin <markov at sysbio.ioc.ee> wrote:
>>>
>>> Dear ØMQ developers:
>>>
>>> Synopsis: I am observing a strange interaction between storing
>>> datastream on harddisks and a loss of ZeroMQ messages. It seems that
>>> in my use case, when messages are larger than 2MB, some of them are
>>> randomly dropped.
>>>
>>> Full story:
>>>
>>> I need to pump images acquired by fast scientific cameras into the
>>> files with the rates approaching 25Gb/s. For that, images are acquired
>>> in one server and transferred into the harddisk array using 40Gb/s
>>> network. Since Linux-based solutions using iSCSI were not working very
>>> well (maybe need to optimize more) and plain network applications
>>> could use the full bandwidth, I decided to use RAID-0 inspired
>>> approach: make filesystem on each of 32 harddisks separately, run
>>> small slave programs one per filesystem and let the slaves ask the
>>> dataset server for a dataset in a loop. As a messaging system, I use
>>> ZeroMQ and REQ/REP connection. In general, all seem to work perfectly:
>>> I am able to stream and record data at about 36Gb/s rates. However, at
>>> some point (within 5-10 min), sometimes messages get lost.
>>> Intriguingly, this occurs only if I write files and messages are 2MB
>>> or larger. Much smaller messages do not seem to trigger this effect.
>>> If I just stream data and either dump it or just calculate on the
>>> basis of it, all messages go through. All messages go through if I use
>>> 1Gb network.
>>>
>>> While in production code I stream data into HDF5, use zmqpp and
>>> pooling to receive messages, I have reduced the problematic code into
>>> the simplest case using zmq.hpp, regular files, and plain send/recv
>>> calls. Code is available at
>>>
>>> http://www.ioc.ee/~markov/zmq/problem-missing-messages/
>>>
>>> At the same time, there don't seem to be any excessive drops in
>>> ethernet cards, as reported by ifconfig in Linux (slaves run on
>>> Gentoo, server on Ubuntu):
>>>
>>>
>>> ens1f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
>>> inet 192.168.38.1 netmask 255.255.255.252 broadcast 192.168.38.3
>>> inet6 fe80::225:90ff:fe9c:62c3 prefixlen 64 scopeid 0x20<link>
>>> ether 00:25:90:9c:62:c3 txqueuelen 1000 (Ethernet)
>>> RX packets 8568340799 bytes 76612663159251 (69.6 TiB)
>>> RX errors 7 dropped 0 overruns 0 frame 7
>>> TX packets 1558294820 bytes 93932603947 (87.4 GiB)
>>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
>>>
>>> eth3 Link encap:Ethernet HWaddr 00:25:90:9c:63:1a
>>> inet addr:192.168.38.2 Bcast:192.168.38.3 Mask:255.255.255.252
>>> inet6 addr: fe80::225:90ff:fe9c:631a/64 Scope:Link
>>> UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
>>> RX packets:1558294810 errors:0 dropped:0 overruns:0 frame:0
>>> TX packets:8570261350 errors:0 dropped:0 overruns:0 carrier:0
>>> collisions:0 txqueuelen:1000
>>> RX bytes:102083292705 (102.0 GB) TX bytes:76629844394725 (76.6
>>> TB)
>>>
>>>
>>> So, it should not be a simple dropped frames problem.
>>>
>>> Since the problem occurs only with larger messages, is there any
>>> size-limited buffer in ZeroMQ that may cause dropping of the messages?
>>> Or any other possible solution?
>>>
>>> Thank you for your help,
>>>
>>> Marko
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
More information about the zeromq-dev
mailing list