[zeromq-dev] missing messages on 40GbE network

Pieter Hintjens ph at imatix.com
Tue Jul 7 11:46:45 CEST 2015


For what it's worth, PAIR sockets are a poor choice over TCP as they
do not auto-reconnect automatically. You can replace with DEALER.

On Tue, Jul 7, 2015 at 9:16 AM, Marko Vendelin <markov at sysbio.ioc.ee> wrote:
> Hi Peter,
>
> thank you for the pointers. Its seems now that there is some problem
> with the disk I/O, as suspected first. Namely, when system starts to
> 'crawl', I can fire up new clients that don't write anything and these
> clients are doing absolutely fine (recording high rates). New clients
> with disk i/o crawl immediately.
>
> I'll look into it and would try to isolate the issue further.
>
> REP-REQ: No, I was using multiple requests in PAIR sockets, as you
> advised earlier.
>
> NORM: When things work, TCP is fine. As far as I know a lot is
> processed on the cards internally and I can get to the rates that are
> as large as needed.
>
> I'll let the list know if the problem is in disk I/O and what was the
> cause of it.
>
> Best wishes,
>
> Marko
>
>
> On Mon, Jul 6, 2015 at 11:30 PM, Peter Krey <krey at ripple.com> wrote:
>> You may want to try switching to a UDP based protocol like NORM on zmq. This
>> will let you achieve higher throughput as there will be no TCP packet
>> handshakes.
>>
>> You can also try installing multiple NIC cards on your computer and bind
>> them together into one device for higher throughput if you think the cards
>> devices buffers are being overrun.
>>
>> On Mon, Jul 6, 2015 at 1:25 PM, Peter Krey <krey at ripple.com> wrote:
>>>
>>> You are not using REQ-REP properly; a REQ-REP socket will not accept two
>>> REQ messages in a row; it needs a REP before it will proceed otherwise it
>>> will block.
>>>
>>> I highly advise you using PAIR type for all sockets in your application
>>> and no REQ-REP sockets at all, especially given the throughput required in
>>> your application.
>>>
>>> On Sun, Jul 5, 2015 at 9:58 AM, Marko Vendelin <markov at sysbio.ioc.ee>
>>> wrote:
>>>>
>>>> I did reprogram using pair sockets, one per each client. They were still
>>>> using request reply pattern and when request was not replied to, the client
>>>> repeated the request. Unfortunately, the similar behaviour was observed:
>>>> initial fast rate reduced and never recovered.
>>>>
>>>> I'm wondering is it possible to get error codes out of zeromq to see
>>>> where the problem is?
>>>>
>>>> Best wishes
>>>>
>>>> Marko
>>>>
>>>> On Jul 4, 2015 12:04 AM, "Marko Vendelin" <marko.vendelin at gmail.com>
>>>> wrote:
>>>>>
>>>>> Is there any way I could check for automatic drop of messages by
>>>>> zeromq? I could recompile the library with some debug settings if
>>>>> needed, but this information would be very valuable.
>>>>>
>>>>> In this case I would expect to have the same in nanomsg as well and in
>>>>> the beginning of the test with ZeroMQ. We should have disk i/o faster
>>>>> than the network. Since the dropoff happens at ~10minutes when using
>>>>> zeromq, RAM would not be able to cache the data either (at that time I
>>>>> have transferred already ~2TB in 64GB RAM machines).
>>>>>
>>>>> Use of REQ/REP allows me to spread the load among all disks
>>>>> automatically. Since disk writers are one per HDD and after receiving
>>>>> each dataset write it on disk, the load per disk is proportional to
>>>>> its speed. The rates I am getting in the beginning with ZMQ (first ~10
>>>>> min, ~30-36Gb/s) are above our requirements and would fit the
>>>>> application perfectly. If I could only sustain it as long as the disk
>>>>> space allows.
>>>>>
>>>>> Re PAIR: I was thinking about giving PAIR a try. Would need to
>>>>> reprogram a bit, but its possible.
>>>>>
>>>>> Best wishes,
>>>>>
>>>>> Marko
>>>>>
>>>>>
>>>>> On Fri, Jul 3, 2015 at 10:52 PM, Peter Krey <peterjkrey at gmail.com>
>>>>> wrote:
>>>>> > You may be sending messages faster than you can receive them and write
>>>>> > them
>>>>> > to disk, overflowing zeromq message send buffer causing zeromq to
>>>>> > automatically discard some messages. This is expected behavior.
>>>>> >
>>>>> > Also do not use socket type request reply, use pair. This will not
>>>>> > require
>>>>> > your app to recv and reply before sending the next image; your app can
>>>>> > send
>>>>> > async.
>>>>> >
>>>>> > On Wednesday, July 1, 2015, Marko Vendelin <markov at sysbio.ioc.ee>
>>>>> > wrote:
>>>>> >>
>>>>> >> Dear ØMQ developers:
>>>>> >>
>>>>> >> Synopsis: I am observing a strange interaction between storing
>>>>> >> datastream on harddisks and a loss of ZeroMQ messages. It seems that
>>>>> >> in my use case, when messages are larger than 2MB, some of them are
>>>>> >> randomly dropped.
>>>>> >>
>>>>> >> Full story:
>>>>> >>
>>>>> >> I need to pump images acquired by fast scientific cameras into the
>>>>> >> files with the rates approaching 25Gb/s. For that, images are
>>>>> >> acquired
>>>>> >> in one server and transferred into the harddisk array using 40Gb/s
>>>>> >> network. Since Linux-based solutions using iSCSI were not working
>>>>> >> very
>>>>> >> well (maybe need to optimize more) and plain network applications
>>>>> >> could use the full bandwidth, I decided to use RAID-0 inspired
>>>>> >> approach: make filesystem on each of 32 harddisks separately, run
>>>>> >> small slave programs one per filesystem and let the slaves ask the
>>>>> >> dataset server for a dataset in a loop. As a messaging system, I use
>>>>> >> ZeroMQ and REQ/REP connection. In general, all seem to work
>>>>> >> perfectly:
>>>>> >> I am able to stream and record data at about 36Gb/s rates. However,
>>>>> >> at
>>>>> >> some point (within 5-10 min), sometimes messages get lost.
>>>>> >> Intriguingly, this occurs only if I write files and messages are 2MB
>>>>> >> or larger. Much smaller messages do not seem to trigger this effect.
>>>>> >> If I just stream data and either dump it or just calculate on the
>>>>> >> basis of it, all messages go through. All messages go through if I
>>>>> >> use
>>>>> >> 1Gb network.
>>>>> >>
>>>>> >> While in production code I stream data into HDF5, use zmqpp and
>>>>> >> pooling to receive messages, I have reduced the problematic code into
>>>>> >> the simplest case using zmq.hpp, regular files, and plain send/recv
>>>>> >> calls. Code is available at
>>>>> >>
>>>>> >> http://www.ioc.ee/~markov/zmq/problem-missing-messages/
>>>>> >>
>>>>> >> At the same time, there don't seem to be any excessive drops in
>>>>> >> ethernet cards, as reported by ifconfig in Linux (slaves run on
>>>>> >> Gentoo, server on Ubuntu):
>>>>> >>
>>>>> >>
>>>>> >> ens1f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>>>>> >>         inet 192.168.38.1  netmask 255.255.255.252  broadcast
>>>>> >> 192.168.38.3
>>>>> >>         inet6 fe80::225:90ff:fe9c:62c3  prefixlen 64  scopeid
>>>>> >> 0x20<link>
>>>>> >>         ether 00:25:90:9c:62:c3  txqueuelen 1000  (Ethernet)
>>>>> >>         RX packets 8568340799  bytes 76612663159251 (69.6 TiB)
>>>>> >>         RX errors 7  dropped 0  overruns 0  frame 7
>>>>> >>         TX packets 1558294820  bytes 93932603947 (87.4 GiB)
>>>>> >>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>>>> >>
>>>>> >> eth3      Link encap:Ethernet  HWaddr 00:25:90:9c:63:1a
>>>>> >>           inet addr:192.168.38.2  Bcast:192.168.38.3
>>>>> >> Mask:255.255.255.252
>>>>> >>           inet6 addr: fe80::225:90ff:fe9c:631a/64 Scope:Link
>>>>> >>           UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
>>>>> >>           RX packets:1558294810 errors:0 dropped:0 overruns:0 frame:0
>>>>> >>           TX packets:8570261350 errors:0 dropped:0 overruns:0
>>>>> >> carrier:0
>>>>> >>           collisions:0 txqueuelen:1000
>>>>> >>           RX bytes:102083292705 (102.0 GB)  TX bytes:76629844394725
>>>>> >> (76.6
>>>>> >> TB)
>>>>> >>
>>>>> >>
>>>>> >> So, it should not be a simple dropped frames problem.
>>>>> >>
>>>>> >> Since the problem occurs only with larger messages, is there any
>>>>> >> size-limited buffer in ZeroMQ that may cause dropping of the
>>>>> >> messages?
>>>>> >> Or any other possible solution?
>>>>> >>
>>>>> >> Thank you for your help,
>>>>> >>
>>>>> >> Marko
>>>>> >> _______________________________________________
>>>>> >> zeromq-dev mailing list
>>>>> >> zeromq-dev at lists.zeromq.org
>>>>> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > zeromq-dev mailing list
>>>>> > zeromq-dev at lists.zeromq.org
>>>>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>> >
>>>>
>>>>
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>
>>>
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev



More information about the zeromq-dev mailing list