[zeromq-dev] missing messages on 40GbE network

Peter Krey krey at ripple.com
Mon Jul 6 22:25:20 CEST 2015


You are not using REQ-REP properly; a REQ-REP socket will not accept two
REQ messages in a row; it needs a REP before it will proceed otherwise it
will block.

I highly advise you using PAIR type for all sockets in your application and
no REQ-REP sockets at all, especially given the throughput required in your
application.

On Sun, Jul 5, 2015 at 9:58 AM, Marko Vendelin <markov at sysbio.ioc.ee> wrote:

> I did reprogram using pair sockets, one per each client. They were still
> using request reply pattern and when request was not replied to, the client
> repeated the request. Unfortunately, the similar behaviour was observed:
> initial fast rate reduced and never recovered.
>
> I'm wondering is it possible to get error codes out of zeromq to see where
> the problem is?
>
> Best wishes
>
> Marko
> On Jul 4, 2015 12:04 AM, "Marko Vendelin" <marko.vendelin at gmail.com>
> wrote:
>
>> Is there any way I could check for automatic drop of messages by
>> zeromq? I could recompile the library with some debug settings if
>> needed, but this information would be very valuable.
>>
>> In this case I would expect to have the same in nanomsg as well and in
>> the beginning of the test with ZeroMQ. We should have disk i/o faster
>> than the network. Since the dropoff happens at ~10minutes when using
>> zeromq, RAM would not be able to cache the data either (at that time I
>> have transferred already ~2TB in 64GB RAM machines).
>>
>> Use of REQ/REP allows me to spread the load among all disks
>> automatically. Since disk writers are one per HDD and after receiving
>> each dataset write it on disk, the load per disk is proportional to
>> its speed. The rates I am getting in the beginning with ZMQ (first ~10
>> min, ~30-36Gb/s) are above our requirements and would fit the
>> application perfectly. If I could only sustain it as long as the disk
>> space allows.
>>
>> Re PAIR: I was thinking about giving PAIR a try. Would need to
>> reprogram a bit, but its possible.
>>
>> Best wishes,
>>
>> Marko
>>
>>
>> On Fri, Jul 3, 2015 at 10:52 PM, Peter Krey <peterjkrey at gmail.com> wrote:
>> > You may be sending messages faster than you can receive them and write
>> them
>> > to disk, overflowing zeromq message send buffer causing zeromq to
>> > automatically discard some messages. This is expected behavior.
>> >
>> > Also do not use socket type request reply, use pair. This will not
>> require
>> > your app to recv and reply before sending the next image; your app can
>> send
>> > async.
>> >
>> > On Wednesday, July 1, 2015, Marko Vendelin <markov at sysbio.ioc.ee>
>> wrote:
>> >>
>> >> Dear ØMQ developers:
>> >>
>> >> Synopsis: I am observing a strange interaction between storing
>> >> datastream on harddisks and a loss of ZeroMQ messages. It seems that
>> >> in my use case, when messages are larger than 2MB, some of them are
>> >> randomly dropped.
>> >>
>> >> Full story:
>> >>
>> >> I need to pump images acquired by fast scientific cameras into the
>> >> files with the rates approaching 25Gb/s. For that, images are acquired
>> >> in one server and transferred into the harddisk array using 40Gb/s
>> >> network. Since Linux-based solutions using iSCSI were not working very
>> >> well (maybe need to optimize more) and plain network applications
>> >> could use the full bandwidth, I decided to use RAID-0 inspired
>> >> approach: make filesystem on each of 32 harddisks separately, run
>> >> small slave programs one per filesystem and let the slaves ask the
>> >> dataset server for a dataset in a loop. As a messaging system, I use
>> >> ZeroMQ and REQ/REP connection. In general, all seem to work perfectly:
>> >> I am able to stream and record data at about 36Gb/s rates. However, at
>> >> some point (within 5-10 min), sometimes messages get lost.
>> >> Intriguingly, this occurs only if I write files and messages are 2MB
>> >> or larger. Much smaller messages do not seem to trigger this effect.
>> >> If I just stream data and either dump it or just calculate on the
>> >> basis of it, all messages go through. All messages go through if I use
>> >> 1Gb network.
>> >>
>> >> While in production code I stream data into HDF5, use zmqpp and
>> >> pooling to receive messages, I have reduced the problematic code into
>> >> the simplest case using zmq.hpp, regular files, and plain send/recv
>> >> calls. Code is available at
>> >>
>> >> http://www.ioc.ee/~markov/zmq/problem-missing-messages/
>> >>
>> >> At the same time, there don't seem to be any excessive drops in
>> >> ethernet cards, as reported by ifconfig in Linux (slaves run on
>> >> Gentoo, server on Ubuntu):
>> >>
>> >>
>> >> ens1f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>> >>         inet 192.168.38.1  netmask 255.255.255.252  broadcast
>> 192.168.38.3
>> >>         inet6 fe80::225:90ff:fe9c:62c3  prefixlen 64  scopeid
>> 0x20<link>
>> >>         ether 00:25:90:9c:62:c3  txqueuelen 1000  (Ethernet)
>> >>         RX packets 8568340799  bytes 76612663159251 (69.6 TiB)
>> >>         RX errors 7  dropped 0  overruns 0  frame 7
>> >>         TX packets 1558294820  bytes 93932603947 (87.4 GiB)
>> >>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>> >>
>> >> eth3      Link encap:Ethernet  HWaddr 00:25:90:9c:63:1a
>> >>           inet addr:192.168.38.2  Bcast:192.168.38.3
>> Mask:255.255.255.252
>> >>           inet6 addr: fe80::225:90ff:fe9c:631a/64 Scope:Link
>> >>           UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
>> >>           RX packets:1558294810 errors:0 dropped:0 overruns:0 frame:0
>> >>           TX packets:8570261350 errors:0 dropped:0 overruns:0 carrier:0
>> >>           collisions:0 txqueuelen:1000
>> >>           RX bytes:102083292705 (102.0 GB)  TX bytes:76629844394725
>> (76.6
>> >> TB)
>> >>
>> >>
>> >> So, it should not be a simple dropped frames problem.
>> >>
>> >> Since the problem occurs only with larger messages, is there any
>> >> size-limited buffer in ZeroMQ that may cause dropping of the messages?
>> >> Or any other possible solution?
>> >>
>> >> Thank you for your help,
>> >>
>> >> Marko
>> >> _______________________________________________
>> >> zeromq-dev mailing list
>> >> zeromq-dev at lists.zeromq.org
>> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >
>> >
>> > _______________________________________________
>> > zeromq-dev mailing list
>> > zeromq-dev at lists.zeromq.org
>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> >
>>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20150706/82da337d/attachment.htm>


More information about the zeromq-dev mailing list