[zeromq-dev] missing messages on 40GbE network

Peter Krey peterjkrey at gmail.com
Tue Jul 7 18:51:17 CEST 2015


What are your High Water Mark settings (HWM) ?

On Tue, Jul 7, 2015 at 9:35 AM, A. Mark <gougolith at gmail.com> wrote:

> Hello,
>
> Are you doing extensive error checking with ZMQ? If you are flooding the
> network, some of your ZMQ clients may be timing out on either end and the
> sockets maybe simply closed before they have a chance to send/recv anything?
>
> Mark
>
> On Tue, Jul 7, 2015 at 8:36 AM, Thomas Rodgers <rodgert at twrodgers.com>
> wrote:
>
>> Is the filesystem ext4? We have seen issues with high rates of smallish
>> writes to ext4 (it seems related to failing to acquire a lock in
>> http://lxr.free-electrons.com/source/fs/ext4/extents.c?v=2.6.32#L3228).
>>
>> Using XFS seems to improve the situation for us.
>>
>> On Tue, Jul 7, 2015 at 2:16 AM, Marko Vendelin <markov at sysbio.ioc.ee>
>> wrote:
>>
>>> Hi Peter,
>>>
>>> thank you for the pointers. Its seems now that there is some problem
>>> with the disk I/O, as suspected first. Namely, when system starts to
>>> 'crawl', I can fire up new clients that don't write anything and these
>>> clients are doing absolutely fine (recording high rates). New clients
>>> with disk i/o crawl immediately.
>>>
>>> I'll look into it and would try to isolate the issue further.
>>>
>>> REP-REQ: No, I was using multiple requests in PAIR sockets, as you
>>> advised earlier.
>>>
>>> NORM: When things work, TCP is fine. As far as I know a lot is
>>> processed on the cards internally and I can get to the rates that are
>>> as large as needed.
>>>
>>> I'll let the list know if the problem is in disk I/O and what was the
>>> cause of it.
>>>
>>> Best wishes,
>>>
>>> Marko
>>>
>>>
>>> On Mon, Jul 6, 2015 at 11:30 PM, Peter Krey <krey at ripple.com> wrote:
>>> > You may want to try switching to a UDP based protocol like NORM on
>>> zmq. This
>>> > will let you achieve higher throughput as there will be no TCP packet
>>> > handshakes.
>>> >
>>> > You can also try installing multiple NIC cards on your computer and
>>> bind
>>> > them together into one device for higher throughput if you think the
>>> cards
>>> > devices buffers are being overrun.
>>> >
>>> > On Mon, Jul 6, 2015 at 1:25 PM, Peter Krey <krey at ripple.com> wrote:
>>> >>
>>> >> You are not using REQ-REP properly; a REQ-REP socket will not accept
>>> two
>>> >> REQ messages in a row; it needs a REP before it will proceed
>>> otherwise it
>>> >> will block.
>>> >>
>>> >> I highly advise you using PAIR type for all sockets in your
>>> application
>>> >> and no REQ-REP sockets at all, especially given the throughput
>>> required in
>>> >> your application.
>>> >>
>>> >> On Sun, Jul 5, 2015 at 9:58 AM, Marko Vendelin <markov at sysbio.ioc.ee>
>>> >> wrote:
>>> >>>
>>> >>> I did reprogram using pair sockets, one per each client. They were
>>> still
>>> >>> using request reply pattern and when request was not replied to, the
>>> client
>>> >>> repeated the request. Unfortunately, the similar behaviour was
>>> observed:
>>> >>> initial fast rate reduced and never recovered.
>>> >>>
>>> >>> I'm wondering is it possible to get error codes out of zeromq to see
>>> >>> where the problem is?
>>> >>>
>>> >>> Best wishes
>>> >>>
>>> >>> Marko
>>> >>>
>>> >>> On Jul 4, 2015 12:04 AM, "Marko Vendelin" <marko.vendelin at gmail.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> Is there any way I could check for automatic drop of messages by
>>> >>>> zeromq? I could recompile the library with some debug settings if
>>> >>>> needed, but this information would be very valuable.
>>> >>>>
>>> >>>> In this case I would expect to have the same in nanomsg as well and
>>> in
>>> >>>> the beginning of the test with ZeroMQ. We should have disk i/o
>>> faster
>>> >>>> than the network. Since the dropoff happens at ~10minutes when using
>>> >>>> zeromq, RAM would not be able to cache the data either (at that
>>> time I
>>> >>>> have transferred already ~2TB in 64GB RAM machines).
>>> >>>>
>>> >>>> Use of REQ/REP allows me to spread the load among all disks
>>> >>>> automatically. Since disk writers are one per HDD and after
>>> receiving
>>> >>>> each dataset write it on disk, the load per disk is proportional to
>>> >>>> its speed. The rates I am getting in the beginning with ZMQ (first
>>> ~10
>>> >>>> min, ~30-36Gb/s) are above our requirements and would fit the
>>> >>>> application perfectly. If I could only sustain it as long as the
>>> disk
>>> >>>> space allows.
>>> >>>>
>>> >>>> Re PAIR: I was thinking about giving PAIR a try. Would need to
>>> >>>> reprogram a bit, but its possible.
>>> >>>>
>>> >>>> Best wishes,
>>> >>>>
>>> >>>> Marko
>>> >>>>
>>> >>>>
>>> >>>> On Fri, Jul 3, 2015 at 10:52 PM, Peter Krey <peterjkrey at gmail.com>
>>> >>>> wrote:
>>> >>>> > You may be sending messages faster than you can receive them and
>>> write
>>> >>>> > them
>>> >>>> > to disk, overflowing zeromq message send buffer causing zeromq to
>>> >>>> > automatically discard some messages. This is expected behavior.
>>> >>>> >
>>> >>>> > Also do not use socket type request reply, use pair. This will not
>>> >>>> > require
>>> >>>> > your app to recv and reply before sending the next image; your
>>> app can
>>> >>>> > send
>>> >>>> > async.
>>> >>>> >
>>> >>>> > On Wednesday, July 1, 2015, Marko Vendelin <markov at sysbio.ioc.ee>
>>> >>>> > wrote:
>>> >>>> >>
>>> >>>> >> Dear ØMQ developers:
>>> >>>> >>
>>> >>>> >> Synopsis: I am observing a strange interaction between storing
>>> >>>> >> datastream on harddisks and a loss of ZeroMQ messages. It seems
>>> that
>>> >>>> >> in my use case, when messages are larger than 2MB, some of them
>>> are
>>> >>>> >> randomly dropped.
>>> >>>> >>
>>> >>>> >> Full story:
>>> >>>> >>
>>> >>>> >> I need to pump images acquired by fast scientific cameras into
>>> the
>>> >>>> >> files with the rates approaching 25Gb/s. For that, images are
>>> >>>> >> acquired
>>> >>>> >> in one server and transferred into the harddisk array using
>>> 40Gb/s
>>> >>>> >> network. Since Linux-based solutions using iSCSI were not working
>>> >>>> >> very
>>> >>>> >> well (maybe need to optimize more) and plain network applications
>>> >>>> >> could use the full bandwidth, I decided to use RAID-0 inspired
>>> >>>> >> approach: make filesystem on each of 32 harddisks separately, run
>>> >>>> >> small slave programs one per filesystem and let the slaves ask
>>> the
>>> >>>> >> dataset server for a dataset in a loop. As a messaging system, I
>>> use
>>> >>>> >> ZeroMQ and REQ/REP connection. In general, all seem to work
>>> >>>> >> perfectly:
>>> >>>> >> I am able to stream and record data at about 36Gb/s rates.
>>> However,
>>> >>>> >> at
>>> >>>> >> some point (within 5-10 min), sometimes messages get lost.
>>> >>>> >> Intriguingly, this occurs only if I write files and messages are
>>> 2MB
>>> >>>> >> or larger. Much smaller messages do not seem to trigger this
>>> effect.
>>> >>>> >> If I just stream data and either dump it or just calculate on the
>>> >>>> >> basis of it, all messages go through. All messages go through if
>>> I
>>> >>>> >> use
>>> >>>> >> 1Gb network.
>>> >>>> >>
>>> >>>> >> While in production code I stream data into HDF5, use zmqpp and
>>> >>>> >> pooling to receive messages, I have reduced the problematic code
>>> into
>>> >>>> >> the simplest case using zmq.hpp, regular files, and plain
>>> send/recv
>>> >>>> >> calls. Code is available at
>>> >>>> >>
>>> >>>> >> http://www.ioc.ee/~markov/zmq/problem-missing-messages/
>>> >>>> >>
>>> >>>> >> At the same time, there don't seem to be any excessive drops in
>>> >>>> >> ethernet cards, as reported by ifconfig in Linux (slaves run on
>>> >>>> >> Gentoo, server on Ubuntu):
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> ens1f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
>>> >>>> >>         inet 192.168.38.1  netmask 255.255.255.252  broadcast
>>> >>>> >> 192.168.38.3
>>> >>>> >>         inet6 fe80::225:90ff:fe9c:62c3  prefixlen 64  scopeid
>>> >>>> >> 0x20<link>
>>> >>>> >>         ether 00:25:90:9c:62:c3  txqueuelen 1000  (Ethernet)
>>> >>>> >>         RX packets 8568340799  bytes 76612663159251 (69.6 TiB)
>>> >>>> >>         RX errors 7  dropped 0  overruns 0  frame 7
>>> >>>> >>         TX packets 1558294820  bytes 93932603947 (87.4 GiB)
>>> >>>> >>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions
>>> 0
>>> >>>> >>
>>> >>>> >> eth3      Link encap:Ethernet  HWaddr 00:25:90:9c:63:1a
>>> >>>> >>           inet addr:192.168.38.2  Bcast:192.168.38.3
>>> >>>> >> Mask:255.255.255.252
>>> >>>> >>           inet6 addr: fe80::225:90ff:fe9c:631a/64 Scope:Link
>>> >>>> >>           UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
>>> >>>> >>           RX packets:1558294810 errors:0 dropped:0 overruns:0
>>> frame:0
>>> >>>> >>           TX packets:8570261350 errors:0 dropped:0 overruns:0
>>> >>>> >> carrier:0
>>> >>>> >>           collisions:0 txqueuelen:1000
>>> >>>> >>           RX bytes:102083292705 (102.0 GB)  TX
>>> bytes:76629844394725
>>> >>>> >> (76.6
>>> >>>> >> TB)
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> So, it should not be a simple dropped frames problem.
>>> >>>> >>
>>> >>>> >> Since the problem occurs only with larger messages, is there any
>>> >>>> >> size-limited buffer in ZeroMQ that may cause dropping of the
>>> >>>> >> messages?
>>> >>>> >> Or any other possible solution?
>>> >>>> >>
>>> >>>> >> Thank you for your help,
>>> >>>> >>
>>> >>>> >> Marko
>>> >>>> >> _______________________________________________
>>> >>>> >> zeromq-dev mailing list
>>> >>>> >> zeromq-dev at lists.zeromq.org
>>> >>>> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> >>>> >
>>> >>>> >
>>> >>>> > _______________________________________________
>>> >>>> > zeromq-dev mailing list
>>> >>>> > zeromq-dev at lists.zeromq.org
>>> >>>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> >>>> >
>>> >>>
>>> >>>
>>> >>> _______________________________________________
>>> >>> zeromq-dev mailing list
>>> >>> zeromq-dev at lists.zeromq.org
>>> >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> >>>
>>> >>
>>> >
>>> >
>>> > _______________________________________________
>>> > zeromq-dev mailing list
>>> > zeromq-dev at lists.zeromq.org
>>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> >
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20150707/766832ff/attachment.htm>


More information about the zeromq-dev mailing list