[zeromq-dev] missing messages on 40GbE network
Thomas Rodgers
rodgert at twrodgers.com
Tue Jul 7 17:36:04 CEST 2015
Is the filesystem ext4? We have seen issues with high rates of smallish
writes to ext4 (it seems related to failing to acquire a lock in
http://lxr.free-electrons.com/source/fs/ext4/extents.c?v=2.6.32#L3228).
Using XFS seems to improve the situation for us.
On Tue, Jul 7, 2015 at 2:16 AM, Marko Vendelin <markov at sysbio.ioc.ee> wrote:
> Hi Peter,
>
> thank you for the pointers. Its seems now that there is some problem
> with the disk I/O, as suspected first. Namely, when system starts to
> 'crawl', I can fire up new clients that don't write anything and these
> clients are doing absolutely fine (recording high rates). New clients
> with disk i/o crawl immediately.
>
> I'll look into it and would try to isolate the issue further.
>
> REP-REQ: No, I was using multiple requests in PAIR sockets, as you
> advised earlier.
>
> NORM: When things work, TCP is fine. As far as I know a lot is
> processed on the cards internally and I can get to the rates that are
> as large as needed.
>
> I'll let the list know if the problem is in disk I/O and what was the
> cause of it.
>
> Best wishes,
>
> Marko
>
>
> On Mon, Jul 6, 2015 at 11:30 PM, Peter Krey <krey at ripple.com> wrote:
> > You may want to try switching to a UDP based protocol like NORM on zmq.
> This
> > will let you achieve higher throughput as there will be no TCP packet
> > handshakes.
> >
> > You can also try installing multiple NIC cards on your computer and bind
> > them together into one device for higher throughput if you think the
> cards
> > devices buffers are being overrun.
> >
> > On Mon, Jul 6, 2015 at 1:25 PM, Peter Krey <krey at ripple.com> wrote:
> >>
> >> You are not using REQ-REP properly; a REQ-REP socket will not accept two
> >> REQ messages in a row; it needs a REP before it will proceed otherwise
> it
> >> will block.
> >>
> >> I highly advise you using PAIR type for all sockets in your application
> >> and no REQ-REP sockets at all, especially given the throughput required
> in
> >> your application.
> >>
> >> On Sun, Jul 5, 2015 at 9:58 AM, Marko Vendelin <markov at sysbio.ioc.ee>
> >> wrote:
> >>>
> >>> I did reprogram using pair sockets, one per each client. They were
> still
> >>> using request reply pattern and when request was not replied to, the
> client
> >>> repeated the request. Unfortunately, the similar behaviour was
> observed:
> >>> initial fast rate reduced and never recovered.
> >>>
> >>> I'm wondering is it possible to get error codes out of zeromq to see
> >>> where the problem is?
> >>>
> >>> Best wishes
> >>>
> >>> Marko
> >>>
> >>> On Jul 4, 2015 12:04 AM, "Marko Vendelin" <marko.vendelin at gmail.com>
> >>> wrote:
> >>>>
> >>>> Is there any way I could check for automatic drop of messages by
> >>>> zeromq? I could recompile the library with some debug settings if
> >>>> needed, but this information would be very valuable.
> >>>>
> >>>> In this case I would expect to have the same in nanomsg as well and in
> >>>> the beginning of the test with ZeroMQ. We should have disk i/o faster
> >>>> than the network. Since the dropoff happens at ~10minutes when using
> >>>> zeromq, RAM would not be able to cache the data either (at that time I
> >>>> have transferred already ~2TB in 64GB RAM machines).
> >>>>
> >>>> Use of REQ/REP allows me to spread the load among all disks
> >>>> automatically. Since disk writers are one per HDD and after receiving
> >>>> each dataset write it on disk, the load per disk is proportional to
> >>>> its speed. The rates I am getting in the beginning with ZMQ (first ~10
> >>>> min, ~30-36Gb/s) are above our requirements and would fit the
> >>>> application perfectly. If I could only sustain it as long as the disk
> >>>> space allows.
> >>>>
> >>>> Re PAIR: I was thinking about giving PAIR a try. Would need to
> >>>> reprogram a bit, but its possible.
> >>>>
> >>>> Best wishes,
> >>>>
> >>>> Marko
> >>>>
> >>>>
> >>>> On Fri, Jul 3, 2015 at 10:52 PM, Peter Krey <peterjkrey at gmail.com>
> >>>> wrote:
> >>>> > You may be sending messages faster than you can receive them and
> write
> >>>> > them
> >>>> > to disk, overflowing zeromq message send buffer causing zeromq to
> >>>> > automatically discard some messages. This is expected behavior.
> >>>> >
> >>>> > Also do not use socket type request reply, use pair. This will not
> >>>> > require
> >>>> > your app to recv and reply before sending the next image; your app
> can
> >>>> > send
> >>>> > async.
> >>>> >
> >>>> > On Wednesday, July 1, 2015, Marko Vendelin <markov at sysbio.ioc.ee>
> >>>> > wrote:
> >>>> >>
> >>>> >> Dear ØMQ developers:
> >>>> >>
> >>>> >> Synopsis: I am observing a strange interaction between storing
> >>>> >> datastream on harddisks and a loss of ZeroMQ messages. It seems
> that
> >>>> >> in my use case, when messages are larger than 2MB, some of them are
> >>>> >> randomly dropped.
> >>>> >>
> >>>> >> Full story:
> >>>> >>
> >>>> >> I need to pump images acquired by fast scientific cameras into the
> >>>> >> files with the rates approaching 25Gb/s. For that, images are
> >>>> >> acquired
> >>>> >> in one server and transferred into the harddisk array using 40Gb/s
> >>>> >> network. Since Linux-based solutions using iSCSI were not working
> >>>> >> very
> >>>> >> well (maybe need to optimize more) and plain network applications
> >>>> >> could use the full bandwidth, I decided to use RAID-0 inspired
> >>>> >> approach: make filesystem on each of 32 harddisks separately, run
> >>>> >> small slave programs one per filesystem and let the slaves ask the
> >>>> >> dataset server for a dataset in a loop. As a messaging system, I
> use
> >>>> >> ZeroMQ and REQ/REP connection. In general, all seem to work
> >>>> >> perfectly:
> >>>> >> I am able to stream and record data at about 36Gb/s rates. However,
> >>>> >> at
> >>>> >> some point (within 5-10 min), sometimes messages get lost.
> >>>> >> Intriguingly, this occurs only if I write files and messages are
> 2MB
> >>>> >> or larger. Much smaller messages do not seem to trigger this
> effect.
> >>>> >> If I just stream data and either dump it or just calculate on the
> >>>> >> basis of it, all messages go through. All messages go through if I
> >>>> >> use
> >>>> >> 1Gb network.
> >>>> >>
> >>>> >> While in production code I stream data into HDF5, use zmqpp and
> >>>> >> pooling to receive messages, I have reduced the problematic code
> into
> >>>> >> the simplest case using zmq.hpp, regular files, and plain send/recv
> >>>> >> calls. Code is available at
> >>>> >>
> >>>> >> http://www.ioc.ee/~markov/zmq/problem-missing-messages/
> >>>> >>
> >>>> >> At the same time, there don't seem to be any excessive drops in
> >>>> >> ethernet cards, as reported by ifconfig in Linux (slaves run on
> >>>> >> Gentoo, server on Ubuntu):
> >>>> >>
> >>>> >>
> >>>> >> ens1f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
> >>>> >> inet 192.168.38.1 netmask 255.255.255.252 broadcast
> >>>> >> 192.168.38.3
> >>>> >> inet6 fe80::225:90ff:fe9c:62c3 prefixlen 64 scopeid
> >>>> >> 0x20<link>
> >>>> >> ether 00:25:90:9c:62:c3 txqueuelen 1000 (Ethernet)
> >>>> >> RX packets 8568340799 bytes 76612663159251 (69.6 TiB)
> >>>> >> RX errors 7 dropped 0 overruns 0 frame 7
> >>>> >> TX packets 1558294820 bytes 93932603947 (87.4 GiB)
> >>>> >> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
> >>>> >>
> >>>> >> eth3 Link encap:Ethernet HWaddr 00:25:90:9c:63:1a
> >>>> >> inet addr:192.168.38.2 Bcast:192.168.38.3
> >>>> >> Mask:255.255.255.252
> >>>> >> inet6 addr: fe80::225:90ff:fe9c:631a/64 Scope:Link
> >>>> >> UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
> >>>> >> RX packets:1558294810 errors:0 dropped:0 overruns:0
> frame:0
> >>>> >> TX packets:8570261350 errors:0 dropped:0 overruns:0
> >>>> >> carrier:0
> >>>> >> collisions:0 txqueuelen:1000
> >>>> >> RX bytes:102083292705 (102.0 GB) TX bytes:76629844394725
> >>>> >> (76.6
> >>>> >> TB)
> >>>> >>
> >>>> >>
> >>>> >> So, it should not be a simple dropped frames problem.
> >>>> >>
> >>>> >> Since the problem occurs only with larger messages, is there any
> >>>> >> size-limited buffer in ZeroMQ that may cause dropping of the
> >>>> >> messages?
> >>>> >> Or any other possible solution?
> >>>> >>
> >>>> >> Thank you for your help,
> >>>> >>
> >>>> >> Marko
> >>>> >> _______________________________________________
> >>>> >> zeromq-dev mailing list
> >>>> >> zeromq-dev at lists.zeromq.org
> >>>> >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >>>> >
> >>>> >
> >>>> > _______________________________________________
> >>>> > zeromq-dev mailing list
> >>>> > zeromq-dev at lists.zeromq.org
> >>>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >>>> >
> >>>
> >>>
> >>> _______________________________________________
> >>> zeromq-dev mailing list
> >>> zeromq-dev at lists.zeromq.org
> >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >>>
> >>
> >
> >
> > _______________________________________________
> > zeromq-dev mailing list
> > zeromq-dev at lists.zeromq.org
> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> >
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20150707/d0a214fa/attachment.htm>
More information about the zeromq-dev
mailing list