[zeromq-dev] Inefficient TCP connection for my PUB-SUB zmq communication

Francesco francesco.montorsi at gmail.com
Sat Mar 27 10:22:32 CET 2021

Hi Jim,
You're right and I have in plan to change the MTU to 9000 for sure. However
even now, with the MTU being 1500, I see most packets are very far from the
Attached is a screenshot of the capture:

[image: tcp_capture.png]

By looking at the timestamps I see that the packets of size 583B and 376B
are spaced just 100us roughly and between the packet of 376B and 366B are
spaced 400us.
In this case I'd be more than welcome to pay some extra latency and merge
all these 3 packets together.

After some more digging I found this code in ZMQ:

    //  Disable Nagle's algorithm. We are doing data batching on 0MQ level,
    //  so using Nagle wouldn't improve throughput in anyway, but it would
    //  hurt latency.
    int nodelay = 1;
    const int rc =
      setsockopt (s_, IPPROTO_TCP, TCP_NODELAY,
                  reinterpret_cast<char *> (&nodelay), sizeof (int));
    assert_success_or_recoverable (s_, rc);
    if (rc != 0)
        return rc;

Now my next question is: where is this " data batching on 0MQ level"
happening? Can I tune it somehow? Can I restore Nagle algorithm ?
I saw also from here
that there's the possibility to set TCP_CORK as option on the socket to try
to optimize throughput ... any way to do that through ZMQ?



Il giorno sab 27 mar 2021 alle ore 05:01 Jim Melton <jim at melton.space> ha

> Small TCP packets will never achieve maximum throughput. This is
> independent of ZMQ. Each TCP packet requires a synchronous round-trip.
> For a 20 Gbps network, you need a larger MTU to achieve close to
> theoretical bandwidth, and each packet needs to be close to MTU. Jumbo MTU
> is typically 9000 bytes. The TCP ACK packets will kill your throughput,
> though.
> --
> Jim Melton
> (303) 829-0447
> http://blogs.melton.space/pharisee/
> jim at melton.space
> On Mar 26, 2021, at 4:17 PM, Francesco <francesco.montorsi at gmail.com>
> wrote:
> Hi all,
> I'm using ZMQ in a product that moves a lot of data using TCP as transport
> and PUB-SUB as communication pattern. "A lot" here means around 1Gbps. The
> software is actually a mono-directional chain of small components each
> linked to the previous with a SUB socket (to receive data) and a PUB socket
> (to send data to next stage).
> I'm debugging an issue with one of these components receiving 1.1Gbps from
> its SUB socket and sending out 1.1Gbps on its PUB socket (no wonder the two
> numbers match since the component does not aggregation whatsoever).
> The "problem" is that we are currently using 16 ZMQ background threads to
> move a total of 2.2Gbps for that software component (note the physical
> links can carry up to 20Gbps so we're far from saturation of the link).
> IIRC the "golden rule" for sizing number of ZMQ background threads is 1Gbps
> = 1 thread.
> As you can see we're very far from this golden rule, and that's what I'm
> trying to debug.
> The ZMQ background threads have a CPU usage ranging from 98% to 80%.
> Using "strace" I see that most of the time for these threads is spent in
> the "sendto" syscall.
> So I started digging on the quality of the TX side of the TCP connection,
> recording a short trace of the traffic outgoing from the software component.
> Analyzing the traffic with wireshark it turns out that the TCP packets for
> the PUB connection are pretty small:
> * 50% of them are 66B long; these are the TCP ACK packets (incoming)
> * 21% of them are in the range 160B-320B
> * 18% in the range 320B-640B
> * 5% in range 640B-1280B
> * just 3% reach the MTU equal to 1500B
> * [there are a <1% fraction that also exceed the MTU=1500B of the link,
> which I'm not sure how is possible]
> My belief is that having a fewer number of packets, all close to the MTU
> of the link should greatly improve the performances. Would you agree with
> that?
> Is there any configuration I can apply on the PUB socket to force the
> Linux TCP stack to generate fewer but larger TCP segments on the wire?
> Thanks for any hint,
> Francesco
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20210327/a57f293d/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tcp_capture.png
Type: image/png
Size: 56670 bytes
Desc: not available
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20210327/a57f293d/attachment.png>

More information about the zeromq-dev mailing list