[zeromq-dev] Message batching in zmq

Francesco francesco.montorsi at gmail.com
Mon Aug 12 15:37:01 CEST 2019

Hi Doron,

Il giorno lun 12 ago 2019 alle ore 12:13 Doron Somech
<somdoron at gmail.com> ha scritto:
> It is not waiting to batch up.
> The background IO thread dequeue messages from internal queue of messages waiting to be sent.
> Zeromq dequeue messages until that queue is empty or the buffer is full, so not waiting for anything.

Right ok, I didn't meant to say that it was literally waiting doing a
sleep() but my naive reasoning would be that the ZMQ background IO
thread should always have its queue full of messages to send over TCP
so that message batching up to 8KB should be happening all the time...
but then my question (why I don't get a flat curve up to 8kB message
sizes) applies :)

I did some further investigation and I found that, in the 10Gbps
environment setup I benchmarked
(http://zeromq.org/results:10gbe-tests-v432) the performances are
bounded by the remote_thr side, when sending 64B frames. Here is what
"perf top" reports on the 2 worker threads of the remote_thr app:

main remote_thr thread:

 23,33%  libzmq.so.5.2.3   [.] zmq::ypipe_t<zmq::msg_t, 256>::flush
  22,86%  libc-2.17.so      [.] malloc
  20,00%  libc-2.17.so      [.] _int_malloc
  11,51%  libzmq.so.5.2.3   [.] zmq::pipe_t::write
   4,35%  libzmq.so.5.2.3   [.] zmq::ypipe_t<zmq::msg_t, 256>::write
   2,38%  libzmq.so.5.2.3   [.] zmq::socket_base_t::send
   1,81%  libzmq.so.5.2.3   [.] zmq::lb_t::sendpipe
   1,36%  libzmq.so.5.2.3   [.] zmq::msg_t::init_size
   1,33%  libzmq.so.5.2.3   [.] zmq::pipe_t::flush

zmq bg IO remote_thr thread:

  38,35%  libc-2.17.so        [.] _int_free
  13,61%  libzmq.so.5.2.3     [.] zmq::pipe_t::read
   9,24%  libc-2.17.so        [.] __memcpy_ssse3_back
   8,99%  libzmq.so.5.2.3     [.] zmq::msg_t::size
   3,22%  libzmq.so.5.2.3     [.] zmq::encoder_base_t<zmq::v2_encoder_t>::encode
   2,34%  [kernel]            [k] sysret_check
   2,20%  libzmq.so.5.2.3     [.] zmq::ypipe_t<zmq::msg_t, 256>::check_read
   2,15%  libzmq.so.5.2.3     [.] zmq::ypipe_t<zmq::msg_t, 256>::read
   1,32%  libc-2.17.so        [.] free

So my feeling is that even if the message batching is happening, right
now it's the zmq_msg_init_size() call that is limiting the
performances actually.
This is the same problem I experienced in a more complex contest and
that I described in this email thread:

> If we would support the zerocopy we can make the buffer larger than 8kb, and when the buffer is full we would use the zerocopy flag.

Right. However before getting benefits from the new kernel zerocopy
flag I think we should somehow allow the libzmq users to use some kind
of memory pooling, otherwise my feeling is that the performance
benefit would be neglible... what do you think?


More information about the zeromq-dev mailing list