[zeromq-dev] using PGM, when the data size has more than some extent(from hundreds of kb and mb), data is received abnormally

ahelr wehrl totoru5891 at gmail.com
Fri Jan 10 11:53:46 CET 2020


Dear all,
Now I have difficulty using multicast using Sub/Pub pattern.
First time, I've posted this issue in libzmq git.
But as there has been any response so far, I request help here.

Issue description

In using PGM(pub/sub pattern), when the data size has more than some
extent(from hundreds of kb and mb), the following problems happen.

   1. From some point, messages are not received anymore in "zmq_msg_recv"
   though the packets of messages are monitored to keep being received through
   Wireshark.
   2. Each message is composed of 2 parts. The first part has fixed size of
   2 bytes. The last part has variable size.
   Sometimes a part of a message is lost and only a part of a message is
   received.
   The receiver side of my program makes output in normal case like this:
   49: first message part of size 2 received, second message part of size
   196604 received
   50: first message part of size 2 received, second message part of size
   199747 received
   51: first message part of size 2 received, second message part of size
   110503 received
   But In some abnormal case,
   49: first message part of size 2 received, second message part of size
   196604 received
   50: first message part of size 2 received, second message part of size 2
   received
   51: first message part of size 110503 received, second message part of
   size 2 received

Environment

4 PCs with Intel CPUs, each has differenct specs(CPU, RAM, GPU)
Each PC is connected to switch hub with 1Gbps with Category 6 cable.

   - libzmq version (commit hash if unreleased): 4.25, cppzmq
   - OS: Linux Ubuntu 18.04

Minimal test code / Steps to reproduce the issue

I append the test project:
https://github.com/zeromq/libzmq/files/4008149/TestGroupMessaging.zip
You will see the problems following my instructions below.
Basically the test program runs in two different mode, sender and receiver
mode.
In both mode, we give the time argument value (in nanoseconds) to control
sending/receiving rate.
For sender mode, the arguments given to the program execution is the
following.
"TestBasicPublishGroupMessaging(program name) y(indicating sender mode),
100(total sending count), 1000000(sending rate in nanoseconds)"
For receiver mode, the arguments given to the program execution is the
following.
"TestBasicPublishGroupMessaging(program name) n(indicating receiver mode),
1000(receiving rate in nanoseconds)"

I have tweaked the relevant setting values such as ZMQ_RCV/SNDHWM,
ZMQ_RATE, ZMQ_RCV/SNDBUF into the maximum values.
What's the actual result? (include assertion message & call stack if
applicable)

For 100kb ~ 300kb message size, when receiver rate is per 1,000 nanoseconds
and the sender rate is per 1,000,000 nanoseconds, the first problem happens
so frequently.
For 100kb ~ 300kb message size, when receiver rate is per 1,000 nanoseconds
and the sender rate is per 10,000,000 nanoseconds, the first problem
happens so rarely.
For 1MB ~ 3MB, when receiver rate is per 1,000 nanoseconds and the sender
rate is per 1,000,000 or 10,000,000 nanoseconds, the first problem happens
so frequently.

I debugged zeromq code and it was seen that "pgm_recvmsgv" in "receive"
method of "pgm_socket_t" does not get all packets of a message normally
even if all packets had been monitored to be received normally in Wireshark.

For the second problem, it happens in irregular and rare pattern. So it is
difficult to reproduce this problem.
But when I run in debug mode with the above message sizes, I found that
sparsely the data receipt is done in burst pattern:
I added a breakpoint on send
method("networkDistribution.getPublishService().tryPublish()" in
"TestBasicPublishGroupMessaging.cpp") in sender side and repeated to resume
the code with the breakpoint in sender side and watch what happens in
receiver side step by step.
Sometimes, after a message is sent in a sender side, "receive" method of
"pgm_socket_t" does not get all packets with 1428 byte unit size(only some
parts received) composing a message in receiver side. Not yet received
parts of the previous message are retrieved later together with the
following messages in "receive" method of "pgm_socket_t". The second
problem is frequently seen in this case.
How to build the project

   1. set the ZMQ_BASE_DIR, ZMQ_USED_VER variable in CMakeLists.txt
   depending on your environment.
   2. move to the build/Debug or Release directory.
   3. execute "cmake -DCMAKE_BUILD_TYPE=Debug or Release ../.."
   4. the binary files are produced in the bin/Debug or Release
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20200110/2140c429/attachment.htm>


More information about the zeromq-dev mailing list