[zeromq-dev] using PGM, when the data size has more than some extent(from hundreds of kb and mb), data is received abnormally
ahelr wehrl
totoru5891 at gmail.com
Fri Jan 10 11:53:46 CET 2020
Dear all,
Now I have difficulty using multicast using Sub/Pub pattern.
First time, I've posted this issue in libzmq git.
But as there has been any response so far, I request help here.
Issue description
In using PGM(pub/sub pattern), when the data size has more than some
extent(from hundreds of kb and mb), the following problems happen.
1. From some point, messages are not received anymore in "zmq_msg_recv"
though the packets of messages are monitored to keep being received through
Wireshark.
2. Each message is composed of 2 parts. The first part has fixed size of
2 bytes. The last part has variable size.
Sometimes a part of a message is lost and only a part of a message is
received.
The receiver side of my program makes output in normal case like this:
49: first message part of size 2 received, second message part of size
196604 received
50: first message part of size 2 received, second message part of size
199747 received
51: first message part of size 2 received, second message part of size
110503 received
But In some abnormal case,
49: first message part of size 2 received, second message part of size
196604 received
50: first message part of size 2 received, second message part of size 2
received
51: first message part of size 110503 received, second message part of
size 2 received
Environment
4 PCs with Intel CPUs, each has differenct specs(CPU, RAM, GPU)
Each PC is connected to switch hub with 1Gbps with Category 6 cable.
- libzmq version (commit hash if unreleased): 4.25, cppzmq
- OS: Linux Ubuntu 18.04
Minimal test code / Steps to reproduce the issue
I append the test project:
https://github.com/zeromq/libzmq/files/4008149/TestGroupMessaging.zip
You will see the problems following my instructions below.
Basically the test program runs in two different mode, sender and receiver
mode.
In both mode, we give the time argument value (in nanoseconds) to control
sending/receiving rate.
For sender mode, the arguments given to the program execution is the
following.
"TestBasicPublishGroupMessaging(program name) y(indicating sender mode),
100(total sending count), 1000000(sending rate in nanoseconds)"
For receiver mode, the arguments given to the program execution is the
following.
"TestBasicPublishGroupMessaging(program name) n(indicating receiver mode),
1000(receiving rate in nanoseconds)"
I have tweaked the relevant setting values such as ZMQ_RCV/SNDHWM,
ZMQ_RATE, ZMQ_RCV/SNDBUF into the maximum values.
What's the actual result? (include assertion message & call stack if
applicable)
For 100kb ~ 300kb message size, when receiver rate is per 1,000 nanoseconds
and the sender rate is per 1,000,000 nanoseconds, the first problem happens
so frequently.
For 100kb ~ 300kb message size, when receiver rate is per 1,000 nanoseconds
and the sender rate is per 10,000,000 nanoseconds, the first problem
happens so rarely.
For 1MB ~ 3MB, when receiver rate is per 1,000 nanoseconds and the sender
rate is per 1,000,000 or 10,000,000 nanoseconds, the first problem happens
so frequently.
I debugged zeromq code and it was seen that "pgm_recvmsgv" in "receive"
method of "pgm_socket_t" does not get all packets of a message normally
even if all packets had been monitored to be received normally in Wireshark.
For the second problem, it happens in irregular and rare pattern. So it is
difficult to reproduce this problem.
But when I run in debug mode with the above message sizes, I found that
sparsely the data receipt is done in burst pattern:
I added a breakpoint on send
method("networkDistribution.getPublishService().tryPublish()" in
"TestBasicPublishGroupMessaging.cpp") in sender side and repeated to resume
the code with the breakpoint in sender side and watch what happens in
receiver side step by step.
Sometimes, after a message is sent in a sender side, "receive" method of
"pgm_socket_t" does not get all packets with 1428 byte unit size(only some
parts received) composing a message in receiver side. Not yet received
parts of the previous message are retrieved later together with the
following messages in "receive" method of "pgm_socket_t". The second
problem is frequently seen in this case.
How to build the project
1. set the ZMQ_BASE_DIR, ZMQ_USED_VER variable in CMakeLists.txt
depending on your environment.
2. move to the build/Debug or Release directory.
3. execute "cmake -DCMAKE_BUILD_TYPE=Debug or Release ../.."
4. the binary files are produced in the bin/Debug or Release
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20200110/2140c429/attachment.htm>
More information about the zeromq-dev
mailing list