[zeromq-dev] Multicast messages corrupted
Alexander Zhitlenok
AZhitlenok at geneva-trading.com
Fri Aug 2 00:58:54 CEST 2013
Hi,
We still have multicast messages corrupted and our investigation shows that something likely wrong with zmq layer.
I've written a simple c-epgm-client to exclude client apps and managed code impact. This client does nothing but receives frames and logs frames' sizes and data into log file.
We run this client concurrently with our custom application and mdump.exe, listening to the same address.
The results show that something strange happens likely in zmq layer.
The case is. There are a number of publishers send messages to the same epgm address. Some of the client apps run on the same box, however each one in its own process.
At 13:03:04 today we ran into the problem.
Below is the sequence of events in mdump and our simple frame logger.
(format is: Timestamp - publisher ID - (message type):
MDUMP OUR C++ FRAMES LISTENER
1. 13:03:04.160000 NM_04 (HB) RECEIVED
2. 13:03:04.183000 KT_03 (HB) RECEIVED
3. 13:03:04.184000 KT_01 (HB) DID NOT RECEIVE
4. 2 service msgs
5. 13:03:04.247000 JR_03 (HB) RECEIVED
6. 4 service msgs
7. 13:03:04.309000 KT_02 (HB) RECEIVED
8. 2 service msgs
9. 13:03:04.376000 KT_01 (AR) DID NOT RECEIVE
10. 25 service msgs
11. 13:03:04.964000 JR_04 (HB) RECEIVED
12. 13 service msgs
13. 13:03:06.680000 KT_04 (HB) RECEIVED
14. 2 service msgs
15. 13:03:06.805000 KT_01 (HB) BEGAN RECEIVING CORRUPTED DATA, AND COMPLETELY STOPS AFTER A FEW FRAMES
Starting from that moment (13:03:06.805000) mdump shows further good data chunks, however all the frames are corrupted.
All suspicious messages are from the same client, but zmq stopped processing messages from all the clients. Previous messages from that client were processed finely.
The corresponded segments of the two dumps attached.
Thank you, Alex
-----Original Message-----
From: zeromq-dev-bounces at lists.zeromq.org [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter Hintjens
Sent: Saturday, July 27, 2013 11:51 AM
To: ZeroMQ development list
Subject: Re: [zeromq-dev] Multicast messages corrupted
Hi Alexander,
What I'd do is write a simple C listener so you can trace what's happening and learn whether the issue is in libzmq, clrzmq, or your app.
-Pieter
On Thu, Jul 25, 2013 at 6:18 PM, Alexander Zhitlenok <AZhitlenok at geneva-trading.com<mailto:AZhitlenok at geneva-trading.com>> wrote:
> Hi Pieter, thank you for reply.
>
> 1. I don't think that we have problems with data corruption on sender side. Even if for some reason a sender sends corrupted data, it should not stop data receiving on listener side. However that's what we have! After receiving 3-5 corrupted messages (coming at the same moment), ReceiveFrame() method (clrzmq.dll) never returns back with non-empty message. So, it looks like something is broken on listener side.
>
> 2. Since our app is C# app and we call libzmq through clrzmq, it's almost impossible for us from our layer to destroy c++ data buffer. Our layer calls clrzmq layer with sending managed data byte array. As I get from clrzmq code, they copy our managed data into reusable unmanaged buffer and send the unmanaged buffer to zmq_send() method in a synchronous manner. If zmq_send() internally works asynchronously (as you explained to me), clrzmq c++ data buffer could be corrupted. It would seem that if clrzmq had such an obvious bug that a lot of users must have experienced it? And how can I fix it without changing clrzmq code?
>
> Just to reiterate, it does not seem possible that sending corrupted data from a single sender would cause the listener side to stop receiving messages from all senders.
>
> Sincerely,
> Alex
>
> -----Original Message-----
> From: zeromq-dev-bounces at lists.zeromq.org<mailto:zeromq-dev-bounces at lists.zeromq.org>
> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter
> Hintjens
> Sent: Thursday, July 25, 2013 8:21 AM
> To: ZeroMQ development list
> Subject: Re: [zeromq-dev] Multicast messages corrupted
>
> Alex, check you're not reusing or freeing the message data buffer too soon. zmq_send() is asynchronous and happens in the background. If you reuse the message data buffer or free it, what will be sent (some short time after the send call itself) will be garbage.
>
> On Tue, Jul 23, 2013 at 7:42 PM, Alexander Zhitlenok <AZhitlenok at geneva-trading.com<mailto:AZhitlenok at geneva-trading.com>> wrote:
>> This is what we really have. In that part of our app, we have
>> multiple clients send short messages to one subscriber. As soon as
>> message receives we do nothing but print (in log) number of frames
>> and the first integer from the frame, which is our message ID. (After
>> that we put the message in a queue and process it in another thread).
>> Since our messages are short, all the messages are single-frame ones.
>> However at some moment subscriber receives 3-4-frames message with
>> garbage ID (first int). We see, these unexpected messages come "in
>> pack", 3-5 at the same time. After getting these 3-5 unexpected
>> messages (actually, processing is just catching an exception and doing some logging) no more messages come.
>>
>> When I say "we receive message", I mean we do nothing but call
>> ReceiveFrame for ZmqSocket object (clrzmq.dll)
>>
>>
>>
>> I can easily admit that we are doing something wrong, but we do not
>> do anything at the stage of messages initial receiving.
>>
>>
>>
>> Thank you,
>>
>> Alex
>>
>> (we use win7\epgm\clrzmq)
>>
>>
>>
>>
>>
>> From: zeromq-dev-bounces at lists.zeromq.org<mailto:zeromq-dev-bounces at lists.zeromq.org>
>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Steven
>> McCoy
>> Sent: Tuesday, July 23, 2013 8:19 AM
>> To: ZeroMQ development list
>> Subject: Re: [zeromq-dev] Multicast messages corrupted
>>
>>
>>
>> On 22 July 2013 15:06, Alexander Zhitlenok
>> <AZhitlenok at geneva-trading.com<mailto:AZhitlenok at geneva-trading.com>>
>> wrote:
>>
>> All works fine, sometimes for hours, however at some unpredictable
>> moment we start receiving corrupted messages. After 4-5 corrupted
>> messages, our custom C# layer stops receiving messages. I'm not sure
>> yet (still testing) does Zmq Cpp-layer still receive messages or not?
>>
>>
>>
>>
>>
>> Ideally it should not be corrupted messages from the wire as each
>> packet is checksum verified.
>>
>>
>>
>> This leaves corruption in software and hardware. You really need to
>> capture in parallel with other clients to narrow down the scope of
>> corruption. The implication in your message is that multiple Windows
>> machines are receiving and thus it is likely to be somewhere in the software stack.
>>
>>
>>
>> Preferably a capture of the wire traffic would be recommended to try
>> replaying.
>>
>>
>>
>> --
>>
>> Steve-o
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org<mailto:zeromq-dev at lists.zeromq.org>
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org<mailto:zeromq-dev at lists.zeromq.org>
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org<mailto:zeromq-dev at lists.zeromq.org>
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org<mailto:zeromq-dev at lists.zeromq.org>
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130801/c06fdf12/attachment.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: MDUMP.txt
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130801/c06fdf12/attachment.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: FramesDump.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 14762 bytes
Desc: FramesDump.docx
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130801/c06fdf12/attachment.docx>
More information about the zeromq-dev
mailing list