[zeromq-dev] Multicast messages corrupted
Pieter Hintjens
ph at imatix.com
Sat Jul 27 18:50:38 CEST 2013
Hi Alexander,
What I'd do is write a simple C listener so you can trace what's
happening and learn whether the issue is in libzmq, clrzmq, or your
app.
-Pieter
On Thu, Jul 25, 2013 at 6:18 PM, Alexander Zhitlenok
<AZhitlenok at geneva-trading.com> wrote:
> Hi Pieter, thank you for reply.
>
> 1. I don't think that we have problems with data corruption on sender side. Even if for some reason a sender sends corrupted data, it should not stop data receiving on listener side. However that's what we have! After receiving 3-5 corrupted messages (coming at the same moment), ReceiveFrame() method (clrzmq.dll) never returns back with non-empty message. So, it looks like something is broken on listener side.
>
> 2. Since our app is C# app and we call libzmq through clrzmq, it's almost impossible for us from our layer to destroy c++ data buffer. Our layer calls clrzmq layer with sending managed data byte array. As I get from clrzmq code, they copy our managed data into reusable unmanaged buffer and send the unmanaged buffer to zmq_send() method in a synchronous manner. If zmq_send() internally works asynchronously (as you explained to me), clrzmq c++ data buffer could be corrupted. It would seem that if clrzmq had such an obvious bug that a lot of users must have experienced it? And how can I fix it without changing clrzmq code?
>
> Just to reiterate, it does not seem possible that sending corrupted data from a single sender would cause the listener side to stop receiving messages from all senders.
>
> Sincerely,
> Alex
>
> -----Original Message-----
> From: zeromq-dev-bounces at lists.zeromq.org [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Pieter Hintjens
> Sent: Thursday, July 25, 2013 8:21 AM
> To: ZeroMQ development list
> Subject: Re: [zeromq-dev] Multicast messages corrupted
>
> Alex, check you're not reusing or freeing the message data buffer too soon. zmq_send() is asynchronous and happens in the background. If you reuse the message data buffer or free it, what will be sent (some short time after the send call itself) will be garbage.
>
> On Tue, Jul 23, 2013 at 7:42 PM, Alexander Zhitlenok <AZhitlenok at geneva-trading.com> wrote:
>> This is what we really have. In that part of our app, we have multiple
>> clients send short messages to one subscriber. As soon as message
>> receives we do nothing but print (in log) number of frames and the
>> first integer from the frame, which is our message ID. (After that we
>> put the message in a queue and process it in another thread). Since
>> our messages are short, all the messages are single-frame ones.
>> However at some moment subscriber receives 3-4-frames message with
>> garbage ID (first int). We see, these unexpected messages come "in
>> pack", 3-5 at the same time. After getting these 3-5 unexpected
>> messages (actually, processing is just catching an exception and doing some logging) no more messages come.
>>
>> When I say "we receive message", I mean we do nothing but call
>> ReceiveFrame for ZmqSocket object (clrzmq.dll)
>>
>>
>>
>> I can easily admit that we are doing something wrong, but we do not do
>> anything at the stage of messages initial receiving.
>>
>>
>>
>> Thank you,
>>
>> Alex
>>
>> (we use win7\epgm\clrzmq)
>>
>>
>>
>>
>>
>> From: zeromq-dev-bounces at lists.zeromq.org
>> [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Steven McCoy
>> Sent: Tuesday, July 23, 2013 8:19 AM
>> To: ZeroMQ development list
>> Subject: Re: [zeromq-dev] Multicast messages corrupted
>>
>>
>>
>> On 22 July 2013 15:06, Alexander Zhitlenok
>> <AZhitlenok at geneva-trading.com>
>> wrote:
>>
>> All works fine, sometimes for hours, however at some unpredictable
>> moment we start receiving corrupted messages. After 4-5 corrupted
>> messages, our custom C# layer stops receiving messages. I'm not sure
>> yet (still testing) does Zmq Cpp-layer still receive messages or not?
>>
>>
>>
>>
>>
>> Ideally it should not be corrupted messages from the wire as each
>> packet is checksum verified.
>>
>>
>>
>> This leaves corruption in software and hardware. You really need to
>> capture in parallel with other clients to narrow down the scope of
>> corruption. The implication in your message is that multiple Windows
>> machines are receiving and thus it is likely to be somewhere in the software stack.
>>
>>
>>
>> Preferably a capture of the wire traffic would be recommended to try
>> replaying.
>>
>>
>>
>> --
>>
>> Steve-o
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
More information about the zeromq-dev
mailing list