[zeromq-dev] Zeromq messages getting dropped

Yu Dongmin miniway at gmail.com
Sun Jan 27 18:16:44 CET 2013


Hi,

I've found a culprit which caused the data loss. 

When ZMQ send a large message, the stream_engine sends data through multiple out_event calls.

The ZMQ linger option only guarantees messages are delivered to the peer pipe. By the speculative write out_event is called at least once but large message requires multiple hops.

Before finishing enough out_event calls, stream_engine can be terminated.


So a longer linger option will not resolve this issue. A workaround seems to be adding some sleeps before close.  


I'm going to submit a pull request to resolving the issue.

Thanks
Min

On Jan 27, 2013, at 12:42 AM, Yu Dongmin <miniway at gmail.com> wrote:

> Hi,
> 
> My guess was it might have an issue on libzmq (zeromq c library) when large messages were heavily sent.
> 
> Thanks
> Min
> 
> On Jan 26, 2013, at 4:01 PM, Ritesh Adval <riteshadval at gaikai.com> wrote:
> 
>> Hi Min,
>> 
>> Thanks for the update.Just to confirm, 
>> Are you saying that this issue is on zeromq c library or jzmq c wrapper?
>> 
>> Just an update that when I replaced 
>> DEALER socket which connects to ROUTER socket of broker with REQ socket and replaced DEALER socket which connects to DEALER socket of broker with REP socket, then I do not see message loss when doing the same test. (REQ socket does "send" and then "recv" and REP does opposite  "recv" and "send")
>> 
>> -Ritesh
>> Sent from my iPhone.
>> 
>> 
>> On Jan 25, 2013, at 8:42 PM, Min <miniway at gmail.com> wrote:
>> 
>>> I was able to reproduce the issue on jzmq even on zeromq 3.2.2.
>>> 
>>> What I discovered is about last 30K bytes of 45K message was not sometimes delivered to in-router on raw close. 
>>> I didn't build equivalent C code, as jzmq is a thin wrapper of native C library it could have the same problem.
>>> 
>>> But I didn't find a clear solution yet.
>>> 
>>> Thanks
>>> Min
>>> 
>>> 
>>> On Thu, Jan 24, 2013 at 6:39 AM, Ritesh Adval <riteshadval at gaikai.com> wrote:
>>> Hello,
>>> 
>>> I have created a bug for this issue with instructions and java test case. Its at https://zeromq.jira.com/browse/LIBZMQ-497
>>> 
>>> Thanks
>>> Ritesh
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Jan 22, 2013 at 6:30 PM, Ritesh Adval <riteshadval at gaikai.com> wrote:
>>> Thanks Min,
>>> 
>>> I will create a bug with instruction and unit test. I was also experimenting with Java only version of zeromq (https://github.com/zeromq/jeromq). When running same test it does not drop message but has some other issue.
>>> 
>>> -Ritesh
>>> 
>>> 
>>> 
>>> On Mon, Jan 21, 2013 at 11:53 PM, Min <miniway at gmail.com> wrote:
>>> Ritesh,
>>> 
>>> If you can reproduce the problem, Java code should be fine.
>>> 
>>> Community could look into it.
>>> 
>>> Thanks
>>> Min
>>> 
>>> 2013년 1월 17일 목요일에 Ritesh Adval님이 작성:
>>> 
>>> Hi Charles,
>>> 
>>> I have test program in JAVA, I am not a C programmer so i will probably take me time to reproduce this in C. Can someone first take a look at my JAVA program to see if I am not doing anything stupid.  Should I create bug and attach Java maven project?
>>> Its very easy to run it, all you need is zeromq 2.2.0 installed and jzmq built and installed by building jzmq (https://github.com/zeromq/jzmq).
>>> I can add instructions to the bug report. Once confirmed that program  looks right I can try to create a C version of the test but will take me some time.
>>> 
>>> let me know.
>>> 
>>> Thanks
>>> Ritesh
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Jan 16, 2013 at 10:55 PM, Charles Remes <lists at chuckremes.com> wrote:
>>> On Jan 16, 2013, at 4:08 PM, Ritesh Adval <riteshadval at gaikai.com> wrote:
>>> 
>>> > Hi Charles,
>>> >
>>> > Yes I close the socket in my thread after sending 100 messages, and I expect that LINGER will make sure messages are sent to the other end, I expected that context termination will block and make sure any pending messages are sent, but thats not happening. context termination returns quickly.
>>> >
>>> > Just now tried again in my unit test by setting LINGER to Integer.MAX_VALUE explicitly in all my sockets and ran the test again and it did fail with messages getting dropped.
>>> >
>>> > The interesting thing is only the 100th message  (The last one) from some of my concurrent threads are getting dropped.
>>> 
>>> Time to show someone the code. That's the easiest way to figure it out. If you can reproduce this in C, that will get a lot more attention.
>>> 
>>> Here's how to open an issue:
>>> 
>>> http://www.zeromq.org/docs:issue-tracking
>>> 
>>> cr
>>> 
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> 
>>> 
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> 
>>> 
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130128/ea173fcc/attachment.htm>


More information about the zeromq-dev mailing list