[zeromq-dev] Zeromq messages getting dropped
Yu Dongmin
miniway at gmail.com
Sun Jan 27 18:16:44 CET 2013
Hi,
I've found a culprit which caused the data loss.
When ZMQ send a large message, the stream_engine sends data through multiple out_event calls.
The ZMQ linger option only guarantees messages are delivered to the peer pipe. By the speculative write out_event is called at least once but large message requires multiple hops.
Before finishing enough out_event calls, stream_engine can be terminated.
So a longer linger option will not resolve this issue. A workaround seems to be adding some sleeps before close.
I'm going to submit a pull request to resolving the issue.
Thanks
Min
On Jan 27, 2013, at 12:42 AM, Yu Dongmin <miniway at gmail.com> wrote:
> Hi,
>
> My guess was it might have an issue on libzmq (zeromq c library) when large messages were heavily sent.
>
> Thanks
> Min
>
> On Jan 26, 2013, at 4:01 PM, Ritesh Adval <riteshadval at gaikai.com> wrote:
>
>> Hi Min,
>>
>> Thanks for the update.Just to confirm,
>> Are you saying that this issue is on zeromq c library or jzmq c wrapper?
>>
>> Just an update that when I replaced
>> DEALER socket which connects to ROUTER socket of broker with REQ socket and replaced DEALER socket which connects to DEALER socket of broker with REP socket, then I do not see message loss when doing the same test. (REQ socket does "send" and then "recv" and REP does opposite "recv" and "send")
>>
>> -Ritesh
>> Sent from my iPhone.
>>
>>
>> On Jan 25, 2013, at 8:42 PM, Min <miniway at gmail.com> wrote:
>>
>>> I was able to reproduce the issue on jzmq even on zeromq 3.2.2.
>>>
>>> What I discovered is about last 30K bytes of 45K message was not sometimes delivered to in-router on raw close.
>>> I didn't build equivalent C code, as jzmq is a thin wrapper of native C library it could have the same problem.
>>>
>>> But I didn't find a clear solution yet.
>>>
>>> Thanks
>>> Min
>>>
>>>
>>> On Thu, Jan 24, 2013 at 6:39 AM, Ritesh Adval <riteshadval at gaikai.com> wrote:
>>> Hello,
>>>
>>> I have created a bug for this issue with instructions and java test case. Its at https://zeromq.jira.com/browse/LIBZMQ-497
>>>
>>> Thanks
>>> Ritesh
>>>
>>>
>>>
>>>
>>> On Tue, Jan 22, 2013 at 6:30 PM, Ritesh Adval <riteshadval at gaikai.com> wrote:
>>> Thanks Min,
>>>
>>> I will create a bug with instruction and unit test. I was also experimenting with Java only version of zeromq (https://github.com/zeromq/jeromq). When running same test it does not drop message but has some other issue.
>>>
>>> -Ritesh
>>>
>>>
>>>
>>> On Mon, Jan 21, 2013 at 11:53 PM, Min <miniway at gmail.com> wrote:
>>> Ritesh,
>>>
>>> If you can reproduce the problem, Java code should be fine.
>>>
>>> Community could look into it.
>>>
>>> Thanks
>>> Min
>>>
>>> 2013년 1월 17일 목요일에 Ritesh Adval님이 작성:
>>>
>>> Hi Charles,
>>>
>>> I have test program in JAVA, I am not a C programmer so i will probably take me time to reproduce this in C. Can someone first take a look at my JAVA program to see if I am not doing anything stupid. Should I create bug and attach Java maven project?
>>> Its very easy to run it, all you need is zeromq 2.2.0 installed and jzmq built and installed by building jzmq (https://github.com/zeromq/jzmq).
>>> I can add instructions to the bug report. Once confirmed that program looks right I can try to create a C version of the test but will take me some time.
>>>
>>> let me know.
>>>
>>> Thanks
>>> Ritesh
>>>
>>>
>>>
>>>
>>> On Wed, Jan 16, 2013 at 10:55 PM, Charles Remes <lists at chuckremes.com> wrote:
>>> On Jan 16, 2013, at 4:08 PM, Ritesh Adval <riteshadval at gaikai.com> wrote:
>>>
>>> > Hi Charles,
>>> >
>>> > Yes I close the socket in my thread after sending 100 messages, and I expect that LINGER will make sure messages are sent to the other end, I expected that context termination will block and make sure any pending messages are sent, but thats not happening. context termination returns quickly.
>>> >
>>> > Just now tried again in my unit test by setting LINGER to Integer.MAX_VALUE explicitly in all my sockets and ran the test again and it did fail with messages getting dropped.
>>> >
>>> > The interesting thing is only the 100th message (The last one) from some of my concurrent threads are getting dropped.
>>>
>>> Time to show someone the code. That's the easiest way to figure it out. If you can reproduce this in C, that will get a lot more attention.
>>>
>>> Here's how to open an issue:
>>>
>>> http://www.zeromq.org/docs:issue-tracking
>>>
>>> cr
>>>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130128/ea173fcc/attachment.htm>
More information about the zeromq-dev
mailing list