[zeromq-dev] Assertion failed

Emmanuel TAUREL taurel at esrf.fr
Thu Apr 8 13:36:52 CEST 2010


Hi Martin,

Thank's very much for this fast action.
I will update my sources and keep you informed if I still notice this 
assertion.
The RATE is the same on all involved processes and the RECOVERY_IVL is 
at its default value.

BTW, I have noticed something strange using ZMQ.
The context is still the same: 1 publisher running on a host and 5 
subscribers running on another host linked with a 100 Mb network.
Both computers are Ubuntu 9.04. I am using epgm protocol to send 4 
kBytes messages. In each message, I have a counter which allow me to 
identify each message.

I notice different behavior on my subscribers.
1 is receiving all my messages
The other are "loosing" messages: 1 missed 24 messages, another 19, 
another 9 and the last one 6
I also had wireshark running on this host.

As one of my subscriber has received all my messages, I think the 
network delivered all the messages. I have attached to this email, the 
dump of Wireshark of the network transfer. What surprised me in this 
graph, is the "holes" we can see in the network usage. If I follow the 
sequence number in the epgm packets stream,
I notice "holes" in packet sequence number at the moment where according 
to the graph there is no traffic.
Everything looks like some network packet has not been transferred to 
wireshark. However, these packets has been transmitted on the network 
otherwise none of my subscribers will have reported a correct transmission.
If some packets have not been transmitted to wireshark, I guess the same 
things happened for my 4 subscribers which has lost messages.

In one of your previous e-mail, you talked about

kernel implementation of multicast packet dispatching.


My kernel is linux 2.6.28-18. Do you think that this multicast packet 
dispatching done by the kernel could be the reason of the behavior I am 
observing?
If yes, I guess the solution is to decrease the RATE until the kernel is 
able to reliably dispatch all the packets received to all the involved 
subscribers.
But this will be a function of both CPU usage and number of subscribers 
running on the host.

What do you think?

Regards

Emmanuel Taurel


On 08/04/2010 11:08, Martin Sustrik wrote:
> Emmanuel,
>
> Anyway, I've just fixed the original error. You can try with the trunk now.
>
> Martin
>
> Martin Sustrik wrote:
>    
>> Emmanuel TAUREL wrote:
>>      
>>> Hi Martin,
>>>
>>> First of all, thank's very much for your answer.
>>> What I find strange is that I had 5 subscribers on the same host (using
>>> epgm, the same multicast group and port number). For 4 of them,
>>> everything was fine but
>>> I had this assertion for the fifth one.
>>> If the problem is related to my network, the five processes should have
>>> been impacted, no?
>>>        
>> Hard to say. It depends on kernel implementation of multicast packet
>> dispatching.
>>
>> Just a sanity check: Is RATE and RECOVRY_IVL set to the same values in
>> all the applications?
>>
>> Martin
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>      
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>    

-------------- next part --------------
A non-text attachment was scrubbed...
Name: wireshark.jpg
Type: image/jpeg
Size: 151070 bytes
Desc: not available
URL: <http://lists.zeromq.org/pipermail/zeromq-dev/attachments/20100408/1ae01991/attachment.jpg>


More information about the zeromq-dev mailing list