[zeromq-dev] Assertion failed

Emmanuel TAUREL taurel at esrf.fr
Thu Apr 8 13:45:11 CEST 2010


Hello,

I am re-sending this mail a new time but I have removed the attachment. I was not aware that
attachment are not allowed. Sorry  for this.

Regards

Emmanuel Taurel

-------------------------------------------------------------------------------------
Original e-mail:

Hi Martin,

Thank's very much for this fast action.
I will update my sources and keep you informed if I still notice this
assertion.
The RATE is the same on all involved processes and the RECOVERY_IVL is
at its default value.

BTW, I have noticed something strange using ZMQ.
The context is still the same: 1 publisher running on a host and 5
subscribers running on another host linked with a 100 Mb network.
Both computers are Ubuntu 9.04. I am using epgm protocol to send 4
kBytes messages. In each message, I have a counter which allow me to
identify each message.

I notice different behavior on my subscribers.
1 is receiving all my messages
The other are "loosing" messages: 1 missed 24 messages, another 19,
another 9 and the last one 6
I also had wireshark running on this host.

As one of my subscriber has received all my messages, I think the
network delivered all the messages. I have attached to this email, the
dump of Wireshark of the network transfer. What surprised me in this
graph, is the "holes" we can see in the network usage. If I follow the
sequence number in the epgm packets stream,
I notice "holes" in packet sequence number at the moment where according
to the graph there is no traffic.
Everything looks like some network packet has not been transferred to
wireshark. However, these packets has been transmitted on the network
otherwise none of my subscribers will have reported a correct transmission.
If some packets have not been transmitted to wireshark, I guess the same
things happened for my 4 subscribers which has lost messages.

In one of your previous e-mail, you talked about

kernel implementation of multicast packet dispatching.


My kernel is linux 2.6.28-18. Do you think that this multicast packet
dispatching done by the kernel could be the reason of the behavior I am
observing?
If yes, I guess the solution is to decrease the RATE until the kernel is
able to reliably dispatch all the packets received to all the involved
subscribers.
But this will be a function of both CPU usage and number of subscribers
running on the host.

What do you think?

Regards

Emmanuel Taurel



On 08/04/2010 11:08, Martin Sustrik wrote:
> Emmanuel,
>
> Anyway, I've just fixed the original error. You can try with the trunk now.
>
> Martin
>
> Martin Sustrik wrote:
>    
>> Emmanuel TAUREL wrote:
>>      
>>> Hi Martin,
>>>
>>> First of all, thank's very much for your answer.
>>> What I find strange is that I had 5 subscribers on the same host (using
>>> epgm, the same multicast group and port number). For 4 of them,
>>> everything was fine but
>>> I had this assertion for the fifth one.
>>> If the problem is related to my network, the five processes should have
>>> been impacted, no?
>>>        
>> Hard to say. It depends on kernel implementation of multicast packet
>> dispatching.
>>
>> Just a sanity check: Is RATE and RECOVRY_IVL set to the same values in
>> all the applications?
>>
>> Martin
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>      
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>    





More information about the zeromq-dev mailing list