[zeromq-dev] epgm debugging help

Tom Wilberding tom at wilberding.com
Wed Apr 27 19:13:20 CEST 2011


Hi All,

We have a simple application that consumes PUB/SUB messages via TCP
(over an expensive WAN link) and fans them out PUB/SUB via epgm on.the
local LAN.

We have an client that is SUBing to the epgm messages and things seem to
work fine for long stretches of time. This client is intended to stay
connected 24x7 and data flows nearly constantly at all times.

But occasionally the client stops processing data and debugging our code
suggests that calls to zmq_poll() are not returning data.

I did a tcpdump on the epgm PUB side and also on the epgm SUB side. In
both cases, data continues to flow from PUB to SUB (from the NIC's
perspective) even after the client stops processing data. I ran a
netstat -g after the failure and the IGMP subscription was still active.
I also did a tcpdump on the IGMP traffic and I continue to see my
subscriber reporting "V2 Membership Report / Join group".

Any suggestions on how to diagnose this further? Especially any tips for
being able to log or inspect state within the ZMQ layer? Is there known
behavior that would explain why ZMQ might "give up" and stop delivering
to an epgm/SUB client.

I don't have an easy to reproduce case. Rather it happens randomly 2-3
times per day.

Thanks,
Tom




More information about the zeromq-dev mailing list