[zeromq-dev] Apparent queuing on PUB/SUB fan-in despite HWM=1
rossabri at hotmail.com
Wed May 11 16:46:28 CEST 2011
Let me say first that I'm a newb, but that I *kinda* feel like I've grokked ZeroMQ. I'm working on a real-time vision system and we're using ZeroMQ to connect its major computational components. My review so far: ZeroMQ is awesome!
There are a number of design issues that I'd like help with in the medium term. For now, though, there's one that needs some urgent attention: *apparent* queuing of messages on PUB/SUB fan-in despite having HWM set to 1 for both the senders and receivers. I say "apparent" here to acknowledge that problem is most likely with my implementation, and not ZeroMQ. I can't isolate it on my side, though, and am generally at a loss to explain where it's coming from.
We use standard PUB/SUB-bind/connect socket pairs throughout the system, all with HWM set to 1 on both sides. The alleged queuing is apparent at the system endpoint, where messages from all the intermediate processing nodes are collected and displayed as a "movie." (For the most part, these messages are small in size; they're JSON representations of bounding boxes). What we are finding, though, is that messages from the fastest intermediate nodes -- also the nodes that are generating the most messages as a consequence of being fast -- tend to lag farther and farther behind messages from slower nodes.
My understanding of the HWM semantic is that, when the HWM is reached, new messages will not be queued. That means that the queued messages are not the newest messages. There are good reasons for why this has to be the case. Fine. Still, this growing lag for prolific senders doesn't make any sense to me. The message that I receive at the system endpoint should only be as old as the last zmq_recv from that node. Yet, I am experiencing situations in which one of the intermediate nodes will "take a break" -- eg, it won't send any messages for a while -- and the endpoint will continue to process old messages from it for a few more minutes. Shouldn't it process JUST ONE message from a node that's "taking a break" if HWM is 1 on both sides of that socket pair?
My secret concern here is that there is some "gotcha" in the fair-queuing semantics of PUB/SUB pairs that's causing this to manifest itself at the fan-in point. But again, IT'S PROBABLY ME. Any intuition that you guys might be able to share would be greatly appreciated. In the meantime I'll try to cook up a toy example that clearly demonstrates the problem.
Cheers! And great work on ZeroMQ!~Brian
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the zeromq-dev