[zeromq-dev] question about HWM

Chuck Remes lists at chuckremes.com
Thu Jun 21 22:46:44 CEST 2012

On Jun 21, 2012, at 11:48 AM, Diffuser78 wrote:

> Hi,
> I just started playing with zmq, and I had a question about HWM.
> If my socket type is of ZMQ_REP, and if this socket enters an exceptional state due to HWM reached, I read in the guide that it will drop the reply message being sent to the client. My questions is: would it retry again ? Can I trust zmq to deliver the stuff once I hand it over to it ?
> I have two apps: App1 and App2 on two different boxes. App1 send a payload to App2, and waits for an application level ACK from App2. Since ZMQ may drop messages if HWM is reached, my App1 would not know if it's message was delivered or not, and hence it would not know whether to retry or not after timeout ?
> Can you share your experience around this scenario ?

So, let's walk through the full chain of events here. In your scenario you have a REP socket that is sending ACKs back to a REQ socket somewhere else. You have set the HWM to (for example) 10.

If there is a *single* REQ socket talking to a *single* REP socket, then you will never hit the HWM. The REQ/REP sockets enforce a very strict request/reply/request/reply pattern (so you can't do request/request/request/reply/reply/reply).

A more reasonable scenario is that you have several REQ sockets all sending requests to the same REP socket (perhaps there are 100s or 1000s). Let's assume that the network is slow (or high latency) but the work done by the REP process is minimal, so it is able to process the requests very quickly and send a reply but the reply is slow to go out.

From the FAQ you can see that the REQ socket will have an incoming queue and kernel buffers that can store messages. Likewise, the REP system has an outgoing queue which in turn hands messages off to the kernel for transmission. Only when the REP socket finally gets backpressure from the kernel (e.g. ran out of buffers) does it start to internally queue stuff. When you hit the magic number of 10 messages, the 11th message will be dropped. There is *no* retry because this drop happens at a layer above TCP (and it works the same for the other transport types that don't share TCP semantics, so this is consistent).

To solve this, your REQ system needs to have a timer set to a reasonable period of time within which it expects a reply. If it doesn't get the reply before the timer expires, then the request should be assumed to be lost. Furthermore, your app will need to be able to handle the case where the timer expires, the request is sent a second time, and then the *original* reply arrives (it's just late).

I don't know of a specific example in the guide that covers this, but several people have solved it in their applications. You may want to seek out the "Salt" system that uses zmq for its message bus; I imagine it has a mechanism to handle what I described above.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20120621/bacad113/attachment.htm>

More information about the zeromq-dev mailing list