[zeromq-dev] ZMQ spec update proposition / enhancement to core messaging

Justin Cook jhcook at gmail.com
Tue Dec 17 19:14:59 CET 2013


On Tuesday, 17 December 2013 at 17:35, artemv zmq wrote:

> Now, imagine, server shuts down, for example via "ifdown eth0". OS sends to client RST packet and client now recognizes that server became unresponsive. A this point I think would be very-very great to have an socket_option standing for "if socket reveals during runtime that remote peer is not responsive -- don't queue a msg and raise error" .

0MQ abstracts — to a large degree — the underlying socket implementation. TCP is one transport-layer protocol, and from the list it seems UDP may be joining soon. Multicast (used only in PUB/SUB) are encapsulated in UDP.  

FIN/RST are only part of the TCP protocol. TCP has other heuristics it uses for re-requests which are needed to determine if a remote node has become unavailable, but this takes time. Unless you receive an RST inserted by a firewall or an `ifcfg eth0 down`, then it is not possible to know immediately to stop queuing messages. If you are sending 1000s of messages per second, and it takes several seconds to mark a host as unavailable, then what?  

As of now, if you set HWM=1 and the connection breaks, send() will block if a message is on the queue depending on the message pattern.  

There has been other traffic on the list today regarding a similar topic. As of now, since you are interested in finding a host that has gone down QUICKLY, you need to implement your own heartbeat. Relying on transport protocols to do that for you is very unreliable.  

Credit-based flow control has also been mentioned along with other possible approaches.  

> What do you think devs?

My opinion is that it would be great if we somehow did give an option to establish a heartbeat — even though 0MQ provides the library to do this yourself. I wouldn’t mind a socket option that did this, but it will have quite large implications depending on the message pattern and the queue. This is not something that would be easy and straightforward. It would require a lot of thought.

Since it was brought up today, it is definitely worth talking about how this should be done, but if you follow the advice you should implement your own heartbeat. The biggest issue I can see in your case is that you do not have control over the remote node nor the protocol.  

Justin Cook  

More information about the zeromq-dev mailing list