[zeromq-dev] What exact networking problems ZMQ does solve?

Bruno D. Rodrigues bruno.rodrigues at litux.org
Thu Dec 12 19:03:19 CET 2013


On Dec 12, 2013, at 17:40, Randall Nortman <rnzmq at wonderclown.net> wrote:

> On Thu, Dec 12, 2013 at 06:46:12PM +0200, artemv zmq wrote:
> [...]
>>   Now my question is going more to networking field. I want that kind of
>>   situations when I can lose connection __but__ nor FIN, neither RST will be
>>   generated . In other words I want to lose connection veyr silently (from
>>   client perspective). Will be much appreciated for all that possible
>>   scenarios. �Thanks in advance.
> 
> The situations are numerous -- anything happening to any piece of the
> chain in between one application-level socket and the other can cause
> this.  You have seen that at the host level a software firewall like
> iptables can do it.  Here's a short, incomplete list of other
> possibilities:
> 
> - Network cable unplugged on either end, or at any switch/router in
>  between
(…)

let me take this to a concrete ZMQ case:

- node B does a bind socket
- node A does a connect into B
- between A and B there is bad network with bad nat machines.

What happens:

1. network slows down or stops and HWM fills up

1.1 A with blocking socket (PUSH, DEALER, etc.):
solution: send will fail when HWM is hit. Call socket.disconnect and reconnect. loose the messages on the local buffer.

1.2 A with nonblocking socket (PUB, ROUTER):
solution: configure socket to use timeouts so it return error in case of fail, or use pollers, etc. Same as 1.1 

1.3 socket with multiple connects and blocking (e.g. push)
send will only fail when all connections are stale. if n-1 are stale but one is still working, there is no easy way to know about it.

1.4 socket with multiple connects and non blocking (e.g. pub)
some subscribers will receive nothing and A and B won’t know about it.

Additional trick: set linger to 0 or else the client disconnect may still try to send bytes and never close the connection.


2. network is broken in such a way that A’s side of the nat is closed but B’s side is still ESTABLISHED
solution: no idea. can’t unbind the socket. can’t understand that the TCP is dead. Even if keep alive ZMQ packets were used, don’t know how to kill that connection.


This is what’s happening to me now. Using PUSH-PULL, linger(0) and send with timeout I can force the clients to try to reconnect. Sometimes connections will still hang on the client side, but mostly on FIN_WAIT1 state, albeit I’ve seen a couple ESTABLISHED that couldn’t understand. On the bind side it’s typical to have dozens of ESTABLISHED connections and not be able to clean them up. They don’t seem to affect the performance, but if instead of dozens it becomes hundreds of thousands, it could become a problem.

In conclusion, it’s great that ZMQ abstracts the sockets for us, but when sxxx hits the fan, it would be nice to be able to press the panic button. In this case the only panic button available is closing the zmq socket and opening a new one, killing everything.

Please note I’m not complaining about anything here. Just quite confused with the current state of my lan and struggling to get solutions to my problems and, hopefully, help others and the project with that knowledge. 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20131212/1bfcdf76/attachment.sig>


More information about the zeromq-dev mailing list