[zeromq-dev] ZMQ spec update proposition / enhancement to core messaging

Lindley French lindleyf at gmail.com
Thu Dec 19 17:26:14 CET 2013


RSTs can occur even if the remote node is still up.


On Thu, Dec 19, 2013 at 7:25 AM, artemv zmq <artemv.zmq at gmail.com> wrote:

> hi Justin.   Thanks for reply!
>
> Can you please elaborate more on:
> >> FIN/RST are only part of the TCP protocol. TCP has other heuristics it
> uses for re-requests which are needed to determine if a remote node has
> become unavailable, but this takes time
>
> Usually ITOps just "kill" a process and that's it. That's why RST always
> works. I don't know (and architects I work with don't know either ...)
>  what   other  heuristics TCP uses to determine if a remote node has
> become unavailable.    They know  FIN/RST and they happy .
>
>
>
>
> 2013/12/17 Justin Cook <jhcook at gmail.com>
>
>> Artem,
>>
>> On Tuesday, 17 December 2013 at 17:35, artemv zmq wrote:
>>
>> > Now, imagine, server shuts down, for example via "ifdown eth0". OS
>> sends to client RST packet and client now recognizes that server became
>> unresponsive. A this point I think would be very-very great to have an
>> socket_option standing for "if socket reveals during runtime that remote
>> peer is not responsive -- don't queue a msg and raise error" .
>>
>>
>> 0MQ abstracts — to a large degree — the underlying socket implementation.
>> TCP is one transport-layer protocol, and from the list it seems UDP may be
>> joining soon. Multicast (used only in PUB/SUB) are encapsulated in UDP.
>>
>> FIN/RST are only part of the TCP protocol. TCP has other heuristics it
>> uses for re-requests which are needed to determine if a remote node has
>> become unavailable, but this takes time. Unless you receive an RST inserted
>> by a firewall or an `ifcfg eth0 down`, then it is not possible to know
>> immediately to stop queuing messages. If you are sending 1000s of messages
>> per second, and it takes several seconds to mark a host as unavailable,
>> then what?
>>
>> As of now, if you set HWM=1 and the connection breaks, send() will block
>> if a message is on the queue depending on the message pattern.
>>
>> There has been other traffic on the list today regarding a similar topic.
>> As of now, since you are interested in finding a host that has gone down
>> QUICKLY, you need to implement your own heartbeat. Relying on transport
>> protocols to do that for you is very unreliable.
>>
>> Credit-based flow control has also been mentioned along with other
>> possible approaches.
>>
>> > What do you think devs?
>>
>> My opinion is that it would be great if we somehow did give an option to
>> establish a heartbeat — even though 0MQ provides the library to do this
>> yourself. I wouldn’t mind a socket option that did this, but it will have
>> quite large implications depending on the message pattern and the queue.
>> This is not something that would be easy and straightforward. It would
>> require a lot of thought.
>>
>> Since it was brought up today, it is definitely worth talking about how
>> this should be done, but if you follow the advice you should implement your
>> own heartbeat. The biggest issue I can see in your case is that you do not
>> have control over the remote node nor the protocol.
>>
>> --
>> Justin Cook
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20131219/93a7168a/attachment.htm>


More information about the zeromq-dev mailing list