[zeromq-dev] Important: backward incompatible changes for 0MQ/3.0!
Martin Sustrik
sustrik at 250bpm.com
Mon Apr 4 08:16:27 CEST 2011
Hi Paul,
> The documentation is actually a bit misleading. After
> you call shutdown(s, SHUT_RD) you *can* read, up to the point
> when shutdown called. It means everything already buffered will
> be read, and you will read until 0 (zero) returned from read call.
What implementation is that? Both POSIX spec and Stevens seem to suggest
that you can't read after shutdown(s, SHUT_RD).
>> 2. The handshake with all the peers during the shutdown can take
>> arbitrary long time and even cause a deadlock.
>
> Probably yes. It's up to you, to use it. Many small application will
> actually never care. Many huge applications will probably use failure
> resistancy to overcome reconfiguration problem. But there are plenty
> of others where you would stop entire system, when you add, remove
> or replace a node from configuration if you have no chance to
> shutdown socket cleanly. And time is not always very important.
Well, I don't like introducing an API that works well for small apps and
deadlocks for large apps. Let's rather think of something that works
consistently in either scenario.
> Consider the following scenario. You have a node A which
> pushes work for node B along with 10 other nodes. And
> you should remove node B (probably because of replacing it
> with C). Node A has bound socket. Currently you have two
> ways:
>
> * stop producing on A until all the work is consumed by B, then
> disconnect it, connect C and continue. It can take a lot of time
> while other workers are also stopped
> * nevermind loosing messages and react on them downstream,
> which takes a lot of time to notice (because of timeout), and
> probably some more time to find lost messages in logs at node A.
You have to keep in mind that messages may be lost at any point on route
from A to B. Thus, you can't count on being notified about the loss. The
only way to handle it is timeout. Btw, even in simple single-hop
scenario, TCP spec mandates keep-alives of at least 2 hours. So, if B is
killed brutally (such as when power is switched off) A won't be notified
for at least 2 hours.
> With shutdown function, you shutdown socket at B. Consume
> messages until shutdown message comes (probably you can
> even forward them back to the A, to be injected again), it doesn't
> matter if it takes time, because replacement node can be
> connected immediately, and other nodes still work.
>
> It is only single scenario where it's crucial, there are plenty of
> others.
>
> Ah, well, if you care about zmq_close needing to send this
> message, then I would say, if you have an oustanding queue
> of messages, then it doesn't matter of sending few more
> bytes. And if you have no queue, then write call for socket
> will return immediately and OS will take care of it, so it wouldn't
> add considerable time to application shutdown.
>
>> This kind of thing is
>> extremely vulnerable to DoS attacks.
>>
>
> Why? Timeouts are also applied to the "last" request served.
> An application is much more vulnerable if it must entirely stop
> one producer to replace one of consumers (see above). And
> of course if you remove the only (or last) consumer from the
> chain, then it's vulnerable. But with described semantics you
> can start new one within minimum amount of time (I guess
> it's about 100 ms, the time needed for other side to reconnect).
It's easy. Just never send the sentinel message. That'll block the
resources on the peer forever.
>> 3. Note that the intention here is to improve the reliability rather
>> than make 0MQ "reliable". See previous email for the detailed
>> discussion. Given the case is that we are trying to provide some
>> heuristic semi-reliable behaviour, it should not change the API in any way.
>>
>
> Why you want heuristics instead of clean behavior. Of course it
> will actually not improve reliability against crashes or network failures,
> It will improve ability to reconfigure application on the fly.
Ok, let me explain.
If there's something I've learned about the messaging it's that
reliability is complex and that there's no single magic solution. The
solutions are always domain-specific, in our case
messaging-pattern-specific. So let's have a look at different patterns
(put aside PAIR which is not a real pattern, just a leftover from the past):
1. Request/reply. In this case requester can re-send request after
timeout have expired. There are couple of nice properties of this
system: It's fully end-to-end so you are resilient against middle node
failures. What you get is actually a TCP-level reliability ("As long as
the app is alive it will ultimately get the data through"). The downside
is that at some rare circumstances, a message may be processed twice.
Which does not really matter as the services are assumed to be stateless.
2. Publish/subscribe. In this case we assume infinite feed of messages
that individual subscribers consume. When shutting down, the subscriber
has to cut off the feed at some point. Whether it cuts off immediately,
dropping the messages on the fly or whether it waits for messages on the
fly to be delivered is irrelevant. The only difference is that the point
of cut off is slightly delayed in the latter case.
3. Pipeline (push/pull). This is the interesting case. The communication
is uni-directional, meaning that we can't timeout and resend while at
the same time we want every message to be delivered. In this case
automatic acks from consumer to producer can improve reliability
heavily. When consumer disconnects, the unacked messages would be
rescheduled to be delivered to a different peer. That messes with the
message ordering, however, we don't care as the parallelised pipeline
architecture does not guarantee ordering anyway. NB: this is a
heuristic, it will improve the reliability, but won't make it perfect.
The point is that all of the above can be done with no change of API.
>> Yes. The reset seems to be a good idea. Use cases:
>>
>> 1. On REQ socket: I am not interested in this reply any more. Cancel the
>> request and start a new one.
>>
>> 2. On REP socket: The request I've got is malformed and possibly
>> malevolent. Drop the request without even responding to the requester.
>
> That would be great news. It means I can stop using XREQ and XREP
> where I want just REQ and REP. (Peter, I don't feel confident if I closing
> and opening new socket, it's also makes system do a lot more work
> than needed if it's bound socket).
We should give some more though to how the reset would interact with
half-sent/recvd multipart messages.
Martin
More information about the zeromq-dev
mailing list