[zeromq-dev] Important: backward incompatible changes for 0MQ/3.0!
Paul Colomiets
tailhook at yandex.ru
Sun Apr 3 15:22:54 CEST 2011
Hi Martin,
03.04.2011, 11:58, "Martin Sustrik" <sustrik at 250bpm.com>:
> Hi Paul,
>
>>> I would say the question is how can we improve reliability of 0mq (NB:
>>> not make it perfect, just improve it) without dragging all this madness in.
>> That was exacly my intention. May be I've not clear about that. I'm thinking
>> about API similar to posix shutdown. First we call:
>>
>> zmq_shutdown(sock, SHUT_RD)
>
> Ok. Couple of points:
>
> 1. Your proposal doesn't map to POSIX shutdown semantics. POSIX shutdown
> is a non-blocking operation, ie. it initiates a half-close and returns
> immediately. No more messages can be read/written.
>
Well I don't undrestand your point. Zeromq shutdown also must be
non-blocking. The documentation is actually a bit misleading. After
you call shutdown(s, SHUT_RD) you *can* read, up to the point
when shutdown called. It means everything already buffered will
be read, and you will read until 0 (zero) returned from read call.
> 2. The handshake with all the peers during the shutdown can take
> arbitrary long time and even cause a deadlock.
Probably yes. It's up to you, to use it. Many small application will
actually never care. Many huge applications will probably use failure
resistancy to overcome reconfiguration problem. But there are plenty
of others where you would stop entire system, when you add, remove
or replace a node from configuration if you have no chance to
shutdown socket cleanly. And time is not always very important.
Consider the following scenario. You have a node A which
pushes work for node B along with 10 other nodes. And
you should remove node B (probably because of replacing it
with C). Node A has bound socket. Currently you have two
ways:
* stop producing on A until all the work is consumed by B, then
disconnect it, connect C and continue. It can take a lot of time
while other workers are also stopped
* nevermind loosing messages and react on them downstream,
which takes a lot of time to notice (because of timeout), and
probably some more time to find lost messages in logs at node A.
With shutdown function, you shutdown socket at B. Consume
messages until shutdown message comes (probably you can
even forward them back to the A, to be injected again), it doesn't
matter if it takes time, because replacement node can be
connected immediately, and other nodes still work.
It is only single scenario where it's crucial, there are plenty of
others.
Ah, well, if you care about zmq_close needing to send this
message, then I would say, if you have an oustanding queue
of messages, then it doesn't matter of sending few more
bytes. And if you have no queue, then write call for socket
will return immediately and OS will take care of it, so it wouldn't
add considerable time to application shutdown.
> This kind of thing is
> extremely vulnerable to DoS attacks.
>
Why? Timeouts are also applied to the "last" request served.
An application is much more vulnerable if it must entirely stop
one producer to replace one of consumers (see above). And
of course if you remove the only (or last) consumer from the
chain, then it's vulnerable. But with described semantics you
can start new one within minimum amount of time (I guess
it's about 100 ms, the time needed for other side to reconnect).
> 3. Note that the intention here is to improve the reliability rather
> than make 0MQ "reliable". See previous email for the detailed
> discussion. Given the case is that we are trying to provide some
> heuristic semi-reliable behaviour, it should not change the API in any way.
>
Why you want heuristics instead of clean behavior. Of course it
will actually not improve reliability against crashes or network failures,
It will improve ability to reconfigure application on the fly.
May be socket option would work, instead of entirely new function, if
it's your major concern.
>> Probably when we add a sentinel messages we can do PUB/SUB more
>> reliable. When connection from publisher is closed unexpectedly we
>> can send application EIO error (or whatever we choose). For tcp we know
>> when connection is broken, for ipc it is broken only on application crash
>> and we also know it, for pgm we have retry timeout. Also we have to
>> inject this kind of message when queue is full and we loose some
>> message. This way you don't need to count messages to know when
>> to die if messages stream is broken (and don't need to duplicate complex
>> bookkeeping when there are several publishers). For devices it's up
>> to the application on whats to do with error. It have to forward it as some
>> application specific message if it needs to.
>
> The problem here is that PUB/SUB allows for multiple publishers. Thus
> numbering the messages wouldn't help. The real solution AFAICS is
> breaking the pub/sub pattern into two distinct patterns: true pub/sub
> with a single stream of messages (numbering makes sense here) and
> "aggregation" where streams from multiple publishers are aggregated as
> they are forwarded towards the consumer (no point in numbering).
>
And again you can do nothing about the reliability for aggregation pattern.
Actually numbering works for aggregation, you just need to send pairs
(publisher_name, index). But it doesn't work when you filter messages.
And If you know when buffer has overflowed and when connection closed
uncleanly you can implement "Suicidal Snail" pattern without this
complication.
>
> Yes. The reset seems to be a good idea. Use cases:
>
> 1. On REQ socket: I am not interested in this reply any more. Cancel the
> request and start a new one.
>
> 2. On REP socket: The request I've got is malformed and possibly
> malevolent. Drop the request without even responding to the requester.
That would be great news. It means I can stop using XREQ and XREP
where I want just REQ and REP. (Peter, I don't feel confident if I closing
and opening new socket, it's also makes system do a lot more work
than needed if it's bound socket).
--
Paul
More information about the zeromq-dev
mailing list