[zeromq-dev] Handling disconnections; was:Questions about socket errors

Brian Granger ellisonbg at gmail.com
Thu Feb 11 18:45:35 CET 2010


Thanks, replies inline below...

> Exactly. Specifying message type determines the algorithm used to handle
>  broken connections. In case of PUB/SUB the messages are simply dropped once
> the queue overflows. In cacase of REQ/REP unaccessible connections are
> skipped. Once there's no accessible connection and queue limits are reached,
> send function will block.

OK, this clarifies the behavior for the different types of sockets.
But, isn't it a bit
dangerous to have send block in when the queue fills up.  I would
think there needs to
be someway for the application logic to learn about this and remedy
the situation.

>> I would imagine that a socket type that round robin distributes to a
>> set of endpoints, would just skip
>> any endpoint that disconnects?  What about reply/request queues or
>> multicast?
> REQ/REP doesn't work over multicast right now. I haven't seen a compelling
> use case for the functionality by the way. If you have one, please do share
> it.

Sorry, I didn't mean REQ/REP over multicast.  I don't currently see a
use case for this.

> 1. It should be made clear what 'disconnection' means. On networking level
> there are no disconnections. There are only packets either getting through
> or not getting through. Disconnection can mean various things:
> a.) I've sent a packet and haven't got ACK for N seconds.
> b.) I've sent a message and haven't got ACK for N seconds.
> c.) I've sent a message and the peer application haven't acknowledged that
> it have processed the transaction for N seconds.
> d.) There were no data received from the peer for N seconds (heartbeats).
> etc.

I am more thinking at the level of tcp, where the various socket calls
can return a range
 of error codes that indicate something went wrong with the
connection.  Obviously
zeromq is handling those errors codes underneath it all.

> 2. When should the disconnection notification be delivered?
> a.) Immediately when it happens.
> b.) It should be stored and delivered on next 0mq function call.
> c.) It should be placed into the queue and delivered just after the last
> message we've got before the disconnection.

Another approach other than "notification" would be to provide a set
of functions
for querying and manipulating the state of a queue.  If application logic could
see how many messages are queued and how long they have been queued,
it could adjust how things are being handled.

For example, if my application saw that messages with topic "foo" were not being
recv'd by anyone, it could handle that situation.  As it currently
stands, the application
doesn't really have anyway of handling these types of things.

> 3. Each 0MQ socket handles N "connections". Supposing the connections are
> anonymous the disconnection notification would simply state "one of the
> connections was broken" - which is not of much use aside of keeping track of
> number of opened connections. What's the use case here?

I am coming from more of an RPC style of thinking so thinking in terms
of messages
is different for me.  In an RPC context, it is typically perfectly
clear which connection
was broken and when.  It probably doesn't make sense to track
individual connections
being broken.

> 4. With multicast transports, sender is not even aware of all the receivers
> (though receiver is aware of the senders) and thus it is certainly not aware
> of receiver "disconnections". How does this fit into a bigger picture?

> 5. If there's a middlebox on the path from sender to receiver (say
> zmq_forwarder) this way A->B->C, when does the disconnection has to be
> reported to A. If A-B connection breaks? What about B-C disconnection? It
> prevents passage of messages in the same way as A-B disconnection does. How
> should the event be passed back to A?
> In overall, my feeling is that disconnection notifications are inherently
> flawed concept (please, do argue with the point).

I need to think about this more, but I do agree that it is difficult
to see how an actual
notification mechanism would work in a messaging context.  However, I think
I have some use cases that are not covered fully by the current
design.  I will post
a new thread describing some of these usage cases.

Cheers and thanks for the discussion,


> What's needed instead is
> an ACK mechanism, moving the responsibility for message transfer between
> nodes on the path, dead letter queues etc. That brings us to the reliability
> topic: Can message delivery be acknowledged if the next node stored it in
> memory? Or should it be stored persistently, so that it survives power
> failures? Or, should it be replicated on mutliple boxes to survive HD
> failure? Or should we wait for ACK from the peer application? Should the
> application send the ACK itself once it processed the business transaction
> associated with the message? Should it do so within a DB-like transaction?
> Should we support XA distributed transactions so that sending ACK will
> happen atomically with committing the transaction results into the database?
> Etc.
> Martin

Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu
ellisonbg at gmail.com

More information about the zeromq-dev mailing list