[zeromq-dev] poll loop

Scott scott at quickeye.us
Thu Mar 21 17:04:46 CET 2013


I've got a app which works good. Now I need to make
it fault tolerant.

I have a polling loop where I listen for new workers
and maintain a list of them.

Before I use a worker, I need to verify that they are
still there.

I can perform a simple handshake right before I assign
the worker a job but I need to properly and quickly
detect when this handshake fails.

I can't have a 30 second time out or infinite wait in
the event of a fault.

If I am calling zmq_send and zmq_receive, on what
I am expecting to be a real quick back and forth,
can I set a 2 second timeout so I can detect that:

the connection has dropped, the network is broken,
the worker won't reply right now, or the worker isn't
there anymore? (As in any of the above.)

Then, after my quick handshake is successful,
I set the timeout back to big or infinite and perform
my normal zmq_send and resume polling.


Idea: set a socket option to not block, try the send
and retry for 2 seconds if it doesn't complete right
away.

Because of the infrastructure of zmq, this would
only queue up the message. But if I then did the
same thing on the following zmq_recv call, I guess
that would be an effective timeout.

As long as I can properly kill that connection so
that if the worker comes back it will detect that
it needs to re-connect, this may work.

I don't mind busy looping in the event of a fault
(which should be exceedingly rare). I just need to
find a way to verify and recover from faults, without
halting production for more than two seconds.

Any opinions? This seems to be a weakness of zmq
to be totally geared toward normal and we have
to go through major hoops to handle problems.


For example, it may help my cause to be able to
get the number of outstanding messages on a
socket. This would help me to detect when there's
a problem without blindly sending more messages
that won't send, making the backlog worse.

Then, when my state for a worker says that it's
idle, and I send a message and busy loop for
two seconds waiting for 'Outstanding messages'
to reach zero, this could be more graceful than
than calling zmq_recv. Also for loadbalancing
in other circumstances, this could be a great
feature.

     scott

-- 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130321/45afda21/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 16932 bytes
Desc: not available
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130321/45afda21/attachment.gif>


More information about the zeromq-dev mailing list