[zeromq-dev] help/advice for using XREP pattern when running many FSM/co-routine actors in a single thread.

Chuck Remes cremes.devlist at mac.com
Sun Sep 25 17:44:05 CEST 2011


On Sep 25, 2011, at 5:14 AM, matthew riek wrote:

> Hi,
>  
> I have a server that I would dearly like to move from boost::asio to 0mq.  The server manages hundreds of client connections and has complex asynchronous statefulness for each connected client.  To avoid running hundreds to thousands of threads in this server, each client connection has a FSM instance per client request, which all run in a single thread - so it is very much unlike the actor-per-thread pattern which is well supported by 0mq.  I guess these FSM instances may best be likened to co-routines.
>  
> Anyway, I would love help with the following questions - I have looked at the API and played with zmq for a few days now so I hope I'm not missing the obvious.
>  
> 1) in this context, I can't readily use zmq_poll to "wake up" or schedule a client FSM instance when there is a message to read for that client.  zmq_poll works wonderfully for device based threads as exemplified in all the documentation - but in this context I can't see how I can refactor the FSM manager to block on zmq_poll.  Currently all asynchronous events that can "wake up" or schedule an FSM to run are placed in a prioritized thread-safe queue.  Ideally, some kind of callback from zmq API to notify that a read can be performed on the XREP/ROUTER socket would be ideal!  Is such a setup possible with the current API? 

I'm not sure I follow this explanation. I also don't know how much you understand zmq_poll(), so forgive me if I get a bit pedantic here.

If you have one 0mq socket per FSM, then you would create a zmq_pollitem_t for it and set the appropriate flags for POLLIN/POLLOUT. This array is passed to zmq_poll() which will return an integer indicating how many sockets in the array have an event triggered. You need to iterate the pollitems array and examine the poll struct associated with each socket to see if it has (in your case) POLLIN set. If so, there is a message to be read which can be handed directly to your FSM. This is the typical reactor pattern.

If you have a single 0mq socket mapped to several FSMs, then clearly you are responsible for dispatching the message to the appropriate FSM based upon some data within the delivered message (e.g. fsm_destination_id). I would assume the message would need to be decoded sufficiently to discover this application-level destination_id. In this situation, your application code is responsible for "waking up" the FSM and handing the correct message to it.

> Things are a little more difficult than this though... It is highly desirable for a client to have a full read queue (on the server), and not desire callback notification untill the server FSM(s) for that client are ready to perform a read (they may be busy performing and write queueing a time intensive complex query). 

If you are using the reactor pattern, I would suggest removing the socket from the pollitems array while your FSM is processing its data. Further calls to zmq_poll() will not find any new messages for this FSM since the socket is no longer part of the poll set. When the FSM completes its work, add the socket back to the pollitems array. Subsequent calls to zmq_poll() can then find new messages for this FSM.

If you have one 0mq socket mapping messages to multiple FSMs, then the zmq_poll() trick described above isn't appropriate since it would block reception for messages to all other FSMs on that socket. In this case you would need another queue to push the message onto. When the FSM completes, it would check this queue for more work.

> 3) dropped/closed/connected client notification callback would also be ideal.  I have seen some comments about this possibly coming for 4.0?  Thought I would just state that it is highly desirable in this instance, to help with justification for this change to the API.

These notifications will be available with the new ZMQ_ROUTER socket which is *completely* different from the current ZMQ_ROUTER socket even though it has the same name. (No, this won't cause any confusion as people migrate... :-P

You can check out the current master branch of libzmq which represents the work being done on 4.0. I imagine the core dev(s) would appreciate your feedback on API and behavior. Perhaps this is a good opportunity for you to whip up a small proof of concept.

cr




More information about the zeromq-dev mailing list