[zeromq-dev] process control revisited

Andrew Hume andrew at research.att.com
Sat Aug 6 21:09:20 CEST 2011

	previously on this list, i have described how i did process control
across a flotilla of processes across several servers connected via 0MQ.
basically, i caused upstream processes to send no-ops rather than regular data.
when everyone was processing no-ops, processes could be brought up and down
without losing data, and when the configuration was stable, real data
started flowing and processes would indicate they were alive and active.
(several details elided for brevity.)

	as the design structure has morphed over the last several months,
this design has proved unacceptable for several reasons. the principal
ones being:
	+ the data flow now has loops
	+ it proved unworkable for processes to guess how many no-ops
to send (nontrivial if you don't know how many processes have been forked)
	+ if a process died, then the flow of no-ops was interrupted.
	+ as every process potentially had to know where most of the other
processes were (for routing messages), this didn't smell scalable.
	+ nuisance of having specify an alarming number of tcp-port addresses.

	accordingly, i have a new scheme, which i describe below and in which
i would be interested in any comments.

	each server is now a black box. all data flow into a server goes through
one of 2-3 portal processes (think 0MQ device), and there is a global config
specifying how to map the key field for each of these data flow types into
a server. thus, any process needing to send a datum d with key k, can simply
look up how to map k into a sever name and the port number for the
portal for d on that server.

	control is simpler. config changes are done by pausing the portal processes
and simply waiting for the all the internal processes to indicate they are idle
(for example, no data received since the last heartbeat). all the processes
interanl to a server will have small queues, so queue lengths shouldn't be an issue.
once a srver has an idle status, it can either reconfigure internally or be reconfigured
externally (like take a server out of action).

	addressing is now simplified, because essentially all internal (to a server)
addresses now can use ipc names, and not global port numbers. (this also
allows multiple instances of a waldo server to share a server -- we need this
for dev and testing.)

	the portals also allow for automatic detection of processing rates and thus
self-configuration of teh cluster guiding schemes (when i trust things enough).
and if i need to buffer stuff to disk, the portals can be written to do that
(although i use 2.1.7 where 0MQ does this itself, i am looking ahead to
3.0 and besides, i may want to reguide that data anyway.)

	thanks for reading and for any comments,


Andrew Hume  (best -> Telework) +1 623-551-2845
andrew at research.att.com  (Work) +1 973-236-2014
AT&T Labs - Research; member of USENIX and LOPSA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20110806/2474de4a/attachment.htm>

More information about the zeromq-dev mailing list