[zeromq-dev] Our router architecture - suggestions going forward?
Pieter Hintjens
ph at imatix.com
Thu Feb 2 21:51:32 CET 2012
Hi Noah,
There is no 4.x roadmap and any previous version calling itself "4.x"
no longer exists. We are working on the 3.x tree, currently pushing
3.1 towards stability.
In general anyone proposing incompatible patches to 3.x will have to
give extraordinarily good reasons for them. My advice to anyone,
including yourselves, who wants the fullest control over 0MQ's future
direction is to become a contributor.
-Pieter
On Thu, Feb 2, 2012 at 8:18 PM, Noah Gibbs <noah at ooyala.com> wrote:
>
> Hi! My team at Ooyala are putting together a zmq-based architecture for some monitoring stuff we're doing. We're trying to figure out if it's reasonable to keep compatibility options for ZMQ 4.x. I'm hoping you might have suggestions, for that or in general.
>
> ** First, why we're doing this:
>
> The idea is that a monitoring client runs on each monitored machine. Local processes send registrations, statistics, heartbeats and notifications (errors, warnings, etc). They also declare plugins to run periodically to assess process and machine health, roughly like what Nagios does. Client sets are dynamic, and a lot of this runs in EC2.
>
> We put the various information to Graphite, into our alerting system, into our scheduling system for running plugins and a few other places. Then we can see the results and determine the health of our cluster -- what machines are running and what applications they're running, as well as health checks from the plugins.
>
> ** Next, what we're doing with ZMQ:
>
> The clients send JSON with the stats, notifications, etc. over a ZMQ_DEALER connected to central routers (six routers, to start with). The routers bind a ZMQ_ROUTER socket for client traffic, which is resent via a ZMQ_PUSH socket to our back-end message sinks.
>
> A few high-value messages like error notifications require acknowledgements from the sink, and will be resent periodically until the ack is received by the client. The router doesn't store any state about that, it just forwards messages.
>
> Each client has a UUID. It's sent in their JSON messages, it's what they bind as the socket identity. That's how we send them things. It's how we identify things like statistics from them. It persists across reboots, but we can generate new ones easily when provisioning new virtual machine instances.
>
> The message sinks connect to the routers with a ZMQ_PULL socket. They receive messages (stats, notifications, etc.) and put them in various back-end storage, including sending out notifications by email or pager where appropriate. Each message sink has a type (heartbeat sink, stats sink, registration sink, etc), and the pull socket distributes the work among the available sinks.
>
> The routers also bind a REP socket for traffic from the back end *to* the clients. At the moment, the traffic to the client is either acks or "run this plugin now" messages.
>
> A scheduler (in practice, several machines) looks at that storage, determines what plugins need to run, and then sends "run this plugin" messages to the REP socket on the router to be forwarded to the clients by UUID.
>
> ** What we're worried about with 4.0:
>
> From the mailing list, it sounds like ZMQ 4.0 router sockets won't support setting identity, which makes it difficult to send to a client by UUID. Presumably we could make each client, when it connects to the router, send its UUID in a "hello" message so that the router could then save its identity and forward messages to it. Does that sound like the right approach? Should we be doing this already in 3.1?
>
> Right now we're using ROUTER and DEALER for the client/router connection, which lets us send everything over a single socket - very nice for keeping our firewall rules simple. But it sounds like there's not any way to do this in a way that's both 3.1- and 4.0-compatible. Is that true, or am I misunderstanding?
>
> ** Suggestions?
>
> Right now we're in the early stages. We have a basic ZMQ topology running and a few tests, but there will never be a better time to change this architecture. What are we doing wrong?
>
> --
> Noah Gibbs
> Software Engineer |
> noah at ooyala.com | (510) 260-5409 (cell)
> www.ooyala.com | blog | @ooyala
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
More information about the zeromq-dev
mailing list