[zeromq-dev] Monitoring and management question

Martin Sustrik sustrik at 250bpm.com
Thu Dec 23 09:24:10 CET 2010


Hi Juan,

> I will insist in the observability matter. If you find inappropriate or
> unproductive my insistence, just tell me.

No problem. I'm cc'ing the list btw.

>  > However, what I fear is that exposing this kind of info would
> immediately result in people using it to
>  >  drive business logic rather then using it for system monitoring.
> That in turn breaks the "not-connected"
>  >  design of 0MQ, severely hurts scalability etc.
>  > My feeling is that not having monitoring is less harmful than
> introducing features that directly defeat
>  > the goals of the project.
>
> I don't know what exactly mean "severely hurts scalability". If adding
> this features affect people that's not using them, I agree that it is a
> bad thing. I think that "if you don't use it, you'll not have to pay for
> it" should be a design principle in this kind of software (system
> software?).

The problem is this: When you add the feature, people are going to use 
"monitor connections" feature to check presence of individual peers. 
They are also going to use "queue size" to check whether particular peer 
is busy or not.

The problem is that while this kind of usage doesn't really work (having 
connection doesn't necessarily mean the peer is present; small queue 
doesn't necessarily mean the peer is not busy etc.) it works well enough 
to be used by clueless users just to be bitterly regretted later on when 
the production deployment starts to misbehave.

As for the scalability, the main mechanism for scaling 0MQ deployments 
is adding devices in the middle. However, when you depend on peer 
monitoring in your business logic, adding a device into the middle 
breaks that business logic. For example, having a connection to the 
intermediary node doesn't necessarily mean there are any peer present 
further away in the topology.

> One use case of using "monitoring" information in business logic is the
> treatment of congestion in telephony networks.
> Suppose a system connected to a telephony network, the network derives
> some special calls, for example calling card call, to the system. The
> system process them, asking for the calling card number, verifying the
> balance, calculating the max call duration, etc.
> The system is composed of one computed connected to the network, acting
> as a gateway and distributing the work that arrives to a pool of worker
> computers through 0MQ.
> The workers must inform if they are in "congestion state". There are
> several congestion states: near congestion, congestion, congestion
> discarding work (dropping calls). With this information, the gateway
> calculate the congestion state of the overall system and inform it the
> network. The network, with this information, can redirect part of the
> workload to another system, or start a plan B, i.e. giving a polite
> message that means more or less "the the network is in congestion, try
> latter". This will certainly happen on Christmas Day.

Happy Christmas btw! :)

Well, you see, your use case is not about monitoring. It actually binds 
business logic (redirecting the traffic) to the data you get. So, your 
system is prone to the problems described above.

0MQ solves the problem of load balancing and peers signaling congestion 
in a different way. A worker simply stops reading messages when it is 
congested, which causes messages to queue up to HWM. When HWM is 
reached, the peer is removed from load-balancing, thus no more messages 
are sent to it. It gets more messages only when it gets out of congested 
state and starts reading messages again.

Martin



More information about the zeromq-dev mailing list