[zeromq-dev] Reliability of zeromq based project

Paul Colomiets paul at colomiets.name
Wed Nov 9 10:20:52 CET 2011


We have some design problem with our zeromq application. We have two
clusters A and B. Each serving it's own pool of clients. We have RPC
based on zeromq request-reply pattern beetween them. RPC is used for
serving 1 to 10 percent of requests. And we expect at least 1000
requests per second at each cluster. The problem we encountered is
when cluster B is down. We have to timeout each request for example
for 5 seconds. This obviosly takes down cluster A because each worker
in cluster A is wating for RPC. Ideally we want requests which can be
served without RPC to B to be served with minimum latency changes. We
have several options:

1. Lower the timeout. One that probably solves problem of failed
cluster B is about 10ms or even 1ms. Which probably will fail most
requests under high load (at cluster B there is it's own request
queue, even if we prioritizing RPC over serving clients)

2. Make all workers asynchronous. Then we can set right HWM and begin
to get failures immediately after buffers are full. It's huge amount
of work (Well and the reason I like zeromq, is because usually
synchronous code using zeromq is as fast as asynchronous one)

3. Place a device between them, count unanswered requests and fail
following replies immediately until some timeout. The problem with
this one is when we have uUnanswered_request_count lower than number
of workers at A each worker will wait for response anyway. If we have
other way around, then if each worker at A will request RPC roughly at
the same time, we will get failures (the latter is fairly common as
client requests which do not need RPC to B are must faster)

4. Place monitoring and do not send requests to B. This solution
solves problem when B is totally down (like no pings, etc.), but not
when it's overloaded by requests. (We are probably monitoring another
service like ICMP, not RPC, as the only way to monitor RPC is in (3)).

5. We can also split topologies that need RPC to B and ones that don't
need. So the latter ones are much faster. Which means client must
choose topology, that doesn't sound very good. Also it sometimes can't
be determined ahead of time.

We are currently thinking about some smarter device for (3), but it
would be great if there were a better way. With traditional networking
the most dumb way would be to try to connect at each request, which
often will fail much faster than timeouted requests in zeromq. Also
you could connect each 10th request if previous connection/request
attempt failed and so on.

I'm open to both good ideas and some practical experience of solving
similar issues.


More information about the zeromq-dev mailing list