[zeromq-dev] [PATCH] Scalability improvements for large amounts of connections

Martin Lucina mato at kotelna.sk
Mon Oct 11 15:02:31 CEST 2010


Hi Jon,

jon at totient.co.uk said:
> Is this a fix to anything in the issue manager? Or was it observed in 
> the field?

Observed in the field.

> Also are there any further info on when it would be useful to increase 
> the snd buffer on the socketpair and to what sort of sizes?

Some background - in ØMQ 2.1 Martin Sustrik introduced the concept that
sockets are no longer "owned" by any specific application thread (as long
as you follow the rules about memory barries, etc...), and that a
signalling file descriptor for a ØMQ socket can be obtained with the ZMQ_FD
socket option.

This subtly changes the way outstanding signalling commands are processed;
as I understand it in ØMQ 2.0.x any outstanding commands on *all* sockets
belonging to  a thread issuing an API call would be processed. In 2.1.x
this is no longer the case, and commands will only be processed if that
particular *socket* is touched by an API call.

In practice what this means is if you have an application where you have a
lot of clients connecting (hundreds), but for whatever reason you do not
"kick" the socket those clients are connecting to (say the app is busy
doing something else and only wants to talk to those clients "later"), the
signaler's buffer fills up and you can get deadlock -- the ØMQ I/O thread
blocks in signaler send() and the application thread also...

So, in my specific use case, what we came up with is a threefold
workaround:

1) In the application, I make sure to register all sockets I use in a
single poller object, and I call zmq_poll () with no events on sockets I'm
not interested in just now simply to drain the signalling pipe. This can
also be done by calling e.g. zmq_getsockopt (ZMQ_EVENTS) on the socket.

2) In ØMQ, we added the compile time option to increase the signaler
sndbuf.

3) Not done yet, but some time this week I will make a patch for ØMQ to
reduce the size of the data actually sent across the signalling pipes; at
the moment a command_t is 48 bytes on x86_64 and you can get something like
5 commands per client connection that need to be processed.

I wish there were a better way to handle this, but with sockets no longer
being "owned" by any one thread it's not possible for the application
thread to process commands for sockets it's not directly operating on. An
alternative but complex (due to locking issues) approach would be to have a
linked list buffer inside the signaler and only use the pipe to signal a
single byte that something is present.

Any good ideas on how to improve this/get rid of the problem entirely would
be appreciated.

Cheers,

-mato



More information about the zeromq-dev mailing list