[zeromq-dev] [PATCH] Race condition in eventfd signaler fixed

Martin Sustrik sustrik at 250bpm.com
Sun Jul 3 16:40:13 CEST 2011

Hi Paul,

> Why do you check for 2? There can be any value>  1. Why do you check
> the value any way? If I understand the code correctly, it puts pipe in
> active state, and reads until there are no more messages, so you just
> don't care about the number.

It follows from the way the ypipe works. This is not covered by the 
architecture wiki page :(

In short, when item is read from ypipe, it can return false ("there's no 
item available"). The reader is then supposed not to read from the ypipe 
anymore (the reader is "passive").

When item is written to the ypipe, it can return false ("the reader is 
in passive state"). In such condition, writer is reposnsible for "waking 
up" the reader, ie. getting it into active state, where it reads items.

"Waking up" happens OOB. There must be an external communication channel 
between writer and reader to transfer the wake-up signal.

In case of mailbox, this external channel can be implemented using 
eventfd (on Linux).

If you check the above algorithm, you'll find out that there's at most 
one signal on the fly at any given moment, which explains the zmq_assert 
(dummy == 1); thing.

However, the above statement is not 100% true. There's a race condition 

Note that signal is not removed immediately from the eventfd when reader 
is activated, rather it's left lingering there so that polling (ZMQ_FD) 
reports that assocaited socket is readable (POLLIN).

So, when reader fails to read an item and gets into passive state, it 
should remove the signal from the eventfd (so that poll blocks rather 
than signal POLLIN).

There are two things happening at once:

1. Signal is removed from eventfd.
2. Writer may write an item to the ypipe, find out that reader is 
already passive and thus send it a signal.

If these steps happen in particular order (reader gets passive, writer 
sends command, reader retrieves signal) it may happen that reader gets 
two signals instead of a single one.

The patch writes the signal back to the eventfd in such case.


More information about the zeromq-dev mailing list