[zeromq-dev] Problem with blocking recv in V.3.0

Peter Caven inversematrix at hotmail.com
Tue Jul 19 17:07:58 CEST 2011

Hello 0MQ devs,

I've been experimenting with the new 0MQ version 3.0 (specifically the new raw buffer API), 
and I noticed that on my 4-core Windows Vista machine, the CPU load was averaging 15% to 25% 
of total CPU for a simple server with a blocking recv.

I wrote a barebones 'C' language server (another Hello World server) to test with.
After some time with the Visual Studio debugger, I think I found the problem.

In "signaler.cpp" there is a 'select' call that I think should just simply 
block the thread making the 'zmq_recv' call when using 'zmq_socket(context, ZMQ_REP)'.

However, the 'timeout_' value passed to 'zmq::signaler_t::wait' is -1 (implying an infinite wait).
This causes the members of the 'timeval' struct to be set to: 
   timeout.tv_sec == 0
   timeout.tv_usec == -1000

resulting in a non-blocking call to 'select', and consequently wastes a huge number of CPU cycles in
continually re-executing the 'select' call on the main thread. 
According to the doc for 'select' a NULL value should be passed instead of the pointer to the 'timeval' struct,
and I've confirmed that by making the small change shown below.
After the change, the server CPU usage is drastically reduced (< 1%).

This is not the whole story though, since it appears that this problem should also be apparent on non-Windows platforms,
since the same badly initialized 'timeval' struct is used in both cases.

In "signaler.cpp":
@@ -176,11 +176,11 @@ int zmq::signaler_t::wait (int timeout_)
     FD_SET (r, &fds);
     struct timeval timeout;
     timeout.tv_sec = timeout_ / 1000;
     timeout.tv_usec = timeout_ % 1000 * 1000;
-    int rc = select (0, &fds, NULL, NULL, &timeout);
+    int rc = select (0, &fds, NULL, NULL, timeout_ < 0 ? NULL : &timeout);
     wsa_assert (rc != SOCKET_ERROR);
     int rc = select (r + 1, &fds, NULL, NULL, &timeout);
     if (unlikely (rc < 0)) {
         zmq_assert (errno == EINTR);

So, I'm fairly sure that this is a problem on all platforms, 
and that a real solution should also avoid the construction of the 'timeval' struct in the infinite blocking case.
Looking back through the archives of this list it seems that this behavior is due to some recent changes, 
but I'm also quite unfamiliar with the codebase at this time.

Best Regards,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20110719/de24ff07/attachment.htm>

More information about the zeromq-dev mailing list