[zeromq-dev] Resource temporarily unavailable

john skaller skaller at users.sourceforge.net
Thu Jan 19 01:58:11 CET 2012


On 19/01/2012, at 5:35 AM, Chuck Remes wrote:

>> 
>> Yes, I did that because I got EAGAIN. If I take out the loop on EAGAIN,
>> I get .. well I get EAGAIN (code 35 on OSX).
> 
> It doesn't matter if you are using a REQ socket for blocking or non-blocking writes. For that socket type you must adhere to a strict send/recv/send/recv pattern. Don't do that.

Ok..

>> Note: I only get this problem when the client sends the message
>> to the server, so the server IS reading the message .. well,
>> its doing something in response to the message from the client.
> 
> May I assume the server has connected via a REP socket?

The code is written in Felix, and it is intended to be the Felix
version of the Hello World example:

hwclient/hwserver

documented in the zguide. The Felix compiler generates C++, so I can
inspect the generated C++ code (to check that my binding is doing the "right thing").

It looks good to me: i.e. the zmq binding is right, and so is the use of it.
I'm hoping that this is not the case. The reason is that the alternative is a bug
in the Felix compiler or Felix run time system causing a corruption and that
will be extremely hard to track down!

This happened once before integrating Google's RE2 regex library and the problem
turned out to be leaving off a "hint" to the garbage collector on the library binding ..
and this one took almost a year to find (because the problem only occurred when
enough allocations had happened to trigger the GC, and none of my regression
tests do that) AND use Re2.

> This is a fairly common error. You might want to scan the guide again... don't worry, we've all had to read it 3 or 4 times before it sank in. :)

As above: the problem is that I'm actually *implementing* the guide examples :)

The loop on EAGAIN was only added after I got the resource temporarily available
message (EAGAIN) and the correct behaviour for that is to retry AFAIK...

If I should not retry, ZMQ should not issue that error code.

The C version of this code (from the zguide) works fine.

So there is a problem in the Felix
generated code somewhere. It is not impossible there is a memory corruption
and the error code is a spurious and lucky side effect of it.

--
john skaller
skaller at users.sourceforge.net







More information about the zeromq-dev mailing list