[zeromq-dev] ETIMEDOUT when doing a non-blocking recv

Chuck Remes cremes.devlist at mac.com
Tue Jan 24 21:26:48 CET 2012


On Jan 6, 2012, at 3:23 PM, Martin Sustrik wrote:

> Hi Chuck,
> 
>> I am periodically receiving ETIMEDOUT (errno 60) when doing a
>> non-blocking read from either a SUB socket or a DEALER/XREQ socket.
>> What can I assume from this error?
>> 
>> My guess is that the socket has recently tried to connect to another
>> socket (in this particular case,*everything* is using 'inproc'
>> transport and they only bind once at startup) and it timed out.
>> Because zmq_connect() is async, we don't actually see the error until
>> we try to zmq_send()/zmq_recv() with that socket. At that point the
>> error is delivered.
>> 
>> Is that assumption correct? If so, what can I do about it?
>> 
>> OS =>  OSX libzmq =>  2.1.11 ulimit -n =>  400000
>> 
>> At the time of the error, there has usually been about 2-3k xreq
>> sockets opened&  closed with around 200 being open at any given
>> moment.
> 
> 0MQ itself doesn't seem to produce this error. I.e. it must be received 
> from the OS and forward via 0MQ to the user.
> 
> Given that only transport you are using is inproc there's not much OS 
> functionality involved so it shouldn't be that hard to track the source 
> of the error down.
> 
> My guess would be that it is generated by singaler_t class which 
> contains a system socketpair on OSX platform. One of the OS functions 
> called there is probably returning ETIMEDOUT for some reason.
> 
> Unfortunately, I don't have a Mac so it's up to you to investigate.

Martin,

I see this ETIMEDOUT error quite a bit when my machine is under a little bit of load so I agree that it's probably some OSX kernel resource running out/low. (OSX is *not* a good choice for server workloads.)

Do you have any specific suggestions on what components of libzmq that I should instrument? I can add some printf's...

cr




More information about the zeromq-dev mailing list