[zeromq-dev] Interrupted System Call: advice to handle it

Raphael Bauduin rblists at gmail.com
Mon Aug 6 13:58:00 CEST 2012


On Mon, Jul 9, 2012 at 11:40 AM, Raphael Bauduin <rblists at gmail.com> wrote:
> On Wed, Jul 4, 2012 at 5:20 PM, Chuck Remes <lists at chuckremes.com> wrote:
>>
>> On Jul 4, 2012, at 5:05 AM, Raphael Bauduin wrote:
>>
>>> Hi,
>>>
>>> I'm using the ruby zmq bindings in a web application. I regularly get
>>> error message "ZMQ::Error: Interrupted system call" related to a send.
>>> This is in a Ruby on Rails application served with passenger, which
>>> spawns worker processes. I think I have identified a process that
>>> generated this error, and an strace on it shows no activity at all.
>>> This process however keeps open a connection to the mysql server. An
>>> accumulation of such errors will eventually become problematic server
>>> side, in addition to clients getting an error page and messages being
>>> lost.
>>
>> I'm assuming this happens under MRI. Is it 1.8.x or 1.9.x?
>
> It is REE: ruby 1.8.7 (2012-02-08 MBARI 8/0x6770 on patchlevel 358)
> [x86_64-linux], MBARI 0x6770, Ruby Enterprise Edition 2012.02
>
>>
>> Do you see the same behavior when running your app with JRuby or Rubinius?
>>
>
> The problem is that I don't have the problem systematically. It
> happens once every x days in production where there are thousands of
> page views that run the code in question. So it's very hard to
> reproduce.
>
>>> I'm looking for advice in avoiding this error and possibly for further
>>> debugging hints. Related to that I have several questions:
>>> - Should I simply catch this exception, and retry the send if needed?
>>> As this is done in the process sending the page content back to the
>>> client, won't it possibly make some requests too slow? (This could
>>> still be better than an error as we have currently)
>>
>> Using exception handling for flow control in Ruby can be slow. But unless you are building the next amazon.com then it probably won't hurt you too much. You could give this a try though it's always better to figure out the actual underlying cause and fix it. Using exceptions here is just a band-aid.
>>
>>> - If my understanding is correct, the problem occurs with blocking
>>> syscalls, and requests having the error don't return any content to
>>> the client. But what happens if I make the send non blocking?
>>> (http://zeromq.github.com/rbzmq/classes/ZMQ/Socket.html#M000010)
>>
>> Try it and see.
>
> My question was more about knowing if the same problem could occur. As
> mentioned above, I can't reproduce the problem systematically.
>
>>
>>> - Finally, what might interrupt the syscall? Any interesting read about this?
>>
>> Something in your app is generating a signal. The technique I use to figure out these kinds of errors is to run my app under other Ruby runtimes. Most of the time they will fail differently and/or give me an exact backtrace pointing to the source of the problem.
>
> Can it also be a signal coming from outside the app, eg passenger?
>
> Or can it be due to the fact that I set the LINGER option?
>   s.setsockopt(ZMQ::LINGER,100)
>   ..
>   s.send(m)
>   s.close
>
> Any suggestion on this would be really welcome!
>
>>
>> Lastly, you may want to look at the ffi-rzmq gem (disclaimer: I'm its maintainer). It has a different API from the zmq gem but it appears to enjoy wider usage by the community so it may be a bit more stable.
>
> Thanks for the tip, I add it as an option, but I'd like to understand
> what's going on too.


I think I have identified what is the cause of the problem: EINTR is
not handled in the code of rbzmq.

I thought to replace this call (see code at
https://github.com/zeromq/rbzmq/blob/master/rbzmq.c#L1573 )

        rc = zmq_send (s, &msg, flags);

by this:

    int do_loop=1;
    while ( do_loop>0) {
        rc = zmq_send (s, &msg, flags);
        if (rc==0 || zmq_errno () != EINTR)
            do_loop=0;
    }

I've run it successfully in my staging env. Any counter indications?

thx

Raph



More information about the zeromq-dev mailing list