[zeromq-dev] 0MQ-based proxy worker crashes with "Assertion failed: pipe (bundled/zeromq/src/session_base.cpp:441)"
Tomas Krajca
t.l.krajca at gmail.com
Fri Sep 12 14:10:42 CEST 2014
Thanks Peter, I don't have a minimal case yet, I am trying to narrow it
down but I struggle a bit since I am not a C++ programmer and I don't
know much about 0MQ internals.
The main problem is that I don't get any python stacktrace or anything
like that, the proxy worker crashes with the 0MQ assertion and that's
it. If I could at least narrow it down to a specific line of python code
that might really help me in understanding where the problem could be. I
might then be able to come up with some theories that I could
test/confirm/discard.
As I said, I know virtually nothing about 0MQ internals and I am not a
c++ programmer, digging through the zeromq4.x source code, my best guess
is that the assertion is raised in timer_event() which is called from
execute_timers() in poller_base.cpp which is called from a
system-specific poller (e.g. epoll or select) when polling for events.
That would then suggest that the crash happens in *socks =
dict(poller.poll())* in my python code (http://pastebin.com/usi0FXDL).
Is that a correct observation/theory?
Can you think of any reasons why that assertion could fail? Why it could
potentially fail after not failing for many times before that? As I
said, the proxy processed about 80000 requests before crashing so it
feels like it's hitting some sort of limit. I raised the ulimit for
number of open files to 10000 but that didn't help.
I don't know much about 0MQ internals so I am struggling to come up with
theories that I could test to narrow this issue down. If you can think
of anything that you would like me to test, I am more than happy to do so.
Thanks,
Tomas
On 09/12/2014 04:47 PM, Pieter Hintjens wrote:
> Is there any way you can cut down your code to a minimal case that you
> can get to crash? That is usually the best way to get the problem
> resolved.
>
> On Fri, Sep 12, 2014 at 8:26 AM, Tomas Krajca <t.l.krajca at gmail.com> wrote:
>> Does anybody have any idea about the original proxy crash? The proxy did about 80000 requests just fine today and then it crashed with the pipe assertion again, really weird.
>>
>> Thanks,
>> Tomas
>>
>>> On 10 Sep 2014, at 10:51 am, Tomas Krajca <t.l.krajca at gmail.com> wrote:
>>>
>>> Thanks Justin, zurl is definitely worth looking at.
>>>
>>> Regards,
>>> Tomas
>>>
>>>> On 9 Sep 2014, at 12:35 pm, Justin Karneges <justin at affinix.com> wrote:
>>>>
>>>> Hi Tomas,
>>>>
>>>> This does not answer your question at all, but you might be interested
>>>> in the Zurl project. It is a 0MQ daemon that does HTTP requests. You can
>>>> speak to it with REQ/REP.
>>>>
>>>> https://github.com/fanout/zurl
>>>>
>>>>> On 09/08/2014 06:44 PM, Tomas Krajca wrote:
>>>>> Hi,
>>>>>
>>>>> I've got a 0MQ-based proxy, clients talk 0MQ to the proxy, the proxy
>>>>> then talks HTTP to do either a GET on a specific url endpoint or a POST
>>>>> on a specific endpoint (it always goes to one of these two url
>>>>> endpoints). I've got a master process that has a zmq.ROUTER towards its
>>>>> clients (zmq.REQ) and a zmq.DEALER towards its workers (zmq.REP). The
>>>>> master is a single process, no threading, normal 0MQ, it spawns a worker
>>>>> processes via multiprocessing, this worker process uses gevent and
>>>>> zmq.green to spawn the actual (green) workers (those use grequests to
>>>>> talk HTTP). The master uses 0MQ auth to authenticate its clients. It
>>>>> should all be pretty standard but note that this is my first
>>>>> gevent/zmq.green based project.
>>>>>
>>>>> So this proxy runs pretty well untill the worker process (I run only 1
>>>>> worker process) crashes with 'Assertion failed: pipe
>>>>> (bundled/zeromq/src/session_base.cpp:441)"' on its stderr. There is
>>>>> nothing else in the logs or on stdout that would give me any more idea
>>>>> of what is going on. I can see the master running and netcat to its
>>>>> zmq.ROUTER so its definitely the worker that dies. Sorry, I have no idea
>>>>> how to reproduce this, once it crashed after 5 hours of working nicely,
>>>>> second time it crashed after about a day.
>>>>>
>>>>> Here is a snippet of the worker code (the relevant bits):
>>>>> http://pastebin.com/usi0FXDL
>>>>>
>>>>> The STSDBResponder uses grequests to do the HTTP, there should be
>>>>> nothing special about that.
>>>>>
>>>>> This happens on CentOS 6.5, the proxy is running in virtualenv (pip
>>>>> install pyzmq):
>>>>>
>>>>> Python 2.7.6 (default, Jul 10 2014, 04:59:13)
>>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
>>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>>>> import zmq
>>>>>>>> zmq.zmq_version()
>>>>> '4.0.4'
>>>>>>>> zmq.__version__
>>>>> '14.3.1'
>>>>>
>>>>> I have no idea whether this is a libzmq bug or pyzmq bug or a bug in my
>>>>> code or a system misconfiguration (do I need to increase ulimit or
>>>>> something?), I run 64 gevent threads. I tried to see session_base.cpp
>>>>> but it didn't help me understand why this could happen either.
>>>>>
>>>>> If anybody could please point me to a direction as to why the worker
>>>>> crashes, it would be much appreciated.
>>>>>
>>>>> Thanks,
>>>>> Tomas
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> zeromq-dev mailing list
>>>>> zeromq-dev at lists.zeromq.org
>>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>>
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
More information about the zeromq-dev
mailing list