[zeromq-dev] 0MQ-based proxy worker crashes with "Assertion failed: pipe (bundled/zeromq/src/session_base.cpp:441)"

Tomas Krajca t.l.krajca at gmail.com
Tue Sep 16 12:59:53 CEST 2014


So the proxy ran against libzmq master for about 15 hours just fine (over 1000000 requests), then I had to stop it.
 :(

We are now working on a different theory. Apparently, using multiprocessing, threading and logging in python altogether might cause issues (deadlocks, etc.). I think the proxy worker might have been hitting some sort of race condition that caused 0MQ to crash (is that possible?). Anyway, we'll keep working on this theory.

Is there any plan for releasing libzmq 4.0.5?

Thanks,
Tomas

> On 16 Sep 2014, at 12:05 pm, Tomas Krajca <t.l.krajca at gmail.com> wrote:
> 
> Hi,
> I've got a few more observations that I made over the weekend.
> 
> It crashes whether I set linger=1 or linger=-1.
> It crashes whether it runs with gevent threads or POSIX threads.
> It crashes whether the DEALER in the master process talks over ipc or tcp with the REP workers.
> 
> I also tried to "recycle" my threaded workers - let them process 1024 requests each then they shut down and the worker process spawns a new to replace the "dead" one. It crashed with the same assertion again.
> 
> I have been running it against libzmq and libsodium masters with pyzmq 14.3.1 for about 12 hours, no crash yet - almost 1.000.000 successful requests.
> 
> The annoying part is that it crashes after a long time - a few hundreds thousands requests before it crashes is normal (hours and hours of uptime).
> 
> 
> 
>> On Sat, Sep 13, 2014 at 12:25 PM, Tomas Krajca <t.l.krajca at gmail.com> wrote:
>> Thanks, that's a good idea, I'll give that a go on Monday.
>> 
>> Tomas
>> 
>>> On 13 Sep 2014, at 1:59 am, Martin Hurton <hurtonm at gmail.com> wrote:
>>> 
>>> Hi Tomas, can you please check with the master and report back? Thanks.
>>> 
>>>> On Sep 9, 2014 3:44 AM, "Tomas Krajca" <t.l.krajca at gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> I've got a 0MQ-based proxy, clients talk 0MQ to the proxy, the proxy then talks HTTP to do either a GET on a specific url endpoint or a POST on a specific endpoint (it always goes to one of these two url endpoints). I've got a master process that has a zmq.ROUTER towards its clients (zmq.REQ) and a zmq.DEALER towards its workers (zmq.REP). The master is a single process, no threading, normal 0MQ, it spawns a worker processes via multiprocessing, this worker process uses gevent and zmq.green to spawn the actual (green) workers (those use grequests to talk HTTP). The master uses 0MQ auth to authenticate its clients. It should all be pretty standard but note that this is my first gevent/zmq.green based project.
>>>> 
>>>> So this proxy runs pretty well untill the worker process (I run only 1 worker process) crashes with 'Assertion failed: pipe (bundled/zeromq/src/session_base.cpp:441)"' on its stderr. There is nothing else in the logs or on stdout that would give me any more idea of what is going on. I can see the master running and netcat to its zmq.ROUTER so its definitely the worker that dies. Sorry, I have no idea how to reproduce this, once it crashed after 5 hours of working nicely, second time it crashed after about a day.
>>>> 
>>>> Here is a snippet of the worker code (the relevant bits): http://pastebin.com/usi0FXDL
>>>> 
>>>> The STSDBResponder uses grequests to do the HTTP, there should be nothing special about that.
>>>> 
>>>> This happens on CentOS 6.5, the proxy is running in virtualenv (pip install pyzmq):
>>>> 
>>>> Python 2.7.6 (default, Jul 10 2014, 04:59:13) 
>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>> >>> import zmq
>>>> >>> zmq.zmq_version()
>>>> '4.0.4'
>>>> >>> zmq.__version__
>>>> '14.3.1'
>>>> >>> 
>>>> 
>>>> I have no idea whether this is a libzmq bug or pyzmq bug or a bug in my code or a system misconfiguration (do I need to increase ulimit or something?), I run 64 gevent threads. I tried to see session_base.cpp but it didn't help me understand why this could happen either.
>>>> 
>>>> If anybody could please point me to a direction as to why the worker crashes, it would be much appreciated.
>>>> 
>>>> Thanks,
>>>> Tomas
>>>> 
>>>> _______________________________________________
>>>> zeromq-dev mailing list
>>>> zeromq-dev at lists.zeromq.org
>>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20140916/1c088125/attachment.htm>


More information about the zeromq-dev mailing list