[zeromq-dev] load issue with 2.1.7

Andrew Hume andrew at research.att.com
Tue Jun 14 15:01:39 CEST 2011


at long last, i am starting to scale up my project.
that is, amongst the flotilla of processes doing stuff,
i am starting to increase the size of a couple of pools of worker processes.

i first ran into the open file descriptor limit, so that has been increased from 1024 to 8192.

i then got this on stderr:

Assertion failed: new_sndbuf > old_sndbuf (mailbox.cpp:183)

although when i gdb'ed the core dump, i get a different spot for the error:
(gdb) bt
#0  0x00000032a1630265 in raise () from /lib64/libc.so.6
#1  0x00000032a1631d10 in abort () from /lib64/libc.so.6
#2  0x0000000004c1bd59 in zmq::mailbox_t::send (this=0x52d6fd0, cmd_=...)
   at mailbox.cpp:178
#3  0x0000000004c1d4cd in zmq::object_t::send_bind (this=0x8f645a0, 
   destination_=0x52d6ef0, in_pipe_=0x8f672f0, out_pipe_=0x0, 
   peer_identity_=..., inc_seqnum_=224) at object.cpp:267
#4  0x0000000004c24cc8 in zmq::session_t::process_attach (this=0x8f645a0, 
   engine_=0x8f5d4e0, peer_identity_=...) at session.cpp:263
#5  0x0000000004c1d7b4 in zmq::object_t::process_command (this=0x8f645a0, 
   cmd_=...) at object.cpp:88
#6  0x0000000004c1a8d0 in zmq::io_thread_t::in_event (
   this=<value optimized out>) at io_thread.cpp:83
#7  0x0000000004c1923a in zmq::epoll_t::loop (this=0x52d3ea0) at epoll.cpp:161
#8  0x0000000004c2b177 in thread_routine (arg_=0x52d3f10) at thread.cpp:73
#9  0x00000032a1e0673d in start_thread () from /lib64/libpthread.so.0
#10 0x00000032a16d40cd in clone () from /lib64/libc.so.6

the context is that this process is the overall coordinator, and all the sub processes
are sending status messages over a PUSH/PULL socket. the load shouldn't be too high;
we are going from 200ish processes to 300ish processes, and the messages are only sent every 15 seconds.
to me, it smells of a file descriptor weirdness. and the gdb points to this
(the assert on the stack is a fcntl failing on a file descriptor).

can anyone offer advice on what i might look at?
(the underlying OS is RHEL6.)


------------------
Andrew Hume  (best -> Telework) +1 623-551-2845
andrew at research.att.com  (Work) +1 973-236-2014
AT&T Labs - Research; member of USENIX and LOPSA




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20110614/c2c2b47c/attachment.htm>


More information about the zeromq-dev mailing list