[zeromq-dev] assertion at mailbox.cpp:182

Pavel Gushcha pavimus at gmail.com
Sun Jan 30 15:35:28 CET 2011

Hi, All!

It seems, that I have the same problem as Zoufal Andreas:
>From some timepoint 2 applications from my system start to crash
regularly with assert zmq_assert (new_sndbuf > old_sndbuf). Earlier i
seen this assert sometimes while programs are finishing it work. But
now, i see this error regularly (sometimes applications works 12
hours, sometimes only 3).

Info about my scenario:
application1 has ST_PUSH TCP socket with setted ZMQ_HWM value to 10.
I'm running 3 application instances on 3 different servers.
application2 has ST_PULL TCP socket with setted ZMQ_HWM value to 1000.
I'm running 1 application instance on 4-th server.

Last time i seen following: first instance of application1 asserted,
after some minutes asserted second instance of application1, after
some minutes - asserts application2.

I'm using zeromq-2.1.0 on hardened gentoo (kernel 2.6.36)

There is my limits:
# sysctl -a | grep net.core
error: permission denied on key 'net.ipv4.route.flush'
error: permission denied on key 'net.ipv6.route.flush'
net.core.somaxconn = 128
net.core.xfrm_aevent_etime = 10
net.core.xfrm_aevent_rseqth = 2
net.core.xfrm_larval_drop = 1
net.core.xfrm_acq_expires = 30
net.core.wmem_max = 131071
net.core.rmem_max = 131071
net.core.wmem_default = 126976
net.core.rmem_default = 126976
net.core.dev_weight = 64
net.core.netdev_max_backlog = 1000
net.core.netdev_tstamp_prequeue = 1
net.core.message_cost = 5
net.core.message_burst = 10
net.core.optmem_max = 20480
net.core.rps_sock_flow_entries = 0
net.core.netdev_budget = 300
net.core.warnings = 1

I'm not very involved into zmq2 internals, but as i understood, socket
buffer must have bigger size, that message_t class. And message_t has
relatively small size than my net.core.wmem_default = 126976. May be
zmq::mailbox_t::send tries to increase socket buffer in case when it
have big size but is full too?

IT is very hard for me to make a test programs and scenario that will
reproduce this error. And i can't say, that problem is in
ST_PULL/ST_PUSH sockets between this applications, because total i
have 15  different applications (total about 120-150 running instances
on 12 servers) that have many connections between each other.

May be somebody can suggest me, how i can localize the problem?

Thanks for help!

