[zeromq-dev] need help with linux platform

Chuck Remes cremes.devlist at mac.com
Wed Feb 16 01:52:10 CET 2011


On Feb 15, 2011, at 6:00 PM, Chuck Remes wrote:

> Due to some ongoing issues with 0mq on OSX, I switched over to using my linux box as the main dev and test server. 
> 
> I am running a very recent master from the last day or two, so it's all 2.1.0.
> 
> My systems do a lot of high-volume communication amongst 4 distributed components. They connect strictly via the tcp transport (no inproc or ipc). After switching to linux (archlinux running the 2.6.35 kernel) I started getting the mailbox assertion after it ran for a few hours.
> 
>  Assertion failed:  new_sndbuf > old_sndbuf (mailbox.cpp:182)

So I added a little debug print before the assertion in mailbox.cpp. Here is what prints out:

Assertion failed: new_sndbuf > old_sndbuf (mailbox.cpp:183)

new_sndbuf = 2097152, old_sndbuf = 524288

new_sndbuf = 8388608, old_sndbuf = 2097152

new_sndbuf = 2097152, old_sndbuf = 524288

new_sndbuf = 10485760, old_sndbuf = 8388608

new_sndbuf = 10485760, old_sndbuf = 10485760


As you can see, it's growing in a fast loop here. My code isn't doing anything special either though it's hard to say for certain because this abort prevents me from seeing which specific application code leads to this condition.

Also, no other messages are printed between these debug prints in the 0mq library. My code is doing a lot of work (in a reactor, so it's single-threaded) yet it doesn't get a chance to print anything out while this buffer is being expanded 5 times in a row.

Lastly, while watching this print to a console in real-time, I saw that there was a noticeable pause *before* the first print came up. I don't know what the pause signifies; perhaps the OS was blocked on something?

Unfortunately, the failing component is part of a distributed system so creating a small reproducible example is likely impossible.

Any suggestions?

cr




More information about the zeromq-dev mailing list