[zeromq-dev] ZMQ_RCVMORE Set, But recv() Blocks

Gregory Szorc gregory.szorc at gmail.com
Sun Nov 28 02:58:27 CET 2010


"Martin Sustrik"  wrote in message news:4CF19DF0.2030806 at 250bpm.com...
> It looks like a bug IMO. If RCVMORE is set, the subsequent recv() should
> not block.
>
> A simple test program that reproduces the problem would be helpful.

I tried reducing my program to a simple reproduce case. Naturally, it isn't 
reproducing.

I've been doing more investigation of my program. It turns out that a thread 
exits rights before the error occurs.

Flow is something like the following (PULL, PUSH, PUB, SUB all related 
socket pairs):

tmain - main thread started
tmain - PULL.bind()
tmain - create thread "tworker"
tmain - create thread "tother"
tmain - message poll/process loop (from original email)
tworker - PUSH.connect()
tworker - PUSH.send(some message)
tworker - SUB.connect() - throws error_t 0MQ exception
tworker - Catches exception. Returns from thread start function. Thread 
exits.
tother - PUB.bind()
tother - does stuff
tmain - RCVMORE set on PULL
tmain - PULL.recv() blocks

Turns out there is a timing bug in my program: a SUB socket attempts to 
connect() before a PUB socket bind(), and since I'm using inproc:// (on 
Linux), it doesn't like that.

I can make my program work (and not have this 0MQ bug) by fixing the timing 
problem and ensuring the SUB connects after the PUB binds. If I guarantee 
the timing problem I can repro the RCVMORE+recv() block bug in my code 100%.

I'm thinking the underlying 0MQ bug has something to do with the thread 
termination. The exiting thread is exiting cleanly by returning from its 
start routine. There are no uncaught exceptions, etc (although, the thread 
does terminate due to an exception in 0MQ's C++ binding, but this is being 
caught in the thread).

I feel bad not being able to come up with a clean repro and I hate saying 
this because I hate hearing it on the other end, but since my program is 
open source and I can repro 100%, I could send you a link and you should be 
able to get a reproduce running in about 5 minutes. Interested?

Greg 




More information about the zeromq-dev mailing list