[zeromq-dev] ZMQ_RCVMORE Set, But recv() Blocks
Gregory Szorc
gregory.szorc at gmail.com
Sun Nov 28 02:58:27 CET 2010
"Martin Sustrik" wrote in message news:4CF19DF0.2030806 at 250bpm.com...
> It looks like a bug IMO. If RCVMORE is set, the subsequent recv() should
> not block.
>
> A simple test program that reproduces the problem would be helpful.
I tried reducing my program to a simple reproduce case. Naturally, it isn't
reproducing.
I've been doing more investigation of my program. It turns out that a thread
exits rights before the error occurs.
Flow is something like the following (PULL, PUSH, PUB, SUB all related
socket pairs):
tmain - main thread started
tmain - PULL.bind()
tmain - create thread "tworker"
tmain - create thread "tother"
tmain - message poll/process loop (from original email)
tworker - PUSH.connect()
tworker - PUSH.send(some message)
tworker - SUB.connect() - throws error_t 0MQ exception
tworker - Catches exception. Returns from thread start function. Thread
exits.
tother - PUB.bind()
tother - does stuff
tmain - RCVMORE set on PULL
tmain - PULL.recv() blocks
Turns out there is a timing bug in my program: a SUB socket attempts to
connect() before a PUB socket bind(), and since I'm using inproc:// (on
Linux), it doesn't like that.
I can make my program work (and not have this 0MQ bug) by fixing the timing
problem and ensuring the SUB connects after the PUB binds. If I guarantee
the timing problem I can repro the RCVMORE+recv() block bug in my code 100%.
I'm thinking the underlying 0MQ bug has something to do with the thread
termination. The exiting thread is exiting cleanly by returning from its
start routine. There are no uncaught exceptions, etc (although, the thread
does terminate due to an exception in 0MQ's C++ binding, but this is being
caught in the thread).
I feel bad not being able to come up with a clean repro and I hate saying
this because I hate hearing it on the other end, but since my program is
open source and I can repro 100%, I could send you a link and you should be
able to get a reproduce running in about 5 minutes. Interested?
Greg
More information about the zeromq-dev
mailing list