[zeromq-dev] possible ZeroMQ bugs

Martin Sustrik sustrik at 250bpm.com
Tue Jan 11 08:32:47 CET 2011


Hi Shannon,

I'm going to answer on libzmq level, pyzmq devs may elaborate on 
interactions with python

> I think I may have found a few bugs, although I have a suspicion they
> may have been covered before.  Hopefully there's I can point out at
> least one a couple new ones, though ;)
>
> My sample code is here: http://pastebin.com/4FC89sNc
>
> 1. In the code, if you replace tcp://127.0.0.1:7676 with
> http://127.0.0.1:7676 (i.e. if your fingers accidentally type "http"
> instead of "tcp"), you don't get an error.

I'm getting "protocol not supported" error.

> 2. In the code, if you replace tcp://127.0.0.1:7676 with
> tcp://localhost:7676, it says "No such device" which is kind of a
> mysterious error.  "Hostnames are not allowed" might be better when
> you're dealing with tcp://.

The meaning of error is: With bind, you bind to a network interface, not 
a host.

The error is ENODEV. Have a look at POSIX error codes. Feel free to 
propose one that describes the problem better.

> 3. If you run the code as is, it'll hang forever in context.term().  I
> think it has to do with the fact that I have two threads.  However,
> the code is very simple, and by the type context.term() is run, one
> thread has joined with the other.  I don't see any reason why the code
> should hang.

Not to hang you have to:

1. Close all the sockets prior to calling zmq_term().
2. Set ZMQ_LINGER socket option to 0 on all sockets to prevent 0MQ to 
try to send the pending data on zmq_term().

> By the way, it hangs hard--you have to kill -9 it.

That's interesting. Possibly one more stange interaction with python VM.

> 4. In general, the code doesn't respond well to keyboard interrupts.
> For instance, even if you comment out all the lines that start with
> thread in my code sample, hitting Cntl-C doesn't really do the right
> thing.  I've seen it fail in 3 different ways, but I've never seen it
> raise a KeyboardInterrupt and exit the program.  I'm just going to
> take a guess and say that it may be necessary to mark ZeroMQ's thread
> as a daemon thread, but I haven't looked at the code.

There have been extensive discussion about that one. The conclusion is 
that POSIX signal system is fundamentally broken and 100% correct 
handling of SIGINT is *impossible*.

Anyway, individual cases may be addressed to behave correctly in most 
cases. Please, do report the problems you have seen.

Martin



More information about the zeromq-dev mailing list