[zeromq-dev] 0MQ/2.0 segmentation fault for PUB/bind, SUB/connect
Ben Dyer
ben.dyer at taguchimail.com
Mon Oct 19 08:27:37 CEST 2009
Hi,
I've been experimenting with the PUB/SUB functionality in 0MQ/2.0
(both alpha 3 and the latest revision from GitHub), and have come
across a segmentation fault which could either be the result of me
doing something unsupported, or an bug in pipe termination.
The following works fine:
* Create a PUB socket in Process A, connect to tcp://127.0.0.1:5660
* Create a SUB socket in Process B, bind to tcp://0.0.0.0:5660
* Send a message from Process A
* Receive the message from Process B, then terminate
* Send another message from Process A
However, using connect/bind the other way round crashes Process A when
Process B terminates:
* Create a PUB socket in Process A, bind to tcp://0.0.0.0:5660
* Create a SUB socket in Process B, connect to tcp://127.0.0.1:5660
* Send a message from Process A
* Receive the message from Process B, then terminate
* Send another message from Process A
* Process A will terminate with a segmentation fault in pipe.cpp:84
(process_revive)
It seems to me that the second scenario -- binding on the PUB side and
connecting on the SUB side -- would be the most common case, as it
enables subscribers to connect/disconnect on an ad-hoc basis. However,
perf/python/local_thr.py and perf/python/remote_thr.py work as
described in the first scenario, which makes me wonder if I'm missing
something?
If the second scenario is meant to be supported, here's some more
detail on the issue, and a potential fix:
process-a.py:
import libpyzmq, time
ctx = libpyzmq.Context(1,1)
s = libpyzmq.Socket(ctx, libpyzmq.PUB)
s.bind('tcp://0.0.0.0:5660')
while True:
s.send("\x04a.xymessage")
time.sleep(1.0)
process-b.py:
import libpyzmq, time
ctx = libpyzmq.Context(1,1)
s = libpyzmq.Socket(ctx, libpyzmq.SUB)
s.connect('tcp://127.0.0.1:5660')
s.setsockopt(libpyzmq.SUBSCRIBE, '*')
while True:
msg = s.recv()
print repr(msg)
And run:
~ python process-a.py &
~ python process-b.py
^C
I believe this is caused by the reader_t peer of the PUB socket's
writers never having an endpoint set; if flushing the pipe fails,
writer_t::flush returns false, which causes a revive command to be
sent to the reader_t peer, but since endpoint is NULL calling endpoint-
>revive obviously crashes the process.
Including an "if (endpoint)" check prior to calling endpoint->revive
fixes the issue, and allows proper termination of the old connection:
// pipe.cpp:82
void zmq::reader_t::process_revive ()
{
if (endpoint)
endpoint->revive (this);
}
This enables the second scenario to work the way I expected, however,
I'm not familiar enough with the codebase to tell if this is the right
thing to do, if it would cause other problems, or if it's not meant to
be supported in the first place and therefore doesn't matter.
Regards,
Ben
More information about the zeromq-dev
mailing list