[zeromq-dev] 0MQ/2.0 segmentation fault for PUB/bind, SUB/connect
Martin Sustrik
sustrik at fastmq.com
Tue Oct 20 12:26:25 CEST 2009
Hi Ben,
I've created a ticket for the issue (ZMQII-20). We'll have a look at it
shortly.
Martin
> I've been experimenting with the PUB/SUB functionality in 0MQ/2.0
> (both alpha 3 and the latest revision from GitHub), and have come
> across a segmentation fault which could either be the result of me
> doing something unsupported, or an bug in pipe termination.
>
> The following works fine:
> * Create a PUB socket in Process A, connect to tcp://127.0.0.1:5660
> * Create a SUB socket in Process B, bind to tcp://0.0.0.0:5660
> * Send a message from Process A
> * Receive the message from Process B, then terminate
> * Send another message from Process A
>
> However, using connect/bind the other way round crashes Process A when
> Process B terminates:
> * Create a PUB socket in Process A, bind to tcp://0.0.0.0:5660
> * Create a SUB socket in Process B, connect to tcp://127.0.0.1:5660
> * Send a message from Process A
> * Receive the message from Process B, then terminate
> * Send another message from Process A
> * Process A will terminate with a segmentation fault in pipe.cpp:84
> (process_revive)
>
> It seems to me that the second scenario -- binding on the PUB side and
> connecting on the SUB side -- would be the most common case, as it
> enables subscribers to connect/disconnect on an ad-hoc basis. However,
> perf/python/local_thr.py and perf/python/remote_thr.py work as
> described in the first scenario, which makes me wonder if I'm missing
> something?
>
> If the second scenario is meant to be supported, here's some more
> detail on the issue, and a potential fix:
>
> process-a.py:
> import libpyzmq, time
> ctx = libpyzmq.Context(1,1)
> s = libpyzmq.Socket(ctx, libpyzmq.PUB)
> s.bind('tcp://0.0.0.0:5660')
> while True:
> s.send("\x04a.xymessage")
> time.sleep(1.0)
>
> process-b.py:
> import libpyzmq, time
> ctx = libpyzmq.Context(1,1)
> s = libpyzmq.Socket(ctx, libpyzmq.SUB)
> s.connect('tcp://127.0.0.1:5660')
> s.setsockopt(libpyzmq.SUBSCRIBE, '*')
> while True:
> msg = s.recv()
> print repr(msg)
>
> And run:
> ~ python process-a.py &
> ~ python process-b.py
> ^C
>
> I believe this is caused by the reader_t peer of the PUB socket's
> writers never having an endpoint set; if flushing the pipe fails,
> writer_t::flush returns false, which causes a revive command to be
> sent to the reader_t peer, but since endpoint is NULL calling endpoint-
> >revive obviously crashes the process.
>
> Including an "if (endpoint)" check prior to calling endpoint->revive
> fixes the issue, and allows proper termination of the old connection:
>
> // pipe.cpp:82
> void zmq::reader_t::process_revive ()
> {
> if (endpoint)
> endpoint->revive (this);
> }
>
> This enables the second scenario to work the way I expected, however,
> I'm not familiar enough with the codebase to tell if this is the right
> thing to do, if it would cause other problems, or if it's not meant to
> be supported in the first place and therefore doesn't matter.
>
> Regards,
> Ben
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
More information about the zeromq-dev
mailing list