[zeromq-dev] 0MQ/2.0 segmentation fault for PUB/bind, SUB/connect

Martin Sustrik sustrik at fastmq.com
Tue Oct 20 12:26:25 CEST 2009


Hi Ben,

I've created a ticket for the issue (ZMQII-20). We'll have a look at it 
shortly.

Martin

> I've been experimenting with the PUB/SUB functionality in 0MQ/2.0  
> (both alpha 3 and the latest revision from GitHub), and have come  
> across a segmentation fault which could either be the result of me  
> doing something unsupported, or an bug in pipe termination.
> 
> The following works fine:
> * Create a PUB socket in Process A, connect to tcp://127.0.0.1:5660
> * Create a SUB socket in Process B, bind to tcp://0.0.0.0:5660
> * Send a message from Process A
> * Receive the message from Process B, then terminate
> * Send another message from Process A
> 
> However, using connect/bind the other way round crashes Process A when  
> Process B terminates:
> * Create a PUB socket in Process A, bind to tcp://0.0.0.0:5660
> * Create a SUB socket in Process B, connect to tcp://127.0.0.1:5660
> * Send a message from Process A
> * Receive the message from Process B, then terminate
> * Send another message from Process A
> * Process A will terminate with a segmentation fault in pipe.cpp:84  
> (process_revive)
> 
> It seems to me that the second scenario -- binding on the PUB side and  
> connecting on the SUB side -- would be the most common case, as it  
> enables subscribers to connect/disconnect on an ad-hoc basis. However,  
> perf/python/local_thr.py and perf/python/remote_thr.py work as  
> described in the first scenario, which makes me wonder if I'm missing  
> something?
> 
> If the second scenario is meant to be supported, here's some more  
> detail on the issue, and a potential fix:
> 
> process-a.py:
> import libpyzmq, time
> ctx = libpyzmq.Context(1,1)
> s = libpyzmq.Socket(ctx, libpyzmq.PUB)
> s.bind('tcp://0.0.0.0:5660')
> while True:
>      s.send("\x04a.xymessage")
>      time.sleep(1.0)
> 
> process-b.py:
> import libpyzmq, time
> ctx = libpyzmq.Context(1,1)
> s = libpyzmq.Socket(ctx, libpyzmq.SUB)
> s.connect('tcp://127.0.0.1:5660')
> s.setsockopt(libpyzmq.SUBSCRIBE, '*')
> while True:
>      msg = s.recv()
>      print repr(msg)
> 
> And run:
> ~ python process-a.py &
> ~ python process-b.py
> ^C
> 
> I believe this is caused by the reader_t peer of the PUB socket's  
> writers never having an endpoint set; if flushing the pipe fails,  
> writer_t::flush returns false, which causes a revive command to be  
> sent to the reader_t peer, but since endpoint is NULL calling endpoint- 
>  >revive obviously crashes the process.
> 
> Including an "if (endpoint)" check prior to calling endpoint->revive  
> fixes the issue, and allows proper termination of the old connection:
> 
> // pipe.cpp:82
> void zmq::reader_t::process_revive ()
> {
>      if (endpoint)
>          endpoint->revive (this);
> }
> 
> This enables the second scenario to work the way I expected, however,  
> I'm not familiar enough with the codebase to tell if this is the right  
> thing to do, if it would cause other problems, or if it's not meant to  
> be supported in the first place and therefore doesn't matter.
> 
> Regards,
> Ben
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev




More information about the zeromq-dev mailing list