[zeromq-dev] wedging bug

Andrew Hume andrew at research.att.com
Wed Mar 14 23:19:59 CET 2012


i have a program called portal that takes a socket as input and several output sockets.
i have a thread R that receives messages from the input and a thread S that
sends messages out on one of teh output threads. pseudocode is

tmp_in and tmp_out are the input and output ends of a PUSH/PULL inproc socket
with no queue bounds.

R:
	while(zmq_recv(isock, &msg)){
		// do statistics
		zmq_send(tmp_out, &msg)
	}

S:
	while(zmq_recv(tmp_in, &msg)){
		// do statistics
		// determine which output socket osock
		zmq_send(osock, &msg)
	}

the input socket is a PUSH/PULL with a bound of about 20000 messages, and maybe
	a hundred or so inputs (PUSHers).
the output sockets are PUSH/PULL with a bound of 5000 messages, each going to a
	single process.

ordinarily, this works great; the internal inproc socket remains empty (we drain
it as fast as input comes in. under heavy load, about once or twice a day, this setup wedges;
that is, S is blocked on the zmq_send and and the destination process is blocked on a
zmq_recv.

this wedging occurs with both TCP transport and ipc transport.
when it occurs, killing just the receiving process does not fix teh problem;
all the receiving processes have to be killed.
this occurs under 2.1.7, and under 2.1.11.
i have several portals, each handling messages of different sizes and contents, on each
server (there are 8 servers). when the portal on one server wedges, the portal of the same
type on all the other servers soon (within 5-10 minutes) will wedge.

	any clues or advice?

		andrew

------------------
Andrew Hume  (best -> Telework) +1 623-551-2845
andrew at research.att.com  (Work) +1 973-236-2014
AT&T Labs - Research; member of USENIX and LOPSA




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20120314/61e4a7bd/attachment.htm>


More information about the zeromq-dev mailing list