[zeromq-dev] wedging bug

Jon Dyte jon at totient.co.uk
Thu Mar 15 21:11:08 CET 2012


Hi Andrew

Just reading this trying to make sense of what you are describing

each S thread has it own set of output sockets yes?

and each one of these sockets is connected to an external process over 
either tcp or ipc?

could you create a simple example which just replicated the a few 'S' 
threads spinning very fast just pushing messages out
over the various output sockets to these external processes?

jon

On 14/03/12 22:19, Andrew Hume wrote:
> i have a program called portal that takes a socket as input and 
> several output sockets.
> i have a thread R that receives messages from the input and a thread S 
> that
> sends messages out on one of teh output threads. pseudocode is
>
> tmp_in and tmp_out are the input and output ends of a PUSH/PULL inproc 
> socket
> with no queue bounds.
>
> R:
> while(zmq_recv(isock, &msg)){
> // do statistics
> zmq_send(tmp_out, &msg)
> }
>
> S:
> while(zmq_recv(tmp_in, &msg)){
> // do statistics
> // determine which output socket osock
> zmq_send(osock, &msg)
> }
>
> the input socket is a PUSH/PULL with a bound of about 20000 messages, 
> and maybe
> a hundred or so inputs (PUSHers).
> the output sockets are PUSH/PULL with a bound of 5000 messages, each 
> going to a
> single process.
>
> ordinarily, this works great; the internal inproc socket remains empty 
> (we drain
> it as fast as input comes in. under heavy load, about once or twice a 
> day, this setup wedges;
> that is, S is blocked on the zmq_send and and the destination process 
> is blocked on a
> zmq_recv.
>
> this wedging occurs with both TCP transport and ipc transport.
> when it occurs, killing just the receiving process does not fix teh 
> problem;
> all the receiving processes have to be killed.
> this occurs under 2.1.7, and under 2.1.11.
> i have several portals, each handling messages of different sizes and 
> contents, on each
> server (there are 8 servers). when the portal on one server wedges, 
> the portal of the same
> type on all the other servers soon (within 5-10 minutes) will wedge.
>
> any clues or advice?
>
> andrew
>
> ------------------
> Andrew Hume (best -> Telework) +1 623-551-2845
> andrew at research.att.com <mailto:andrew at research.att.com> (Work) +1 
> 973-236-2014
> AT&T Labs - Research; member of USENIX and LOPSA
>
>
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev




More information about the zeromq-dev mailing list