[zeromq-dev] wedging bug

Andrew Hume andrew at research.att.com
Thu Mar 15 22:55:38 CET 2012


answers below.

after talking about this to a colleague for an hour today,
i have an experiment to try which i expect to resolve the issue.
(the experiment involves avoiding something clever i do
in the destination process.) but i am still keen to see if others have
any other ideas.

	thanks

On Mar 15, 2012, at 1:11 PM, Jon Dyte w rote:

> Hi Andrew
> 
> Just reading this trying to make sense of what you are describing
> 
> each S thread has it own set of output sockets yes?
yes
> 
> and each one of these sockets is connected to an external process over 
> either tcp or ipc?
yes
> 
> could you create a simple example which just replicated the a few 'S' 
> threads spinning very fast just pushing messages out
> over the various output sockets to these external processes?
> 
i have but have never seen this bug.
 
> jon
> 
> On 14/03/12 22:19, Andrew Hume wrote:
>> i have a program called portal that takes a socket as input and 
>> several output sockets.
>> i have a thread R that receives messages from the input and a thread S 
>> that
>> sends messages out on one of teh output threads. pseudocode is
>> 
>> tmp_in and tmp_out are the input and output ends of a PUSH/PULL inproc 
>> socket
>> with no queue bounds.
>> 
>> R:
>> while(zmq_recv(isock, &msg)){
>> // do statistics
>> zmq_send(tmp_out, &msg)
>> }
>> 
>> S:
>> while(zmq_recv(tmp_in, &msg)){
>> // do statistics
>> // determine which output socket osock
>> zmq_send(osock, &msg)
>> }
>> 
>> the input socket is a PUSH/PULL with a bound of about 20000 messages, 
>> and maybe
>> a hundred or so inputs (PUSHers).
>> the output sockets are PUSH/PULL with a bound of 5000 messages, each 
>> going to a
>> single process.
>> 
>> ordinarily, this works great; the internal inproc socket remains empty 
>> (we drain
>> it as fast as input comes in. under heavy load, about once or twice a 
>> day, this setup wedges;
>> that is, S is blocked on the zmq_send and and the destination process 
>> is blocked on a
>> zmq_recv.
>> 
>> this wedging occurs with both TCP transport and ipc transport.
>> when it occurs, killing just the receiving process does not fix teh 
>> problem;
>> all the receiving processes have to be killed.
>> this occurs under 2.1.7, and under 2.1.11.
>> i have several portals, each handling messages of different sizes and 
>> contents, on each
>> server (there are 8 servers). when the portal on one server wedges, 
>> the portal of the same
>> type on all the other servers soon (within 5-10 minutes) will wedge.
>> 
>> any clues or advice?
>> 
>> andrew
>> 
>> ------------------
>> Andrew Hume (best -> Telework) +1 623-551-2845
>> andrew at research.att.com <mailto:andrew at research.att.com> (Work) +1 
>> 973-236-2014
>> AT&T Labs - Research; member of USENIX and LOPSA
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> 
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev


------------------
Andrew Hume  (best -> Telework) +1 623-551-2845
andrew at research.att.com  (Work) +1 973-236-2014
AT&T Labs - Research; member of USENIX and LOPSA




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20120315/82aa3e64/attachment.htm>


More information about the zeromq-dev mailing list