[zeromq-dev] Forwarder stops forwarding

Robin Weisberg robin at scout-trading.com
Tue Jun 15 15:08:25 CEST 2010


I think we've figured out how this happens, and I don't think its specific to the forwarder.

The exact setup is zmq_forwarder A connecting to zmq_forwarder B, which is binding. The messages are forwarded in the reverse direction i.e. messages are forwarded from B to A. This is done because of firewall restrictions. When there is a network interruption the server (B) detects it and drops the connection however forwarder A never realizes this since the connection isn't shut down cleanly since the host B is on is unreachable.

Netstat confirms that forwarder A still believes its connected to B, but B does not see the connection. In fact you can restart forwarder B many hours after the network disconnect and forwarder A still believes its connected. I'm guessing the issue is that since there is no messages flowing from A to B writes never fail so A never realizes it has been disconnected. I'm somewhat surprised by this, I thought TCP would be able to figure out a connection was disconnected after a minute or 2 on both client and server. Any TCP  experts that can confirm that this behavior is expected? 

We'll try to reproduce it, but that will have to wait until we have an opening to mess w/ our networking infrastructure (probably the weekend).



-----Original Message-----
From: zeromq-dev-bounces at lists.zeromq.org [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Martin Sustrik
Sent: Thursday, June 03, 2010 4:07 PM
To: 0MQ development list
Subject: Re: [zeromq-dev] Forwarder stops forwarding

Robert,

Sorry, I've missed the backtrance. However, it doesn't say much. It 
looks like the forwarder is just sitting there waiting for more 
messages. Strange.

Anyway, I cannot help much without being able to reproduce the problem...

Martin


Robert Zhang wrote:
> Backtrace from gdb is at the bottom of the email. When this problem occurs, we see tcp connection is established using netstat. But no messages are flowing through. If we restart the forwarder, everything recovers. 
> We're now running a debug build version, I'll try to get a more detailed backtrace when it happens again.
> 
> -----Original Message-----
> From: zeromq-dev-bounces at lists.zeromq.org [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Martin Sustrik
> Sent: Thursday, June 03, 2010 2:11 AM
> To: 0MQ development list
> Subject: Re: [zeromq-dev] Forwarder stops forwarding
> 
> Robert,
> 
> The problem with forwarder hanging up haven't yet been reported. Have 
> you been able to get back traces?
> 
> As for PGM examples these are a leftover from previous versions of 0MQ 
> and should be removed IMO.
> 
> Martin
> 
> Robert Zhang wrote:
>> Bit more details. We're running the standard download zeromq-2.0.6 from 
>> the website.
>>
>> Linux 2.6.28-18-server #60-Ubuntu SMP
>>
>> Compiled using gcc 4.3.3
>>
>>  
>>
>> Btw, when I configure the package with "--with-cpp --with-pgm 
>> --with-pgm-examples",  the Makefile in the perf directory seems to have 
>> a path problem. It's looking for 
>>  *`../../foreign*/openpgm/libpgm-2.0.24/openpgm/pgm/examples/pgmsend.c' 
>> when it should be 
>>  `*../foreign*/openpgm/libpgm-2.0.24/openpgm/pgm/examples/pgmsend.c'
>>
>>  
>>
>> *From:* Robert Zhang
>> *Sent:* Tuesday, June 01, 2010 11:12 AM
>> *To:* 'zeromq-dev at lists.zeromq.org'
>> *Subject:* Forwarder stops forwarding
>>
>>  
>>
>> Has anyone seen this problem? We have a forwarder that stops forwarding 
>> messages after running for about a day pretty consistently. Seems to be 
>> stuck in a lock. I'll try to run forwarder in a debug build and maybe we 
>> can get a better stack trace.  Thanks.
>>
>>  
>>
>> Config looks like this:
>>
>> <forwarder>
>>
>>     <in>
>>
>>         <connect addr = "tcp://host1:30006"/>
>>
>>     </in>
>>
>>     <out>
>>
>>         <bind addr = "tcp://eth0:30008"/>
>>
>>     </out>
>>
>> </forwarder>
>>
>>  
>>
>> (gdb) info thread
>>
>>   2 Thread 0x7fd0e1d40910 (LWP 14401)  0x00007fd0e292fe48 in epoll_wait 
>> () from /lib/libc.so.6
>>
>>   1 Thread 0x7fd0e3c20750 (LWP 14398)  0x00007fd0e3067c34 in 
>> __lll_lock_wait () from /lib/libpthread.so.0
>>
>>  
>>
>> (gdb) where
>>
>> #0  0x00007fd0e292fe48 in epoll_wait () from /lib/libc.so.6
>>
>> #1  0x00007fd0e37b0b72 in zmq::epoll_t::loop (this=0x1332b70) at 
>> epoll.cpp:161
>>
>> #2  0x00007fd0e37c38a7 in zmq::thread_t::thread_routine (arg_=0x1332bb0) 
>> at thread.cpp:99
>>
>> #3  0x00007fd0e3060a04 in start_thread () from /lib/libpthread.so.0
>>
>> #4  0x00007fd0e292f80d in clone () from /lib/libc.so.6
>>
>> #5  0x0000000000000000 in ?? ()
>>
>>  
>>
>> (gdb) where
>>
>> #0  0x00007fd0e3067c34 in __lll_lock_wait () from /lib/libpthread.so.0
>>
>> #1  0x00007fd0e3063295 in _L_lock_949 () from /lib/libpthread.so.0
>>
>> #2  0x00007fd0e30630b8 in pthread_mutex_lock () from /lib/libpthread.so.0
>>
>> #3  0x00007fd0e37c5328 in zmq::ypollset_t::poll (this=<value optimized 
>> out>) at simple_semaphore.hpp:127
>>
>> #4  0x00007fd0e37ac632 in zmq::app_thread_t::process_commands 
>> (this=0x1331ec0, block_=128, throttle_=false) at app_thread.cpp:93
>>
>> #5  0x00007fd0e37bf6ed in zmq::socket_base_t::recv (this=0x13338c0, 
>> msg_=0x7fffe0c9b300, flags_=0) at socket_base.cpp:380
>>
>> #6  0x0000000000408225 in main (argc=<value optimized out>, argv=<value 
>> optimized out>) at ../../include/zmq.hpp:241
>>
>>  
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> 
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev

_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev



More information about the zeromq-dev mailing list