[zeromq-dev] Forwarder stops forwarding

Robin Weisberg robin at scout-trading.com
Thu Jun 10 15:55:00 CEST 2010


So Robert and I have new manifestation of what may be a related problem. The forwarder crashed in this scenario after running for a few days. The backtrace is below. I can send a core if its helpful, or provide some more info...

Thx!
Robin

GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /home/stprod/lib/zmq2/lib/libzmq.so.0...done.
Loaded symbols for /home/stprod/lib/zmq2/lib/libzmq.so.0
Reading symbols from /lib/libuuid.so.1...done.
Loaded symbols for /lib/libuuid.so.1
Reading symbols from /usr/lib/libstdc++.so.6...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /usr/lib/libgthread-2.0.so.0...done.
Loaded symbols for /usr/lib/libgthread-2.0.so.0
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /usr/lib/libglib-2.0.so.0...done.
Loaded symbols for /usr/lib/libglib-2.0.so.0
Reading symbols from /lib/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/libpcre.so.3...done.
Loaded symbols for /lib/libpcre.so.3
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Core was generated by `/home/stprod/lib/zmq2/bin/zmq_forwarder ./config/nsdq_to_hq_out.xml'.
Program terminated with signal 6, Aborted.
[New process 20674]
[New process 20671]
#0  0x00007f6f239f2fb5 in raise () from /lib/libc.so.6
(gdb) bt
#0  0x00007f6f239f2fb5 in raise () from /lib/libc.so.6
#1  0x00007f6f239f4bc3 in abort () from /lib/libc.so.6
#2  0x00007f6f24934c53 in zmq::tcp_connecter_t::connect (this=0x7f6f1c00ed98)
    at tcp_connecter.cpp:283
#3  0x00007f6f24938545 in zmq::zmq_connecter_t::out_event (this=0x7f6f1c00ed50)
    at zmq_connecter.cpp:81
#4  0x00007f6f24922c14 in zmq::epoll_t::loop (this=0xf30f30) at epoll.cpp:189
#5  0x00007f6f249358a7 in zmq::thread_t::thread_routine (arg_=0xf30f70)
    at thread.cpp:99
#6  0x00007f6f241d63ba in start_thread () from /lib/libpthread.so.0
#7  0x00007f6f23aa5fcd in clone () from /lib/libc.so.6
#8  0x0000000000000000 in ?? ()
(gdb) info threads
  2 process 20671  0x00007f6f241dca94 in __lll_lock_wait ()
   from /lib/libpthread.so.0
* 1 process 20674  0x00007f6f239f2fb5 in raise () from /lib/libc.so.6
(gdb) thread 2
[Switching to thread 2 (process 20671)]#0  0x00007f6f241dca94 in __lll_lock_wait () from /lib/libpthread.so.0
(gdb) bt
#0  0x00007f6f241dca94 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007f6f241d8190 in _L_lock_102 () from /lib/libpthread.so.0
#2  0x00007f6f241d7a7e in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x00007f6f24937328 in zmq::ypollset_t::poll (this=<value optimized out>)
    at simple_semaphore.hpp:127
#4  0x00007f6f2491e632 in zmq::app_thread_t::process_commands (this=0xf303f0,
    block_=128, throttle_=192) at app_thread.cpp:93
#5  0x00007f6f249316ed in zmq::socket_base_t::recv (this=0xf31c80,
    msg_=0x7fffe16e7fc0, flags_=0) at socket_base.cpp:380
#6  0x0000000000408225 in main (argc=<value optimized out>,
    argv=<value optimized out>) at ../../include/zmq.hpp:241
(gdb)

Here is the config file:
<forwarder>
    <in>
        <bind addr = "tcp://eth7:31001"/>
        <bind addr = "tcp://eth7:31011"/>
        <bind addr = "tcp://eth7:31004"/>
        <bind addr = "tcp://eth7:31018"/>
        <connect addr = "tcp://prd02-local:31006"/>
    </in>
    <out>
        <bind addr = "tcp://eth0:30006"/>
    </out>
</forwarder>

This is again version 2.0.6 of zmq 
uname -a 
Linux prd01 2.6.28-18-server #59-Ubuntu SMP Thu Jan 28 02:25:03 UTC 2010 x86_64 GNU/Linux


-----Original Message-----
From: zeromq-dev-bounces at lists.zeromq.org [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Martin Sustrik
Sent: Thursday, June 03, 2010 4:07 PM
To: 0MQ development list
Subject: Re: [zeromq-dev] Forwarder stops forwarding

Robert,

Sorry, I've missed the backtrance. However, it doesn't say much. It 
looks like the forwarder is just sitting there waiting for more 
messages. Strange.

Anyway, I cannot help much without being able to reproduce the problem...

Martin


Robert Zhang wrote:
> Backtrace from gdb is at the bottom of the email. When this problem occurs, we see tcp connection is established using netstat. But no messages are flowing through. If we restart the forwarder, everything recovers. 
> We're now running a debug build version, I'll try to get a more detailed backtrace when it happens again.
> 
> -----Original Message-----
> From: zeromq-dev-bounces at lists.zeromq.org [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Martin Sustrik
> Sent: Thursday, June 03, 2010 2:11 AM
> To: 0MQ development list
> Subject: Re: [zeromq-dev] Forwarder stops forwarding
> 
> Robert,
> 
> The problem with forwarder hanging up haven't yet been reported. Have 
> you been able to get back traces?
> 
> As for PGM examples these are a leftover from previous versions of 0MQ 
> and should be removed IMO.
> 
> Martin
> 
> Robert Zhang wrote:
>> Bit more details. We're running the standard download zeromq-2.0.6 from 
>> the website.
>>
>> Linux 2.6.28-18-server #60-Ubuntu SMP
>>
>> Compiled using gcc 4.3.3
>>
>>  
>>
>> Btw, when I configure the package with "--with-cpp --with-pgm 
>> --with-pgm-examples",  the Makefile in the perf directory seems to have 
>> a path problem. It's looking for 
>>  *`../../foreign*/openpgm/libpgm-2.0.24/openpgm/pgm/examples/pgmsend.c' 
>> when it should be 
>>  `*../foreign*/openpgm/libpgm-2.0.24/openpgm/pgm/examples/pgmsend.c'
>>
>>  
>>
>> *From:* Robert Zhang
>> *Sent:* Tuesday, June 01, 2010 11:12 AM
>> *To:* 'zeromq-dev at lists.zeromq.org'
>> *Subject:* Forwarder stops forwarding
>>
>>  
>>
>> Has anyone seen this problem? We have a forwarder that stops forwarding 
>> messages after running for about a day pretty consistently. Seems to be 
>> stuck in a lock. I'll try to run forwarder in a debug build and maybe we 
>> can get a better stack trace.  Thanks.
>>
>>  
>>
>> Config looks like this:
>>
>> <forwarder>
>>
>>     <in>
>>
>>         <connect addr = "tcp://host1:30006"/>
>>
>>     </in>
>>
>>     <out>
>>
>>         <bind addr = "tcp://eth0:30008"/>
>>
>>     </out>
>>
>> </forwarder>
>>
>>  
>>
>> (gdb) info thread
>>
>>   2 Thread 0x7fd0e1d40910 (LWP 14401)  0x00007fd0e292fe48 in epoll_wait 
>> () from /lib/libc.so.6
>>
>>   1 Thread 0x7fd0e3c20750 (LWP 14398)  0x00007fd0e3067c34 in 
>> __lll_lock_wait () from /lib/libpthread.so.0
>>
>>  
>>
>> (gdb) where
>>
>> #0  0x00007fd0e292fe48 in epoll_wait () from /lib/libc.so.6
>>
>> #1  0x00007fd0e37b0b72 in zmq::epoll_t::loop (this=0x1332b70) at 
>> epoll.cpp:161
>>
>> #2  0x00007fd0e37c38a7 in zmq::thread_t::thread_routine (arg_=0x1332bb0) 
>> at thread.cpp:99
>>
>> #3  0x00007fd0e3060a04 in start_thread () from /lib/libpthread.so.0
>>
>> #4  0x00007fd0e292f80d in clone () from /lib/libc.so.6
>>
>> #5  0x0000000000000000 in ?? ()
>>
>>  
>>
>> (gdb) where
>>
>> #0  0x00007fd0e3067c34 in __lll_lock_wait () from /lib/libpthread.so.0
>>
>> #1  0x00007fd0e3063295 in _L_lock_949 () from /lib/libpthread.so.0
>>
>> #2  0x00007fd0e30630b8 in pthread_mutex_lock () from /lib/libpthread.so.0
>>
>> #3  0x00007fd0e37c5328 in zmq::ypollset_t::poll (this=<value optimized 
>> out>) at simple_semaphore.hpp:127
>>
>> #4  0x00007fd0e37ac632 in zmq::app_thread_t::process_commands 
>> (this=0x1331ec0, block_=128, throttle_=false) at app_thread.cpp:93
>>
>> #5  0x00007fd0e37bf6ed in zmq::socket_base_t::recv (this=0x13338c0, 
>> msg_=0x7fffe0c9b300, flags_=0) at socket_base.cpp:380
>>
>> #6  0x0000000000408225 in main (argc=<value optimized out>, argv=<value 
>> optimized out>) at ../../include/zmq.hpp:241
>>
>>  
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> 
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev

_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev



More information about the zeromq-dev mailing list