[zeromq-dev] router socket hangs on write (was detecting dead MDP workers)

Gyorgy Szekely hoditohod at gmail.com
Thu Feb 16 12:22:25 CET 2017


Hi,
Continuing my journey on detecting dead workers I reduced the design to the
minimal, and eliminated the messy file descriptors.
I only have:
- a router socket, with some number of peers
- a monitor socket attached to the router socket

When the monitor detects a disconnect on the router socket:
- do setsockopt(ZMQ_ROUTER_MANDATORY, 1);
- send heartbeat message to every known peer
- if EHOSTUNREACH returned: remove the peer
- do setsockopt(ZMQ_ROUTER_MANDATORY, 0);

What happens: _my app regularly hangs_ in zmq_msg_send(). Roughly 20% of
the invocations. The call never returns, I have to kill the application.

What am I doing wrong??? According to the RFC's router sockets should never
block.
I attached a full stacktrace with info locals and args for each relevant
frame (sorry for the machine readable format).

Env:
libzmq 4.2.1 stable, debug build
Ubuntu 16.04 64bit (the same happens with ubuntu packaged lib)

Regards,
  Gyorgy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20170216/812b4b0c/attachment.htm>
-------------- next part --------------

<741bt
>&"bt\n"
>~"#0  0x00007f827d342e8d in poll () at ../sysdeps/unix/syscall-template.S:84\n"
>~"#1  0x0000000000524c06 in zmq::signaler_t::wait (this=0x2392138, timeout_=-1) at src/signaler.cpp:228\n"
>~"#2  0x00000000005144d1 in zmq::mailbox_t::recv (this=0x23920d0, cmd_=0x7fff10f6bf80, timeout_=-1) at src/mailbox.cpp:81\n"
>~"#3  0x000000000052a443 in zmq::socket_base_t::process_commands (this=0x2393510, timeout_=-1, throttle_=false) at src/socket_base.cpp:1328\n"
>~"#4  0x0000000000529c1c in zmq::socket_base_t::send (this=0x2393510, msg_=0x7fff10f6c190, flags_=2) at src/socket_base.cpp:1142\n"
>~"#5  0x0000000000500bf1 in s_sendmsg (s_=0x2393510, msg_=0x7fff10f6c190, flags_=2) at src/zmq.cpp:375\n"
>~"#6  0x0000000000501aeb in zmq_msg_send (msg_=0x7fff10f6c190, s_=0x2393510, flags_=2) at src/zmq.cpp:642\n"
>~"#7  0x00000000004c7ede in zmq::socket_t::send (this=0x7fff10f6cba0, flags_=2, msg_=...) at /home/twinsen/Git/gehc-av-broker/import/zmq-20160511/include/zmq.hpp:612\n"
>~"#8  zmq::multipart_t::send (this=0x7fff10f6c2d0, socket=...) at /home/twinsen/Git/gehc-av-broker/import/zmq-20160511/include/zmq_addon.hpp:124\n"
>~"#9  0x00000000004c155c in reqrep::Service::handleWorkerDisconnect (this=0x7fff10f6cb70, event=<optimized out>, fd=<optimized out>) at /home/twinsen/Git/gehc-av-broker/src/main/src/reqrepService.cpp:132\n"
>~"#10 0x00000000004d54de in std::function<void (int, int)>::operator()(int, int) const (__args#1=46, __args#0=512, this=0x7fff10f6c610) at /usr/include/c++/5/functional:2267\n"
>~"#11 broker::Monitor::monitorEvent(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void (int, int)>) (this=0x7fff10f6cb70, socketId=<optimized out>, name=..., callback=...) at /home/twinsen/Git/gehc-av-broker/src/main/src/monitor.cpp:134\n"
>~"#12 0x00000000004d613b in broker::Monitor::<lambda()>::operator() (__closure=0x23c8d30) at /home/twinsen/Git/gehc-av-broker/src/main/src/monitor.cpp:69\n"
>~"#13 std::_Function_handler<void(), broker::Monitor::watchSocket(zmq::socket_t&, const string&, std::function<void(int, int)>, int)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/5/functional:1871\n"
>~"#14 0x00000000004d2d84 in std::function<void ()>::operator()() const (this=<optimized out>) at /usr/include/c++/5/functional:2267\n"
>~"#15 broker::Reactor::processPollSet (this=this at entry=0x7fff10f6c8b0) at /home/twinsen/Git/gehc-av-broker/src/main/src/reactor.cpp:210\n"
>~"#16 0x00000000004d2f92 in broker::Reactor::start (this=this at entry=0x7fff10f6c8b0, testMode=testMode at entry=false) at /home/twinsen/Git/gehc-av-broker/src/main/src/reactor.cpp:163\n"
>~"#17 0x00000000004780ae in broker::Broker::start (this=this at entry=0x7fff10f6d78f, reqrepParams=..., pubsubParams=..., httpParams=...) at /home/twinsen/Git/gehc-av-broker/src/main/src/broker.cpp:71\n"
>~"#18 0x0000000000472c86 in main (argc=1, argv=0x7fff10f6d960) at /home/twinsen/Git/gehc-av-broker/src/main/src/main.cpp:30\n"
>741^done


<744frame 1
>&"frame 1\n"
>~"#1  0x0000000000524c06 in zmq::signaler_t::wait (this=0x2392138, timeout_=-1) at src/signaler.cpp:228\n"
>~"228\t    int rc = poll (&pfd, 1, timeout_);\n"
>744^done


<745info locals
>&"info locals\n"
>~"pfd = {fd = 14"
>~", events = 1"
>~", revents = 0"
>~"}"
>~"\n"
>~"rc = 0"
>~"\n"
>745^done


<746info args
>&"info args\n"
>~"this = 0x2392138"
>~"\n"
>~"timeout_ = -1"
>~"\n"
>746^done


<747frame 2
>&"frame 2\n"
>~"#2  0x00000000005144d1 in zmq::mailbox_t::recv (this=0x23920d0, cmd_=0x7fff10f6bf80, timeout_=-1) at src/mailbox.cpp:81\n"
>~"81\t    int rc = signaler.wait (timeout_);\n"
>747^done


<748info locals
>&"info locals\n"
>~"rc = 32767"
>~"\n"
>~"ok = 16"
>~"\n"
>748^done


<749info args
>&"info args\n"
>~"this = 0x23920d0"
>~"\n"
>~"cmd_ = 0x7fff10f6bf80"
>~"\n"
>~"timeout_ = -1"
>~"\n"
>749^done


<750frame 3
>&"frame 3\n"
>~"#3  0x000000000052a443 in zmq::socket_base_t::process_commands (this=0x2393510, timeout_=-1, throttle_=false) at src/socket_base.cpp:1328\n"
>~"1328\t        rc = mailbox->recv (&cmd, timeout_);\n"
>750^done


<751info locals
>&"info locals\n"
>~"rc = 32767"
>~"\n"
>~"cmd = {destination = 0x7fff10f6bfe0"
>~", type = 284606496"
>~", args = {stop = {<No data fields>}"
>~", plug = {<No data fields>}"
>~", own = {object = 0x7fff10f6c060"
>~"}"
>~", attach = {engine = 0x7fff10f6c060"
>~"}"
>~", bind = {pipe = 0x7fff10f6c060"
>~"}"
>~", activate_read = {<No data fields>}"
>~", activate_write = {msgs_read = 140733477994592"
>~"}"
>~", hiccup = {pipe = 0x7fff10f6c060"
>~"}"
>~", pipe_term = {<No data fields>}"
>~", pipe_term_ack = {<No data fields>}"
>~", term_req = {object = 0x7fff10f6c060"
>~"}"
>~", term = {linger = 284606560"
>~"}"
>~", term_ack = {<No data fields>}"
>~", reap = {socket = 0x7fff10f6c060"
>~"}"
>~", reaped = {<No data fields>}"
>~", done = {<No data fields>}"
>~"}"
>~"}"
>~"\n"
>751^done


<752info args
>&"info args\n"
>~"this = 0x2393510"
>~"\n"
>~"timeout_ = -1"
>~"\n"
>~"throttle_ = false"
>~"\n"
>752^done


<753frame 4
>&"frame 4\n"
>~"#4  0x0000000000529c1c in zmq::socket_base_t::send (this=0x2393510, msg_=0x7fff10f6c190, flags_=2) at src/socket_base.cpp:1142\n"
>~"1142\t        if (unlikely (process_commands (timeout, false) != 0)) {\n"
>753^done


<754info locals
>&"info locals\n"
>~"sync_lock = {mutex = 0x0"
>~"}"
>~"\n"
>~"rc = -1"
>~"\n"
>~"timeout = -1"
>~"\n"
>~"end = 0"
>~"\n"
>754^done


<755info args
>&"info args\n"
>~"this = 0x2393510"
>~"\n"
>~"msg_ = 0x7fff10f6c190"
>~"\n"
>~"flags_ = 2"
>~"\n"
>755^done


<756frame 5
>&"frame 5\n"
>~"#5  0x0000000000500bf1 in s_sendmsg (s_=0x2393510, msg_=0x7fff10f6c190, flags_=2) at src/zmq.cpp:375\n"
>~"375\t    int rc = s_->send ((zmq::msg_t *) msg_, flags_);\n"
>756^done


<757info locals
>&"info locals\n"
>~"sz = 5"
>~"\n"
>~"rc = 32767"
>~"\n"
>~"max_msgsz = 37303568"
>~"\n"
>757^done


<758info args
>&"info args\n"
>~"s_ = 0x2393510"
>~"\n"
>~"msg_ = 0x7fff10f6c190"
>~"\n"
>~"flags_ = 2"
>~"\n"
>758^done


<759frame 6
>&"frame 6\n"
>~"#6  0x0000000000501aeb in zmq_msg_send (msg_=0x7fff10f6c190, s_=0x2393510, flags_=2) at src/zmq.cpp:642\n"
>~"642\t    int result = s_sendmsg (s, msg_, flags_);\n"
>759^done


<760info locals
>&"info locals\n"
>~"s = 0x2393510"
>~"\n"
>~"result = 32767"
>~"\n"
>760^done


<761info args
>&"info args\n"
>~"msg_ = 0x7fff10f6c190"
>~"\n"
>~"s_ = 0x2393510"
>~"\n"
>~"flags_ = 2"
>~"\n"
>761^done


<762frame 7
>&"frame 7\n"
>~"#7  0x00000000004c7ede in zmq::socket_t::send (this=0x7fff10f6cba0, flags_=2, msg_=...) at /home/twinsen/Git/gehc-av-broker/import/zmq-20160511/include/zmq.hpp:612\n"
>~"612\t            int nbytes = zmq_msg_send (&(msg_.msg), ptr, flags_);\n"
>762^done


<763info locals
>&"info locals\n"
>~"nbytes = <optimized out>\n"
>763^done


<764info args
>&"info args\n"
>~"this = 0x7fff10f6cba0"
>~"\n"
>~"flags_ = 2"
>~"\n"
>~"msg_ = @0x7fff10f6c190: {msg = {_ = \"\\000\\000\\000\\000\\000\\000\\000\\000\\000\\344<\\230o\\000\\000\\000\\020\\277<\\002\\000\\000\\000\\000x~\\232}\\202\\177\\000\\000\\b\\277<\\002\\000\\000\\000\\000\\242\\005e\\001\\000\\000\\000\\000\\330\\302\\366\\020\\377\\177\\000\\000\\b\\277<\\002\\000\\000\\000\""
>~", p = 0x0"
>~"}"
>~"}"
>~"\n"
>764^done


<765frame 8
>&"frame 8\n"
>~"#8  zmq::multipart_t::send (this=0x7fff10f6c2d0, socket=...) at /home/twinsen/Git/gehc-av-broker/import/zmq-20160511/include/zmq_addon.hpp:124\n"
>~"124\t            if (!socket.send(message, more ? ZMQ_SNDMORE : 0))\n"
>765^done


<766info locals
>&"info locals\n"
>~"message = {msg = {_ = \"\\000\\000\\000\\000\\000\\000\\000\\000\\000\\344<\\230o\\000\\000\\000\\020\\277<\\002\\000\\000\\000\\000x~\\232}\\202\\177\\000\\000\\b\\277<\\002\\000\\000\\000\\000\\242\\005e\\001\\000\\000\\000\\000\\330\\302\\366\\020\\377\\177\\000\\000\\b\\277<\\002\\000\\000\\000\""
>~", p = 0x0"
>~"}"
>~"}"
>~"\n"
>~"more = true"
>~"\n"
>766^done


<767info args
>&"info args\n"
>~"this = 0x7fff10f6c2d0"
>~"\n"
>~"socket = @0x7fff10f6cba0: {ptr = 0x2393510"
>~", ctxptr = 0x238daa0"
>~"}"
>~"\n"
>767^done


More information about the zeromq-dev mailing list