[zeromq-dev] Process eating 100 % of one core

Emmanuel TAUREL taurel at esrf.fr
Fri Nov 7 11:30:57 CET 2014


Hi Gerry,

On 07/11/2014 11:26, Gerry Steele wrote:
>
> How long does the cpu use last for when it does happen? Or does it 
> stay at 100pct till restart?
>
It stays at 100 % until we force ZMQ to do something like publishing a 
message

Emmanuel

> On 7 Nov 2014 10:21, "Emmanuel TAUREL" <taurel at esrf.fr 
> <mailto:taurel at esrf.fr>> wrote:
>
>     Hello all,
>
>     We are using ZMQ (still release 3.2.4) mainly on Linux boxes. We are
>     using the PUB/SUB model.
>     Our system runs 24/7. From time to time, we have some of our PUB
>     processes eating 100 % of one core of our CPU's.
>     We don't know yet what exactly triggers this phenomenon and
>     therefore we
>     are not able to repoduce it. It does not happen so often (once
>     every 3/6
>     months!!)
>     Nevertheless, we did some analysis last time it happens.
>
>     Here are the result of "strace" on the PUB process
>
>     2889  10:53:18.021013 epoll_wait(19, {{EPOLLERR|EPOLLHUP,
>     {u32=335547808, u64=140097873776032}}}, 256, 4294967295) = 1
>     2889  10:53:18.021041 epoll_ctl(19, EPOLL_CTL_MOD, 49, {0,
>     {u32=335547808, u64=140097873776032}}) = 0
>     2889  10:53:18.021068 epoll_wait(19, {{EPOLLERR|EPOLLHUP,
>     {u32=335547808, u64=140097873776032}}}, 256, 4294967295) = 1
>     2889  10:53:18.021096 epoll_ctl(19, EPOLL_CTL_MOD, 49, {0,
>     {u32=335547808, u64=140097873776032}}) = 0
>     2889  10:53:18.021123 epoll_wait(19, {{EPOLLERR|EPOLLHUP,
>     {u32=335547808, u64=140097873776032}}}, 256, 4294967295) = 1
>     2889  10:53:18.021151 epoll_ctl(19, EPOLL_CTL_MOD, 49, {0,
>     {u32=335547808, u64=140097873776032}}) = 0
>     2889  10:53:18.021178 epoll_wait(19, {{EPOLLERR|EPOLLHUP,
>     {u32=335547808, u64=140097873776032}}}, 256, 4294967295) = 1
>     2889  10:53:18.021206 epoll_ctl(19, EPOLL_CTL_MOD, 49, {0,
>     {u32=335547808, u64=140097873776032}}) = 0
>     2889  10:53:18.021233 epoll_wait(19, {{EPOLLERR|EPOLLHUP,
>     {u32=335547808, u64=140097873776032}}}, 256, 4294967295) = 1
>     2889  10:53:18.021260 epoll_ctl(19, EPOLL_CTL_MOD, 49, {0,
>     {u32=335547808, u64=140097873776032}}) = 0
>     2889  10:53:18.021288 epoll_wait(19, {{EPOLLERR|EPOLLHUP,
>     {u32=335547808, u64=140097873776032}}}, 256, 4294967295) = 1
>
>      From the number of couples epoll_wait()/epoll_ctl() and their
>     period (2
>     times in in 100 us), it is clear that this is this thread which
>     eats the
>     CPU.
>     Form the flag returned by epoll_wait() (EPOLLERR|EPOLLHUP), it seems
>     that something wrong happens on one of the file descriptor (number
>     49 if
>     I look
>     at epoll_ctl() argument. It is confirmed by the result of "lsof"
>     on the
>     same PUB process:
>
>     Starter 2863 dserver   49u  sock                0,6      0t0 7902
>     can't
>     identify protocol
>
>     If I take control of the PUB process with gdb and if I request for
>     this
>     thread stack trace, I have
>
>     #0  0x00007fb65d3205ca in epoll_ctl () from
>     /lib/x86_64-linux-gnu/libc.so.6
>     #1  0x00007fb65e23c298 in zmq::epoll_t::reset_pollin
>     (this=<optimized out>,
>          handle_=<optimized out>) at epoll.cpp:101
>     #2  0x00007fb65e253da1 in zmq::stream_engine_t::in_event
>     (this=0x7fb6509d8c10)
>          at stream_engine.cpp:216
>     #3  0x00007fb65e23c46b in zmq::epoll_t::loop (this=0x7fb6611c5b70)
>          at epoll.cpp:154
>     #4  0x00007fb65e257de6 in thread_routine (arg_=0x7fb6611c5be0) at
>     thread.cpp:83
>     #5  0x00007fb65de0d0a4 in start_thread ()
>         from /lib/x86_64-linux-gnu/libpthread.so.0
>     #6  0x00007fb65d32004d in clone () from
>     /lib/x86_64-linux-gnu/libc.so.6
>
>     Even if something wrong has happened on the socket associated to
>     fd 49,
>     I think Zmq should not enter into a "crazy" loop.
>     Is it a known issue?
>     Is there something we could do to prevent this to happen anymore?
>
>     Thank's in advance for your help
>
>     Emmanuel
>
>
>
>     _______________________________________________
>     zeromq-dev mailing list
>     zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>     http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20141107/50841c1f/attachment.htm>


More information about the zeromq-dev mailing list