[zeromq-dev] hangs when opening/closing sockets "frequently" [2.0.7, OS/X] [re-post]
Martin Hurton
hurtonm at gmail.com
Thu Jul 15 12:17:39 CEST 2010
Hi Matt,
I think I have a fix for this bug. You can find it in my private
repository (http://github.com/hurtonm/zeromq2/commit/f6dc098024ad44f07cf0c05d40e8c9c446cb699c).
Please test and report back.
- Martin
On Thu, Jul 8, 2010 at 11:30 PM, Matt Weinstein
<matt_weinstein at yahoo.com> wrote:
>
> On Jul 8, 2010, at 5:01 PM, Matt Weinstein wrote:
>
> [Moderator - Please kill my prior message, I joined the list ;-) ]
> Folks,
>
> I have a client and server using REQ and REP, running the a ZMQ_QUEUE
> device.
> REQ -- [TCP localhost] - XREP - ZMQ_QUEUE - XREQ - [INPROC] - REP
> To handle timeouts, the client is closing its socket, and opening a new
> socket, whenever it sees a null packet coming from the server:
>
> // check to see if we're a special case
> if (reply.size() == 0) {
> delete psocket;
> psocket = new zmq::socket_t(*pctx, ZMQ_REQ);
> assert(psocket != NULL);
> psocket->connect(client_connect);
> }
> I have a server sending close replies ever 10th message.
> After a few hundred cycles, things hang, see below.
> I've done a git of the latest 2.0.7, as I needed the fix for bug 38
> (Assertion failed: fetched (xrep.cpp:196)), which had been biting me.
> Any thoughts?
>
> I played around a bit, and the problem goes away if I insert a usleep()
> strategically in one of two places (where it --helps). My feeling is that
> there may be a race condition related to tearing down the actual TCP socket,
> or a timing problem allocating and deallocating a ypipe. I tried using an
> OSMemoryBarrier (OS/X) but that didn't help. I haven't tried different
> usleep() values:
> if (reply.size() == 0) {
> // usleep(10000); -- does not help
> delete psocket;
> // usleep(10000); //-- helps here
> psocket = new zmq::socket_t(*pctx, ZMQ_REQ);
> assert(psocket != NULL);
> usleep(10000); //-- helps here
> psocket->connect(client_connect);
> }
>
> After a long term test, this solution didn't work. Threads slowly hang, and
> eventually I got a SEGV.
>
> The problem is reproducible (easily) on OS/X.
>
> Code is available. Environment: OS/X Leopard.
>
> Thanks,
> Best,
> Matt
> client recv: Xthread# 0x10040a000 request# 297
> client send: thread# 0x10040a000 request# 298
> server recv: thread# 0x10040a000 request# 298
> server send thread# 0x10040a000 request# 298
> server send complete
> client recv: Xthread# 0x10040a000 request# 298
> client send: thread# 0x10040a000 request# 299
> server recv: thread# 0x10040a000 request# 299
> server send thread# 0x10040a000 request# 299
> server send complete
> client recv: Xthread# 0x10040a000 request# 299
> client send: thread# 0x10040a000 request# 300
> server recv: thread# 0x10040a000 request# 300
> server send null for thread# 0x10040a000 request# 300
> client recv:
> client send: thread# 0x10040a000 request# 301
> server recv: thread# 0x10040a000 request# 301
> server send thread# 0x10040a000 request# 301
> server send complete
> --- I expected to see this, it never showed up:
> client recv: Xthread# 0x10040a000 request# 301
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
More information about the zeromq-dev
mailing list