[zeromq-dev] zmq-signaler-port-sync

KIU Shueng Chuan nixchuan at gmail.com
Thu Feb 14 01:46:21 CET 2013


Are we okay with using assertions to catch both
- programming errors
- rare but known situations not handled in the code? (in this case resource
exhaustion)

How about this: Have make_fdpair() return -1 (and release the critical
section) on error returns from the calls to connect() and accept() only.
The failure will be caught by signaler_t() which calls make_fdpair()



On Wed, Feb 13, 2013 at 7:22 PM, Pau <pau at teleopsia.com> wrote:

>  Hi, thanks
>
> I do not want to look bungler, but wouldn't be a shortcut to implement
> asserts that clean the event before aborting?
>
> El 13/02/2013 9:54, KIU Shueng Chuan escribió:
>
> Hi Pau,
>
>  The system-wide critical section is currently implemented using a win32
> Event which, as you observed, has the possibility of resulting in a
> deadlock in the following situation:
> 1) Process A takes the Event
> 2) Process B tries to take the Event and blocks
> 3) Process A aborts within the critical section (due to an assertion being
> raised)
> 4) Since Process B has opened the Event, the OS will not clean up the
> Event.
> 5) Process B and any subsequent process will now block forever for the
> Event.
>
>  As I mentioned in the previous mail, if the critical section were to be
> implemented using a Mutex instead, then in step 5, Process B would be able
> to enter the critical section with a return code of WAIT_ABANDONED from
> WaitForSingleObject. (Or at least that's what I read from MSDN)
>
>  Note: If Process A aborted due to some exhaustion of resources, then
> Process B would likely hit the same assertion too.
>
>  The question is how to convert the Event to a Mutex and yet not break
> compatibility with existing applications using older versions of the
> library.
>
>
>
> On Wed, Feb 13, 2013 at 3:28 PM, Pau <pau at teleopsia.com> wrote:
>
>>  Hi,
>>
>> I am back with the asserts happening inside a critical section in
>> signaler.cpp.
>> The problem still is that in signale.cpp make_fdpair(..) creates
>> system-wide critical section and does a number of things that can generate
>> a wsa_assert() or win_assert() before releasing the session.
>>
>> I have seen that in the trunk someone has added a CloseHandle(sync) at
>> the end of the function, I do not know if it had something related with
>> this but I understand that the problem is still there. wsa_assert() and
>> wsa_windows() end up in  RaiseException (0x40000015,
>> EXCEPTION_NONCONTINUABLE, 1, extra_info) which I understand is a cul de sac
>> that has no way out to clean up before leaving.
>>
>> I guess we need a special assert function to use inside this critical but
>> I'd like a more documented opinion (Kiu?).
>>
>> thanks,
>>
>> Pau Ceano
>>
>> El 21/01/2013 23:37, KIU Shueng Chuan escribió:
>>
>> Hi Pau, a fix for the assertion on connection to port 5905 is in trunk
>> branch.
>>
>> I think the dangling critical section possibility could be fixed by
>> changing the Event to a Mutex. When an assertion occurs the mutex would
>> just be abandoned. However this change will cause backward compatibility
>> issues with older versions.
>>  On Jan 22, 2013 2:04 AM, "Pieter Hintjens" <ph at imatix.com> wrote:
>>
>>> Hi Pau,
>>>
>>> So there are two different problems here, one is that we're hitting a
>>> socket limit on WXP, and the other is that the asserts are happening
>>> inside a critical section.
>>>
>>> I don't think we can fix the first one easily but we can presumably
>>> assert in a smarter way. Do you want to try making a patch for this?
>>>
>>> -Pieter
>>>
>>> On Mon, Jan 21, 2013 at 6:23 PM, Pau <pau at teleopsia.com> wrote:
>>> >
>>> > Hi,
>>> >
>>> >
>>> > I am using (not yet in production) ZMQ on Windows and I have found what
>>> > I think is a big problem for Windows users.
>>> > We use WXP and W7 and Visual C++ different versions. ZMQ version 3.2.0
>>> > (as far as I see the same problem happens in 3.2.2)
>>> >
>>> > I do not fully understand ZMQ internals but I've seen that every time a
>>> > socket is created the function make_fdpair(..) is called and in
>>> > signaler.cpp(line244) a system event "zmq-signaler-port-sync" is
>>> created.
>>> > This event is used as a system-wide critical section and, so all
>>> > applications that try to create an event will WaitForSingleObject
>>> (sync,
>>> > INFINITE) until  SetEvent (...) is called.
>>> > The problem is that the code between:
>>> >   HANDLE sync = CreateEvent (NULL, FALSE, TRUE, TEXT
>>> > ("zmq-signaler-port-sync"));
>>> > and
>>> > SetEvent (sync);
>>> > is full of wsa_asserts(..) that will terminate the application if
>>> > something goes wrong.
>>> >
>>> > It is clear that terminating the application not leaving the
>>> system-wide
>>> > critical section is a bad idea because all applications in the system
>>> > will hang and you will have to stop all them to start again.
>>> > I understand that no errors should happen but anyway to escape from the
>>> > error is not a good idea in this case.
>>> >
>>> > I do not know all possible reasons to generate a fatal wsa_assert(..)
>>> > but there is at least one:
>>> >
>>> > I have seen that in XP it is possible that line 301  rc = connect (*w_,
>>> > (sockaddr *) &addr, sizeof (addr)); returns an error when a socket
>>> tries
>>> > to connect to 5905 and this has happened many times.
>>> > Windows uses port numbers starting near 1400 and XP has a limit at
>>> 5000.
>>> > In W7 this does not look as a problem because maximum is 65000
>>> > It sounds as if the number was big enough but apart from the fact that
>>> > ZMQ uses a big number of connections (at least in my tests) I have
>>> > experienced that Windows jumps port numbers by 7, so 5000 happens
>>> > sometimes with catastrophic consequences.
>>> >
>>> > best,
>>> >
>>> > Pau Ceano
>>> > _______________________________________________
>>> > zeromq-dev mailing list
>>> > zeromq-dev at lists.zeromq.org
>>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>
>>
>> _______________________________________________
>> zeromq-dev mailing listzeromq-dev at lists.zeromq.orghttp://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>
>
> _______________________________________________
> zeromq-dev mailing listzeromq-dev at lists.zeromq.orghttp://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130214/b75cdcdd/attachment.html>


More information about the zeromq-dev mailing list