[zeromq-dev] zmq-signaler-port-sync
Pau
pau at teleopsia.com
Thu Feb 14 09:09:29 CET 2013
Sounds good to me.
El 14/02/2013 1:46, KIU Shueng Chuan escribió:
> Are we okay with using assertions to catch both
> - programming errors
> - rare but known situations not handled in the code? (in this case
> resource exhaustion)
>
> How about this: Have make_fdpair() return -1 (and release the critical
> section) on error returns from the calls to connect() and accept() only.
> The failure will be caught by signaler_t() which calls make_fdpair()
>
>
>
> On Wed, Feb 13, 2013 at 7:22 PM, Pau <pau at teleopsia.com
> <mailto:pau at teleopsia.com>> wrote:
>
> Hi, thanks
>
> I do not want to look bungler, but wouldn't be a shortcut to
> implement asserts that clean the event before aborting?
>
> El 13/02/2013 9:54, KIU Shueng Chuan escribió:
>> Hi Pau,
>>
>> The system-wide critical section is currently implemented using a
>> win32 Event which, as you observed, has the possibility of
>> resulting in a deadlock in the following situation:
>> 1) Process A takes the Event
>> 2) Process B tries to take the Event and blocks
>> 3) Process A aborts within the critical section (due to an
>> assertion being raised)
>> 4) Since Process B has opened the Event, the OS will not clean up
>> the Event.
>> 5) Process B and any subsequent process will now block forever
>> for the Event.
>>
>> As I mentioned in the previous mail, if the critical section were
>> to be implemented using a Mutex instead, then in step 5, Process
>> B would be able to enter the critical section with a return code
>> of WAIT_ABANDONED from WaitForSingleObject. (Or at least that's
>> what I read from MSDN)
>>
>> Note: If Process A aborted due to some exhaustion of resources,
>> then Process B would likely hit the same assertion too.
>>
>> The question is how to convert the Event to a Mutex and yet not
>> break compatibility with existing applications using older
>> versions of the library.
>>
>>
>>
>> On Wed, Feb 13, 2013 at 3:28 PM, Pau <pau at teleopsia.com
>> <mailto:pau at teleopsia.com>> wrote:
>>
>> Hi,
>>
>> I am back with the asserts happening inside a critical
>> section in signaler.cpp.
>> The problem still is that in signale.cpp make_fdpair(..)
>> creates system-wide critical section and does a number of
>> things that can generate a wsa_assert() or win_assert()
>> before releasing the session.
>>
>> I have seen that in the trunk someone has added a
>> CloseHandle(sync) at the end of the function, I do not know
>> if it had something related with this but I understand that
>> the problem is still there. wsa_assert() and wsa_windows()
>> end up in RaiseException (0x40000015,
>> EXCEPTION_NONCONTINUABLE, 1, extra_info) which I understand
>> is a cul de sac that has no way out to clean up before leaving.
>>
>> I guess we need a special assert function to use inside this
>> critical but I'd like a more documented opinion (Kiu?).
>>
>> thanks,
>>
>> Pau Ceano
>>
>> El 21/01/2013 23:37, KIU Shueng Chuan escribió:
>>>
>>> Hi Pau, a fix for the assertion on connection to port 5905
>>> is in trunk branch.
>>>
>>> I think the dangling critical section possibility could be
>>> fixed by changing the Event to a Mutex. When an assertion
>>> occurs the mutex would just be abandoned. However this
>>> change will cause backward compatibility issues with older
>>> versions.
>>>
>>> On Jan 22, 2013 2:04 AM, "Pieter Hintjens" <ph at imatix.com
>>> <mailto:ph at imatix.com>> wrote:
>>>
>>> Hi Pau,
>>>
>>> So there are two different problems here, one is that
>>> we're hitting a
>>> socket limit on WXP, and the other is that the asserts
>>> are happening
>>> inside a critical section.
>>>
>>> I don't think we can fix the first one easily but we can
>>> presumably
>>> assert in a smarter way. Do you want to try making a
>>> patch for this?
>>>
>>> -Pieter
>>>
>>> On Mon, Jan 21, 2013 at 6:23 PM, Pau <pau at teleopsia.com
>>> <mailto:pau at teleopsia.com>> wrote:
>>> >
>>> > Hi,
>>> >
>>> >
>>> > I am using (not yet in production) ZMQ on Windows and
>>> I have found what
>>> > I think is a big problem for Windows users.
>>> > We use WXP and W7 and Visual C++ different versions.
>>> ZMQ version 3.2.0
>>> > (as far as I see the same problem happens in 3.2.2)
>>> >
>>> > I do not fully understand ZMQ internals but I've seen
>>> that every time a
>>> > socket is created the function make_fdpair(..) is
>>> called and in
>>> > signaler.cpp(line244) a system event
>>> "zmq-signaler-port-sync" is created.
>>> > This event is used as a system-wide critical section
>>> and, so all
>>> > applications that try to create an event will
>>> WaitForSingleObject (sync,
>>> > INFINITE) until SetEvent (...) is called.
>>> > The problem is that the code between:
>>> > HANDLE sync = CreateEvent (NULL, FALSE, TRUE, TEXT
>>> > ("zmq-signaler-port-sync"));
>>> > and
>>> > SetEvent (sync);
>>> > is full of wsa_asserts(..) that will terminate the
>>> application if
>>> > something goes wrong.
>>> >
>>> > It is clear that terminating the application not
>>> leaving the system-wide
>>> > critical section is a bad idea because all
>>> applications in the system
>>> > will hang and you will have to stop all them to start
>>> again.
>>> > I understand that no errors should happen but anyway
>>> to escape from the
>>> > error is not a good idea in this case.
>>> >
>>> > I do not know all possible reasons to generate a fatal
>>> wsa_assert(..)
>>> > but there is at least one:
>>> >
>>> > I have seen that in XP it is possible that line 301
>>> rc = connect (*w_,
>>> > (sockaddr *) &addr, sizeof (addr)); returns an error
>>> when a socket tries
>>> > to connect to 5905 and this has happened many times.
>>> > Windows uses port numbers starting near 1400 and XP
>>> has a limit at 5000.
>>> > In W7 this does not look as a problem because maximum
>>> is 65000
>>> > It sounds as if the number was big enough but apart
>>> from the fact that
>>> > ZMQ uses a big number of connections (at least in my
>>> tests) I have
>>> > experienced that Windows jumps port numbers by 7, so
>>> 5000 happens
>>> > sometimes with catastrophic consequences.
>>> >
>>> > best,
>>> >
>>> > Pau Ceano
>>> > _______________________________________________
>>> > zeromq-dev mailing list
>>> > zeromq-dev at lists.zeromq.org
>>> <mailto:zeromq-dev at lists.zeromq.org>
>>> > http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org
>>> <mailto:zeromq-dev at lists.zeromq.org>
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> zeromq-dev mailing list
>>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>>
>>
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org <mailto:zeromq-dev at lists.zeromq.org>
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130214/143c9c0e/attachment.htm>
More information about the zeromq-dev
mailing list