[zeromq-dev] Java API is not notifed of C++ assert failures.
Martin Sustrik
sustrik at fastmq.com
Mon Apr 20 13:00:37 CEST 2009
Holger Hoffstätte wrote:
> Hi,
>
> sorry for the late reply. I somehow thought I had already answered but
> apparently did not.
>
> Martin Sustrik wrote:
>> This is pretty contentious issue. However, let's have a look at
>> possibilities here:
>
> No. :-)
> There may or may not be reasons why asserts are raised, but that really
> does not matter. zmq (when viewed as a library) is not responsible for the
> lifetime of the process and should not suddenly assume it is. Other C
> libraries don't just call "exit()" when they encounter an error either.
>
> By trying to predict the exact reason for the assert - bug, incorrect
> usage or whatever - you're only getting yourself deeper into trouble,
> because there will always be a "surprise" reason that you didn't foresee.
>
> Like I said, I understand the motivation for terminate-fast scenarios -
> when zmq "is" the app or in safety-critical environments with n-version
> agreement where single-party disagreement implies possible corruption and
> requires immediate system shutdown. I think we can rule that out for now.
Hm. The use case is a bit different. The software is primarily intended
for mission critical environments like banks or exchanges. Before
anything gets into production in such an environment you do a lot of
testing (several months, even a year). The asserts ensure that if
there's a bug in the system it won't be ignored or silently coded around
but fixed before the system gets into production.
I am perfectly aware that on the other end of the spectrum there are
systems (like GUI applications) that prefer staying alive though
mutilated to being dead :)
> Again, stopping work/shutting down zmq is not the issue: assuming control
> over the process is.
>
>> In short, if you are having problem with a specific assert, let's fix
>> the problem. Don't try to introduce complex generic mechanisms to handle
>> the unhandleable.
>
> I am not trying to "handle the unhandleable", I'm trying to isolate
> failure behaviour so that I can shut down the subsystem in a controlled
> way. Or rather let it do that itself and then tell me about it, so that I
> can shut down the rest of the containing process cleanly or at least limp
> along on the remaining legs for a few more minutes.
Yes. That's a valid point. However, I have no idea how to handle this
situation without possibly causing even more damage. Any suggestions?
> The only other alternative would be to start any zmq messaging as an
> external process and talk to it via some other IPC mechanism as
> trampoline, but that would make things even more fragile (managing ext.
> process lifetime etc.) and also slower.
Bleh. That would be really slow :(
>
> Hope that explains my reasoning. It's just a suggestion and concern that
> came up during a review.
Martin
More information about the zeromq-dev
mailing list