[zeromq-dev] zeromq, abort(), and high reliability environments

Thomas Rodgers rodgert at twrodgers.com
Wed Aug 13 02:08:46 CEST 2014

> At least for languages that support exceptions, I believe throwing an
> exception for invalid arguments is far preferable to just killing the
> process.

I write a lot of C++ code for automated trading systems, so I come at this
from the view that there is no way in this world to light yourself on fire
faster than making the same stupid trade over and over in a tight loop.  My
experience has been that error recovery logic is almost always poorly
exercised and never works entirely as intended when Bad Things happen.

Spending time writing Erlang based systems has also changed my view to
favor the "just let it crash" approach (note, Erlang also has exceptions,
but they are not the most important feature of it's error handling/recovery
model).  These days, I do not generally expect exceptions to be
recoverable.  They are used a mechanism where I can hang additional
reporting on what the failure was, and the context within which it
happened, on the way to a top level handler that does nothing but log the
failure and terminate the process.  It is then the responsibility of an
external process to put the system back into a known good state and restart
the failed process.

To some extent, a library that aborts the process out from underneath me
denies me the opportunity to gather more context into my logs before
terminating, but core files are useful things for post mortem debugging.

On Tue, Aug 12, 2014 at 6:35 PM, Michi Henning <michi at triodia.com> wrote:

> > My current view on what constitutes a sane API and behavior from the
> > library is heavily driven by what I want, as a user. That is, my C
> > libraries are things I primarily make to use, not to sell. I think
> > it's been about 30 years that I wrote my first C libraries, and my
> > style and view has shifted massively since then, to what we have in
> > cases like CZMQ today.
> I can attest to having undergone a similar change of view over the past 30
> years :-)
> > Mainly, the API enforces its style upwards, so that you simply
> > *cannot* get strange code paths and bizarre arguments. If you do, your
> > application is corrupt, or incompetent, and the library has a
> > responsibility to stop things immediately, not allow them to continue.
> >
> > It is a safety cord that has proven its usefulness many times. Indeed,
> > some of the hardest bugs to catch in recent months were from older
> > APIs that precisely returned EINVAL on bad arguments, and where the
> > calling code forgot to check the return code. Stuff is then...
> > bizarrely broken and tracking that down can be insanely hard.
> I hear you, and there is probably not a single one true answer here.
> Part of the problem is C, which makes it possible to ignore error codes
> and just blithely stumble on regardless.
> In languages with exception handling, it's a different matter though,
> because I can force the caller to pay attention to invalid arguments.
> My main concern is that, by aborting in the library, it becomes very
> difficult to write something that needs to have high reliability.
> Basically, I can be sure that my program won't dump core only if I have
> exercised it to the extent that all possible code paths with all possible
> argument values are tested under all possible combinations. For any
> sizeable program (especially with lots of threads and asynchronous things
> going on), that can be damn near impossible.
> In turn, if I still want to persist, I now have to wrap the underlying C
> API and check all the preconditions for every API call myself, just so I
> can throw an exception when a pre-condition is violated instead of having
> the program aborted by the library. But validating the pre-conditions
> myself may well be very difficult or very expensive. For example, the cost
> of verifying that a valid socket pointer is passed to every API call is
> quite high.
> If I'm given the option of catching an exception, I may be able to recover
> from my own programming error, for example, by terminating only the current
> operation. At least, the program keeps running, instead of dumping core,
> and I can splatter my log with error messages or whatever I deem
> appropriate. The point here is that general-purpose libraries should avoid
> setting policy, because what should happen under certain error conditions
> is something that needs to be under control of the caller.
> I hear you about the difficulty of debugging code that ignores EINVAL from
> API calls. But that is the price of programming in C. It's no different
> from making system calls and ignoring the return value; do so at your
> peril. But a system call is policy-free: it allows me to decide what should
> happen when I have passed bad arguments, instead of taking that decision
> away from me.
> At least for languages that support exceptions, I believe throwing an
> exception for invalid arguments is far preferable to just killing the
> process.
> Cheers,
> Michi.
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20140812/72358e37/attachment.html>

More information about the zeromq-dev mailing list