[zeromq-dev] zeromq, abort(), and high reliability environments

Goswin von Brederlow goswin-v-b at web.de
Thu Aug 14 12:12:40 CEST 2014

On Wed, Aug 13, 2014 at 09:35:18AM +1000, Michi Henning wrote:
> > My current view on what constitutes a sane API and behavior from the
> > library is heavily driven by what I want, as a user. That is, my C
> > libraries are things I primarily make to use, not to sell. I think
> > it's been about 30 years that I wrote my first C libraries, and my
> > style and view has shifted massively since then, to what we have in
> > cases like CZMQ today.
> I can attest to having undergone a similar change of view over the
> past 30 years :-)
> > Mainly, the API enforces its style upwards, so that you simply
> > *cannot* get strange code paths and bizarre arguments. If you do, your
> > application is corrupt, or incompetent, and the library has a
> > responsibility to stop things immediately, not allow them to continue.
> > 
> > It is a safety cord that has proven its usefulness many times. Indeed,
> > some of the hardest bugs to catch in recent months were from older
> > APIs that precisely returned EINVAL on bad arguments, and where the
> > calling code forgot to check the return code. Stuff is then...
> > bizarrely broken and tracking that down can be insanely hard.
> I hear you, and there is probably not a single one true answer here.
> Part of the problem is C, which makes it possible to ignore error
> codes and just blithely stumble on regardless.
> In languages with exception handling, it's a different matter
> though, because I can force the caller to pay attention to invalid
> arguments.
> My main concern is that, by aborting in the library, it becomes very
> difficult to write something that needs to have high reliability.

My concern is more that I can't give a proper error message and shut
down cleanly. E.g. a distributed system might want to send a "goodbye"
message to it's peers before going down or a server might want to log
an error.

> Basically, I can be sure that my program won't dump core only if I
> have exercised it to the extent that all possible code paths with all
> possible argument values are tested under all possible combinations.
> For any sizeable program (especially with lots of threads and
> asynchronous things going on), that can be damn near impossible.
> In turn, if I still want to persist, I now have to wrap the
> underlying C API and check all the preconditions for every API call
> myself, just so I can throw an exception when a pre-condition is
> violated instead of having the program aborted by the library. But

And you have to do that for the c++ API, the python API, the ruby API,
the ocaml API, the Go API, the obj-c API, the ..... And each and every
one has to match exactly, to the tiniest detail the checks made in zmq
itself. And those checks might change over time.

How often do we need to duplicate argument validation that libzmq
already has?

> validating the pre-conditions myself may well be very difficult or
> very expensive. For example, the cost of verifying that a valid socket
> pointer is passed to every API call is quite high.

Well, use a better language. E.g. with ocaml the type system won't
even compile code that tries to pass a non-socket to something
expecting a socket.

On that note: Why is zmq using "void *" instead of declaring abstract
types? If A context where a context_t and a socket a socket_t then
even in C you couldn't accidentally pass a context in place of a
socket. And yes, I've passed the wrong thing to zmq in C by accident
because I got the argument order wrong and both values where "void *".
No compiler warning or error. It just fails at runtime.

> If I'm given the option of catching an exception, I may be able to
> recover from my own programming error, for example, by terminating
> only the current operation. At least, the program keeps running,
> instead of dumping core, and I can splatter my log with error messages
> or whatever I deem appropriate. The point here is that general-purpose
> libraries should avoid setting policy, because what should happen
> under certain error conditions is something that needs to be under
> control of the caller.

Or a server can log the request that started the sequenze leading to
the invalid call before going down.

> I hear you about the difficulty of debugging code that ignores
> EINVAL from API calls. But that is the price of programming in C. It's
> no different from making system calls and ignoring the return value;
> do so at your peril. But a system call is policy-free: it allows me to
> decide what should happen when I have passed bad arguments, instead of
> taking that decision away from me. 
> At least for languages that support exceptions, I believe throwing
> an exception for invalid arguments is far preferable to just killing
> the process.
> Cheers,
> Michi.

Many languages also allow using them interactively. For example I can
start an ocaml toplevel and then interactively enter commands to
quickly try out stuff. But as soon as I type in the wrong thing libzmq
would abort the whole session, loosing all the work entered before. An
error or exception is far better there since the toplevel will catch
that, display it and continue.


More information about the zeromq-dev mailing list