[zeromq-dev] zeromq, abort(), and high reliability environments

Pieter Hintjens ph at imatix.com
Wed Aug 13 01:15:28 CEST 2014


My current view on what constitutes a sane API and behavior from the
library is heavily driven by what I want, as a user. That is, my C
libraries are things I primarily make to use, not to sell. I think
it's been about 30 years that I wrote my first C libraries, and my
style and view has shifted massively since then, to what we have in
cases like CZMQ today.

Mainly, the API enforces its style upwards, so that you simply
*cannot* get strange code paths and bizarre arguments. If you do, your
application is corrupt, or incompetent, and the library has a
responsibility to stop things immediately, not allow them to continue.

It is a safety cord that has proven its usefulness many times. Indeed,
some of the hardest bugs to catch in recent months were from older
APIs that precisely returned EINVAL on bad arguments, and where the
calling code forgot to check the return code. Stuff is then...
bizarrely broken and tracking that down can be insanely hard.

If you read the code in a project like zbroker or zyre or czmq you'll
appreciate how this works. Theory is fine in theory, but in practice,
practice beats theory.

-Pieter


On Wed, Aug 13, 2014 at 12:57 AM, Thomas Rodgers <rodgert at twrodgers.com> wrote:
>> I agree with you in the sense that, specifically speaking about 0mq,
>> errors that can be checked synchronously (you called the function, and
>> the check happens then) should return sensible error codes that
>> binding authors should be responsible for checking and throwing (or
>> whatever) their language specific construct to indicate that.
>
>
> I tend to agree with this, but I also wonder how many of these EINVAL
> results fall outside the category of options not supported by a given socket
> type?  It would seem things like passing something that wasn't a zmq_msg_t*
> to a function expecting it, or a socket, would be fundamental problems for
> any language binding to allow in the first place.
>
>>  Having just
>> got done writing way too much C++ in the last week it's almost
>> refreshing.
>
>
> Even with exceptions, assert(), or BOOST_ASSERT[_MSG]() are reasonable
> things to do in C++ ... and probably just as refreshing :)
>
>
> On Tue, Aug 12, 2014 at 5:22 PM, Michel Pelletier
> <pelletier.michel at gmail.com> wrote:
>>
>> On Tue, Aug 12, 2014 at 2:42 PM, Michi Henning <michi at triodia.com> wrote:
>> >
>> > That seems a bit too simplistic to me. It's possible for an application
>> > to have some code path that is tickled only under highly unusual
>> > circumstances then causing invalid arguments to passed, even though the
>> > application is otherwise doing just fine. If the library aborts in this
>> > case, it sets policy in a way it isn't entitled to, IMO. Throwing an
>> > InvalidArgumentException instead, or returning an error in a C API is far
>> > better. Imagine the kernel were to apply the same strict policy and were to
>> > abort my process whenever I pass an invalid argument to a system call. It's
>> > just not the done thing.
>> >
>> > I believe the only time a library is entitled to abort is when it
>> > realizes that its own internal invariants are violated. Any other condition,
>> > such as resource exhaustion or pre-condition violation should be reported to
>> > the caller in a way that allows the caller to handle the error. It's up to
>> > the caller to call abort, not the library.
>>
>> I agree with you in the sense that, specifically speaking about 0mq,
>> errors that can be checked synchronously (you called the function, and
>> the check happens then) should return sensible error codes that
>> binding authors should be responsible for checking and throwing (or
>> whatever) their language specific construct to indicate that.
>>
>> However  there is a class of errors in 0mq that happen asynchronously.
>> There is no caller to return a code to or thrown an exception for.
>> Aborting is the only sensible option.  Unless we want to go down the
>> road CUDA went, where EVERY function documentation contains the words
>> "Note that this function may also return error codes from previous,
>> asynchronous launches."  It's impossible to use exceptions in that
>> case, because every caller would have to try to catch every possible
>> exception!  Madness.
>>
>> Pieter's style is a bit more aggressive, but I accept it.  Having just
>> got done writing way too much C++ in the last week it's almost
>> refreshing.
>>
>> -Michel
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>



More information about the zeromq-dev mailing list