[zeromq-dev] czmq: Error Traceability with assert(...) and release code

Pieter Hintjens ph at imatix.com
Mon Mar 10 10:36:22 CET 2014


This theory is fine in theory. In practice, could you provide a case
that reproduces the crash you got?


On Mon, Mar 10, 2014 at 10:10 AM, Christoph Zach
<czach at rst-automation.de> wrote:
> On Friday 07 March 2014 17:36:21 Pieter Hintjens wrote:
>> On Fri, Mar 7, 2014 at 3:13 PM, Christoph Zach <czach at rst-automation.de> wrote:
>>
>> > To further use zyre/czmq We are planing on replacing all the assert(...) statements
>> > with actual error handling routines.
>>
>> As Olaf explains, the asserts cannot ever happen in practice unless
>> there is a coding bug in your app or in CZMQ.
>>
>> If you can reproduce an assert under "normal" conditions, that is a
>> bug that we take very seriously and fix.
>>
>> Code that has hit an internal error _cannot_ continue to operate
>> sanely. The extensive use of asserts is a deliberate and long-standing
>> design choice, and though you may do what you like with your forks of
>> the codebase, such patches would be rejected without much pity.
>>
>> I'd not trust a system that had asserts disabled. Production code (and
>> I've made that my profession for decades) should run with all asserts
>> enabled. The correct response to a internal failure is crash fast,
>> recover fast. You cannot run a software system reliably when you have
>> internal errors. Adding error handling to recover from (by definition)
>> unforeseen internal errors makes things less, not more reliable.
> Semantically We are agreeing on detecting invalid/fatal states. Let me explain
> (in more detail), why error codes and not assertions should be used to
> detected these:
>
> 1) Context Awareness
> The issue with the old school assert statements is that they will
> simply quit your application immediately. Even when you have enabled
> them. E.g. If you have a C++ app with RAII:
> [...]
> {
>     RAIIWrapperX x (...);
>
>     libraryPotentiallyGoinigToAssert(....):
>
> } // Never reached here. --> Will never call dtor of x!
>
> The issue that when the library has detected that it has reached
> an invalid/unknown/fatal state it just quits and does not allow the
> RAIIWrapperX to clean up nicely.
>
> The issue with the assumption
>     "You cannot run a software system reliably when you have internal errors"
> is that 'reduced functionality' states are ignored.
> This means that when a library has entered an unknown/invalid state it
> does NOT mean that the other parts of the system have too!
> Therefore, the other parts must be given a chance to clean up as much
> as possible.
> Please note that this does not protect against Machiavellian errors, where
> someone simply corrupts the whole memory of your application. But then
> again there's Unit Testing and valgrind to determine such things.
>
> 2) Unit Testing
> By unit testing a library there are different kinds of tests. E.g. a test
> can validate that the function f() does what it should do.
> Then another test can validate that f() protects itself against invalid input.
> This means that no matter how invalid the given arguments are the
> function f() will report an error and does not crash the application.
> This test (a.k.a 'invalid parameter detection') is only possible by using
> error codes. If assert(...) statements are used it can never be fully tested.
>
> 3) Design Principle: "An API must be easy to use correctly and hard to
> use incorrectly".
> This is part of Scott Meyers' article, called "The Most Important Deign
> Guideline?". Besides this article he also wrote some pretty good books
> on how to write/design good C++ software. They have the same level as
> the books of Herb Sutter.
>
>>
>> What can be helpful is to replace the assert() macro with a more
>> extensive error reporting system.
> That was my original intention. Instead of assert() and kill the program
> simply provide the user with a verbose error & message. Then it's the
> user's responsibility to handle it correctly and clean up everything else.
>
>> However be careful you don't try to
>> do to much: the state of the application when it hits an assert is
>> unknown. You can have arbitrary memory corruption, for instance. Doing
>> *anything* more than "print error & exit" leaves you open to worse
>> damage.
> To protect against such an issue the only thing We can do is to write
> defensive code:
>  * const as much as possible
>  * validate invalid input
>  * report verbose errors (to better track the issue when the customer
>    reports it)
>  * use unit testing (test against good and bad cases)
>  * use the type system as much as possible
>  * use valgrind when running unit tests
>  * etc.
>
> By applying all these (and many more) methods it's possible to
> reduces the probability of such an event. That's everything We can
> do, because at run-time if We detect and invariant We can not tell
> if it's wise to shutdown immediately. Therefore, We shall try to clean
> up as much as possible.
>
>>
>> -Pieter
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
> Best Regards
>
> Christoph Zach
>
> -----------------------------------------------------------------------------
> RST Industrie Automation GmbH * Carl-Zeiss-Str. 51, D-85521 Ottobrunn
> Tel. +49-89-9616018-00 * Fax +49-89-9616018-10 * http://www.rst-automation.de
>
> Geschäftsführer: Dipl.-Ing.(FH) Robert Schachner
> Amtsgericht München: HRB 103 626 * ID-Nr. DE 811 466 035
> -----------------------------------------------------------------------------



More information about the zeromq-dev mailing list