[zeromq-dev] error handling? assert?

Aamir M intuitionist at gmail.com
Tue Dec 16 15:35:33 CET 2008


Hi Martin,

I'm writing a 0MQ communication adapter as a shared object library for a CEP
(Complex Event Processing) engine. When I'm working inside the CEP engine,
if I create two instances of my 0MQ adapter and pass them the same 0MQ
global exchange name, the 0MQ library crashes the host process (which in
this case is the CEP engine). Before the CEP engine crashes, its start-up
behavior has already been configured such that it creates two instances my
0MQ adapter using the same global exchange name. So when I try to restart
the CEP engine, it crashes on start-up ... I get stuck in a catch-22
situation when I need to start the engine in order to reconfigure the 0MQ
adapters, but the engine crashes on start-up because the 0MQ adapters are
causing exchange name collisions. The only way to break out of this loop (as
far as I know) is to temporarily delete the shared object library so that
the CEP engine cannot dynamically load it during startup. I guess this is
more of an annoyance for the developer (me) rather than a production
reliability issue ... but strictly speaking, it means that my 0MQ adapter is
not fault-tolerant ... the user has to be very careful when working with it
inside the CEP engine.

I must admit I had never thought of program crashes as a form of
reliability. You do have a point that this makes things quite reliable, as
long as you can get the program to a state where it doesn't crash. Do you
usually compile 0MQ with asserts when you put it into production use? Or do
you compile it with NDEBUG and give up error checking in favor of
performance?

Thanks,
Aamir


On Tue, Dec 16, 2008 at 5:15 AM, Martin Sustrik <sustrik at fastmq.com> wrote:

> Hi,
>
>  Does ZeroMQ provide error handling for all kinds of errors? Or just for
>> when a client disconnects?
>>
>> I am calling ZeroMQ from inside a larger program. When I try to create two
>> global exchanges using the same exchange name, ZeroMQ crashes with the
>> mssage: "locator.cpp:97: vritual void zmq::locator_t::create( ...) Assertion
>> 'cmd == create_ok_id' failed"
>>
>
> Yes. That's the case. How would you like it to behave? Invoking a callback
> function (error handler) is an option...
>
>  So it appears that if ZeroMQ encounters errors it just does an assert and
>> ends up crashing the whole process? In my use case, this is causing a much,
>> much larger server process to crash along with ZeroMQ. This makes it
>> difficult to make my code fault-tolerant.
>>
>
> The idea is that 0MQ is intended primarily for environments with a need for
> high reliability (financial services) thus the policy of hiding errors is
> not tolerable. If there's a bug, application should crash (hopefully pretty
> early during the tests in test env) and the bug should be corrected rather
> than stay in some half-defined state and possibly cause some erroneous
> business transactions to succeed.
>
> Let's consider your case: I assume the problem appears when two instances
> of an application that should have at most one instance running, are
> started, right? In this case, crashing seems to be a pretty good thing to
> do. Letting the second instance run would be potentially harmful to your
> business logic so it's better to crash and thus let the user know that
> there's a bug to fix in the app.
>
> Thoughts?
> Martin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20081216/87fa7486/attachment.htm>


More information about the zeromq-dev mailing list