[zeromq-dev] zeromq, abort(), and high reliability environments

Goswin von Brederlow goswin-v-b at web.de
Mon Aug 11 11:33:49 CEST 2014


On Mon, Aug 11, 2014 at 07:52:02AM +0100, Gerry Steele wrote:
> How about not sending an ack to your users until the unit of work they
> input has cleared the pipeline? That way the input application can decide
> what to do. Obviously depends on your application...

What if the input application gets the SIGABRT?

Zeromq should imho never fail an assertion. That should be reserved
for bugs, not exceptional circumstances.

Although with out of memory the application simply gets killed by the
OOM killer or gets a segfault due to memory overcommit. There isn't
much you can do there.

My suggestion is that if you find an assertion that gets triggered
then patch it out and handle the error properly and send a pull
request for the fix.

MfG
	Goswin

> On 9 Aug 2014 03:12, "Dylan Cali" <calid1984 at gmail.com> wrote:
> 
> > Hey guys,
> >
> > What is the right way to use zeromq in high reliability environments?  In
> > certain insane/impossible situations (e.g. out of memory, out of file
> > descriptors, etc) libzmq assertions will fail and it will abort.
> >
> > I came across a thread by Martin where he addresses a similar situation
> > [1].  If
> > I'm reading his argument correctly, the gist in general is: If it's
> > impossible
> > to connect due to some error, than you're dead in the water anyways.  Crash
> > loudly and immediately with the error (the Fail-Fast paradigm), fix the
> > error,
> > and then restart the process.
> >
> > I actually agree with this philosophy, but a user would say "You
> > terminated my
> > entire application stack and didn't give me a chance to cleanup!  I had
> > very important data
> > in memory and it's gone!"  This is especially the case with Java
> > programmers who
> > Always Expect an Exception.
> >
> > For example, in the case of being out of file descriptors, the jzmq
> > bindings will abort,
> > but a Java programmer would expect to get an Exception with the "Too Many
> > Open
> > Files" error.
> >
> > I guess one possible retort is: if the data in memory was so important, why
> > didn't you have redundancy/failover/some kind of playback log? Why did you
> > put
> > all your eggs in one basket assuming your process would never crash?
> >
> > Is that the right answer here (basically blame the user for not having
> > disaster
> > recovery), or is there a different/better way to address the high
> > reliability
> > scenario?
> >
> > I came across another thread where Martin gets this very
> > complaint (zeromq aborted my application!), and basically says well, if
> > you really, really want to,
> > you can install a signal handler for SIGABRT, but caveat emptor [2].
> >
> > To me, this is playing with fire, dangerous, and just a Bad Idea. But
> > maybe it's
> > worth the risk in high reliability environments?
> >
> >
> > Thanks in advance for any advice or thoughts.
> >
> > [1] http://lists.zeromq.org/pipermail/zeromq-dev/2009-May/000784.html
> > [2] http://lists.zeromq.org/pipermail/zeromq-dev/2011-October/013608.html

MfG
	Goswin



More information about the zeromq-dev mailing list