[zeromq-dev] Can we customize a behaviour for in-mem queued messages after reconnect?

Lindley French lindleyf at gmail.com
Fri Dec 20 21:01:04 CET 2013


I'll agree to a limited extent, though I don't see things exactly as you do.

The problem, in my view, is that normally you can trust TCP to get your
packets through intact. When something goes wrong and a connection fails,
you can take appropriate action to test what go through and what didn't,
and fix it. But when TCP connections go down and then come back within 0MQ,
there's no way to react to that, and 0MQ doesn't do a whole lot (from my
understanding) to make sure no messages got lost in the ether. So nothing
is done automatically and nothing can be done manually when a fault
occurs.....which means you are forced to write a higher-level protocol on
top of 0MQ *as if* it is totally unreliable and failures can happen at any
time, even though in reality TCP is pretty good 99% of the time.

I'm not going to request that 0MQ do its own acking and retransmissions or
anything like that----I've been down that road, sooner or later you're
basically writing TCP over TCP----but I do think there should be hooks to
let you know what range of messages might be at risk when a connection goes
down, so you can give them whatever special treatment you like.


On Fri, Dec 20, 2013 at 2:42 PM, artemv zmq <artemv.zmq at gmail.com> wrote:

> hi Gregg,
>
> As for the "acks".   The game on mobile device is awaiting (with timeout)
> for "acks". So, yes, we do "acks", sure.
>
> I also was thinking about
> >>  timestamping the messages and giving them a TTL
>
> and considered it as not reliable in my case.   The problem is that  we
> don't have control on where we deploy our software. We can't check: is time
> settings the same on all nodes in a cluster . And we can't
> ask our customers: "you have to ensure that time settings are the same on
> all nodes in your datacenter."  I'm pretty sure that wouldn't work (at
> least, in my company).
>
> As for
> >> This sounds like an application problem, not a 0MQ problem
>
> I wouldn't put like that. It's not a problem, it's rather a missing
> feature in 0mq.  I think behaviour like:  "_unconditionally_ deliver
> messages on reconnected socket"  is somewhat too strict.  It's more
> designed to support some kind of historical data flow, where you don't
> want to lose even one message. What it can be?  E.g. wheather data from
> sensors, e.g.  quotes from stock exchange.    But it is not very much
> suitable  when  you deal with something like: "place a bet" , "create a
> purchase order", "book hotel room".    Agree?
>
>
>
>
> 2013/12/20 Gregg Irwin <gregg at pointillistic.com>
>
>> Hi Artem,
>>
>> az> Real example from gambling.
>>
>> az> We have thousands users betting from their phones.  For end user a
>> bet is
>> az> just a click in UI, but for backend it's  a bunch of remote calls to
>> az> services. If  service is not available, then bet message  will stuck
>> az> in 0mq in-mem message queue  (up to hwm).   The game UI can wait up to
>> az> certain timeout  and then render something akin to  "We have
>> communication
>> az> problem with our backend. Try again later."  So at this point user
>> believes
>> az> that bet wasn't succeeded (.. this is important).    What happens
>> then --
>> az> ITOps get their pager rings, and then during 1hr they do their best to
>> az>  restart a failed service.  Ok?
>>
>> az> After 1hr or so service restarts  and now what? Now queued bet will be
>> az> delivered to restarted service. And this is not goood, because 1hr
>> earlier
>> az>  we ensured user that "we had a backend issue"  and his bet wasn't
>> suceeded.
>>
>> az> So  the  question arised --  how to not redeliver messages upon
>> reconnect?
>>
>> This sounds like an application problem, not a 0MQ problem. A request
>> to place the bet can be received, which doesn't guarantee that the bet
>> has been placed (if other work needs to be done). To know that the bet
>> was place, you need an ack. You can also ack that the *request* was
>> received. In your scenario above, timestamping the messages and giving
>> them a TTL lets you handle cases where requests could not be processed
>> in a timely manner, and possibly ask the user what they want to do.
>>
>> -- Gregg
>>
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20131220/cae120c8/attachment.htm>


More information about the zeromq-dev mailing list