[zeromq-dev] Can we customize a behaviour for in-mem queued messages after reconnect?
artemv zmq
artemv.zmq at gmail.com
Mon Dec 23 11:30:20 CET 2013
hello there,
Gregg Irwin said:
>> In your case, when you hit your timeout, why not close the socket (setting
LINGER to 0) and connect again?
That would work if we talk about 1 connection address. I will hit timeout
when _all_ msg queues will be filled up to hwm. I don't want to limit
solutions saying -- "look, we have 0mq which allows connecting to several
endpoints, _but_ we should connect only to one address."
I'm glad that my question raised such a broad discussion concerning TCP and
reliability, but can we move a little bit back to original question.
Here's appl. architecture (to keep discussion more focused):
iOS/Android (game ui) <----ssl----> Tomcat <--------> bet_service .
Basically there's three layers: game UI (0), java webserver (1) and a
concrete service layer (2).
Layer0:
- doesn't host 0mq library.
- talks to L1 via ssl.
- blocking(with timeout) on every call to L1 awaiting for response from
L2.
Layer1:
- does host 0mq library.
- it's a gateway for game ui. It's an async layer between ui and world of
services.
- it's _asynchronous_. It's a sort of "delegator/router/etc" for a call
from L0 to L1. Basically, game ui may call asynchronously any service .
- this layer doesn't wait for response from L2.
Layer2:
- does host 0mq library.
- a concrete business service layer.
- L0 and L1 don't care will this layer produce response or not. If not
-- L0 will hit call-timeout, L1 -- simply don't care et al.
So, original question can be narrowed down to:
When L2 goes down (whatever reason), then L1 will queue messages, and, by
turn, L0 will hit call-timeout. After certain amount of time (usually up to
1hr) L2 will be restarted.
The question is -- when L1 will recognize that "restart occured" (I
suppose 0mq can do it) then let L1 will not deliver queued messages to
newly-started-L2. Ok? So how to do that?
2013/12/22 Andrew Hume <andrew at research.att.com>
> two things come to mind:
>
> 1) you speak of efficiently recovering from unreliable tcp transmission:
> why? it can’t possibly be that commonplace. if it is, you have more
> pressing problems.
>
> 2) i spent 7+ years working on reliable cluster computing and i would
> assess
> the relative contributors to error being
> a) app (85%)
> b) OS — mainly linux (10%)
> c) networking (4%)
> d) hardware (1%)
>
> this is why our how scheme was based on verified end-to-end and,
> like promise theory, coping with the fact that entities promising to do
> stuff may end up lying but making progress anyway.
>
> trying to optimize how to handle intermediate errors seems pointless;
> generally, they happen rarely and i liked the fact that
> 1) they were handled by the system automatically
> 2) they raised a fuss so i noticed.
>
> for example, this is how i discovered that in our environment, a static
> file
> became corrupted every 10 TB-years or so.
>
> but to return to the original subject of zeromq and reliability and tcp,
> i have found the TCP buffering to be just a nuisance and its effects on
> dealing with errors to be just about nonexistent. i would worry more
> about awful things like network splits.
>
> On Dec 22, 2013, at 2:45 AM, Pieter Hintjens <ph at imatix.com> wrote:
>
> On Fri, Dec 20, 2013 at 10:18 PM, Lindley French <lindleyf at gmail.com>
> wrote:
>
> I'm starting to think a *lot* of reliability protocols built on top of TCP
> could be done more efficiently if TCP could expose some read-only
> information about its internal ACKing....
>
>
> You are making such assumptions about "reliability". The network is
> unreliable. The receiving application is unreliable. The box it runs
> on is unreliable. The database it writes to is unreliable.
>
> Before addressing "unreliability" you must, please, itemize and
> quantify. What is the actual chance of network failure in your use
> case? How does this compare to application failure?
>
> In most use cases, the #1 cause of failure is poor application code.
> Why are we even talking about TCP then?
>
> The only use case I know of where network failures are more common are
> mobile/WiFi networks, and even then, that case of TCP accepting a
> message but not successfully delivering, without reporting an error,
> it is extremely rare, by experience.
>
> Thus you must in any case satisfy the end-to-end reliability problem,
> i.e. sender app to receiver app, so e.g. a protocol like FILEMQ would
> acknowledge received file data only after writing to disk (perhaps
> even with an fsync).
>
> There's nothing TCP, nor ZeroMQ can do to solve unreliability in
> calling applications.
>
> -Pieter
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
> -----------------------
> Andrew Hume
> 949-707-1964 (VO and best)
> 732-420-0907 (NJ)
> andrew at research.att.com
>
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20131223/f230625a/attachment.htm>
More information about the zeromq-dev
mailing list