[zeromq-dev] Can we customize a behaviour for in-mem queued messages after reconnect?

artemv zmq artemv.zmq at gmail.com
Mon Dec 23 11:30:20 CET 2013


hello there,

Gregg Irwin said:
>> In your case, when you hit your timeout, why not close the socket (setting
LINGER to 0) and connect again?
That would work if we talk about 1 connection address. I will hit timeout
when _all_  msg queues will be filled up to hwm. I don't want to limit
solutions saying -- "look, we have 0mq which allows connecting to several
endpoints,  _but_  we should connect only to  one  address."

I'm glad that my question raised such a broad discussion concerning TCP and
reliability,  but can we move a little bit back to original question.
Here's appl. architecture (to keep discussion more focused):

iOS/Android (game ui)  <----ssl----> Tomcat <-------->  bet_service  .

Basically there's three layers:  game UI (0),  java webserver (1) and a
concrete service layer (2).
Layer0:
  - doesn't host 0mq library.
  - talks to L1 via ssl.
  - blocking(with timeout) on every call to L1   awaiting for response from
L2.
Layer1:
  - does host 0mq library.
  - it's a gateway for game ui. It's an async layer between ui and world of
services.
  - it's _asynchronous_. It's a sort of "delegator/router/etc" for a call
from L0  to  L1. Basically, game ui may call asynchronously any service .
  - this layer doesn't wait for response from L2.
Layer2:
  - does host 0mq library.
  -  a concrete business service layer.
  - L0 and L1  don't care will this layer produce response or not. If not
-- L0 will hit call-timeout, L1 -- simply don't care et al.

So, original question can be narrowed down to:
When L2 goes down (whatever reason), then L1 will queue messages, and, by
turn, L0 will hit call-timeout. After certain amount of time (usually up to
1hr) L2 will be restarted.
The  question is -- when L1  will recognize that "restart occured" (I
suppose 0mq can do it) then  let L1  will not  deliver  queued messages to
newly-started-L2.  Ok?  So how to do that?




2013/12/22 Andrew Hume <andrew at research.att.com>

> two things come to mind:
>
> 1) you speak of efficiently recovering from unreliable tcp transmission:
> why? it can’t possibly be that commonplace. if it is, you have more
> pressing problems.
>
> 2) i spent 7+ years working on reliable cluster computing and i would
> assess
> the relative contributors to error being
> a) app (85%)
> b) OS — mainly linux (10%)
> c) networking (4%)
> d) hardware (1%)
>
> this is why our how scheme was based on verified end-to-end and,
> like promise theory, coping with the fact that entities promising to do
> stuff may end up lying but making progress anyway.
>
> trying to optimize how to handle intermediate errors seems pointless;
> generally, they happen rarely and i liked the fact that
> 1) they were handled by the system automatically
> 2) they raised a fuss so i noticed.
>
> for example, this is how i discovered that in our environment, a static
> file
> became corrupted every 10 TB-years or so.
>
> but to return to the original subject of zeromq and reliability and tcp,
> i have found the TCP buffering to be just a nuisance and its effects on
> dealing with errors to be just about nonexistent. i would worry more
> about awful things like network splits.
>
> On Dec 22, 2013, at 2:45 AM, Pieter Hintjens <ph at imatix.com> wrote:
>
> On Fri, Dec 20, 2013 at 10:18 PM, Lindley French <lindleyf at gmail.com>
> wrote:
>
> I'm starting to think a *lot* of reliability protocols built on top of TCP
> could be done more efficiently if TCP could expose some read-only
> information about its internal ACKing....
>
>
> You are making such assumptions about "reliability". The network is
> unreliable. The receiving application is unreliable. The box it runs
> on is unreliable. The database it writes to is unreliable.
>
> Before addressing "unreliability" you must, please, itemize and
> quantify. What is the actual chance of network failure in your use
> case? How does this compare to application failure?
>
> In most use cases, the #1 cause of failure is poor application code.
> Why are we even talking about TCP then?
>
> The only use case I know of where network failures are more common are
> mobile/WiFi networks, and even then, that case of TCP accepting a
> message but not successfully delivering, without reporting an error,
> it is extremely rare, by experience.
>
> Thus you must in any case satisfy the end-to-end reliability problem,
> i.e. sender app to receiver app, so e.g. a protocol like FILEMQ would
> acknowledge received file data only after writing to disk (perhaps
> even with an fsync).
>
> There's nothing TCP, nor ZeroMQ can do to solve unreliability in
> calling applications.
>
> -Pieter
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
>
> -----------------------
> Andrew Hume
> 949-707-1964 (VO and best)
> 732-420-0907 (NJ)
> andrew at research.att.com
>
>
>
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20131223/f230625a/attachment.htm>


More information about the zeromq-dev mailing list