[zeromq-dev] How to tell from an exception thrown by recv_string() if this exception is for the listening socket or for the client socket?

Francesco francesco.montorsi at gmail.com
Sun Oct 9 17:02:51 CEST 2022


Hi Torsten, Yuri,
I'm not a core developer of ZMQ but a ZMQ user since many years... here's
my take on this:

Il giorno dom 9 ott 2022 alle ore 09:49 Torsten Wierschin <
torsten.wierschin at gmail.com> ha scritto:

> Yuri <yuri at rawbw.com> schrieb am Sa., 8. Okt. 2022, 20:46:
>
>> On 10/8/22 07:09, orzodk wrote:
>> > My understanding of ZMQ is that the implementation details of the "under
>> > the hood" socket are hidden from you intentionally. I'm not sure how one
>> > would catch that. Hopefully someone else can answer.
>>
>>
>> But the abstraction level is too deep and it prevents access to
>> important and relevant information.
>>
> I agree.
>
> The scenerio is:
> - connection established and working
> - server now vanishes unintentionally
> - client side is not able to reestablish connection iff server reappears
>
> The abstraction level at first seems unable to handle such a simple
scenario perhaps.
But if you read the ZMQ guide, one the concept that it conveys is that you
need to build some protocol on top of ZMQ transport that fullfills all your
application needs.
In other words: if your need is to ensure that 100% of the time there is a
point-to-point connection (server-client) "working" (usable to move
bytes/information between the 2 points), then you should e.g. design "keep
alive" frames (or "ping/pong") in your protocol so that both sides have the
ability to detect unhealthy connection and re-act.
In other words: in your scenario above, if you have ping/pong frames and
logic to check how much time has elapsed since the last "ping", you will be
able to understand that the TCP server has vanished on both application
sides.

You might debate on the fact that just handling the "listening TCP socket
error" would be easier than building ping/pong frames, timeout logic, etc.
However, consider that handling such TCP-server-level errors would be not
enough to detect "stale connections" or dysfunctional networking; I'll make
a very practical example for me: I've written applications that are
deployed inside a Kubernetes using Istio service mesh (
https://istio.io/latest/docs/ops/deployment/architecture/):
[image: image.png]
In such context, all TCP connections between servers/clients are
transparently redirected to the Envoy sidecar. Real data flow happens only
between 2 Envoy sidecars.
Sometimes it happens (for a number of reasons) that the TCP connection
between 2 Envoys break. My app would never realize that by just handling
"listening TCP socket error": the TCP listen server on "Service A" in that
Istio arch picture is running just fine; if the problem happens on the
green line "Mesh traffic", the only way you can detect that is to have
ping/pong frames (or some other protocol-level indication).

This is just an example of problems that does not impact directly the TCP
sockets of the 2 servers where your application are running but that result
in their inability to communicate.

So in some sense I can say that the ZMQ abstraction forces you to write
reliable protocols/applications that take an "holistic" approach to
networking without restricting your focus on just the most obvious
networking issues (like e.g. a server that cannot start with errno "port
already in use")

HTH,
Francesco
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20221009/788149e1/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 10391 bytes
Desc: not available
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20221009/788149e1/attachment.png>


More information about the zeromq-dev mailing list