[zeromq-dev] Malamute (reconnection and some more questions)

Kevin Sapper kevinsapper88 at gmail.com
Mon Apr 18 13:13:06 CEST 2016


Hi Alena,

in the mlm_client.xml there is a state named "defaults" which is inherited
by many others including "disconnecting". When the client is in
"disconnecting" state and the server reconnects it will send a heartbeat
which the client will answer with a connection ping and upon connection
pong from the server the client will move from "disconnecting" state into
"connected" state.

//Kevin

2016-04-18 8:47 GMT+02:00 Alena Chernikava <e.c.6078570 at gmail.com>:

> Hi,
>
> I would like to ask some questions and point out some problems in Malamute
> broker.
>
> I am facing a problem with client reconnect procedure in malamute. Usually
> a formal description allows me to better understand the problem, that is
> why I started an investigation with creating a visualization of a state
> machine for malamute client. I would say it helped me a lot :) Right away I
> found some "strange behavior"s. I would like to ask some questions to make
> it more clear for me (may be it was done intentionally) before I will try
> to "experiment" with fixes.
>
> In the attachment you can find my hand-made visualization of the state
> machine (I was doing it for myself, so it has my thoughts written down).
> (GREEN - states, RED - events, BLUE - actions). It is not complete, but
> already helped me to spot some potential and real problems. Here I would
> describe some issues I found (numbering is the same as on the picture).
>
> 1. Re-connection problem. It is actually the main problem I want to
> discuss.
>
> Situation:
> client sends 3  PINGS and do not receive any PONGS back. After this client
> will end up in the "disconnected" state. I would say that it is a  black
> hole state, as client cannot normally recover from it (to the "connected"
> state) or at least move somewhere.
>
> Analysis:
> * We can destroy the client. We will move out of "disconnected" state, but
> we destroyed the client. :) End of work, nothing to do. Everything is fine
> * We can move to the "connected" state, if client will receive "PONG" from
> server or we can move to the "HAVE ERROR" state if client will receive
> "ERROR" from server. In order to receive from server some response, we need
> to send something to the server. And here we are: the client do not send
> anything to the server :( PINGs are disabled in the "mlm_client.xml" from
> the very beginning.
>
> Questions:
> * Why PING was disabled in "disconnected" state?
> * What was the basic idea for the "re connect" implementation?
>
> Proposal:
> Enable PINGs. When server receive a PING from "unknown client" it will
> send "ERROR" back that will trigger "re connection" procedure. But still, I
> am not sure if client would reconnect correctly, but at least we can give
> him a chance to do so, because now the client have no chance to reconnect
> (if server is off for longer period)
>
> 2. Take a look on the picture on the right corner.
>
> in the mlm_client.xml:
>
>     <state name = "connecting" inherit = "defaults">
>         <event name = "OK" next = "connected">
>             <action name = "signal success" />
>             <action name = "client is connected" />
>         </event>
> This can cause that the following code will be ok (and actually I saw such
> behavior couple times):
>       int rv  = mlm_client_connect();
>       assert (rv == 0)
>       assert (mlm_client_connected () == false)
>
> Proposal: do "signal success" after "client is connected"
> Question: is there any reason to left the order as it is?
>
> 3+4. I didn't understand from the code one point. When client is supposed
> to start heart beating?
> I thought, that it should happen after client got "OK" response from the
> server, but from the state machine I see that in the state "connecting"
> (while waiting for the response from the server) heart beating starts. Is
> this a bug or it was done intentionally?
>
> 5. It is just a bug, I will fix it later. If mlm_client_connect didn’t
> work for the first time, the client should remain in «start" state.
>
> 6. It is a potential problem. If "PONG" will come before "OK" message from
> server, the mlm_client_set_producer/consumer/worker will not end correctly
> and potentially will never do a "return". I propose: return to "confirming"
> state and wait for "OK" response from server. Do you think it will not
> break anything?
>
>
>
>
>
> Thank you for reading this, waiting forward for your reply.
> Alena Chernikava
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20160418/745b4162/attachment.htm>


More information about the zeromq-dev mailing list