[zeromq-dev] SUB socket can fail to subscribe if PUB socket is reset

Martin Hurton hurtonm at gmail.com
Mon Aug 12 21:19:46 CEST 2013


Hi Shan, thanks for the report. I am going to look at it in 2-3 days
if nobody beats me to it.

- Martin

On Sun, Aug 11, 2013 at 11:01 PM, Shan Wang <Shan.Wang at ig.com> wrote:
> Hi All,
>
> I'm trying to use 0MQ in our project, but recently I found a very tricky problem in the pub/sub pattern.
>
> the setup is like this:
>
> version: 3.2.3, c++ API.
>
> transport protocol: tcp
>
> one SUB socket, subscribe to everything, high water mark = 1
> one PUB socket, high water mark = 1
>
> there is application level heartbeat between pub and sub, so if SUB socket discover there is no heartbeat from PUB, it assumes PUB is gone and will reset itself.
> This means whenever PUB is shutdown, SUB will close the socket, create it again, reconnect, and resubscribe.
> (The heartbeat is not necessary on normal PUB shutdown, but is essential in situation like PUB server power down or cable cut.)
>
> What I found is, if I keep on restarting PUB server, ie. once every 5 seconds, the SUB socket can fail to subscribe and never receive anything from PUB.
> This is very hard to reproduce and happens once in a few hundred times.
>
> Unfortunately we are not allowed to use tcpdump here, so all I can do is use strace on the io thread to monitor socket functions.
> When it's broken, the messages SUB exchange with PUB after reconnect is like this:
>
> recvfrom(53, "\377\0\0\0\0\0\0\0\1\177", 12, 0, NULL, NULL) = 10
> | 00000  ff 00 00 00 00 00 00 00  01 7f
> recvfrom(53, 0x7f881000b9db, 2, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
> sendto(53, "\377\0\0\0\0\0\0\0\1\177\1\2", 12, 0, NULL, 0) = 12
> | 00000  ff 00 00 00 00 00 00 00  01 7f 01 02
> recvfrom(53, "\1\1", 2, 0, NULL, NULL)  = 2
> | 00000  01 01
> recvfrom(53, "\0\0", 8192, 0, NULL, NULL) = 2
> | 00000  00 00
> sendto(53, "\0\0", 2, 0, NULL, 0)       = 2
>
> The last line is what broke it, when it's normal, the last line look like this:
> sendto(54, "\0\0\0\1\1", 5, 0, NULL, 0) = 5
>
> After reading this ZMTP document:
> http://rfc.zeromq.org/spec:23
> I still can't crack the exact meanings of these messages, but I attempt to explain these messages like this:
> After connected, SUB tries to read 12 bytes from PUB, it got 10 initially, but received the last two eventually as 01 01.(I guess the 12 bytes are some signature)
> Then SUB sends 12 byts to PUB, with the first 10 bytes the same, but last two bytes as 01 02.
> Then SUB tries to read 8192 bytes, but only got two, which are 00 00
> Then SUB decides to send 00 00 back to PUB, and that's it, there will be nothing coming from this SUB socket.
> The TCP connection is still there, so there must be something wrong with the subscription.
>
> And in the normal case, SUB will send 00 00 00 11 11 instead of 00 00, I guess this must be the subscription message, but i don't understand why it's 5 bytes?
>
> So can someone please explain the exchange sequence on PUB/SUB, and maybe shed some lights on what is broken here?
>
> Thanks very much
> Shan
>
>
> The information contained in this email is strictly confidential and for the use of the addressee only, unless otherwise indicated. If you are not the intended recipient, please do not read, copy, use or disclose to others this message or any attachment. Please also notify the sender by replying to this email or by telephone (+44(020 7896 0011) and then delete the email and any copies of it. Opinions, conclusion (etc) that do not relate to the official business of this company shall be understood as neither given nor endorsed by it. IG is a trading name of IG Markets Limited (a company registered in England and Wales, company number 04008957) and IG Index Limited (a company registered in England and Wales, company number 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG Index Limited (register number 114059) are authorised and regulated by the Financial Services Authority.
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev



More information about the zeromq-dev mailing list