[zeromq-dev] SUB socket can fail to subscribe if PUB socket is reset

KIU Shueng Chuan nixchuan at gmail.com
Tue Aug 13 13:12:05 CEST 2013


The 0x00 0x00 sent by both pub and sub is the 0-length identity frame. I.e.
no explicitly set identity.

I made a mistake in my previous mail, "final-short" is 2 bytes long, the
length is part of it.
On Aug 13, 2013 6:00 PM, "Shan Wang" <Shan.Wang at ig.com> wrote:

> Thanks,****
>
> ** **
>
> That document makes sense. But do you know in what situation the SUB
> socket can send 0x00 0x00 to PUB? Does that mean a final short with length
> 0?****
>
> ** **
>
> Also why PUB send 0x00 0x00 to SUB after the signature? ****
>
> ** **
>
> One thing I forgot to mention, once the SUB socket is broken, it can not
> recover, means if I keep on restarting PUB sockets, the SUB will keep on
> receiving wrong sequence of messages and no other data.****
>
> ** **
>
> ** **
>
> *From:* zeromq-dev-bounces at lists.zeromq.org [mailto:
> zeromq-dev-bounces at lists.zeromq.org] *On Behalf Of *KIU Shueng Chuan
> *Sent:* 12 August 2013 22:56
> *To:* ZeroMQ development list
> *Subject:* Re: [zeromq-dev] SUB socket can fail to subscribe if PUB
> socket is reset****
>
> ** **
>
> You should read http://rfc.zeromq.org/spec:15 instead.****
>
> The five bytes are:
> 2 bytes identity frame.
> 3 bytes subscription frame
>   0: final-short
>   1: payload length
>   1: code for "subscribe"****
>
> On Aug 12, 2013 7:05 AM, "Shan Wang" <Shan.Wang at ig.com> wrote:****
>
> Hi All,
>
> I'm trying to use 0MQ in our project, but recently I found a very tricky
> problem in the pub/sub pattern.
>
> the setup is like this:
>
> version: 3.2.3, c++ API.
>
> transport protocol: tcp
>
> one SUB socket, subscribe to everything, high water mark = 1
> one PUB socket, high water mark = 1
>
> there is application level heartbeat between pub and sub, so if SUB socket
> discover there is no heartbeat from PUB, it assumes PUB is gone and will
> reset itself.
> This means whenever PUB is shutdown, SUB will close the socket, create it
> again, reconnect, and resubscribe.
> (The heartbeat is not necessary on normal PUB shutdown, but is essential
> in situation like PUB server power down or cable cut.)
>
> What I found is, if I keep on restarting PUB server, ie. once every 5
> seconds, the SUB socket can fail to subscribe and never receive anything
> from PUB.
> This is very hard to reproduce and happens once in a few hundred times.
>
> Unfortunately we are not allowed to use tcpdump here, so all I can do is
> use strace on the io thread to monitor socket functions.
> When it's broken, the messages SUB exchange with PUB after reconnect is
> like this:
>
> recvfrom(53, "\377\0\0\0\0\0\0\0\1\177", 12, 0, NULL, NULL) = 10
> | 00000  ff 00 00 00 00 00 00 00  01 7f
> recvfrom(53, 0x7f881000b9db, 2, 0, 0, 0) = -1 EAGAIN (Resource temporarily
> unavailable)
> sendto(53, "\377\0\0\0\0\0\0\0\1\177\1\2", 12, 0, NULL, 0) = 12
> | 00000  ff 00 00 00 00 00 00 00  01 7f 01 02
> recvfrom(53, "\1\1", 2, 0, NULL, NULL)  = 2
> | 00000  01 01
> recvfrom(53, "\0\0", 8192, 0, NULL, NULL) = 2
> | 00000  00 00
> sendto(53, "\0\0", 2, 0, NULL, 0)       = 2
>
> The last line is what broke it, when it's normal, the last line look like
> this:
> sendto(54, "\0\0\0\1\1", 5, 0, NULL, 0) = 5
>
> After reading this ZMTP document:
> http://rfc.zeromq.org/spec:23
> I still can't crack the exact meanings of these messages, but I attempt to
> explain these messages like this:
> After connected, SUB tries to read 12 bytes from PUB, it got 10 initially,
> but received the last two eventually as 01 01.(I guess the 12 bytes are
> some signature)
> Then SUB sends 12 byts to PUB, with the first 10 bytes the same, but last
> two bytes as 01 02.
> Then SUB tries to read 8192 bytes, but only got two, which are 00 00
> Then SUB decides to send 00 00 back to PUB, and that's it, there will be
> nothing coming from this SUB socket.
> The TCP connection is still there, so there must be something wrong with
> the subscription.
>
> And in the normal case, SUB will send 00 00 00 11 11 instead of 00 00, I
> guess this must be the subscription message, but i don't understand why
> it's 5 bytes?
>
> So can someone please explain the exchange sequence on PUB/SUB, and maybe
> shed some lights on what is broken here?
>
> Thanks very much
> Shan
>
>
> The information contained in this email is strictly confidential and for
> the use of the addressee only, unless otherwise indicated. If you are not
> the intended recipient, please do not read, copy, use or disclose to others
> this message or any attachment. Please also notify the sender by replying
> to this email or by telephone (+44(020 7896 0011) and then delete the email
> and any copies of it. Opinions, conclusion (etc) that do not relate to the
> official business of this company shall be understood as neither given nor
> endorsed by it. IG is a trading name of IG Markets Limited (a company
> registered in England and Wales, company number 04008957) and IG Index
> Limited (a company registered in England and Wales, company number
> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> Index Limited (register number 114059) are authorised and regulated by the
> Financial Services Authority.
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev****
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130813/6c7b254e/attachment.htm>


More information about the zeromq-dev mailing list