[zeromq-dev] SUB socket can fail to subscribe if PUB socket is reset
Shan Wang
Shan.Wang at ig.com
Tue Aug 13 11:08:26 CEST 2013
Thanks,
That document makes sense. But do you know in what situation the SUB socket can send 0x00 0x00 to PUB? Does that mean a final short with length 0?
Also why PUB send 0x00 0x00 to SUB after the signature?
One thing I forgot to mention, once the SUB socket is broken, it can not recover, means if I keep on restarting PUB sockets, the SUB will keep on receiving wrong sequence of messages and no other data.
From: zeromq-dev-bounces at lists.zeromq.org [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of KIU Shueng Chuan
Sent: 12 August 2013 22:56
To: ZeroMQ development list
Subject: Re: [zeromq-dev] SUB socket can fail to subscribe if PUB socket is reset
You should read http://rfc.zeromq.org/spec:15 instead.
The five bytes are:
2 bytes identity frame.
3 bytes subscription frame
0: final-short
1: payload length
1: code for "subscribe"
On Aug 12, 2013 7:05 AM, "Shan Wang" <Shan.Wang at ig.com<mailto:Shan.Wang at ig.com>> wrote:
Hi All,
I'm trying to use 0MQ in our project, but recently I found a very tricky problem in the pub/sub pattern.
the setup is like this:
version: 3.2.3, c++ API.
transport protocol: tcp
one SUB socket, subscribe to everything, high water mark = 1
one PUB socket, high water mark = 1
there is application level heartbeat between pub and sub, so if SUB socket discover there is no heartbeat from PUB, it assumes PUB is gone and will reset itself.
This means whenever PUB is shutdown, SUB will close the socket, create it again, reconnect, and resubscribe.
(The heartbeat is not necessary on normal PUB shutdown, but is essential in situation like PUB server power down or cable cut.)
What I found is, if I keep on restarting PUB server, ie. once every 5 seconds, the SUB socket can fail to subscribe and never receive anything from PUB.
This is very hard to reproduce and happens once in a few hundred times.
Unfortunately we are not allowed to use tcpdump here, so all I can do is use strace on the io thread to monitor socket functions.
When it's broken, the messages SUB exchange with PUB after reconnect is like this:
recvfrom(53, "\377\0\0\0\0\0\0\0\1\177", 12, 0, NULL, NULL) = 10
| 00000 ff 00 00 00 00 00 00 00 01 7f
recvfrom(53, 0x7f881000b9db, 2, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
sendto(53, "\377\0\0\0\0\0\0\0\1\177\1\2", 12, 0, NULL, 0) = 12
| 00000 ff 00 00 00 00 00 00 00 01 7f 01 02
recvfrom(53, "\1\1", 2, 0, NULL, NULL) = 2
| 00000 01 01
recvfrom(53, "\0\0", 8192, 0, NULL, NULL) = 2
| 00000 00 00
sendto(53, "\0\0", 2, 0, NULL, 0) = 2
The last line is what broke it, when it's normal, the last line look like this:
sendto(54, "\0\0\0\1\1", 5, 0, NULL, 0) = 5
After reading this ZMTP document:
http://rfc.zeromq.org/spec:23
I still can't crack the exact meanings of these messages, but I attempt to explain these messages like this:
After connected, SUB tries to read 12 bytes from PUB, it got 10 initially, but received the last two eventually as 01 01.(I guess the 12 bytes are some signature)
Then SUB sends 12 byts to PUB, with the first 10 bytes the same, but last two bytes as 01 02.
Then SUB tries to read 8192 bytes, but only got two, which are 00 00
Then SUB decides to send 00 00 back to PUB, and that's it, there will be nothing coming from this SUB socket.
The TCP connection is still there, so there must be something wrong with the subscription.
And in the normal case, SUB will send 00 00 00 11 11 instead of 00 00, I guess this must be the subscription message, but i don't understand why it's 5 bytes?
So can someone please explain the exchange sequence on PUB/SUB, and maybe shed some lights on what is broken here?
Thanks very much
Shan
The information contained in this email is strictly confidential and for the use of the addressee only, unless otherwise indicated. If you are not the intended recipient, please do not read, copy, use or disclose to others this message or any attachment. Please also notify the sender by replying to this email or by telephone (+44(020 7896 0011) and then delete the email and any copies of it. Opinions, conclusion (etc) that do not relate to the official business of this company shall be understood as neither given nor endorsed by it. IG is a trading name of IG Markets Limited (a company registered in England and Wales, company number 04008957) and IG Index Limited (a company registered in England and Wales, company number 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG Index Limited (register number 114059) are authorised and regulated by the Financial Services Authority.
_______________________________________________
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org<mailto:zeromq-dev at lists.zeromq.org>
http://lists.zeromq.org/mailman/listinfo/zeromq-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20130813/1a4c1032/attachment.htm>
More information about the zeromq-dev
mailing list