[zeromq-dev] [External] Re: A PGM/EPGM question

Montero, Antonio UTC CCS Antonio.Montero at fs.utc.com
Fri Mar 23 18:43:10 CET 2018

Understood however that is not the behavior I am seeing. Although that is likely to be the case for EPGM since those are UDP packets although from my understanding regardless whether incoming data is multicast or unicast, PGM is binding to any address and specific port. The kernel will pass all data received on an interface to any listening socket as long as the destination port patches that of the socket binding.

Now let’s put aside UDP for a sec, what about when using pgm transport? These are raw sockets and any unicast NAK are actually sent from remote SUB to the PUB unicast address and source port (which is randomly selected at the time of creating the raw PGM PUB socket). At that point the PUB socket should be the only one listening on its own unicast address and source port. Correct?

This is a snapshot of what my netstat –ln looks like at the moment. This is with both ( PUB and SUB created and running on the same host ).

Proto Recv-Q Send-Q Local Address                                                                Foreign Address      State
Sockets associated with PUB:
raw   164672      0     2001:db8::2b0:19ff:fe73:d890%2147479552:113 ::%622984:*            113
raw   164672      0    2001:db8::2b0:19ff:fe73:d890%2147479552:113 ::%623304:*             113
raw        0            0    ::%2147479552:113                                                    ::%622984:*              113
Sockets associated with SUB:
raw   164672      0    2001:db8::2b0:19ff:fe73:d890%2147479552:113 ::%622984:*             113
raw   164672      0    2001:db8::2b0:19ff:fe73:d890%2147479552:113 ::%623304:*             113
raw        0            0    ::%2147479552:113                                                     ::%622984:*            113

You would notice how the Recv-Q is full on both PUB and SUB related send/router alert send sockets.
These are my thoughts as to why they are full and not because of the same reason:

For the case of the SUB associated sockets the 2001 address ones basically used to send NAKs to the remote PUB:
These get full as soon as a remote PUB starts sending multicast data. I think the SUB send socket is connecting with the destination port used to send multicast traffic. I can see whenever a SUB sends NAK to the PUB that the source port on unicast packet matches that of the destination port of the multicast group. However this is not really an issue since the SUB socket is configured in PGM as receive only therefore any ODATA/SPM data received on its send socket is not processed. The SUB socket however is also getting the multicast data via the local binding: ::%2147479552:113 which as seen is emptying out its queue fine and I could verify the node is receiving data at the application level.

For the case of the PUB associated sockets the 2001 address ones basically used to send ODATA/SPM/RDATA/NCF to remote SUB:
Even though its local binding: ::%2147479552:113 is also receiving the multicast data sent by remote PUB it is thrown out since the PUB socket is configured as send only at PGM level and so ODATA/SPM data received is thrown out.
However, its send associated sockets do receive unicast NAKs from remote SUB and as seen above they are being put on the socket’s Recv-Q however the queue is full because NAKs are not being processed by the PUB socket.

Note: The exact same behavior is seen with EPGM the only difference is that none of the sockets Recv-Q get full because they are being emptied out at the UDP layer upon arrival however I suspect that once forwarded to the PGM layer the PGM socket buffers would show the same thing as netstat –ln above.

Even though I think is redundant and probably not a good idea to run the same code when creating either a ZMQ PUB and/or SUB socket since essentially those socket types are restricted to do specific things like send/receive only, that does not appear to be the cause of the issue here. I have read in some of the openpgm doc that it is necessary for the application to frequently call pgm_recv as that somehow moves the pgm state machine to do things, however my issue here is how to accomplish that from the ZMQ API layer, that is the whole point of using ZMQ in my case in the first place.

Any thoughts? And thanks of the comments.

From: zeromq-dev [mailto:zeromq-dev-bounces at lists.zeromq.org] On Behalf Of Steven McCoy
Sent: Friday, March 23, 2018 12:53 PM
To: ZeroMQ development list
Subject: Re: [zeromq-dev] [External] Re: A PGM/EPGM question

The problem is that the kernel will not multicast UDP unicast packets to each socket listening so it is probable the wrong socket is hearing the NAK.

On Fri, Mar 23, 2018 at 12:07 Montero, Antonio UTC CCS <Antonio.Montero at fs.utc.com<mailto:Antonio.Montero at fs.utc.com>> wrote:
ZMQ’s implementation of PUB socket type does not allow for receive calls to be made (zmq_recv is disabled), hence why I am trying to figure out how does one trigger ZMQ to call “pgm_recv” on the PUB socket in order to get the PUB socket to processes received NAKs from a remote SUB socket?
I have tried querying the PUB socket state via ZMQ_EVENTS to triggering the processing of any commands available for the socket however that does not seem to move the PGM state machine in terms of processing NAKs.

I am running both a PUB and SUB on the same application on the same host and although I see the same set of sockets being created at the PGM level for both PUB and SUB ZMQ sockets which includes multiple sockets binding to the same port, this does not appear to cause any issues in terms of my SUB socket able to receive multicast messages from a remote PUB and respond with unicast NAKs when data loss is detected.

Any ideas as to how a user should get ZMQ lib to trigger NAKs processing for a PUB socket using either pgm/epgm transports?

Antonio Montero.
From: zeromq-dev [mailto:zeromq-dev-bounces at lists.zeromq.org<mailto:zeromq-dev-bounces at lists.zeromq.org>] On Behalf Of Steven McCoy
Sent: Friday, March 23, 2018 9:55 AM
To: ZeroMQ development list
Subject: [External] Re: [zeromq-dev] A PGM/EPGM question

You should check the PUB socket has a loop that is processing the incoming NAK requests, this is usually recv call based.  The symptoms indicate that the protocol is operating TX-only.


On Wed, Mar 21, 2018 at 19:50 Montero, Antonio UTC CCS <Antonio.Montero at fs.utc.com<mailto:Antonio.Montero at fs.utc.com>> wrote:
I am having a bit of a hard time getting a ZMQ PUB socket reacting to PGM NAKs which means at this point I am not able to recover lost packets
I have tried with both protocols: (pgm and epgm). Still getting the same result.

I have a setup where I create both a PUB and SUB sockets in that order in the same ZMQ context running on the same host and connected to the same IPv6 multicast address and port.
I have N nodes and each node has a PUB and SUB. All N nodes send messages asynchronously and all N nodes receive all messages. My multicast network is working fine whether I use pgm or epgm and all N nodes communicate with each other over IPv6 multicast.
The issue I am having is when a packet loss occurs, a remote SUB sends a unicast NAK back to the source PUB however I am not seeing any NCF or RDATA being sent by the source PUB. I have verified that the packets in question are in fact still in the Tx Window as reported by the SPMs being sent by the source PUB. I have ongoing traffic on a periodic basis which triggers a send and receive respectably on the PUB and SUB sockets and I am clearing out the ZMQ_EVENTS after every send and/or receive. I also have a polling thread running every 150ms to check for ZMQ_EVENTS on both PUB and SUB.

Nothing seems to work in terms of triggering the PUB to react and process the NAKs received from remote SUB. Looking at the code a bit I see this function zmq::pgm_socket_t::process_upstream but
can’t tell if and how it is being triggered. It does not appear to be from my perspective.

Any help or direction would be appreciated. Thanks.


zeromq-dev mailing list
zeromq-dev at lists.zeromq.org<mailto:zeromq-dev at lists.zeromq.org>
zeromq-dev mailing list
zeromq-dev at lists.zeromq.org<mailto:zeromq-dev at lists.zeromq.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20180323/a8a690bd/attachment.htm>

More information about the zeromq-dev mailing list