[zeromq-dev] PUB/SUB on an epgm socket stops receiving eventually …

Steven McCoy steven.mccoy at miru.hk
Wed Jun 1 22:41:14 CEST 2011


On 2 June 2011 04:17, Ladan Gharai <lgharai at gmail.com> wrote:

> Hi:
>
>
>
> I are trying to use PUB/SUB with epgm – and what I am observing is that
> sometimes one of the receivers stops receiving data.
>
>
>
> With Iperf and UDP,  the boxes I am using can sustain 500Mbps with no loss
> (or very little) – but the epgm receiver is clunking out at 100Mbps or even
> lower data rates
>
>
The default rate limit is now 40mbps, you can turn it off or raise it up as
required.


>
>
> I am using RHEL5 and zeromq-2.1.7
>
>
>
> I’ve turned on  the openpgm trace/debug messages – afaict  once the epgm
> receiver sustains “a lot” of packet loss its just not able to start-over
> again
>
>
Every time the receiver sees packet loss it closes the socket and schedules
a new socket to be created to reconnect to the PGM stream.


>
>
> My questions are:
>
>    1.   Is there a way to reset the receiver once this happens?
>
>
Reconnects occur with the same engine as TCP reconnects.


>
>    1.
>    2. Has anyone experimented with changing the size of the rxw (it
>    currently uses 33333) – and the various timers NAK_RB_IVL, NAK_RPT_IVL and
>    NAK_RDATA_IVL  (something akin to TCP tuning?)
>
>
If you find PGM is non-productive you should investigate tightening the
recovery settings so failure is raised sooner rather than later.  The
default settings are friendly towards 10mb networks and so running at high
speed on 1gb networks may pose a problem with high data loss.

For example, drop the retry count for DATA & NCF from the default 50 to 2.

~line 211 in pgm_socket.cpp:


                  nak_data_retries = 2,

                  nak_ncf_retries  = 2;



>
>    1.
>    2.    Also occasionally I see the following assertion failed sometime
>    after everything has gone to zero:
>
>                                 Assertion failed: pending_bytes == 0
> (pgm_receiver.cpp:142)
>

Also raised a couple of weeks ago and there is a case in Github.  Requires
additional debugging to find the cause.  The first step is to add an
assertion to ensure "*pending_bytes*" is always positive.

~line 226 in pgm_receiver.cpp:

 //  Push all the data to the decoder.
        ssize_t processed = it->second.decoder->process_buffer (data, received);
*assert (processed >= 0);*
        if (processed < received) {
            //  Save some state so we can resume the decoding process later.
            pending_bytes = received - processed;


-- 
Steve-o
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20110602/875399d0/attachment.htm>


More information about the zeromq-dev mailing list