[zeromq-dev] PUB/SUB on an epgm socket stops receiving eventually …
Ladan Gharai
lgharai at gmail.com
Fri Jun 3 21:49:33 CEST 2011
On Wed, Jun 1, 2011 at 4:41 PM, Steven McCoy <steven.mccoy at miru.hk> wrote:
> On 2 June 2011 04:17, Ladan Gharai <lgharai at gmail.com> wrote:
>
>> Hi:
>>
>>
>>
>> I are trying to use PUB/SUB with epgm – and what I am observing is that
>> sometimes one of the receivers stops receiving data.
>>
>>
>>
>> With Iperf and UDP, the boxes I am using can sustain 500Mbps with no
>> loss (or very little) – but the epgm receiver is clunking out at 100Mbps or
>> even lower data rates
>>
>>
> The default rate limit is now 40mbps, you can turn it off or raise it up as
> required.
>
Yes, I have ZMQ_RATE set to 500Mbps
>
>
>>
>>
>> I am using RHEL5 and zeromq-2.1.7
>>
>>
>>
>> I’ve turned on the openpgm trace/debug messages – afaict once the epgm
>> receiver sustains “a lot” of packet loss its just not able to start-over
>> again
>>
>>
> Every time the receiver sees packet loss it closes the socket and schedules
> a new socket to be created to reconnect to the PGM stream.
>
I am not sure I understand this - do you mean the zmq socket gets a new
zmq socket if the ePGM receiver experiences unrecoverable loss? (I dont see
any new socket opening I just see the zmq recv not receiving anymore)
>
>
>>
>>
>> My questions are:
>>
>> 1. Is there a way to reset the receiver once this happens?
>>
>>
> Reconnects occur with the same engine as TCP reconnects.
>
>
>>
>> 1.
>> 2. Has anyone experimented with changing the size of the rxw (it
>> currently uses 33333) – and the various timers NAK_RB_IVL, NAK_RPT_IVL and
>> NAK_RDATA_IVL (something akin to TCP tuning?)
>>
>>
> If you find PGM is non-productive you should investigate tightening the
> recovery settings so failure is raised sooner rather than later. The
> default settings are friendly towards 10mb networks and so running at high
> speed on 1gb networks may pose a problem with high data loss.
>
> For example, drop the retry count for DATA & NCF from the default 50 to 2.
>
> ~line 211 in pgm_socket.cpp:
> nak_data_retries = 2,
>
> nak_ncf_retries = 2;
>
Yes - this seems the most sensible approach, expect now it crashes -
Segmentation fault - once it falls into a long series of packet losses.
>
>> 1.
>> 2. Also occasionally I see the following assertion failed sometime
>> after everything has gone to zero:
>>
>> Assertion failed: pending_bytes == 0
>> (pgm_receiver.cpp:142)
>>
>
> Also raised a couple of weeks ago and there is a case in Github. Requires
> additional debugging to find the cause. The first step is to add an
> assertion to ensure "*pending_bytes*" is always positive.
>
> ~line 226 in pgm_receiver.cpp:
>
> // Push all the data to the decoder.
>
> ssize_t processed = it->second.decoder->process_buffer (data, received);
>
> *assert (processed >= 0);*
>
> if (processed < received) {
>
> // Save some state so we can resume the decoding process later.
>
> pending_bytes = received - processed;
>
>
> --
> Steve-o
>
> _______________________________________________
> zeromq-dev mailing list
> zeromq-dev at lists.zeromq.org
> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20110603/b84a88ae/attachment.htm>
More information about the zeromq-dev
mailing list