[zeromq-dev] Last call for Extensible Statistics Transmission Protocol (ESTP) v1.0

Paul Colomiets paul at colomiets.name
Thu Jun 14 22:10:40 CEST 2012


Hi Schmurfy,

On Thu, Jun 14, 2012 at 2:20 PM, Schmurfy <schmurfy at gmail.com> wrote:
> DERIVE: that's were I am getting skeptical, once we agree that the type used
> at the end of the chain (you advice using DERIVE RRD type to store counters)
>   (say RRD) is not the one used in the protocol is there really any reason
> to have a DERIVE type in the protocol ?

Note that it is COUNTER that's is represented by DERIVE type, not
DERIVE represented as a COUNTER.  So I don't think argument applies,
as not every DERIVE is a COUNTER (particularly, the one having value
that can go down).


> Would a client send the derive
> computed by
>   itself or send the raw value and let the server do this ? In the later
> case this is just a counter.
>

Yes, the type is considered to be used by stateless implementations.
All COUNTER uses are also for stateless implementations, and also
potentially can be calculated by sender. Potentially the difference is
mostly in handling message loss: message loss in COUNTER results to
averaging the value, message loss in DELTA value means that period is
totally lost (and system may have high load during that period, which
was a reason of a message loss, and message loss will not count
against network quota as described below).

By the way, collectd uses only DERIVE with zero minimum, which is
essentially a COUNTER type for ESTP. Having no real DERIVE type I see
as a bug in collectd (which I'll discuss in their ML soon), but
actually means that no use cases for real DERIVE type have been
emerged in collectd.

All in all, I've changed my mind 3 times while writing this email, so
will probably take a break and will think more :) But here is basic
round up:

Pros
1. Seems natural, complementary type to COUNTER
2. Usual type for RRD users
3. Some imaginary use-cases are there, the real ones may appear in future
4. A type that may be hard to add in future (all implementations
should be updated)

Cons:
1. Additional type (maintenance burden)
2. All imaginary use-cases are covered by GAUGE and good GUI (which
shows change over period)
3. No real use cases found *(please, propose anyone!)*
4. Resets of the value (if the underlying value is volatile) work bad
5. Can be calculated by sender (same as COUNTER)


> DELTA: what is the difference between GAUGE and DELTA ?
>

The difference is that scale of GAUGE doesn't depend on interval (e.g.
CPU %). But scale of DELTA very much depends: e.g. 10 messages per
second means 600 messages per minute. In other words: 10 with interval
of 10 is not the same as 10 with interval of 60 (compare with CPU%).
So if client for some reason chooses to change interval of the value
reporting, the old values (e.g. stored as messages per second) can
still be used. The changing interval is very frequent use case for
GUI: it may be more interesting to see messages per hour, instead
seeing messages per second at hour intervals (the latter may be nice
too, but this type gives you a choice).

Basically DELTA value is usually implemented as a COUNTER with
remembering value at the start interval and sending difference (or
alternatively by resetting counter at read), the statistics collection
application then divides it by seconds elapsed to store nice messages
per second value.

> Here is something to think about on types, say I have a probe sending the
> number of bytes sent by the network card to my central server,
> now what I want to graph is:
> - the speed at which data are sent
> - the total number of bytes sent (say I want to check how much I will pay at
> the end of the month)
>
> For this I would prefer having my client sending one metric which is the
> number of bytes sent and then let the server store the data, one metric
> received
> could lead to storing two, three or maybe more RRD metrics (if used) but why
> the client should care about that ?
>

Sure use a COUNTER type. Storing both values is a collection
application's problem. Note that actually storing the bytes per second
rate with double precision (as RRD does), gives you quite precise
estimate of a traffic per month (and per day or week), without storing
values twice (and it's actually better in case of counter reset or
wrap than the difference between counter values at the start of the
month and at the end)


> I may be completely wrong on the goal of the types but for me they should
> just define what the client is sending and not how
> the data will be ultimately stored on disk if we want something flexible.
> So are the current types a definition of what is sent or an indication of
> how to store the data ?
>

Yes is the definition of what client is sending. But DELTA type gives
the semantics of the value, not how it is stored. It can be stored in
various ways:

1. Original values
2. Data rates per time unit (note: any time unit)
3. Growing counter

But compare that to GAUGE: it's useless to store data rates per second
or a counter.

Similarly if you making a proxy application that aggregates statistics
over longer period, the difference of the DELTA and GAUGE type is a
hint for summing former's and averaging latter's values over a period.

-- 
Paul



More information about the zeromq-dev mailing list