[zeromq-dev] Last call for Extensible Statistics Transmission Protocol (ESTP) v1.0

Schmurfy schmurfy at gmail.com
Fri Jun 15 23:11:27 CEST 2012


I dd not gave it too much thought before starting implementing it but why
using ":" as a separator (which is later used in the time) but suddenly
stop using it halfway and use space which can even appear multiple times ?

Why not just go for something like this:
ESTP*org.example*sys**cpu*2012-06-02T09:36:45*10*7.2


In ruby (and I am sure it is as easy in most high level languages) it could
be parsed as simply as:

header, host, app, res, metric, time, interval, value =
"ESTP*org.example*sys**cpu*2012-06-02T09:36:45*10*7.2:a".split("*")
time = Time.parse(time)
value, value_type = value.split(":")


extension fields are omitted but I am sure you see my point.
I don't see why we should have something much more complex than needed
unless there is a good reason.

On 15 June 2012 11:33, Schmurfy <schmurfy at gmail.com> wrote:

> After reading your answer and the definition type in the spec I admit they
> all make sense, I did not understood what
> DELTA was supposed to be but I am not sure why since the definition is not
> that long.
>
>
> On 14 June 2012 22:10, Paul Colomiets <paul at colomiets.name> wrote:
>
>> Hi Schmurfy,
>>
>> On Thu, Jun 14, 2012 at 2:20 PM, Schmurfy <schmurfy at gmail.com> wrote:
>> > DERIVE: that's were I am getting skeptical, once we agree that the type
>> used
>> > at the end of the chain (you advice using DERIVE RRD type to store
>> counters)
>> >   (say RRD) is not the one used in the protocol is there really any
>> reason
>> > to have a DERIVE type in the protocol ?
>>
>> Note that it is COUNTER that's is represented by DERIVE type, not
>> DERIVE represented as a COUNTER.  So I don't think argument applies,
>> as not every DERIVE is a COUNTER (particularly, the one having value
>> that can go down).
>>
>>
>> > Would a client send the derive
>> > computed by
>> >   itself or send the raw value and let the server do this ? In the later
>> > case this is just a counter.
>> >
>>
>> Yes, the type is considered to be used by stateless implementations.
>> All COUNTER uses are also for stateless implementations, and also
>> potentially can be calculated by sender. Potentially the difference is
>> mostly in handling message loss: message loss in COUNTER results to
>> averaging the value, message loss in DELTA value means that period is
>> totally lost (and system may have high load during that period, which
>> was a reason of a message loss, and message loss will not count
>> against network quota as described below).
>>
>> By the way, collectd uses only DERIVE with zero minimum, which is
>> essentially a COUNTER type for ESTP. Having no real DERIVE type I see
>> as a bug in collectd (which I'll discuss in their ML soon), but
>> actually means that no use cases for real DERIVE type have been
>> emerged in collectd.
>>
>> All in all, I've changed my mind 3 times while writing this email, so
>> will probably take a break and will think more :) But here is basic
>> round up:
>>
>> Pros
>> 1. Seems natural, complementary type to COUNTER
>> 2. Usual type for RRD users
>> 3. Some imaginary use-cases are there, the real ones may appear in future
>> 4. A type that may be hard to add in future (all implementations
>> should be updated)
>>
>> Cons:
>> 1. Additional type (maintenance burden)
>> 2. All imaginary use-cases are covered by GAUGE and good GUI (which
>> shows change over period)
>> 3. No real use cases found *(please, propose anyone!)*
>> 4. Resets of the value (if the underlying value is volatile) work bad
>> 5. Can be calculated by sender (same as COUNTER)
>>
>>
>> > DELTA: what is the difference between GAUGE and DELTA ?
>> >
>>
>> The difference is that scale of GAUGE doesn't depend on interval (e.g.
>> CPU %). But scale of DELTA very much depends: e.g. 10 messages per
>> second means 600 messages per minute. In other words: 10 with interval
>> of 10 is not the same as 10 with interval of 60 (compare with CPU%).
>> So if client for some reason chooses to change interval of the value
>> reporting, the old values (e.g. stored as messages per second) can
>> still be used. The changing interval is very frequent use case for
>> GUI: it may be more interesting to see messages per hour, instead
>> seeing messages per second at hour intervals (the latter may be nice
>> too, but this type gives you a choice).
>>
>> Basically DELTA value is usually implemented as a COUNTER with
>> remembering value at the start interval and sending difference (or
>> alternatively by resetting counter at read), the statistics collection
>> application then divides it by seconds elapsed to store nice messages
>> per second value.
>>
>> > Here is something to think about on types, say I have a probe sending
>> the
>> > number of bytes sent by the network card to my central server,
>> > now what I want to graph is:
>> > - the speed at which data are sent
>> > - the total number of bytes sent (say I want to check how much I will
>> pay at
>> > the end of the month)
>> >
>> > For this I would prefer having my client sending one metric which is the
>> > number of bytes sent and then let the server store the data, one metric
>> > received
>> > could lead to storing two, three or maybe more RRD metrics (if used)
>> but why
>> > the client should care about that ?
>> >
>>
>> Sure use a COUNTER type. Storing both values is a collection
>> application's problem. Note that actually storing the bytes per second
>> rate with double precision (as RRD does), gives you quite precise
>> estimate of a traffic per month (and per day or week), without storing
>> values twice (and it's actually better in case of counter reset or
>> wrap than the difference between counter values at the start of the
>> month and at the end)
>>
>>
>> > I may be completely wrong on the goal of the types but for me they
>> should
>> > just define what the client is sending and not how
>> > the data will be ultimately stored on disk if we want something
>> flexible.
>> > So are the current types a definition of what is sent or an indication
>> of
>> > how to store the data ?
>> >
>>
>> Yes is the definition of what client is sending. But DELTA type gives
>> the semantics of the value, not how it is stored. It can be stored in
>> various ways:
>>
>> 1. Original values
>> 2. Data rates per time unit (note: any time unit)
>> 3. Growing counter
>>
>> But compare that to GAUGE: it's useless to store data rates per second
>> or a counter.
>>
>> Similarly if you making a proxy application that aggregates statistics
>> over longer period, the difference of the DELTA and GAUGE type is a
>> hint for summing former's and averaging latter's values over a period.
>>
>> --
>> Paul
>> _______________________________________________
>> zeromq-dev mailing list
>> zeromq-dev at lists.zeromq.org
>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.zeromq.org/pipermail/zeromq-dev/attachments/20120615/a4878073/attachment.htm>


More information about the zeromq-dev mailing list