[zeromq-dev] Statistics protocol v0.1

Paul Colomiets paul at colomiets.name
Fri Jun 1 09:41:33 CEST 2012


Hi Marten,

On Fri, Jun 1, 2012 at 9:04 AM, Marten Feldtmann <itlists at schrievkrom.de> wrote:
> we are actually in the process of implementing a statistic system for
> our C#/Smalltalk application system using ZeroMQ and we have included
> the following information in our telegrams.
>

Thanks for sharing info. My questions about your use-case below. Would
you like to participate in developing a protocol? And what chance that
your system will make use of it?

> We used an UDP approach (subscribe method) and we had several statistic
> collectors within our network to save the information sent from all our
> 0MQ communication nodes.
>

So you use UDP instead of zeromq with pub/sub? (Or is there zeromq
with UDP somewhere?)
Why you don't use pub/sub?

> We included the following informations in our implementation:
>
> -> subscription filter

What it consists of?

> -> location (computer) of the process sending this information
> (ip-number - no name)

Why do you use IP number instead names? I believe
it's internal internal policy in your company or somesuch.
Or is it just debugging info, along with node name?

> -> name of the process
> -> process id

Is PID just a debugging info, or is it meaningful? (do you aggregate
info from several processes?)

> -> start-time of statistic interval (in ASCII to make it more readable
> in a format like: 01.06.2012T21:00:00.000+12 .. in that well known
> format) and including timezone information.

It's interesting from two points:

1. Textual data format may be nicer than unix timestamp. But
I'd prefer UTC only timestamp. As it's not intended to be presented
to user as is (except for debugging), its easier to deal with UTC
timestamps than with diverse timezones.

2. We used to send timestamp at the time of sending, not
the start of interval. Its easier  to produce, and it's more logical
when we send a counter, instead of a rate value (see below).

> -> duration length of the statistic interval in milliseconds

This one I've obviously missed. Will add shortly.

> -> symbolic name of the statistic producer (0MQ node)
> -> sub-symbolic name of the statistic producer (0MQ node)

So, according to the text below, I think that symbolic name
it's DNS name of the node, and sub-symbolic name it's name
of the subsystem, inside of the process. Am I right?

> -> number of bytes received/sent in that interval
> -> number of telegrams received/sent in that interval
>

1. The proposed protocol splits the values one per message (its barely
useful for UDP, so some bulking should be probably implemented there)

2. We used to send counter value instead of bytes in interval. I mean:
you have a long integer counter of bytes, which is always only
incremented, and we calculate rate value by subtracting previous one
from it (yes, wrapping of the counter is also accounted). This is how
collectd usually works. There are pros and cons of both. This should
be discussed more.


All in all, it seems to fit the protocol nice, however, items has to
be sorted out to know what is essential and what is inherently a
debugging info (e.g. ip address and node name are duplicated info),
and what can be put in extension data as it's not universally
understandable.

-- 
Paul



More information about the zeromq-dev mailing list