[zeromq-dev] Statistics reporting using zeromq/crossroads
Paul Colomiets
paul at colomiets.name
Tue May 29 21:50:43 CEST 2012
Hi Schmurfy,
On Tue, May 29, 2012 at 3:35 PM, Schmurfy <schmurfy at gmail.com> wrote:
> What I am trying to do is to have a common infrastructure for application
> (my applications) and system (disk io, memory used, disk used, ...)
> statistics, I shown what the collectd protocol could do but I am not
> entirely satisfied with it either, here are my problems with collectd:
>
> - I want to have statistics available as soon as possible: collectd has a
> cycle time (default is 10s) and plugins have no way to know when the cycle
> starts or end, they are notified when data is received or ready to be sent
> but that's all. One problem I had with my zeromq attempt (my original branch
> is on github: https://github.com/schmurfy/collectd/tree/zeromq but based on
> collectd 4.x) is that you cannot send one "frame" per cycle which is what I
> wanted to do with it (the original network plugins fills a buffer and when
> the buffer reach a threshold it sends the packet so your values can be sent
> now or at now+cycle time which means 10s later my a 10s cycle).
>
Well I have no problem with cycle time. We actually don't even flush
the statistics while viewing (which drops latency to about 15
minutes). However, we use collectd only for statistics and use
separate monitoring system (nagios). And actually all
monitoring/statistics systems I've seen have only bigger delay, not
smaller.
Making more tools which support libxs and using SURVEY sockets to get
fresh data, may fix the problem.
> - I never used encryption, I consider it can be better handled at lower
> level and bonus point is that you don't cripple your code with it
Ok.
> - I don't like the need to predefine the "types" used, it is a pain when
> using multiple collectd servers since they need to all have the same
> configuration file to understand each other, I prefer a more open way.
Me too.
> That
> said you don't need to know the types definitions to actually parse the data
> stream, the number of parts is in the packet itself, you need the
> definitions to match each cell with its label.
Yes. The most obvious way to implement that is use
one packet per value.
> - I am not really fond of the plugin/plugin_instance/type/type_instance,
> most of the time I don't remember which one is supposed to be what and many
> existing collectd plugins do not use all fields so it is more annoying than
> anything else.
>
Me too. But I'm not sure what format is best here. For simple values it's:
example.org/cpu
Then there are plugins:
example.org/processes/zombie
Then there is a namespace inside a plugin (plugin instances in terms
of collectd):
example.org/disk_usage/sda1/free
Then there are complex values:
example.org/disk/sda1/write.operations
example.org/disk/sda1/write.bytes
And there are also host-pair values:
example.org-example.net/ping/round-trip-time
frontend1.tld-www.h1.tld/http/requests
frontend1.tld-www.h2.tld/http/requests
frontend2.tld-www.h1.tld/http/requests
It seems that collectd does things mostry right, except it puts type
into a name of the value. This is one of the things I want to fix.
Also pings usually should be viewed other way around (not by the host
which collects values, but by the host which is pinged), but probably
this case should be fixed in GUI.
Do you have other point of view on naming?
> I don't really like text protocols, sure they may seem easier to parse (and
> debug since a human can read them) but for statistics which could be sent at
> a high rate you waste a lot of space (binary parsers are not that hard to
> write). For reference here is my ruby
> parser: https://github.com/schmurfy/rrd-grapher/blob/master/lib/rrd-grapher/notifier/parsers/ruby_parser.rb
I don't think of a system where rate of the data is really so big that
statistics can be slow, or bandwidth waste. Statistics are sent at
regular intervals (10s is a pretty big value), so it's easy to
calculate how much bandwidth you need. I'm not strongly against binary
protocol, but to be able to use subscriptions with zeromq you must do
the name of the value textual (at least without length-prefix). I also
do not like the collectd protocol in the following things:
1. It consists of unordered data parts. It's unclear whether order
matters (AFAICS, yes) and whether other fields are preserved after
"value" field (AFAICS, yes). Also it's unclear how to deal with
under-specified data (when not all fields are present).
2. It allows to concatenate several packets, which is bad for
subscriptions, and is bad when concatenating under-specified packets.
All in all it's easy to write wrong parser for the packets. And this
point is crucial for the wide adoption for the protocol.
> I am currently doing some experiments using a hash like structure serialized
> with messagepack and udp to transport, since messagepack support a lot of
> languages it virtually means any language could serialize/deserialize very
> easily the packets but I have not much to show currently since I am in the
> early phases of the project.
>
Messagepack is great. But it's not very easy to parse in C (without
dependencies)
and doesn't play well with subscriptions.
So unless you have strong objections, I'd rather try plain-text protocol.
> I am currently using https://github.com/schmurfy/rrd-grapher to graph the
> data read directly from the rrd files (I have a rrdcached server to help
> keep my I/O from destroying the machine xD), unlike many tools out there the
> server do not generate any images but sends the data to the clients which
> then create the graph.
> I tested some of the existing solutions but I don't like the idea of
> generating images server side, today's browsers are more than capable of
> rendering one of more graphs and it allows to shift some work to the client
> which is a good thing for me since they don't do much ;)
>
We are at the same point here. I'm using https://github.com/tailhook/jarred
It's very similar (uses flot too), but rrd -> json conversion is written in C.
>
> The good thing with collectd is the large number of existing plugins to
> collect nearly any statistic you could desire from your running servers, I
> even made an attempt to interface a ruby daemon directly with the plugins
> wich was not a real success ^^
> (the plugins exports functions but also use exported functions from collectd
> itself so interfacing with them is a bit tricky)
>
This is both strength and weakness of the collectd. The weakness is due
to the fact that it doesn't encourage software to implement collectd
protocol, but encourages to make a plugin to read the data.
The goal of the project is to make most of the zeromq/libxs - centric
software to use single protocol, and then expand it's dominance to
other areas. Pretty ambitious goal right? :) To achieve the goal I volunteer
to implement the protocol at least in my own products, in mongrel2
(it it will be accepted by the project owners) and a collectd plugin.
Implementing compatible statistics between zerogw, mongrel2,
nginx (rio?) is midterm goal.
--
Paul
More information about the zeromq-dev
mailing list