[Observium] observium with otsdb-rrdcd ( rrdcached to opentsdb proxy ) (Was Re: is_file() usage for rrd file existance)

5 Sep 2013


      Just thought I would share some of my progress with my work on scaling --
just incase anyone was interested.
( see 
http://postman.memetic.org/pipermail/observium/attachments/20130829/6208eb9
8/attachment-0001.png for concept diag )
I decided I needed to stop 'theory-ing' around and have a crack.
As such I created a fresh install, stuffed in my code, and bashed and
bashed and bashed.
Create/Update
-------------
I overloaded rrdtool.inc.php after rrdtool_create to 'touch' the filename
on disk (solves that is_file issue for the moment on a single poller).
The CREATE gets sent to the otsdb-rrdcd(shim) .. It stores all the
metadata in both redis and hbase, and creates all the metrics in OpenTSDB.
  No problems here.
Updates are sent to the otsdb-rrdcd which uses the meta-data to map them
to metric updates and pushes them into OpenTSDB.
The only thing I noted here is that OpenTSDB does not support 'Unknown' or
'U' or 'NaN' or NULL or any of that .. If you don't have a metric it
doesn't write one.
The polling times on all the devices I have tried on both native rrd and
otsdb-rrdcd are similar
(like identical .. I assume with a loaded platform the rrd could
potentially be slower as it would be disk-whacking and doing all those
consolidations and writes all the time)
Fetch(aka Graph)
----------------
A bit trickier.
The data is stored in 'raw' format in OpenTSDB, so when we pull out data
from 'type' DERIVE or COUNTER we turn it into a rate (well OpenTSDB does
that if we ask nicely).
RRDtool is expecting that the data that is returned is at the highest
resolution available for the period (start->end).
I use the RRA data stored during the 'CREATE' to work out the appropriate
'sample' size.
When I query OpenTSDB I ask it to 'downsample' to that sample size, then I
create a 2D array and fill it -- my first attempt didn't do this, but
giving back 5 minute samples for a 2 year period seems to slow things up a
bit... (*cough*)
Interesting things about RRDtool -- when it does a FETCH it doesn't ask
for a metric, it expects all the metrics in the RRD.
So that makes for a bit of additional data get/packing.    It also appears
that if you reference the same RRD for say INOCTECTS and OUTOCTECTS it
does 2 fetch operations (even though it got all the DataSource's in the
first...)
So I cache any FETCH return for 'TTL=period' (limited to max 1800s).
I needed to put some small modification in a few graph definition files as
the 'trunk' version of rrdtool doesn't like some of the formatting (meh)
Ok -- so how does it work.
Simple and small single or couple of item (ie mem usage) data stacks work
pretty well.  It is a bit slower that the native version on a side-by-side
vmware, but performance is acceptable and I assume could benefit from some
tuning of my hbase, my code, my deployment or some dedicated fast hardware
(or a couple of UI hosts behind a ha-proxy/load-balancer)
The composite traffic graphs are a bit of a different beast
A 3750G 24 port switch 'traffic aggregate' swiggle-line or overview graph
pulls in 34 RRD files (the ports).  It hits IN and OUT octets, and then it
sums them.
On a 3750 with 34 ports ___pre-cached___ example (ie it had a query
response ready to go)
  Otsdb-rrdcd =  Runtime 0.21555590629578 secs
RRD         =  Runtime 0.141361951828 secs
  OTSDB-GRAPH/UI = 38144 points retrieved, 11654 points plotted in 245ms
  (Query 
/#start=2013/09/04-19:42:00&m=sum:5m-avg:rate:observium.INOCTETS{host=sw1.d
ev1,port=*}&o=&m=sum:5m-avg:rate:observium.OUTOCTETS{host=sw1.dev1,port=*}&
o=axis x1y2&yrange=[0:]&key=out bottom center&wxh=1400x600)
On a 7600 with 88 ports with no pre-cache (ie we haven't accessed the data
prior we have)
  Otsdb-rrdcd =  Runtime 30.283470869064 secs
  RRD         =  Runtime 0.18092679977417 secs
  OTSDB-GRAPH/UI = 52272 points retrieved, 28838 points plotted in 670ms.
  (Query 
http://.../#start=2013/09/04-19:42:00&m=sum:5m-avg:rate:observium.INOCTE...
host=7600.devtest,port=*}&o=&m=sum:5m-avg:rate:observium.OUTOCTETS{host=760
0.devtest,port=*}&o=axis%20x1y2&yrange=[0:]&key=out%20bottom%20center&wxh=1
400x600)
Same 7600 if we are getting 'cached' results from otsdb-rrdcd
  Otsdb-rrcd  =  Runtime 0.68489003181458 secs
A 2811 with 3 ports with no pre-cache
  Otsdb-rrdcd =  Runtime 0.70869016647339 secs
  RRD         =  Runtime 0.12454390525818 secs
  OTSDB-GRAPH/UI = 1740 points retrieved, 1110 points plotted in 117ms.
  (Query 
http://.../#start=2013/09/04-19:42:00&m=sum:5m-avg:rate:observium.INOCTE...
host=tr1.adl6,port=*}&o=&m=sum:5m-avg:rate:observium.OUTOCTETS{host=tr1.adl
6,port=*}&o=axis%20x1y2&yrange=[0:]&key=out%20bottom%20center&wxh=1400x600)
As you can see with multiple ports 'stacked' squiggles can bog down.
If you watching what happens RRDtool does a sequential set of 'FETCH'
commands (twice for IN and OUT).  Each one needs to complete before the
other is started (single thread).
I'm not actually as unhappy about this performance as I was initially (you
should have seen it doing 5 minute windows over 1 year *cough*) -- there
are only a couple of bits on the UI with large-multi-port devices where it
make it 'feel' sucky.
Some things I'm thinking about....
Quick Hax Solution #1
---------------------
In rrdtool_graph if more than 5 RRD's are referenced in the graph then for
all the RRD's greater than the first 5 push a (PRE)'FETCH ..filename... CF
start end' command to otsdb-rrdcd, then fire off rrdtool.
So whilst the first 5 queries run sequentially all others are running
parallel, and once the first 5 are completed the rest should _probably_ be
cached, and hence the overall graph will complete faster.
Quick Hax Solution #2
---------------------
In rrdtool_graph if more than 5 RRD's are referenced in the graph then
talk to otsdb-rrdcd and say "run one query for all these metrics for this
time period and pack the result sets into single-metric cached objects" ..
When that completes then run 'rrdtool graph' (all data is pre-cached)
I sort of like this better than #1, as in theory if (for the 7600 example)
you take ~700ms of 'graph fetch, munge, and store in cache time' for the
combined dataset (all ports), and the ~700ms of rrdtool time (for
pre-cached data) it looks like about 1.4 seconds -- I assume a reasonable
hbase deployment might see that drop a little, but if not it beats the
pants out of 30 seconds :)
Quick Hax Solution #3
---------------------
If the runtime of rrdtool_graph exceeds 5 seconds just kill it and draw a
'Under Construction' graph, or just don't draw graphs hitting more than
5/10 ports.rrd files
Probably Better Solution #1
---------------------------
Have some AJAX graph foo (#1) pulling the metric(s) and drawing the UI
graphs.
There are obviously some challenges here since the graph definitions are
rrd-tool specific, but not beyond possible (even if just starting with
multi-stacked traffic).
There are some probably advantages as well such as
  . Only getting the metrics you ask for (FETCH makes you get all the
DataSources in the RRD)
  . Being able to determine the 'downsample' size based on the target size
of the graph -- or not down sample at all for say 24h graphs
     (ie compute the resolution of 1px in 'time' and use that for lower
AJAX data transfer)
  . Having some 'interactive' foo (such as point identification ... ie at
what time was that spike and what was its real value.. (#2))
  . Possible to perhaps have some other 'interactive' foo (like pan left..
Zoom in, zoom out, etc...)
#1) http://code.shutterstock.com/rickshaw/
#2) http://code.shutterstock.com/rickshaw/examples/lines.html
Anyway just thought I would check in.
Every day I stumble on something new in the UI and I smile -- nice work
guys.
Cheers,
  Peter

[Observium] observium with otsdb-rrdcd ( rrdcached to opentsdb proxy ) (Was Re: is_file() usage for rrd file existance)

Peter Childs