![](https://secure.gravatar.com/avatar/b3a546cd599e8024ed2790e548f4c63b.jpg?s=120&d=mm&r=g)
Just thought I would share some of my progress with my work on scaling -- just incase anyone was interested.
( see http://postman.memetic.org/pipermail/observium/attachments/20130829/6208eb9 8/attachment-0001.png for concept diag )
I decided I needed to stop 'theory-ing' around and have a crack.
As such I created a fresh install, stuffed in my code, and bashed and bashed and bashed.
Create/Update ------------- I overloaded rrdtool.inc.php after rrdtool_create to 'touch' the filename on disk (solves that is_file issue for the moment on a single poller).
The CREATE gets sent to the otsdb-rrdcd(shim) .. It stores all the metadata in both redis and hbase, and creates all the metrics in OpenTSDB. No problems here.
Updates are sent to the otsdb-rrdcd which uses the meta-data to map them to metric updates and pushes them into OpenTSDB.
The only thing I noted here is that OpenTSDB does not support 'Unknown' or 'U' or 'NaN' or NULL or any of that .. If you don't have a metric it doesn't write one.
The polling times on all the devices I have tried on both native rrd and otsdb-rrdcd are similar
(like identical .. I assume with a loaded platform the rrd could potentially be slower as it would be disk-whacking and doing all those consolidations and writes all the time)
Fetch(aka Graph) ---------------- A bit trickier.
The data is stored in 'raw' format in OpenTSDB, so when we pull out data from 'type' DERIVE or COUNTER we turn it into a rate (well OpenTSDB does that if we ask nicely).
RRDtool is expecting that the data that is returned is at the highest resolution available for the period (start->end).
I use the RRA data stored during the 'CREATE' to work out the appropriate 'sample' size.
When I query OpenTSDB I ask it to 'downsample' to that sample size, then I create a 2D array and fill it -- my first attempt didn't do this, but giving back 5 minute samples for a 2 year period seems to slow things up a bit... (*cough*)
Interesting things about RRDtool -- when it does a FETCH it doesn't ask for a metric, it expects all the metrics in the RRD.
So that makes for a bit of additional data get/packing. It also appears that if you reference the same RRD for say INOCTECTS and OUTOCTECTS it does 2 fetch operations (even though it got all the DataSource's in the first...)
So I cache any FETCH return for 'TTL=period' (limited to max 1800s).
I needed to put some small modification in a few graph definition files as the 'trunk' version of rrdtool doesn't like some of the formatting (meh)
Ok -- so how does it work.
Simple and small single or couple of item (ie mem usage) data stacks work pretty well. It is a bit slower that the native version on a side-by-side vmware, but performance is acceptable and I assume could benefit from some tuning of my hbase, my code, my deployment or some dedicated fast hardware (or a couple of UI hosts behind a ha-proxy/load-balancer)
The composite traffic graphs are a bit of a different beast
A 3750G 24 port switch 'traffic aggregate' swiggle-line or overview graph pulls in 34 RRD files (the ports). It hits IN and OUT octets, and then it sums them.
On a 3750 with 34 ports ___pre-cached___ example (ie it had a query response ready to go) Otsdb-rrdcd = Runtime 0.21555590629578 secs
RRD = Runtime 0.141361951828 secs OTSDB-GRAPH/UI = 38144 points retrieved, 11654 points plotted in 245ms (Query /#start=2013/09/04-19:42:00&m=sum:5m-avg:rate:observium.INOCTETS{host=sw1.d ev1,port=*}&o=&m=sum:5m-avg:rate:observium.OUTOCTETS{host=sw1.dev1,port=*}& o=axis x1y2&yrange=[0:]&key=out bottom center&wxh=1400x600)
On a 7600 with 88 ports with no pre-cache (ie we haven't accessed the data prior we have) Otsdb-rrdcd = Runtime 30.283470869064 secs RRD = Runtime 0.18092679977417 secs OTSDB-GRAPH/UI = 52272 points retrieved, 28838 points plotted in 670ms. (Query http://.../#start=2013/09/04-19:42:00&m=sum:5m-avg:rate:observium.INOCTE... host=7600.devtest,port=*}&o=&m=sum:5m-avg:rate:observium.OUTOCTETS{host=760 0.devtest,port=*}&o=axis%20x1y2&yrange=[0:]&key=out%20bottom%20center&wxh=1 400x600)
Same 7600 if we are getting 'cached' results from otsdb-rrdcd Otsdb-rrcd = Runtime 0.68489003181458 secs
A 2811 with 3 ports with no pre-cache Otsdb-rrdcd = Runtime 0.70869016647339 secs RRD = Runtime 0.12454390525818 secs OTSDB-GRAPH/UI = 1740 points retrieved, 1110 points plotted in 117ms. (Query http://.../#start=2013/09/04-19:42:00&m=sum:5m-avg:rate:observium.INOCTE... host=tr1.adl6,port=*}&o=&m=sum:5m-avg:rate:observium.OUTOCTETS{host=tr1.adl 6,port=*}&o=axis%20x1y2&yrange=[0:]&key=out%20bottom%20center&wxh=1400x600)
As you can see with multiple ports 'stacked' squiggles can bog down.
If you watching what happens RRDtool does a sequential set of 'FETCH' commands (twice for IN and OUT). Each one needs to complete before the other is started (single thread).
I'm not actually as unhappy about this performance as I was initially (you should have seen it doing 5 minute windows over 1 year *cough*) -- there are only a couple of bits on the UI with large-multi-port devices where it make it 'feel' sucky.
Some things I'm thinking about....
Quick Hax Solution #1 ---------------------
In rrdtool_graph if more than 5 RRD's are referenced in the graph then for all the RRD's greater than the first 5 push a (PRE)'FETCH ..filename... CF start end' command to otsdb-rrdcd, then fire off rrdtool.
So whilst the first 5 queries run sequentially all others are running parallel, and once the first 5 are completed the rest should _probably_ be cached, and hence the overall graph will complete faster.
Quick Hax Solution #2 --------------------- In rrdtool_graph if more than 5 RRD's are referenced in the graph then talk to otsdb-rrdcd and say "run one query for all these metrics for this time period and pack the result sets into single-metric cached objects" .. When that completes then run 'rrdtool graph' (all data is pre-cached)
I sort of like this better than #1, as in theory if (for the 7600 example) you take ~700ms of 'graph fetch, munge, and store in cache time' for the combined dataset (all ports), and the ~700ms of rrdtool time (for pre-cached data) it looks like about 1.4 seconds -- I assume a reasonable hbase deployment might see that drop a little, but if not it beats the pants out of 30 seconds :)
Quick Hax Solution #3 --------------------- If the runtime of rrdtool_graph exceeds 5 seconds just kill it and draw a 'Under Construction' graph, or just don't draw graphs hitting more than 5/10 ports.rrd files
Probably Better Solution #1 --------------------------- Have some AJAX graph foo (#1) pulling the metric(s) and drawing the UI graphs.
There are obviously some challenges here since the graph definitions are rrd-tool specific, but not beyond possible (even if just starting with multi-stacked traffic).
There are some probably advantages as well such as . Only getting the metrics you ask for (FETCH makes you get all the DataSources in the RRD) . Being able to determine the 'downsample' size based on the target size of the graph -- or not down sample at all for say 24h graphs (ie compute the resolution of 1px in 'time' and use that for lower AJAX data transfer) . Having some 'interactive' foo (such as point identification ... ie at what time was that spike and what was its real value.. (#2)) . Possible to perhaps have some other 'interactive' foo (like pan left.. Zoom in, zoom out, etc...)
#1) http://code.shutterstock.com/rickshaw/ #2) http://code.shutterstock.com/rickshaw/examples/lines.html
Anyway just thought I would check in.
Every day I stumble on something new in the UI and I smile -- nice work guys.
Cheers, Peter