As a bit of thought food I have had some interesting success in decoupling a 'UI' instance of observium from the instance that is 'polling' etc (I will call that the 'main' instance).
I installed another deployment of observium using the same SQL instance as the 'main' instance.
The issues then are access to the RRDtool data.
I complied a 'new' version of rrdtool on both main and 'ui' instance boxes from 'trunk' of rrdtool development. The current trunk has support for the 'FETCH' command from rrdcached (usually rrdcached is only used for caching writes to reduce IO load).
I ran rrdcached instance on the 'main' box, and configured $config['rrdcached'] to point to the IP/port of the main box's rrdcached process.
As rrdcached is looking for 'relative' names to the path it is running in I so I hacked rrdtool.inc.php (see below as example do not apply this patch)
There were various issues with graph syntax that cropped up due to changes in the syntax parsing in the 'trunk' version of RRDtool, but pretty easy to work around those with some minor formatting changes (I'm sure I haven't found them all).
The other issue is that for various composite graphs 'is_file( $rrd_filename )' is called I hacked around this by just dumping all the rrd files via rsync (nasty) on the local box. I assume this could be worked around with a rrd-file-exists() type
function that could say 'yes' for 'remote' rrd files (I'm not sure yet how to determine that via the rrdcached protocol if at all possible
perhaps there are other possible solutions
)
When rrdtool graph now draws a graph it hits rrdcached on the main box, gets all the metrics, and presto
So far I notice that
Misc graph fails see above .. Usually some syntax causing the 'new' parser to get sad
Speed there are noticeable delays in drawing some of the graphs as you whip around the UI. Its not 'glacial', and it is potentially not the end of the world, but I'd like to understand where these delays come from with decoupled rrdtool I have a
feeling that potentially the rrdcached is not dealing with a pile of concurrent requests well there may be some ways to make this better, and/or work around this.
Why oh why?
Currently in my 'day job' we run lots of different systems for different networks doing metrics gathering and display. Most of them are 'topped' out despite large amounts of time over the years optimising them, putting in 'faster' stuff etc. I'd like
a metric(s) backend that can eat whatever we need, and eats more based on adding more commodity hardware.
If my metrics backend can 'talk' rrdcached protocol, and systems like Observium can use this, then I feel like this might be a 'good thing'. Need more polling, just split the pollers. Need more UI goodness, just add more UI nodes behind a http load
balancer.
Some of the new metrics backends have some interesting stuff specifically I'm looking at the 'not-released-yet' OpenTSDB. I'm pretty sure it is not the 'right' thing for simple installations, but for larger platforms it looks appealing.
Thoughts?
Cheers,
Peter
Various hacks
pjchilds@uitest:/opt/observium$ svn diff includes/rrdtool.inc.php
Index: includes/rrdtool.inc.php
===================================================================
--- includes/rrdtool.inc.php (revision 4254)
+++ includes/rrdtool.inc.php (working copy)
@@ -107,6 +107,7 @@
if ($config['rrdcached'])
{
+ $options = str_replace( $config['rrd_dir']."/", '', $options );
fwrite($rrd_pipes[0], "graph --daemon " . $config['rrdcached'] . " $graph_file $options");
} else {
fwrite($rrd_pipes[0], "graph $graph_file $options");
@@ -158,6 +159,7 @@
$cmd = "$command $filename $options";
if ($command != "create" && $config['rrdcached'])
{
+ $options = str_replace( $config['rrd_dir']."/", '', $options );
$cmd .= " --daemon " . $config['rrdcached'];
}