![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
Attached my 'concept' diagram. Current rrdtool versions can put 'CREATE' and 'UPDATE' messages via rrdcached (or in this case, a replacement) .. 'trunk' versions have 'FETCH' enabled.
Currently the 'concept' code (which is pretty terrible) can do a simple rrdtool create, update, and fetch, and never touches local disk.
I sort of assume that a single node of this would suck vs rrd storage tuned on disk. I need to create some 'real-world' scenarios and perform some appropriate benchmarking.
Here is an interesting talk from the guys @ box regarding their usage of OpenTSDB, with over 350,000/sec avg updates peaking to 1m/s. I assume they size the metric cluster to suite.
http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/hbas econ-2013-opentsdb-at-box-video.html
Will have a bit more of a fiddle...
My 'target' is a service provider network with over 3000 active network elements (not going to count the servers), which currently has multiple metric collection/display systems (think cacti, customised mrtg, cricket, collectd) none of which can handle all the elements, and most which do not gather the comprehensive metrics set that observium currently does out-of-the-box.
Interesting.
I was thinking of trying to replace rrdtool with graphite/ceres, but I don't really want to give up the rrdtool graph generation stuff.
I know of a fair few installs which would benefit from a method of scaling the rrdtool storage throughput across multiple hosts.
adam.