On 29/08/13 6:37 PM, "Tom Laermans" tom.laermans@powersource.cx wrote:
On 08/29/2013 08:32 AM, Adam Armstrong wrote:
Attached my 'concept' diagram. Current rrdtool versions can put 'CREATE' and 'UPDATE' messages via rrdcached (or in this case, a replacement) .. 'trunk' versions have 'FETCH' enabled.
Currently the 'concept' code (which is pretty terrible) can do a simple rrdtool create, update, and fetch, and never touches local disk.
I sort of assume that a single node of this would suck vs rrd storage tuned on disk. I need to create some 'real-world' scenarios and perform some appropriate benchmarking.
Here is an interesting talk from the guys @ box regarding their usage of OpenTSDB, with over 350,000/sec avg updates peaking to 1m/s. I assume they size the metric cluster to suite.
http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/h bas
econ-2013-opentsdb-at-box-video.html
Will have a bit more of a fiddle...
My 'target' is a service provider network with over 3000 active network elements (not going to count the servers), which currently has multiple metric collection/display systems (think cacti, customised mrtg, cricket, collectd) none of which can handle all the elements, and most which do not gather the comprehensive metrics set that observium currently does out-of-the-box.
Interesting.
I was thinking of trying to replace rrdtool with graphite/ceres, but I don't really want to give up the rrdtool graph generation stuff.
I know of a fair few installs which would benefit from a method of scaling the rrdtool storage throughput across multiple hosts.
Simply sharding the rrd directory over multiple NFS servers is not a solution?
(ie rrd/u/ubuntunoob.observium.org..., rrd/s/shisco1.observium.org, etc)
Tom
Two interesting points.
Graphite(the web-ui) and carbon/ceres/whisper(the engine/storage) have some of the same characteristics as rrd
. You aggregate stats over time into buckets, you define these retention characteristics
. It doesn't bring you any availability benefits .. So its processes running on boxes with storage, the availability of that is pushed to another layer (ie if you care you need to HA that box/services and use reliable storage, box-level-ha)
. It doesn't scale at its core.. You can shard the metrics over multiple instances, and the web-ui can access multiple instances .. But its not really horizontally scaling, and each of those instances you need to consider your HA again
I had thought about a distributed clustered shared nothing file-system approach(say ceph) (or you could use multiple NFS nodes) .. I guess typically with my experience a NFS box dies and your hosts mounting them all shit themselves. I assume again you need to consider with NFS type thing how you do the HA component of that.. Does each box need RAID, does each box need capacity management, does each box need a HA failover pair, shared NFS on SAN or something...
I also believe that writing a RRD update is moderately IO intensive, as it updates the various aggregated metrics as you do it, so you are not just adding to one point of the file, but you are writing to various places in the file (I could be wrong on this) -- rrdcached is supposed to help with this by collecting multiple writes to the same file so a single write occurs. If you are going to use rrdcached for update caching you could probably have it (or a proxy in front of it) do the sharding across multiple rrdcached servers, and hence you could probably also use rrdcached for 'reads' and throw out the NFS bit ...
My other thought on backing the filesystem onto a shared 'storage pool' (NFS or other) is you still end up with all your metrics in RRD files, and all your data aggregated, you loose it after your retention period (sometimes I think I would like to see aggregated traffic stats back more than 1 year...)
Having the metrics in something that can be queried 'out-of-box' I think opens interesting possibilities for making it easy to get this data from other systems. For example you might want to
. write a monitor that looks at backhaul link percentage utilisation and generates alarms
. Generate a 'php-weathermap' type interface for large networks, or pipe realtime collection data into http://code.google.com/p/webgl-globe/
. Feed 'annotations' into OpenTSDB against bgp-peer-metrics when you generate BGP policy changes, and have a custom view of prefixes send/received per-peer vs peering traffic
I guess from a 'scale' perspective there are issues involving
. Metric storage
. Metric consumption (ie making the graphs/UI/UX)
. Metric collection (the pollers)
If the capability of the platform scales with the addition of 1RU/2RU boxes then that sounds like a good thing.
If none of the 1RU/2RU boxes are single points of failure thats a good thing.
If I don't need expensive boxes (think Vmware vSphere motion licenses, dual-path'd san fabrics, dual-DC 'smart' switching fabrics) this is also a good thing.
If I can drop a poller in london and the US because SNMP runs like crap over long distances (AU->LON=300ms, AU=>LAX=160ms) that is a 'good thing'
If I can put a 'web-ui' instances (or instances behind a load balancer) in any of our NOC follow the Sun locations that might be a 'good thing' as well (not sure that is going to make much of a difference -- one is 200ms from AU, another 80ms from AU ..)
Just some thoughts I am knocking around.