It is an interesting problem.
If I get a bit of time I will have a look at OpenVPN to the remote node using SSL/TCP which might make life a bit simpler, and make things like 'ping' travel down the same path without any 'code hacks' etc.
I assume for 'remote node' type things there are a couple of reasons why.
1. Administrative domain issues -- your NMS platform for some reason does not have access to device 'X', but can get to some 'server' that does. Ie perhaps you have a 'highly trusted' network segment. I assume in this case really your upstream network should sort this out -- but perhaps something like OpenVPN, or ssh tunnel, or your agent/server framework would provide an acceptable solution (ie a pinhole to the 'remote-proxy-node' via TCP over a single port for some security guy to be happy about. Remote node openvpn would need to NAT the observium source'd traffic)
2. Throughput issues - snmp over udp over long latencies/distances appears to be ... a little sad. It would be interesting to see what devices do SNMP-over-TCP and/or SNMP/TLS(rfc5953) ... Perhaps not many. OpenVPN over SSL/TCP might be a interesting solution, or remote ssh'ing, or other methods of remotely 'doing' things.
3. You want lots of pollers to scale your platform -- um -- potentially non-trivial problem. I assume you could . Make a remote node 'aware' of a central node . Checks for rrd-files via some remote API call to central-node rather than is_file() . Rrd CREATE/UPDATE via rrdcached protocol back to rrdcached instance running on 'central box' . Configured to use central MySQL . Some indicators to the pollers of which hosts should poll on which remote nodes -- personally I don't think that really helps you 'scale' as would assume for deployments hitting size issues RRD-IO is probably hurting
Trawling the net-snmp docs in theory you could run SNMP/TLS to a remote node with the remote snmpd configured to map communities -> remote host, so you would 'switch out' the 'hostname' with the snmp-proxy hostname and appropriate community. Looks a bit messy to configure, and would break your 'ping' tests etc. http://www.net-snmp.org/wiki/index.php/TUT:Using_TLS http://www.net-snmp.org/wiki/index.php/Snmpd_proxy
On 24/07/2014 7:34 pm, "Adam Armstrong" adama@memetic.org wrote:
Hi Peter,
My vague plan is to do this via HTTP with a PHP script proxying SNMP queries.
adam.
On 2014-07-24 03:33, Peter Childs wrote:
We have a host about 320ms from our observium node (yes.. The other side of the planet).
Doing a full poll circa 650-700ms (including mac and bgp tables... Its a IOS-XR ASR9K ..). Adjusting the Cr/max-reps did increase speeds, and caused several nasty snmpd explosions on the chassis.
As a exercise in interest I spun up a VM much closer to the host, unpacked /opt/observium/mibs etc and configured snmp on that remote host.
Then did this
root@obs1:/opt/observium# diff -c includes/snmp.inc.php.orig includes/snmp.inc.php *** includes/snmp.inc.php.orig 2014-07-23 17:43:53.720274321 +0930 --- includes/snmp.inc.php 2014-07-23 17:56:23.092286235 +0930
*** 333,338 **** --- 333,345 ---- // Add the OID(s) to the strong $cmd .= " ".$oids;
- if( $device['hostname'] == 'faraway.router.com' ){
- $cmd = "ssh -i ~x/x.pem x@remote-poller-node -C " .
escapeshellarg($cmd) ;
- }
This was interesting, but the ssh establishment phase for each snmp execution was a lot of overhead.. So I also did this
root@obs1:/opt/observium# cat ~/.ssh/config
Host remote-poller-node # HostName machine1.example.org ControlPath ~/.ssh/controlmasters/%r@%h:%p ControlMaster auto ControlPersist 10m
This means that a ssh 'master instance' was setup at the start of the poll, and hangs around in the background between executions of various snmpcmd's during the poll.
Full poll went from 650-700ms -> ~170ms
Thought this might be of interest to some people.
Cheers, Peter
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium