OBS-2925 - Ping Only Hosts / OBS_SNMP_SKIP flag / snmp_skip attribute
For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.
https://jira.observium.org/browse/OBS-2925
Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.
Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).
Tested:
1. Add/remove/rename ping only hosts via CLI 2. Add/remove ping only hosts via webUI 3. View/interact ping only hosts via webUI - SNMP specific features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again
Things I know kind of don't work right now:
1. Location override - poller/discovery doesn't seem to perform geocoding and whatever else is happening there
Things that could be improved:
1. Unix Agent poller module etc is enabled by default for all hosts, for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.
Totally untested:
1. Use of autodiscovery SNMP skip - should work in theory, unsure if those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??
Patch generated from recent trunk, touches files as below,
[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff
-Colin
Email: cstubbs @ gmail . com
Ohh my dear..
Which really benefit to use Observium for ping_only hosts?
Pls show (screenshots) how this devices displayed on your install.
Colin Stubbs via observium wrote on 03/03/2019 07:43:
For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.
https://jira.observium.org/browse/OBS-2925
Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.
Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).
Tested:
- Add/remove/rename ping only hosts via CLI
- Add/remove ping only hosts via webUI
- View/interact ping only hosts via webUI - SNMP specific features/menus etc are hidden while skip SNMP is enabled
- Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events
- Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again
Things I know kind of don't work right now:
- Location override - poller/discovery doesn't seem to perform geocoding and whatever else is happening there
Things that could be improved:
- Unix Agent poller module etc is enabled by default for all hosts, for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.
Totally untested:
- Use of autodiscovery SNMP skip - should work in theory, unsure if those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??
Patch generated from recent trunk, touches files as below,
[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff
-Colin
Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Hi Mike,
They're in the JIRA ticket, but sure, here's some inserted to this email.
I'm aware of Adam/your preference to reject or simply ignore the regular requests for this kind of feature, and the confusion it causes new users considering Observium for use in their organisations when they find out they can't do the most basic of monitoring to something using Observium.
A great many of your users would gain a lot of value from being able to add a few ping only hosts.
Why build, manage and integrate (to other tools) another ping based monitoring system when all you need is to monitor a handful of devices with ping only; simply because those devices fail to offer SNMP for some reason?
Anything that does have a working SNMP service can and should be monitored with SNMP to gain greater insight into what's happening on it... but not everything does.
No one in their right mind should attempt to use Observium as a system which monitors massive numbers of DNS names or IP's with ping only, there are better systems out there for that, but there are so many situations in a which an Observium install can and should be able to monitor some devices ping only.
I've associated the ticket with the two previous JIRA requests for this feature; which Adam rejected.
They were just feature requests as I recall; no code proposed. Totally understandable you guys didn't want to work on the feature when it's not really what you feel Observium is for.
Rejecting this now when you have just been gifted working code, that modifies things fairly minimally, and simply leverages the ping information that's already occurring to each and every device (that ping is not disabled for), and uses an existing method to mark devices so that further SNMP based polling does not occur... would make no sense.
That said; I totally understand if you want me to fix/add/improve things before considering it further - simply communicate what's necessary and I'll do that.
So, for anyone else on the list, who wants the feature; now is your best chance to add your voice and explain why this kind of feature is of use to you. Because you're probably not going to get it otherwise.
We've been using this internally since early 2017... I just got around to updating Observium to the latest and greatest (reasons, terrible terrible reasons...), and had to update my patch.
But, I'd really prefer not to have to maintain a patch, and I definitely don't want to have to publish it so others can use it and cause Observium Ltd further dramas because of it.
How many hosts do we monitor ping only?
Six. A grand total of six. Everything else has SNMP in some form; and we want as much info as we can extract from them via it - which is why we love Observium. Nothing else compares for SNMP based monitoring IMHO.
Most of those ping only devices are next hop service provider devices just outside our network, for which they won't give us any kind of SNMP access. Ping'ing them offers a small (very small!) degree of insight into link latency/loss from us to them (manual comparison I must admit), and/or potential device problems for the thing that's currently responding for that IP.
Hey look... a perfect use case for this feature...
The others are devices that we use but which simply don't offer SNMP because vendor X didn't bother to add net-snmp to their standard build for CentOS/Ubuntu/whatever weird and wonderful Linux distro they built them on; when they really should have. Don't care much just need to know they're on the network in some way and want a long track record of response time to them.
Hey look... a perfect use case for this feature...
Not surprisingly we don't want to have to operate and maintain a separate system for six IP's. Alerting can be flappy because Observium/fping is only sending a single ICMP echo - we're OK with that and don't even really want more pings which would lead to average values anyway - and we've worked around flaps with delays and automatic suppression of duplicate alerts in our alert management and escalation tool.
The six are monitored by IP; the DNS hostname test below was simply to double check there was no issue with this and a DNS name. Just a hostname that I've never managed to forget and use a lot to ping test connectivity out.
Also, I just realised that the "Skip ping" option is no longer automatically hidden in device settings when "Skip SNMP" is checked; I'll add an updated diff in JIRA shortly after I look at the JS for that again.
[image: Capture.PNG]
[image: observium_ping_only_screens3.PNG]
[image: observium_ping_only_screens4.png]
[image: observium_ping_only_screens5.PNG]
On Sun, 3 Mar 2019 at 18:07, Mike Stupalov mike@observium.org wrote:
Ohh my dear..
Which really benefit to use Observium for ping_only hosts?
Pls show (screenshots) how this devices displayed on your install.
Colin Stubbs via observium wrote on 03/03/2019 07:43:
For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.
https://jira.observium.org/browse/OBS-2925
Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.
Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).
Tested:
- Add/remove/rename ping only hosts via CLI
- Add/remove ping only hosts via webUI
- View/interact ping only hosts via webUI - SNMP specific
features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping
- will trigger alerts for host down/recovery events
- Shifting a previously SNMP contactable host to ping only by ticking
skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again
Things I know kind of don't work right now:
- Location override - poller/discovery doesn't seem to perform
geocoding and whatever else is happening there
Things that could be improved:
- Unix Agent poller module etc is enabled by default for all hosts,
for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.
Totally untested:
- Use of autodiscovery SNMP skip - should work in theory, unsure if
those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??
Patch generated from recent trunk, touches files as below,
[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only
ping_only_hosts.diff
[root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff
-Colin
Email: cstubbs @ gmail . com
observium mailing listobservium@observium.orghttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
-- Mike Stupalov Observium Limited, http://observium.org
This issue appears to have been deleted from Jira. It also appears that multiple people are asking for this feature, which does exist on other platforms, and are all bring told that it's a stupid request.
Add me to the chorus of those who would benefit from this feature:
1. We want to use Observium as *the* platform for our corporate IT team to see the status of our environment, and to generate appropriate alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a rotating group of MSP-provided technicians can easily ramp up to and work with. We believe Observium fits this use case perfectly. 3. There is a logical flow to this request: 1. Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if it's down or latency is high 4. The VPN tunnels between us and our SaaS partners terminate in Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN tunnel is not reachable, we need it to be a device in Observium that uses ping only to determine up/down/latency
I appreciate that people have a wide range of approaches to this issue, including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing. We just moved > 16k physical servers worth of compute, 7PB of storage, and multiple terabits of connectivity to GCP, all within a six-month window. Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which includes corporate IT systems. Everything will migrate to managed services wherever possible, operated by contracted L1 and L2 technicians. Everything lives as Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.
So at the risk of causing the maintainers to dig their heels in more, I would ask you to reconsider your position on this topic, and either vet the original poster's contributed patch, or consider adding a "no snmp" flag in a way that you're comfortable maintaining it.
Aaron
On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:
For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.
https://jira.observium.org/browse/OBS-2925
Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.
Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).
Tested:
- Add/remove/rename ping only hosts via CLI
- Add/remove ping only hosts via webUI
- View/interact ping only hosts via webUI - SNMP specific
features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping
- will trigger alerts for host down/recovery events
- Shifting a previously SNMP contactable host to ping only by ticking
skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again
Things I know kind of don't work right now:
- Location override - poller/discovery doesn't seem to perform
geocoding and whatever else is happening there
Things that could be improved:
- Unix Agent poller module etc is enabled by default for all hosts,
for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.
Totally untested:
- Use of autodiscovery SNMP skip - should work in theory, unsure if
those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??
Patch generated from recent trunk, touches files as below,
[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only
ping_only_hosts.diff
[root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff
-Colin
Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Yeah, that's somewhat common when Adam doesn't like something.
Or possibly it was another developer who is now enrolled in the same attitude/culture.
However, while I am critical of how requests like this are handled, and customer engagement (or the lack thereof) more generally; they have started to add a better way of handling this kind of situation recently.
They've added a feature to utilise Nagios probes, and while you do need to associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments you want to the probe.
This allows you to monitor other hosts in your network using methods and protocols that would otherwise not be supported by Observium's core code without major modifications. It also means you can easily develop your own probes to meet more bespoke requirements quite easily.
We're yet to test and use it properly, but it's looking like it's going to be a good move.
There still does not appear to be any documentation yet though, from my quick check right now.
So you'll likely need to read the code to understand how to use it at present.
JIRA tickets are likely your other best source of info, I had to log one to fix adding a second probe awhile ago: https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22
I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.
e.g. using check_fping to monitor a router by IP,
[image: obs_probe_menu.png]
[image: obs_probes.png]
It appears as though alerting may now exist for this too.
And contrary to screenshot below, from the look of the code the metrics seem to be based on KVP's that the nagios plugin returns, which makes sense.
First scan of that code suggests that they won't be accessible though.
[image: obs_probe_alerts.png]
On Fri, 14 Feb 2020 at 06:48, Aaron Finney aaron.finney@openx.com wrote:
This issue appears to have been deleted from Jira. It also appears that multiple people are asking for this feature, which does exist on other platforms, and are all bring told that it's a stupid request.
Add me to the chorus of those who would benefit from this feature:
- We want to use Observium as *the* platform for our corporate IT
team to see the status of our environment, and to generate appropriate alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a rotating group of MSP-provided technicians can easily ramp up to and work with. We believe Observium fits this use case perfectly. 3. There is a logical flow to this request:
- Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if it's down or latency is high 4. The VPN tunnels between us and our SaaS partners terminate in Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN tunnel is not reachable, we need it to be a device in Observium that uses ping only to determine up/down/latency
I appreciate that people have a wide range of approaches to this issue, including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing. We just moved > 16k physical servers worth of compute, 7PB of storage, and multiple terabits of connectivity to GCP, all within a six-month window. Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which includes corporate IT systems. Everything will migrate to managed services wherever possible, operated by contracted L1 and L2 technicians. Everything lives as Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.
So at the risk of causing the maintainers to dig their heels in more, I would ask you to reconsider your position on this topic, and either vet the original poster's contributed patch, or consider adding a "no snmp" flag in a way that you're comfortable maintaining it.
Aaron
On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:
For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.
https://jira.observium.org/browse/OBS-2925
Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.
Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).
Tested:
- Add/remove/rename ping only hosts via CLI
- Add/remove ping only hosts via webUI
- View/interact ping only hosts via webUI - SNMP specific
features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again
Things I know kind of don't work right now:
- Location override - poller/discovery doesn't seem to perform
geocoding and whatever else is happening there
Things that could be improved:
- Unix Agent poller module etc is enabled by default for all hosts,
for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.
Totally untested:
- Use of autodiscovery SNMP skip - should work in theory, unsure if
those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??
Patch generated from recent trunk, touches files as below,
[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff
-Colin
Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
That's a good callout; I'm using probes for LDAP and SSH checks with alerting, I'll see if I can leverage this for the same purpose, using the Observium instance itself as the host device for the probes. Still a bummer, but better than the alternatives (a separate system, or migrating to another system like LibreNMS).
Thanks,
Aaron
On Thu, Feb 13, 2020 at 3:31 PM Colin Stubbs cstubbs@gmail.com wrote:
Yeah, that's somewhat common when Adam doesn't like something.
Or possibly it was another developer who is now enrolled in the same attitude/culture.
However, while I am critical of how requests like this are handled, and customer engagement (or the lack thereof) more generally; they have started to add a better way of handling this kind of situation recently.
They've added a feature to utilise Nagios probes, and while you do need to associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments you want to the probe.
This allows you to monitor other hosts in your network using methods and protocols that would otherwise not be supported by Observium's core code without major modifications. It also means you can easily develop your own probes to meet more bespoke requirements quite easily.
We're yet to test and use it properly, but it's looking like it's going to be a good move.
There still does not appear to be any documentation yet though, from my quick check right now.
So you'll likely need to read the code to understand how to use it at present.
JIRA tickets are likely your other best source of info, I had to log one to fix adding a second probe awhile ago: https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22
I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.
e.g. using check_fping to monitor a router by IP,
[image: obs_probe_menu.png]
[image: obs_probes.png]
It appears as though alerting may now exist for this too.
And contrary to screenshot below, from the look of the code the metrics seem to be based on KVP's that the nagios plugin returns, which makes sense.
First scan of that code suggests that they won't be accessible though.
[image: obs_probe_alerts.png]
On Fri, 14 Feb 2020 at 06:48, Aaron Finney aaron.finney@openx.com wrote:
This issue appears to have been deleted from Jira. It also appears that multiple people are asking for this feature, which does exist on other platforms, and are all bring told that it's a stupid request.
Add me to the chorus of those who would benefit from this feature:
- We want to use Observium as *the* platform for our corporate IT
team to see the status of our environment, and to generate appropriate alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a rotating group of MSP-provided technicians can easily ramp up to and work with. We believe Observium fits this use case perfectly. 3. There is a logical flow to this request:
- Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if it's down or latency is high 4. The VPN tunnels between us and our SaaS partners terminate in Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN tunnel is not reachable, we need it to be a device in Observium that uses ping only to determine up/down/latency
I appreciate that people have a wide range of approaches to this issue, including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing. We just moved > 16k physical servers worth of compute, 7PB of storage, and multiple terabits of connectivity to GCP, all within a six-month window. Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which includes corporate IT systems. Everything will migrate to managed services wherever possible, operated by contracted L1 and L2 technicians. Everything lives as Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.
So at the risk of causing the maintainers to dig their heels in more, I would ask you to reconsider your position on this topic, and either vet the original poster's contributed patch, or consider adding a "no snmp" flag in a way that you're comfortable maintaining it.
Aaron
On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:
For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.
https://jira.observium.org/browse/OBS-2925
Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.
Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).
Tested:
- Add/remove/rename ping only hosts via CLI
- Add/remove ping only hosts via webUI
- View/interact ping only hosts via webUI - SNMP specific
features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again
Things I know kind of don't work right now:
- Location override - poller/discovery doesn't seem to perform
geocoding and whatever else is happening there
Things that could be improved:
- Unix Agent poller module etc is enabled by default for all hosts,
for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.
Totally untested:
- Use of autodiscovery SNMP skip - should work in theory, unsure if
those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??
Patch generated from recent trunk, touches files as below,
[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff
-Colin
Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
This bit of code was written specifically for a large user (hence no doc yet) right around the time Colin was having a tantrum over having his patch rejected. :)
At the time it was intended to allow monitoring of extra-Devices devices, though I had intended to add a simplified non-Devices device list system, rather than squeezing all of the checks into other devices, before it was announced as the solution to that issue.
This hasn't been completed yet (it'd be a new alerting mode, like syslog, so a bit more involved compared to just a list and a foreach), but I guess it's still usable to put the checks on the observium host itself with overridden hostnames.
Adam.
Sent from BlueMail
On 14 Feb 2020, 00:18, at 00:18, Aaron Finney via observium observium@observium.org wrote:
That's a good callout; I'm using probes for LDAP and SSH checks with alerting, I'll see if I can leverage this for the same purpose, using the Observium instance itself as the host device for the probes. Still a bummer, but better than the alternatives (a separate system, or migrating to another system like LibreNMS).
Thanks,
Aaron
On Thu, Feb 13, 2020 at 3:31 PM Colin Stubbs cstubbs@gmail.com wrote:
Yeah, that's somewhat common when Adam doesn't like something.
Or possibly it was another developer who is now enrolled in the same attitude/culture.
However, while I am critical of how requests like this are handled,
and
customer engagement (or the lack thereof) more generally; they have
started
to add a better way of handling this kind of situation recently.
They've added a feature to utilise Nagios probes, and while you do
need to
associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments
you want
to the probe.
This allows you to monitor other hosts in your network using methods
and
protocols that would otherwise not be supported by Observium's core
code
without major modifications. It also means you can easily develop
your own
probes to meet more bespoke requirements quite easily.
We're yet to test and use it properly, but it's looking like it's
going to
be a good move.
There still does not appear to be any documentation yet though, from
my
quick check right now.
So you'll likely need to read the code to understand how to use it at present.
JIRA tickets are likely your other best source of info, I had to log
one
to fix adding a second probe awhile ago:
https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22
I'm guessing it's in Professional only at this point, but I havn't
checked
the community code so perhaps it's there too.
e.g. using check_fping to monitor a router by IP,
[image: obs_probe_menu.png]
[image: obs_probes.png]
It appears as though alerting may now exist for this too.
And contrary to screenshot below, from the look of the code the
metrics
seem to be based on KVP's that the nagios plugin returns, which makes
sense.
First scan of that code suggests that they won't be accessible
though.
[image: obs_probe_alerts.png]
On Fri, 14 Feb 2020 at 06:48, Aaron Finney aaron.finney@openx.com
wrote:
This issue appears to have been deleted from Jira. It also appears
that
multiple people are asking for this feature, which does exist on
other
platforms, and are all bring told that it's a stupid request.
Add me to the chorus of those who would benefit from this feature:
- We want to use Observium as *the* platform for our corporate
IT
team to see the status of our environment, and to generate
appropriate
alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a
rotating group
of MSP-provided technicians can easily ramp up to and work with.
We believe
Observium fits this use case perfectly. 3. There is a logical flow to this request:
- Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN
with
that we want to monitor for basic reachability, and alert if
it's down or
latency is high 4. The VPN tunnels between us and our SaaS partners terminate
in
Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to
receive
alerts when an endpoint on the far side of a VPN tunnel is not
reachable,
we need it to be a device in Observium that uses ping only to
determine
up/down/latency
I appreciate that people have a wide range of approaches to this
issue,
including third-party applications running via cron with sendmail
for
alerts, but this is antithetical to how our IT landscape is
changing. We
just moved > 16k physical servers worth of compute, 7PB of storage,
and
multiple terabits of connectivity to GCP, all within a six-month
window.
Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which
includes
corporate IT systems. Everything will migrate to managed services
wherever
possible, operated by contracted L1 and L2 technicians. Everything
lives as
Terraform code in github; in fact, nobody even has the permissions
to
instantiate resources manually.
So at the risk of causing the maintainers to dig their heels in
more, I
would ask you to reconsider your position on this topic, and either
vet the
original poster's contributed patch, or consider adding a "no snmp"
flag in
a way that you're comfortable maintaining it.
Aaron
On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:
For the benefit of anyone on the list who doesn't use JIRA... and
also
so that others who support (want/need) this feature can comment.
https://jira.observium.org/browse/OBS-2925
Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have
hosts
that are ping only.
Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).
Tested:
- Add/remove/rename ping only hosts via CLI
- Add/remove ping only hosts via webUI
- View/interact ping only hosts via webUI - SNMP specific
features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type
equals
ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and
remain
available - remove skip SNMP box and SNMP polling begins again
Things I know kind of don't work right now:
- Location override - poller/discovery doesn't seem to perform
geocoding and whatever else is happening there
Things that could be improved:
- Unix Agent poller module etc is enabled by default for all
hosts,
for ping only hosts, perhaps it should be disabled by default?
Will improve
performance by reducing the number of processes that hang while
the 10s
default connect timeout happens.
Totally untested:
- Use of autodiscovery SNMP skip - should work in theory,
unsure if
those parts of the patch should actually be used though. Some
people out
there may actually want to add anything that does respond to
ping and can
be found thru adjacency and routing protocol info etc??
Patch generated from recent trunk, touches files as below,
[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff
-Colin
Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
I have this working using "status equals alert" as the condition; is it possible to use the output or message of the probe for the condition instead? I.e. I'd love to use something like
message regexp CRITICAL(?=.*10.10.1.1)+
to match the specific test for the alert. This way, you could have multiple check_fping checks under the local machine and alert different groups depending on which test fails.
Thanks!
Aaron
On Thu, Feb 13, 2020 at 4:18 PM Aaron Finney aaron.finney@openx.com wrote:
That's a good callout; I'm using probes for LDAP and SSH checks with alerting, I'll see if I can leverage this for the same purpose, using the Observium instance itself as the host device for the probes. Still a bummer, but better than the alternatives (a separate system, or migrating to another system like LibreNMS).
Thanks,
Aaron
On Thu, Feb 13, 2020 at 3:31 PM Colin Stubbs cstubbs@gmail.com wrote:
Yeah, that's somewhat common when Adam doesn't like something.
Or possibly it was another developer who is now enrolled in the same attitude/culture.
However, while I am critical of how requests like this are handled, and customer engagement (or the lack thereof) more generally; they have started to add a better way of handling this kind of situation recently.
They've added a feature to utilise Nagios probes, and while you do need to associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments you want to the probe.
This allows you to monitor other hosts in your network using methods and protocols that would otherwise not be supported by Observium's core code without major modifications. It also means you can easily develop your own probes to meet more bespoke requirements quite easily.
We're yet to test and use it properly, but it's looking like it's going to be a good move.
There still does not appear to be any documentation yet though, from my quick check right now.
So you'll likely need to read the code to understand how to use it at present.
JIRA tickets are likely your other best source of info, I had to log one to fix adding a second probe awhile ago: https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22
I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.
e.g. using check_fping to monitor a router by IP,
[image: obs_probe_menu.png]
[image: obs_probes.png]
It appears as though alerting may now exist for this too.
And contrary to screenshot below, from the look of the code the metrics seem to be based on KVP's that the nagios plugin returns, which makes sense.
First scan of that code suggests that they won't be accessible though.
[image: obs_probe_alerts.png]
On Fri, 14 Feb 2020 at 06:48, Aaron Finney aaron.finney@openx.com wrote:
This issue appears to have been deleted from Jira. It also appears that multiple people are asking for this feature, which does exist on other platforms, and are all bring told that it's a stupid request.
Add me to the chorus of those who would benefit from this feature:
- We want to use Observium as *the* platform for our corporate IT
team to see the status of our environment, and to generate appropriate alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a rotating group of MSP-provided technicians can easily ramp up to and work with. We believe Observium fits this use case perfectly. 3. There is a logical flow to this request:
- Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if it's down or latency is high 4. The VPN tunnels between us and our SaaS partners terminate in Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN tunnel is not reachable, we need it to be a device in Observium that uses ping only to determine up/down/latency
I appreciate that people have a wide range of approaches to this issue, including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing. We just moved > 16k physical servers worth of compute, 7PB of storage, and multiple terabits of connectivity to GCP, all within a six-month window. Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which includes corporate IT systems. Everything will migrate to managed services wherever possible, operated by contracted L1 and L2 technicians. Everything lives as Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.
So at the risk of causing the maintainers to dig their heels in more, I would ask you to reconsider your position on this topic, and either vet the original poster's contributed patch, or consider adding a "no snmp" flag in a way that you're comfortable maintaining it.
Aaron
On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:
For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.
https://jira.observium.org/browse/OBS-2925
Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.
Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).
Tested:
- Add/remove/rename ping only hosts via CLI
- Add/remove ping only hosts via webUI
- View/interact ping only hosts via webUI - SNMP specific
features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again
Things I know kind of don't work right now:
- Location override - poller/discovery doesn't seem to perform
geocoding and whatever else is happening there
Things that could be improved:
- Unix Agent poller module etc is enabled by default for all
hosts, for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.
Totally untested:
- Use of autodiscovery SNMP skip - should work in theory, unsure
if those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??
Patch generated from recent trunk, touches files as below,
[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff
-Colin
Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
I can add that tomorrow.
Adam.
Sent from BlueMail
On 14 Feb 2020, 17:54, at 17:54, Aaron Finney via observium observium@observium.org wrote:
I have this working using "status equals alert" as the condition; is it possible to use the output or message of the probe for the condition instead? I.e. I'd love to use something like
message regexp CRITICAL(?=.*10.10.1.1)+
to match the specific test for the alert. This way, you could have multiple check_fping checks under the local machine and alert different groups depending on which test fails.
Thanks!
Aaron
On Thu, Feb 13, 2020 at 4:18 PM Aaron Finney aaron.finney@openx.com wrote:
That's a good callout; I'm using probes for LDAP and SSH checks with alerting, I'll see if I can leverage this for the same purpose, using
the
Observium instance itself as the host device for the probes. Still a bummer, but better than the alternatives (a separate system, or
migrating
to another system like LibreNMS).
Thanks,
Aaron
On Thu, Feb 13, 2020 at 3:31 PM Colin Stubbs cstubbs@gmail.com
wrote:
Yeah, that's somewhat common when Adam doesn't like something.
Or possibly it was another developer who is now enrolled in the same attitude/culture.
However, while I am critical of how requests like this are handled,
and
customer engagement (or the lack thereof) more generally; they have
started
to add a better way of handling this kind of situation recently.
They've added a feature to utilise Nagios probes, and while you do
need
to associate the probe config to an existing host in Observium for
alert
association purposes, you can override and pass whatever arguments
you want
to the probe.
This allows you to monitor other hosts in your network using methods
and
protocols that would otherwise not be supported by Observium's core
code
without major modifications. It also means you can easily develop
your own
probes to meet more bespoke requirements quite easily.
We're yet to test and use it properly, but it's looking like it's
going
to be a good move.
There still does not appear to be any documentation yet though, from
my
quick check right now.
So you'll likely need to read the code to understand how to use it
at
present.
JIRA tickets are likely your other best source of info, I had to log
one
to fix adding a second probe awhile ago:
https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22
I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.
e.g. using check_fping to monitor a router by IP,
[image: obs_probe_menu.png]
[image: obs_probes.png]
It appears as though alerting may now exist for this too.
And contrary to screenshot below, from the look of the code the
metrics
seem to be based on KVP's that the nagios plugin returns, which
makes sense.
First scan of that code suggests that they won't be accessible
though.
[image: obs_probe_alerts.png]
On Fri, 14 Feb 2020 at 06:48, Aaron Finney aaron.finney@openx.com wrote:
This issue appears to have been deleted from Jira. It also appears
that
multiple people are asking for this feature, which does exist on
other
platforms, and are all bring told that it's a stupid request.
Add me to the chorus of those who would benefit from this feature:
- We want to use Observium as *the* platform for our corporate
IT
team to see the status of our environment, and to generate
appropriate
alerts 2. We have aggressively moved away from FTEs for our corporate
IT
staff, and need to provide a simplified environment that a
rotating group
of MSP-provided technicians can easily ramp up to and work with.
We believe
Observium fits this use case perfectly. 3. There is a logical flow to this request:
- Observium is our sole monitoring and alerting platform for
our
entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN
with
that we want to monitor for basic reachability, and alert if
it's down or
latency is high 4. The VPN tunnels between us and our SaaS partners terminate
in
Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN
tunnel is not
reachable, we need it to be a device in Observium that uses
ping only to
determine up/down/latency
I appreciate that people have a wide range of approaches to this
issue,
including third-party applications running via cron with sendmail
for
alerts, but this is antithetical to how our IT landscape is
changing. We
just moved > 16k physical servers worth of compute, 7PB of storage,
and
multiple terabits of connectivity to GCP, all within a six-month
window.
Now that our production environment is migrated and our data
centers
decommissioned, we are cleaning up the rest of the mess, which
includes
corporate IT systems. Everything will migrate to managed services
wherever
possible, operated by contracted L1 and L2 technicians. Everything
lives as
Terraform code in github; in fact, nobody even has the permissions
to
instantiate resources manually.
So at the risk of causing the maintainers to dig their heels in
more, I
would ask you to reconsider your position on this topic, and either
vet the
original poster's contributed patch, or consider adding a "no snmp"
flag in
a way that you're comfortable maintaining it.
Aaron
On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:
For the benefit of anyone on the list who doesn't use JIRA... and
also
so that others who support (want/need) this feature can comment.
https://jira.observium.org/browse/OBS-2925
Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip
device
attribute, similar to OBS_PING_SKIP and ping_skip, in order to
have hosts
that are ping only.
Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled
(untested).
Tested:
- Add/remove/rename ping only hosts via CLI
- Add/remove ping only hosts via webUI
- View/interact ping only hosts via webUI - SNMP specific
features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type
equals
ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and
remain
available - remove skip SNMP box and SNMP polling begins again
Things I know kind of don't work right now:
- Location override - poller/discovery doesn't seem to perform
geocoding and whatever else is happening there
Things that could be improved:
- Unix Agent poller module etc is enabled by default for all
hosts, for ping only hosts, perhaps it should be disabled by
default? Will
improve performance by reducing the number of processes that
hang while the
10s default connect timeout happens.
Totally untested:
- Use of autodiscovery SNMP skip - should work in theory,
unsure
if those parts of the patch should actually be used though.
Some people out
there may actually want to add anything that does respond to
ping and can
be found thru adjacency and routing protocol info etc??
Patch generated from recent trunk, touches files as below,
[root@desktop observium]# diff -r -u observium-trunk root | grep
-v
^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff
-Colin
Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Awesome, that would be super useful for us. Thanks!
On Fri, Feb 14, 2020 at 1:21 PM Adam Armstrong via observium < observium@observium.org> wrote:
I can add that tomorrow.
Adam.
Sent from BlueMail http://www.bluemail.me/r?b=15774 On 14 Feb 2020, at 17:54, Aaron Finney via observium < observium@observium.org> wrote:
I have this working using "status equals alert" as the condition; is it possible to use the output or message of the probe for the condition instead? I.e. I'd love to use something like
message regexp CRITICAL(?=.*10.10.1.1)+
to match the specific test for the alert. This way, you could have multiple check_fping checks under the local machine and alert different groups depending on which test fails.
Thanks!
Aaron
On Thu, Feb 13, 2020 at 4:18 PM Aaron Finney aaron.finney@openx.com wrote:
That's a good callout; I'm using probes for LDAP and SSH checks with alerting, I'll see if I can leverage this for the same purpose, using the Observium instance itself as the host device for the probes. Still a bummer, but better than the alternatives (a separate system, or migrating to another system like LibreNMS).
Thanks,
Aaron
On Thu, Feb 13, 2020 at 3:31 PM Colin Stubbs cstubbs@gmail.com wrote:
Yeah, that's somewhat common when Adam doesn't like something.
Or possibly it was another developer who is now enrolled in the same attitude/culture.
However, while I am critical of how requests like this are handled, and customer engagement (or the lack thereof) more generally; they have started to add a better way of handling this kind of situation recently.
They've added a feature to utilise Nagios probes, and while you do need to associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments you want to the probe.
This allows you to monitor other hosts in your network using methods and protocols that would otherwise not be supported by Observium's core code without major modifications. It also means you can easily develop your own probes to meet more bespoke requirements quite easily.
We're yet to test and use it properly, but it's looking like it's going to be a good move.
There still does not appear to be any documentation yet though, from my quick check right now.
So you'll likely need to read the code to understand how to use it at present.
JIRA tickets are likely your other best source of info, I had to log one to fix adding a second probe awhile ago: https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22
I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.
e.g. using check_fping to monitor a router by IP,
[image: obs_probe_menu.png]
[image: obs_probes.png]
It appears as though alerting may now exist for this too.
And contrary to screenshot below, from the look of the code the metrics seem to be based on KVP's that the nagios plugin returns, which makes sense.
First scan of that code suggests that they won't be accessible though.
[image: obs_probe_alerts.png]
On Fri, 14 Feb 2020 at 06:48, Aaron Finney aaron.finney@openx.com wrote:
This issue appears to have been deleted from Jira. It also appears that multiple people are asking for this feature, which does exist on other platforms, and are all bring told that it's a stupid request.
Add me to the chorus of those who would benefit from this feature:
- We want to use Observium as *the* platform for our corporate IT
team to see the status of our environment, and to generate appropriate alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a rotating group of MSP-provided technicians can easily ramp up to and work with. We believe Observium fits this use case perfectly. 3. There is a logical flow to this request:
- Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if it's down or latency is high 4. The VPN tunnels between us and our SaaS partners terminate in Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN tunnel is not reachable, we need it to be a device in Observium that uses ping only to determine up/down/latency
I appreciate that people have a wide range of approaches to this issue, including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing. We just moved > 16k physical servers worth of compute, 7PB of storage, and multiple terabits of connectivity to GCP, all within a six-month window. Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which includes corporate IT systems. Everything will migrate to managed services wherever possible, operated by contracted L1 and L2 technicians. Everything lives as Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.
So at the risk of causing the maintainers to dig their heels in more, I would ask you to reconsider your position on this topic, and either vet the original poster's contributed patch, or consider adding a "no snmp" flag in a way that you're comfortable maintaining it.
Aaron
On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:
For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.
https://jira.observium.org/browse/OBS-2925
Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.
Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).
Tested:
- Add/remove/rename ping only hosts via CLI
- Add/remove ping only hosts via webUI
- View/interact ping only hosts via webUI - SNMP specific
features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again
Things I know kind of don't work right now:
- Location override - poller/discovery doesn't seem to perform
geocoding and whatever else is happening there
Things that could be improved:
- Unix Agent poller module etc is enabled by default for all
hosts, for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.
Totally untested:
- Use of autodiscovery SNMP skip - should work in theory, unsure
if those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??
Patch generated from recent trunk, touches files as below,
[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff
-Colin
Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Should be available in r10282.
Adam.
From: observium observium-bounces@observium.org On Behalf Of Aaron Finney via observium Sent: 14 February 2020 21:37 To: Observium observium@observium.org Cc: Aaron Finney aaron.finney@openx.com; Adam Armstrong adama@memetic.org Subject: Re: [Observium] OBS-2925 - Ping Only Hosts / OBS_SNMP_SKIP flag / snmp_skip attribute
Awesome, that would be super useful for us. Thanks!
On Fri, Feb 14, 2020 at 1:21 PM Adam Armstrong via observium <observium@observium.org mailto:observium@observium.org > wrote:
I can add that tomorrow.
Adam.
Sent from BlueMail http://www.bluemail.me/r?b=15774
On 14 Feb 2020, at 17:54, Aaron Finney via observium <observium@observium.org mailto:observium@observium.org > wrote:
I have this working using "status equals alert" as the condition; is it possible to use the output or message of the probe for the condition instead? I.e. I'd love to use something like
message regexp CRITICAL(?=.*10.10.1.1)+
to match the specific test for the alert. This way, you could have multiple check_fping checks under the local machine and alert different groups depending on which test fails.
Thanks!
Aaron
On Thu, Feb 13, 2020 at 4:18 PM Aaron Finney <aaron.finney@openx.com mailto:aaron.finney@openx.com > wrote:
That's a good callout; I'm using probes for LDAP and SSH checks with alerting, I'll see if I can leverage this for the same purpose, using the Observium instance itself as the host device for the probes. Still a bummer, but better than the alternatives (a separate system, or migrating to another system like LibreNMS).
Thanks,
Aaron
On Thu, Feb 13, 2020 at 3:31 PM Colin Stubbs <cstubbs@gmail.com mailto:cstubbs@gmail.com > wrote:
Yeah, that's somewhat common when Adam doesn't like something.
Or possibly it was another developer who is now enrolled in the same attitude/culture.
However, while I am critical of how requests like this are handled, and customer engagement (or the lack thereof) more generally; they have started to add a better way of handling this kind of situation recently.
They've added a feature to utilise Nagios probes, and while you do need to associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments you want to the probe.
This allows you to monitor other hosts in your network using methods and protocols that would otherwise not be supported by Observium's core code without major modifications. It also means you can easily develop your own probes to meet more bespoke requirements quite easily.
We're yet to test and use it properly, but it's looking like it's going to be a good move.
There still does not appear to be any documentation yet though, from my quick check right now.
So you'll likely need to read the code to understand how to use it at present.
JIRA tickets are likely your other best source of info, I had to log one to fix adding a second probe awhile ago: https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22
I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.
e.g. using check_fping to monitor a router by IP,
It appears as though alerting may now exist for this too.
And contrary to screenshot below, from the look of the code the metrics seem to be based on KVP's that the nagios plugin returns, which makes sense.
First scan of that code suggests that they won't be accessible though.
On Fri, 14 Feb 2020 at 06:48, Aaron Finney <aaron.finney@openx.com mailto:aaron.finney@openx.com > wrote:
This issue appears to have been deleted from Jira. It also appears that multiple people are asking for this feature, which does exist on other platforms, and are all bring told that it's a stupid request.
Add me to the chorus of those who would benefit from this feature:
1. We want to use Observium as *the* platform for our corporate IT team to see the status of our environment, and to generate appropriate alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a rotating group of MSP-provided technicians can easily ramp up to and work with. We believe Observium fits this use case perfectly. 3. There is a logical flow to this request:
1. Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if it's down or latency is high 4. The VPN tunnels between us and our SaaS partners terminate in Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN tunnel is not reachable, we need it to be a device in Observium that uses ping only to determine up/down/latency
I appreciate that people have a wide range of approaches to this issue, including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing. We just moved > 16k physical servers worth of compute, 7PB of storage, and multiple terabits of connectivity to GCP, all within a six-month window. Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which includes corporate IT systems. Everything will migrate to managed services wherever possible, operated by contracted L1 and L2 technicians. Everything lives as Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.
So at the risk of causing the maintainers to dig their heels in more, I would ask you to reconsider your position on this topic, and either vet the original poster's contributed patch, or consider adding a "no snmp" flag in a way that you're comfortable maintaining it.
Aaron
On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium <observium@observium.org mailto:observium@observium.org > wrote:
For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.
https://jira.observium.org/browse/OBS-2925
Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.
Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).
Tested:
1. Add/remove/rename ping only hosts via CLI
2. Add/remove ping only hosts via webUI
3. View/interact ping only hosts via webUI - SNMP specific features/menus etc are hidden while skip SNMP is enabled
4. Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events
5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again
Things I know kind of don't work right now:
1. Location override - poller/discovery doesn't seem to perform geocoding and whatever else is happening there
Things that could be improved:
1. Unix Agent poller module etc is enabled by default for all hosts, for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.
Totally untested:
1. Use of autodiscovery SNMP skip - should work in theory, unsure if those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??
Patch generated from recent trunk, touches files as below,
[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff
-Colin
Email: cstubbs @ gmail . com
_______________________________________________ observium mailing list observium@observium.org mailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Tested, works perfectly. Many thanks for the guidance (Colin) and the fix (Adam). Just one big, happy, dysfunctional family. :D
On Sat, Feb 15, 2020 at 1:58 AM Adam Armstrong via observium < observium@observium.org> wrote:
Should be available in r10282.
Adam.
*From:* observium observium-bounces@observium.org *On Behalf Of *Aaron Finney via observium *Sent:* 14 February 2020 21:37 *To:* Observium observium@observium.org *Cc:* Aaron Finney aaron.finney@openx.com; Adam Armstrong < adama@memetic.org> *Subject:* Re: [Observium] OBS-2925 - Ping Only Hosts / OBS_SNMP_SKIP flag / snmp_skip attribute
Awesome, that would be super useful for us. Thanks!
On Fri, Feb 14, 2020 at 1:21 PM Adam Armstrong via observium < observium@observium.org> wrote:
I can add that tomorrow.
Adam.
Sent from BlueMail http://www.bluemail.me/r?b=15774
On 14 Feb 2020, at 17:54, Aaron Finney via observium < observium@observium.org> wrote:
I have this working using "status equals alert" as the condition; is it possible to use the output or message of the probe for the condition instead? I.e. I'd love to use something like
message regexp CRITICAL(?=.*10.10.1.1)+
to match the specific test for the alert. This way, you could have multiple check_fping checks under the local machine and alert different groups depending on which test fails.
Thanks!
Aaron
On Thu, Feb 13, 2020 at 4:18 PM Aaron Finney aaron.finney@openx.com wrote:
That's a good callout; I'm using probes for LDAP and SSH checks with alerting, I'll see if I can leverage this for the same purpose, using the Observium instance itself as the host device for the probes. Still a bummer, but better than the alternatives (a separate system, or migrating to another system like LibreNMS).
Thanks,
Aaron
On Thu, Feb 13, 2020 at 3:31 PM Colin Stubbs cstubbs@gmail.com wrote:
Yeah, that's somewhat common when Adam doesn't like something.
Or possibly it was another developer who is now enrolled in the same attitude/culture.
However, while I am critical of how requests like this are handled, and customer engagement (or the lack thereof) more generally; they have started to add a better way of handling this kind of situation recently.
They've added a feature to utilise Nagios probes, and while you do need to associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments you want to the probe.
This allows you to monitor other hosts in your network using methods and protocols that would otherwise not be supported by Observium's core code without major modifications. It also means you can easily develop your own probes to meet more bespoke requirements quite easily.
We're yet to test and use it properly, but it's looking like it's going to be a good move.
There still does not appear to be any documentation yet though, from my quick check right now.
So you'll likely need to read the code to understand how to use it at present.
JIRA tickets are likely your other best source of info, I had to log one to fix adding a second probe awhile ago: https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22
I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.
e.g. using check_fping to monitor a router by IP,
It appears as though alerting may now exist for this too.
And contrary to screenshot below, from the look of the code the metrics seem to be based on KVP's that the nagios plugin returns, which makes sense.
First scan of that code suggests that they won't be accessible though.
On Fri, 14 Feb 2020 at 06:48, Aaron Finney aaron.finney@openx.com wrote:
This issue appears to have been deleted from Jira. It also appears that multiple people are asking for this feature, which does exist on other platforms, and are all bring told that it's a stupid request.
Add me to the chorus of those who would benefit from this feature:
- We want to use Observium as *the* platform for our corporate IT
team to see the status of our environment, and to generate appropriate alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a rotating group of MSP-provided technicians can easily ramp up to and work with. We believe Observium fits this use case perfectly. 3. There is a logical flow to this request:
- Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if it's down or latency is high 4. The VPN tunnels between us and our SaaS partners terminate in Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN tunnel is not reachable, we need it to be a device in Observium that uses ping only to determine up/down/latency
I appreciate that people have a wide range of approaches to this issue, including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing. We just moved > 16k physical servers worth of compute, 7PB of storage, and multiple terabits of connectivity to GCP, all within a six-month window. Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which includes corporate IT systems. Everything will migrate to managed services wherever possible, operated by contracted L1 and L2 technicians. Everything lives as Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.
So at the risk of causing the maintainers to dig their heels in more, I would ask you to reconsider your position on this topic, and either vet the original poster's contributed patch, or consider adding a "no snmp" flag in a way that you're comfortable maintaining it.
Aaron
On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:
For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.
https://jira.observium.org/browse/OBS-2925
Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.
Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).
Tested:
Add/remove/rename ping only hosts via CLI
Add/remove ping only hosts via webUI
View/interact ping only hosts via webUI - SNMP specific
features/menus etc are hidden while skip SNMP is enabled
Alerting - device_status equals 0 && device_status_type equals
ping - will trigger alerts for host down/recovery events
Shifting a previously SNMP contactable host to ping only by
ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again
Things I know kind of don't work right now:
Location override - poller/discovery doesn't seem to perform
geocoding and whatever else is happening there
Things that could be improved:
Unix Agent poller module etc is enabled by default for all hosts,
for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.
Totally untested:
Use of autodiscovery SNMP skip - should work in theory, unsure if
those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??
Patch generated from recent trunk, touches files as below,
[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only
ping_only_hosts.diff
[root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff
-Colin
Email: cstubbs @ gmail . com
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
You ignored my reply on that ticket explaining how it'd need to be done if we were to accept a patch for it, so...
Adam.
Sent from BlueMail
On 13 Feb 2020, 23:31, at 23:31, Colin Stubbs via observium observium@observium.org wrote:
Yeah, that's somewhat common when Adam doesn't like something.
Or possibly it was another developer who is now enrolled in the same attitude/culture.
However, while I am critical of how requests like this are handled, and customer engagement (or the lack thereof) more generally; they have started to add a better way of handling this kind of situation recently.
They've added a feature to utilise Nagios probes, and while you do need to associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments you want to the probe.
This allows you to monitor other hosts in your network using methods and protocols that would otherwise not be supported by Observium's core code without major modifications. It also means you can easily develop your own probes to meet more bespoke requirements quite easily.
We're yet to test and use it properly, but it's looking like it's going to be a good move.
There still does not appear to be any documentation yet though, from my quick check right now.
So you'll likely need to read the code to understand how to use it at present.
JIRA tickets are likely your other best source of info, I had to log one to fix adding a second probe awhile ago: https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22
I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.
e.g. using check_fping to monitor a router by IP,
[image: obs_probe_menu.png]
[image: obs_probes.png]
It appears as though alerting may now exist for this too.
And contrary to screenshot below, from the look of the code the metrics seem to be based on KVP's that the nagios plugin returns, which makes sense.
First scan of that code suggests that they won't be accessible though.
[image: obs_probe_alerts.png]
On Fri, 14 Feb 2020 at 06:48, Aaron Finney aaron.finney@openx.com wrote:
This issue appears to have been deleted from Jira. It also appears
that
multiple people are asking for this feature, which does exist on
other
platforms, and are all bring told that it's a stupid request.
Add me to the chorus of those who would benefit from this feature:
- We want to use Observium as *the* platform for our corporate IT
team to see the status of our environment, and to generate
appropriate
alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a
rotating group
of MSP-provided technicians can easily ramp up to and work with.
We believe
Observium fits this use case perfectly. 3. There is a logical flow to this request:
- Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if
it's down or
latency is high 4. The VPN tunnels between us and our SaaS partners terminate
in
Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to
receive
alerts when an endpoint on the far side of a VPN tunnel is not
reachable,
we need it to be a device in Observium that uses ping only to
determine
up/down/latency
I appreciate that people have a wide range of approaches to this
issue,
including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing.
We
just moved > 16k physical servers worth of compute, 7PB of storage,
and
multiple terabits of connectivity to GCP, all within a six-month
window.
Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which
includes
corporate IT systems. Everything will migrate to managed services
wherever
possible, operated by contracted L1 and L2 technicians. Everything
lives as
Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.
So at the risk of causing the maintainers to dig their heels in more,
I
would ask you to reconsider your position on this topic, and either
vet the
original poster's contributed patch, or consider adding a "no snmp"
flag in
a way that you're comfortable maintaining it.
Aaron
On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:
For the benefit of anyone on the list who doesn't use JIRA... and
also so
that others who support (want/need) this feature can comment.
https://jira.observium.org/browse/OBS-2925
Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have
hosts
that are ping only.
Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).
Tested:
- Add/remove/rename ping only hosts via CLI
- Add/remove ping only hosts via webUI
- View/interact ping only hosts via webUI - SNMP specific
features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and
remain
available - remove skip SNMP box and SNMP polling begins again
Things I know kind of don't work right now:
- Location override - poller/discovery doesn't seem to perform
geocoding and whatever else is happening there
Things that could be improved:
- Unix Agent poller module etc is enabled by default for all
hosts,
for ping only hosts, perhaps it should be disabled by default?
Will improve
performance by reducing the number of processes that hang while
the 10s
default connect timeout happens.
Totally untested:
- Use of autodiscovery SNMP skip - should work in theory, unsure
if
those parts of the patch should actually be used though. Some
people out
there may actually want to add anything that does respond to ping
and can
be found thru adjacency and routing protocol info etc??
Patch generated from recent trunk, touches files as below,
[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff
-Colin
Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
No, I read that, but I'm not part of the development team nor can I be.
You should be taking that kind of prototype and feedback from a customer, defining how it should be really done, and delegating that to the development team.
If the JIRA ticket has actually been deleted, and now I've logged in a checked, it definitely appears to have been; that was clearly your choice to remove evidence of the request. It didn't need to be deleted, simply closed as fixed or rejected.
Hell it could even just be left as unresolved, another feature some annoying customer keeps asking you for, like all the others in there.
A good reference point when in future someone finally decides that it might be a feature worthy of developing.
Or, a ticket that someone can respond to and say "Hey, for the original reporter and anyone reading who wants this, did you know we've developed this feature that addresses this kind of problem but in a better way." as with the Nagios probes code that's now in there.
This is what I mean by lack of engagement.
Your seem to think your customers are annoying lepers who constantly bother you with demands to make the product do insane things.
Which is not the case... they're reporting legitimate problems that they need to solve. Solving those is valuable.
Observium as a product, and your business behind it, would go bananas if you simply addressed those core problems that people need.
-Colin
On Fri, 14 Feb 2020 at 16:44, Adam Armstrong via observium < observium@observium.org> wrote:
You ignored my reply on that ticket explaining how it'd need to be done if we were to accept a patch for it, so...
Adam.
Sent from BlueMail http://www.bluemail.me/r?b=15774 On 13 Feb 2020, at 23:31, Colin Stubbs via observium < observium@observium.org> wrote:
Yeah, that's somewhat common when Adam doesn't like something.
Or possibly it was another developer who is now enrolled in the same attitude/culture.
However, while I am critical of how requests like this are handled, and customer engagement (or the lack thereof) more generally; they have started to add a better way of handling this kind of situation recently.
They've added a feature to utilise Nagios probes, and while you do need to associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments you want to the probe.
This allows you to monitor other hosts in your network using methods and protocols that would otherwise not be supported by Observium's core code without major modifications. It also means you can easily develop your own probes to meet more bespoke requirements quite easily.
We're yet to test and use it properly, but it's looking like it's going to be a good move.
There still does not appear to be any documentation yet though, from my quick check right now.
So you'll likely need to read the code to understand how to use it at present.
JIRA tickets are likely your other best source of info, I had to log one to fix adding a second probe awhile ago: https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22
I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.
e.g. using check_fping to monitor a router by IP,
[image: obs_probe_menu.png]
[image: obs_probes.png]
It appears as though alerting may now exist for this too.
And contrary to screenshot below, from the look of the code the metrics seem to be based on KVP's that the nagios plugin returns, which makes sense.
First scan of that code suggests that they won't be accessible though.
[image: obs_probe_alerts.png]
On Fri, 14 Feb 2020 at 06:48, Aaron Finney < aaron.finney@openx.com> wrote:
This issue appears to have been deleted from Jira. It also appears that multiple people are asking for this feature, which does exist on other platforms, and are all bring told that it's a stupid request.
Add me to the chorus of those who would benefit from this feature:
- We want to use Observium as *the* platform for our corporate IT
team to see the status of our environment, and to generate appropriate alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a rotating group of MSP-provided technicians can easily ramp up to and work with. We believe Observium fits this use case perfectly. 3. There is a logical flow to this request:
- Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if it's down or latency is high 4. The VPN tunnels between us and our SaaS partners terminate in Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN tunnel is not reachable, we need it to be a device in Observium that uses ping only to determine up/down/latency
I appreciate that people have a wide range of approaches to this issue, including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing. We just moved > 16k physical servers worth of compute, 7PB of storage, and multiple terabits of connectivity to GCP, all within a six-month window. Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which includes corporate IT systems. Everything will migrate to managed services wherever possible, operated by contracted L1 and L2 technicians. Everything lives as Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.
So at the risk of causing the maintainers to dig their heels in more, I would ask you to reconsider your position on this topic, and either vet the original poster's contributed patch, or consider adding a "no snmp" flag in a way that you're comfortable maintaining it.
Aaron
On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:
For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.
https://jira.observium.org/browse/OBS-2925
Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.
Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).
Tested:
- Add/remove/rename ping only hosts via CLI
- Add/remove ping only hosts via webUI
- View/interact ping only hosts via webUI - SNMP specific
features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again
Things I know kind of don't work right now:
- Location override - poller/discovery doesn't seem to perform
geocoding and whatever else is happening there
Things that could be improved:
- Unix Agent poller module etc is enabled by default for all
hosts, for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.
Totally untested:
- Use of autodiscovery SNMP skip - should work in theory, unsure
if those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??
Patch generated from recent trunk, touches files as below,
[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff
-Colin
Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
--
*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
participants (5)
-
Aaron Finney
-
Adam Armstrong
-
adama@memetic.org
-
Colin Stubbs
-
Mike Stupalov