OBS-2925 - Ping Only Hosts / OBS_SNMP_SKIP flag / snmp_skip attribute - observium

OBS-2925 - Ping Only Hosts / OBS_SNMP_SKIP flag / snmp_skip attribute

Colin Stubbs

2 Mar 2019 2 Mar '19

11:43 p.m.

For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.

https://jira.observium.org/browse/OBS-2925

Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.

Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).

Tested:

1. Add/remove/rename ping only hosts via CLI 2. Add/remove ping only hosts via webUI 3. View/interact ping only hosts via webUI - SNMP specific features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again

Things I know kind of don't work right now:

1. Location override - poller/discovery doesn't seem to perform geocoding and whatever else is happening there

Things that could be improved:

1. Unix Agent poller module etc is enabled by default for all hosts, for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.

Totally untested:

1. Use of autodiscovery SNMP skip - should work in theory, unsure if those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??

Patch generated from recent trunk, touches files as below,

[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff

-Colin

Email: cstubbs @ gmail . com

Attachments:

attachment.html (text/html — 6.3 KB)
ping_only_hosts_r9704.diff (application/octet-stream — 51.4 KB)

Show replies by date

Mike Stupalov

3 Mar 3 Mar

3:07 a.m.

Ohh my dear..

Which really benefit to use Observium for ping_only hosts?

Pls show (screenshots) how this devices displayed on your install.

Colin Stubbs via observium wrote on 03/03/2019 07:43:

...

For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.

https://jira.observium.org/browse/OBS-2925

Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.

Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).

Tested:

Add/remove/rename ping only hosts via CLI

Add/remove ping only hosts via webUI

View/interact ping only hosts via webUI - SNMP specific features/menus etc are hidden while skip SNMP is enabled

Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events

Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again

Things I know kind of don't work right now:

Location override - poller/discovery doesn't seem to perform geocoding and whatever else is happening there

Things that could be improved:

Unix Agent poller module etc is enabled by default for all hosts, for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.

Totally untested:

Use of autodiscovery SNMP skip - should work in theory, unsure if those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??

Patch generated from recent trunk, touches files as below,

[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff

-Colin

Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

-- Mike Stupalov Observium Limited, http://observium.org

Colin Stubbs

5:39 a.m.

Hi Mike,

They're in the JIRA ticket, but sure, here's some inserted to this email.

I'm aware of Adam/your preference to reject or simply ignore the regular requests for this kind of feature, and the confusion it causes new users considering Observium for use in their organisations when they find out they can't do the most basic of monitoring to something using Observium.

A great many of your users would gain a lot of value from being able to add a few ping only hosts.

Why build, manage and integrate (to other tools) another ping based monitoring system when all you need is to monitor a handful of devices with ping only; simply because those devices fail to offer SNMP for some reason?

Anything that does have a working SNMP service can and should be monitored with SNMP to gain greater insight into what's happening on it... but not everything does.

No one in their right mind should attempt to use Observium as a system which monitors massive numbers of DNS names or IP's with ping only, there are better systems out there for that, but there are so many situations in a which an Observium install can and should be able to monitor some devices ping only.

I've associated the ticket with the two previous JIRA requests for this feature; which Adam rejected.

They were just feature requests as I recall; no code proposed. Totally understandable you guys didn't want to work on the feature when it's not really what you feel Observium is for.

Rejecting this now when you have just been gifted working code, that modifies things fairly minimally, and simply leverages the ping information that's already occurring to each and every device (that ping is not disabled for), and uses an existing method to mark devices so that further SNMP based polling does not occur... would make no sense.

That said; I totally understand if you want me to fix/add/improve things before considering it further - simply communicate what's necessary and I'll do that.

So, for anyone else on the list, who wants the feature; now is your best chance to add your voice and explain why this kind of feature is of use to you. Because you're probably not going to get it otherwise.

We've been using this internally since early 2017... I just got around to updating Observium to the latest and greatest (reasons, terrible terrible reasons...), and had to update my patch.

But, I'd really prefer not to have to maintain a patch, and I definitely don't want to have to publish it so others can use it and cause Observium Ltd further dramas because of it.

How many hosts do we monitor ping only?

Six. A grand total of six. Everything else has SNMP in some form; and we want as much info as we can extract from them via it - which is why we love Observium. Nothing else compares for SNMP based monitoring IMHO.

Most of those ping only devices are next hop service provider devices just outside our network, for which they won't give us any kind of SNMP access. Ping'ing them offers a small (very small!) degree of insight into link latency/loss from us to them (manual comparison I must admit), and/or potential device problems for the thing that's currently responding for that IP.

Hey look... a perfect use case for this feature...

The others are devices that we use but which simply don't offer SNMP because vendor X didn't bother to add net-snmp to their standard build for CentOS/Ubuntu/whatever weird and wonderful Linux distro they built them on; when they really should have. Don't care much just need to know they're on the network in some way and want a long track record of response time to them.

Hey look... a perfect use case for this feature...

Not surprisingly we don't want to have to operate and maintain a separate system for six IP's. Alerting can be flappy because Observium/fping is only sending a single ICMP echo - we're OK with that and don't even really want more pings which would lead to average values anyway - and we've worked around flaps with delays and automatic suppression of duplicate alerts in our alert management and escalation tool.

The six are monitored by IP; the DNS hostname test below was simply to double check there was no issue with this and a DNS name. Just a hostname that I've never managed to forget and use a lot to ping test connectivity out.

Also, I just realised that the "Skip ping" option is no longer automatically hidden in device settings when "Skip SNMP" is checked; I'll add an updated diff in JIRA shortly after I look at the JS for that again.

[image: Capture.PNG]

[image: observium_ping_only_screens3.PNG]

[image: observium_ping_only_screens4.png]

[image: observium_ping_only_screens5.PNG]

On Sun, 3 Mar 2019 at 18:07, Mike Stupalov mike@observium.org wrote:

...

Ohh my dear..

Which really benefit to use Observium for ping_only hosts?

Pls show (screenshots) how this devices displayed on your install.

Colin Stubbs via observium wrote on 03/03/2019 07:43:

For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.

https://jira.observium.org/browse/OBS-2925

Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.

Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).

Tested:

Add/remove/rename ping only hosts via CLI

Add/remove ping only hosts via webUI

View/interact ping only hosts via webUI - SNMP specific

features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping

will trigger alerts for host down/recovery events

Shifting a previously SNMP contactable host to ping only by ticking

skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again

Things I know kind of don't work right now:

Location override - poller/discovery doesn't seem to perform

geocoding and whatever else is happening there

Things that could be improved:

Unix Agent poller module etc is enabled by default for all hosts,

for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.

Totally untested:

Use of autodiscovery SNMP skip - should work in theory, unsure if

those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??

Patch generated from recent trunk, touches files as below,

[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only

...
ping_only_hosts.diff

[root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff

-Colin

Email: cstubbs @ gmail . com

observium mailing listobservium@observium.orghttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium

-- Mike Stupalov Observium Limited, http://observium.org

Aaron Finney

13 Feb 13 Feb

3:48 p.m.

This issue appears to have been deleted from Jira. It also appears that multiple people are asking for this feature, which does exist on other platforms, and are all bring told that it's a stupid request.

Add me to the chorus of those who would benefit from this feature:

1. We want to use Observium as *the* platform for our corporate IT team to see the status of our environment, and to generate appropriate alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a rotating group of MSP-provided technicians can easily ramp up to and work with. We believe Observium fits this use case perfectly. 3. There is a logical flow to this request: 1. Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if it's down or latency is high 4. The VPN tunnels between us and our SaaS partners terminate in Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN tunnel is not reachable, we need it to be a device in Observium that uses ping only to determine up/down/latency

I appreciate that people have a wide range of approaches to this issue, including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing. We just moved > 16k physical servers worth of compute, 7PB of storage, and multiple terabits of connectivity to GCP, all within a six-month window. Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which includes corporate IT systems. Everything will migrate to managed services wherever possible, operated by contracted L1 and L2 technicians. Everything lives as Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.

So at the risk of causing the maintainers to dig their heels in more, I would ask you to reconsider your position on this topic, and either vet the original poster's contributed patch, or consider adding a "no snmp" flag in a way that you're comfortable maintaining it.

Aaron

On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:

...

For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.

https://jira.observium.org/browse/OBS-2925

Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.

Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).

Tested:

Add/remove/rename ping only hosts via CLI

Add/remove ping only hosts via webUI

View/interact ping only hosts via webUI - SNMP specific

features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping

will trigger alerts for host down/recovery events

Shifting a previously SNMP contactable host to ping only by ticking

skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again

Things I know kind of don't work right now:

Location override - poller/discovery doesn't seem to perform

geocoding and whatever else is happening there

Things that could be improved:

Unix Agent poller module etc is enabled by default for all hosts,

for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.

Totally untested:

Use of autodiscovery SNMP skip - should work in theory, unsure if

those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??

Patch generated from recent trunk, touches files as below,

[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only

...
ping_only_hosts.diff

[root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff

-Colin

Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

-- *Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

Colin Stubbs

6:30 p.m.

Yeah, that's somewhat common when Adam doesn't like something.

Or possibly it was another developer who is now enrolled in the same attitude/culture.

However, while I am critical of how requests like this are handled, and customer engagement (or the lack thereof) more generally; they have started to add a better way of handling this kind of situation recently.

They've added a feature to utilise Nagios probes, and while you do need to associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments you want to the probe.

This allows you to monitor other hosts in your network using methods and protocols that would otherwise not be supported by Observium's core code without major modifications. It also means you can easily develop your own probes to meet more bespoke requirements quite easily.

We're yet to test and use it properly, but it's looking like it's going to be a good move.

There still does not appear to be any documentation yet though, from my quick check right now.

So you'll likely need to read the code to understand how to use it at present.

JIRA tickets are likely your other best source of info, I had to log one to fix adding a second probe awhile ago: https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22

I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.

e.g. using check_fping to monitor a router by IP,

[image: obs_probe_menu.png]

[image: obs_probes.png]

It appears as though alerting may now exist for this too.

And contrary to screenshot below, from the look of the code the metrics seem to be based on KVP's that the nagios plugin returns, which makes sense.

First scan of that code suggests that they won't be accessible though.

[image: obs_probe_alerts.png]

On Fri, 14 Feb 2020 at 06:48, Aaron Finney aaron.finney@openx.com wrote:

...

This issue appears to have been deleted from Jira. It also appears that multiple people are asking for this feature, which does exist on other platforms, and are all bring told that it's a stupid request.

Add me to the chorus of those who would benefit from this feature:

We want to use Observium as *the* platform for our corporate IT

team to see the status of our environment, and to generate appropriate alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a rotating group of MSP-provided technicians can easily ramp up to and work with. We believe Observium fits this use case perfectly. 3. There is a logical flow to this request:

Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if it's down or latency is high 4. The VPN tunnels between us and our SaaS partners terminate in Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN tunnel is not reachable, we need it to be a device in Observium that uses ping only to determine up/down/latency

I appreciate that people have a wide range of approaches to this issue, including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing. We just moved > 16k physical servers worth of compute, 7PB of storage, and multiple terabits of connectivity to GCP, all within a six-month window. Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which includes corporate IT systems. Everything will migrate to managed services wherever possible, operated by contracted L1 and L2 technicians. Everything lives as Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.

So at the risk of causing the maintainers to dig their heels in more, I would ask you to reconsider your position on this topic, and either vet the original poster's contributed patch, or consider adding a "no snmp" flag in a way that you're comfortable maintaining it.

Aaron

On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:

...
For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.

https://jira.observium.org/browse/OBS-2925

Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.

Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).

Tested:

Add/remove/rename ping only hosts via CLI

Add/remove ping only hosts via webUI

View/interact ping only hosts via webUI - SNMP specific

features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again

Things I know kind of don't work right now:

Location override - poller/discovery doesn't seem to perform

geocoding and whatever else is happening there

Things that could be improved:

Unix Agent poller module etc is enabled by default for all hosts,

for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.

Totally untested:

Use of autodiscovery SNMP skip - should work in theory, unsure if

those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??

Patch generated from recent trunk, touches files as below,

[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff

-Colin

Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

Aaron Finney

7:18 p.m.

That's a good callout; I'm using probes for LDAP and SSH checks with alerting, I'll see if I can leverage this for the same purpose, using the Observium instance itself as the host device for the probes. Still a bummer, but better than the alternatives (a separate system, or migrating to another system like LibreNMS).

Thanks,

Aaron

On Thu, Feb 13, 2020 at 3:31 PM Colin Stubbs cstubbs@gmail.com wrote:

...

Yeah, that's somewhat common when Adam doesn't like something.

Or possibly it was another developer who is now enrolled in the same attitude/culture.

However, while I am critical of how requests like this are handled, and customer engagement (or the lack thereof) more generally; they have started to add a better way of handling this kind of situation recently.

They've added a feature to utilise Nagios probes, and while you do need to associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments you want to the probe.

This allows you to monitor other hosts in your network using methods and protocols that would otherwise not be supported by Observium's core code without major modifications. It also means you can easily develop your own probes to meet more bespoke requirements quite easily.

We're yet to test and use it properly, but it's looking like it's going to be a good move.

There still does not appear to be any documentation yet though, from my quick check right now.

So you'll likely need to read the code to understand how to use it at present.

JIRA tickets are likely your other best source of info, I had to log one to fix adding a second probe awhile ago: https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22

I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.

e.g. using check_fping to monitor a router by IP,

[image: obs_probe_menu.png]

[image: obs_probes.png]

It appears as though alerting may now exist for this too.

And contrary to screenshot below, from the look of the code the metrics seem to be based on KVP's that the nagios plugin returns, which makes sense.

First scan of that code suggests that they won't be accessible though.

[image: obs_probe_alerts.png]

On Fri, 14 Feb 2020 at 06:48, Aaron Finney aaron.finney@openx.com wrote:

...
This issue appears to have been deleted from Jira. It also appears that multiple people are asking for this feature, which does exist on other platforms, and are all bring told that it's a stupid request.

Add me to the chorus of those who would benefit from this feature:

We want to use Observium as *the* platform for our corporate IT

team to see the status of our environment, and to generate appropriate alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a rotating group of MSP-provided technicians can easily ramp up to and work with. We believe Observium fits this use case perfectly. 3. There is a logical flow to this request:

Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if it's down or latency is high 4. The VPN tunnels between us and our SaaS partners terminate in Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN tunnel is not reachable, we need it to be a device in Observium that uses ping only to determine up/down/latency

I appreciate that people have a wide range of approaches to this issue, including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing. We just moved > 16k physical servers worth of compute, 7PB of storage, and multiple terabits of connectivity to GCP, all within a six-month window. Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which includes corporate IT systems. Everything will migrate to managed services wherever possible, operated by contracted L1 and L2 technicians. Everything lives as Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.

So at the risk of causing the maintainers to dig their heels in more, I would ask you to reconsider your position on this topic, and either vet the original poster's contributed patch, or consider adding a "no snmp" flag in a way that you're comfortable maintaining it.

Aaron

On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:

...
For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.

https://jira.observium.org/browse/OBS-2925

Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.

Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).

Tested:

Add/remove/rename ping only hosts via CLI

Add/remove ping only hosts via webUI

View/interact ping only hosts via webUI - SNMP specific

features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again

Things I know kind of don't work right now:

Location override - poller/discovery doesn't seem to perform

geocoding and whatever else is happening there

Things that could be improved:

Unix Agent poller module etc is enabled by default for all hosts,

for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.

Totally untested:

Use of autodiscovery SNMP skip - should work in theory, unsure if

those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??

Patch generated from recent trunk, touches files as below,

[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff

-Colin

Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

-- *Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

Adam Armstrong

14 Feb 14 Feb

1:58 a.m.

This bit of code was written specifically for a large user (hence no doc yet) right around the time Colin was having a tantrum over having his patch rejected. :)

At the time it was intended to allow monitoring of extra-Devices devices, though I had intended to add a simplified non-Devices device list system, rather than squeezing all of the checks into other devices, before it was announced as the solution to that issue.

This hasn't been completed yet (it'd be a new alerting mode, like syslog, so a bit more involved compared to just a list and a foreach), but I guess it's still usable to put the checks on the observium host itself with overridden hostnames.

Adam.

⁣Sent from BlueMail

On 14 Feb 2020, 00:18, at 00:18, Aaron Finney via observium observium@observium.org wrote:

...

That's a good callout; I'm using probes for LDAP and SSH checks with alerting, I'll see if I can leverage this for the same purpose, using the Observium instance itself as the host device for the probes. Still a bummer, but better than the alternatives (a separate system, or migrating to another system like LibreNMS).

Thanks,

Aaron

On Thu, Feb 13, 2020 at 3:31 PM Colin Stubbs cstubbs@gmail.com wrote:

...
Yeah, that's somewhat common when Adam doesn't like something.

Or possibly it was another developer who is now enrolled in the same attitude/culture.

However, while I am critical of how requests like this are handled,

and

...
customer engagement (or the lack thereof) more generally; they have

started

...
to add a better way of handling this kind of situation recently.

They've added a feature to utilise Nagios probes, and while you do

need to

...
associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments

you want

...
to the probe.

This allows you to monitor other hosts in your network using methods

and

...
protocols that would otherwise not be supported by Observium's core

code

...
without major modifications. It also means you can easily develop

your own

...
probes to meet more bespoke requirements quite easily.

We're yet to test and use it properly, but it's looking like it's

going to

...
be a good move.

There still does not appear to be any documentation yet though, from

my

...
quick check right now.

So you'll likely need to read the code to understand how to use it at present.

JIRA tickets are likely your other best source of info, I had to log

one

...
to fix adding a second probe awhile ago:

https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22

...
I'm guessing it's in Professional only at this point, but I havn't

checked

...
the community code so perhaps it's there too.

e.g. using check_fping to monitor a router by IP,

[image: obs_probe_menu.png]

[image: obs_probes.png]

It appears as though alerting may now exist for this too.

And contrary to screenshot below, from the look of the code the

metrics

...
seem to be based on KVP's that the nagios plugin returns, which makes

sense.

...
First scan of that code suggests that they won't be accessible

though.

...
[image: obs_probe_alerts.png]

On Fri, 14 Feb 2020 at 06:48, Aaron Finney aaron.finney@openx.com

wrote:

...
...
This issue appears to have been deleted from Jira. It also appears

that

...
...
multiple people are asking for this feature, which does exist on

other

...
...
platforms, and are all bring told that it's a stupid request.

Add me to the chorus of those who would benefit from this feature:

We want to use Observium as *the* platform for our corporate

IT

...
...
team to see the status of our environment, and to generate

appropriate

...
...
alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a

rotating group

...
...
of MSP-provided technicians can easily ramp up to and work with.

We believe

...
...
Observium fits this use case perfectly. 3. There is a logical flow to this request:

Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN

with

...
...
  that we want to monitor for basic reachability, and alert if
it's down or

...
...
  latency is high
  4. The VPN tunnels between us and our SaaS partners terminate
in

...
...
  Google Cloud, so IPSLA is not an option
  5. Therefore, the logical conclusion is that if we want to
receive

...
...
  alerts when an endpoint on the far side of a VPN tunnel is not
reachable,

...
...
  we need it to be a device in Observium that uses ping only to
determine

...
...
  up/down/latency
I appreciate that people have a wide range of approaches to this
issue,

...
...
including third-party applications running via cron with sendmail

for

...
...
alerts, but this is antithetical to how our IT landscape is

changing. We

...
...
just moved > 16k physical servers worth of compute, 7PB of storage,

and

...
...
multiple terabits of connectivity to GCP, all within a six-month

window.

...
...
Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which

includes

...
...
corporate IT systems. Everything will migrate to managed services

wherever

...
...
possible, operated by contracted L1 and L2 technicians. Everything

lives as

...
...
Terraform code in github; in fact, nobody even has the permissions

to

...
...
instantiate resources manually.

So at the risk of causing the maintainers to dig their heels in

more, I

...
...
would ask you to reconsider your position on this topic, and either

vet the

...
...
original poster's contributed patch, or consider adding a "no snmp"

flag in

...
...
a way that you're comfortable maintaining it.

Aaron

On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:

...
For the benefit of anyone on the list who doesn't use JIRA... and

also

...
...
...
so that others who support (want/need) this feature can comment.

https://jira.observium.org/browse/OBS-2925

Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have

hosts

...
...
...
that are ping only.

Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).

Tested:

Add/remove/rename ping only hosts via CLI

Add/remove ping only hosts via webUI

View/interact ping only hosts via webUI - SNMP specific

features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type

equals

...
...
...
ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and

remain

...
...
...
available - remove skip SNMP box and SNMP polling begins again

Things I know kind of don't work right now:

Location override - poller/discovery doesn't seem to perform

geocoding and whatever else is happening there

Things that could be improved:

Unix Agent poller module etc is enabled by default for all

hosts,

...
...
...
for ping only hosts, perhaps it should be disabled by default?

Will improve

...
...
...
performance by reducing the number of processes that hang while

the 10s

...
...
...
default connect timeout happens.

Totally untested:

Use of autodiscovery SNMP skip - should work in theory,

unsure if

...
...
...
those parts of the patch should actually be used though. Some

people out

...
...
...
there may actually want to add anything that does respond to

ping and can

...
...
...
be found thru adjacency and routing protocol info etc??

Patch generated from recent trunk, touches files as below,

[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff

-Colin

Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Aaron Finney

12:54 p.m.

I have this working using "status equals alert" as the condition; is it possible to use the output or message of the probe for the condition instead? I.e. I'd love to use something like

message regexp CRITICAL(?=.*10.10.1.1)+

to match the specific test for the alert. This way, you could have multiple check_fping checks under the local machine and alert different groups depending on which test fails.

Thanks!

Aaron

On Thu, Feb 13, 2020 at 4:18 PM Aaron Finney aaron.finney@openx.com wrote:

...

That's a good callout; I'm using probes for LDAP and SSH checks with alerting, I'll see if I can leverage this for the same purpose, using the Observium instance itself as the host device for the probes. Still a bummer, but better than the alternatives (a separate system, or migrating to another system like LibreNMS).

Thanks,

Aaron

On Thu, Feb 13, 2020 at 3:31 PM Colin Stubbs cstubbs@gmail.com wrote:

...
Yeah, that's somewhat common when Adam doesn't like something.

Or possibly it was another developer who is now enrolled in the same attitude/culture.

However, while I am critical of how requests like this are handled, and customer engagement (or the lack thereof) more generally; they have started to add a better way of handling this kind of situation recently.

They've added a feature to utilise Nagios probes, and while you do need to associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments you want to the probe.

This allows you to monitor other hosts in your network using methods and protocols that would otherwise not be supported by Observium's core code without major modifications. It also means you can easily develop your own probes to meet more bespoke requirements quite easily.

We're yet to test and use it properly, but it's looking like it's going to be a good move.

There still does not appear to be any documentation yet though, from my quick check right now.

So you'll likely need to read the code to understand how to use it at present.

JIRA tickets are likely your other best source of info, I had to log one to fix adding a second probe awhile ago: https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22

I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.

e.g. using check_fping to monitor a router by IP,

[image: obs_probe_menu.png]

[image: obs_probes.png]

It appears as though alerting may now exist for this too.

And contrary to screenshot below, from the look of the code the metrics seem to be based on KVP's that the nagios plugin returns, which makes sense.

First scan of that code suggests that they won't be accessible though.

[image: obs_probe_alerts.png]

On Fri, 14 Feb 2020 at 06:48, Aaron Finney aaron.finney@openx.com wrote:

...
This issue appears to have been deleted from Jira. It also appears that multiple people are asking for this feature, which does exist on other platforms, and are all bring told that it's a stupid request.

Add me to the chorus of those who would benefit from this feature:

We want to use Observium as *the* platform for our corporate IT

team to see the status of our environment, and to generate appropriate alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a rotating group of MSP-provided technicians can easily ramp up to and work with. We believe Observium fits this use case perfectly. 3. There is a logical flow to this request:

Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if it's down or latency is high 4. The VPN tunnels between us and our SaaS partners terminate in Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN tunnel is not reachable, we need it to be a device in Observium that uses ping only to determine up/down/latency

I appreciate that people have a wide range of approaches to this issue, including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing. We just moved > 16k physical servers worth of compute, 7PB of storage, and multiple terabits of connectivity to GCP, all within a six-month window. Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which includes corporate IT systems. Everything will migrate to managed services wherever possible, operated by contracted L1 and L2 technicians. Everything lives as Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.

So at the risk of causing the maintainers to dig their heels in more, I would ask you to reconsider your position on this topic, and either vet the original poster's contributed patch, or consider adding a "no snmp" flag in a way that you're comfortable maintaining it.

Aaron

On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:

...
For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.

https://jira.observium.org/browse/OBS-2925

Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.

Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).

Tested:

Add/remove/rename ping only hosts via CLI

Add/remove ping only hosts via webUI

View/interact ping only hosts via webUI - SNMP specific

features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again

Things I know kind of don't work right now:

Location override - poller/discovery doesn't seem to perform

geocoding and whatever else is happening there

Things that could be improved:

Unix Agent poller module etc is enabled by default for all

hosts, for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.

Totally untested:

Use of autodiscovery SNMP skip - should work in theory, unsure

if those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??

Patch generated from recent trunk, touches files as below,

[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff

-Colin

Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

-- *Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

Adam Armstrong

4:21 p.m.

I can add that tomorrow.

Adam.

⁣Sent from BlueMail

On 14 Feb 2020, 17:54, at 17:54, Aaron Finney via observium observium@observium.org wrote:

...

I have this working using "status equals alert" as the condition; is it possible to use the output or message of the probe for the condition instead? I.e. I'd love to use something like

message regexp CRITICAL(?=.*10.10.1.1)+

to match the specific test for the alert. This way, you could have multiple check_fping checks under the local machine and alert different groups depending on which test fails.

Thanks!

Aaron

On Thu, Feb 13, 2020 at 4:18 PM Aaron Finney aaron.finney@openx.com wrote:

...
That's a good callout; I'm using probes for LDAP and SSH checks with alerting, I'll see if I can leverage this for the same purpose, using

the

...
Observium instance itself as the host device for the probes. Still a bummer, but better than the alternatives (a separate system, or

migrating

...
to another system like LibreNMS).

Thanks,

Aaron

On Thu, Feb 13, 2020 at 3:31 PM Colin Stubbs cstubbs@gmail.com

wrote:

...
...
Yeah, that's somewhat common when Adam doesn't like something.

Or possibly it was another developer who is now enrolled in the same attitude/culture.

However, while I am critical of how requests like this are handled,

and

...
...
customer engagement (or the lack thereof) more generally; they have

started

...
...
to add a better way of handling this kind of situation recently.

They've added a feature to utilise Nagios probes, and while you do

need

...
...
to associate the probe config to an existing host in Observium for

alert

...
...
association purposes, you can override and pass whatever arguments

you want

...
...
to the probe.

This allows you to monitor other hosts in your network using methods

and

...
...
protocols that would otherwise not be supported by Observium's core

code

...
...
without major modifications. It also means you can easily develop

your own

...
...
probes to meet more bespoke requirements quite easily.

We're yet to test and use it properly, but it's looking like it's

going

...
...
to be a good move.

There still does not appear to be any documentation yet though, from

my

...
...
quick check right now.

So you'll likely need to read the code to understand how to use it

at

...
...
present.

JIRA tickets are likely your other best source of info, I had to log

one

...
...
to fix adding a second probe awhile ago:

https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22

...
...
I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.

e.g. using check_fping to monitor a router by IP,

[image: obs_probe_menu.png]

[image: obs_probes.png]

It appears as though alerting may now exist for this too.

And contrary to screenshot below, from the look of the code the

metrics

...
...
seem to be based on KVP's that the nagios plugin returns, which

makes sense.

...
...
First scan of that code suggests that they won't be accessible

though.

...
...
[image: obs_probe_alerts.png]

On Fri, 14 Feb 2020 at 06:48, Aaron Finney aaron.finney@openx.com wrote:

...
This issue appears to have been deleted from Jira. It also appears

that

...
...
...
multiple people are asking for this feature, which does exist on

other

...
...
...
platforms, and are all bring told that it's a stupid request.

Add me to the chorus of those who would benefit from this feature:

We want to use Observium as *the* platform for our corporate

IT

...
...
...
team to see the status of our environment, and to generate

appropriate

...
...
...
alerts 2. We have aggressively moved away from FTEs for our corporate

IT

...
...
...
staff, and need to provide a simplified environment that a

rotating group

...
...
...
of MSP-provided technicians can easily ramp up to and work with.

We believe

...
...
...
Observium fits this use case perfectly. 3. There is a logical flow to this request:

Observium is our sole monitoring and alerting platform for

our

...
...
...
  entire corporate office infrastructure
  2. Observium can only alert on metrics/events for devices
  3. We have multiple SaaS partners we connect with over VPN
with

...
...
...
  that we want to monitor for basic reachability, and alert if
it's down or

...
...
...
  latency is high
  4. The VPN tunnels between us and our SaaS partners terminate
in

...
...
...
  Google Cloud, so IPSLA is not an option
  5. Therefore, the logical conclusion is that if we want to
  receive alerts when an endpoint on the far side of a VPN
tunnel is not

...
...
...
  reachable, we need it to be a device in Observium that uses
ping only to

...
...
...
  determine up/down/latency
I appreciate that people have a wide range of approaches to this
issue,

...
...
...
including third-party applications running via cron with sendmail

for

...
...
...
alerts, but this is antithetical to how our IT landscape is

changing. We

...
...
...
just moved > 16k physical servers worth of compute, 7PB of storage,

and

...
...
...
multiple terabits of connectivity to GCP, all within a six-month

window.

...
...
...
Now that our production environment is migrated and our data

centers

...
...
...
decommissioned, we are cleaning up the rest of the mess, which

includes

...
...
...
corporate IT systems. Everything will migrate to managed services

wherever

...
...
...
possible, operated by contracted L1 and L2 technicians. Everything

lives as

...
...
...
Terraform code in github; in fact, nobody even has the permissions

to

...
...
...
instantiate resources manually.

So at the risk of causing the maintainers to dig their heels in

more, I

...
...
...
would ask you to reconsider your position on this topic, and either

vet the

...
...
...
original poster's contributed patch, or consider adding a "no snmp"

flag in

...
...
...
a way that you're comfortable maintaining it.

Aaron

On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:

...
For the benefit of anyone on the list who doesn't use JIRA... and

also

...
...
...
...
so that others who support (want/need) this feature can comment.

https://jira.observium.org/browse/OBS-2925

Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip

device

...
...
...
...
attribute, similar to OBS_PING_SKIP and ping_skip, in order to

have hosts

...
...
...
...
that are ping only.

Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled

(untested).

...
...
...
...
Tested:

Add/remove/rename ping only hosts via CLI

Add/remove ping only hosts via webUI

View/interact ping only hosts via webUI - SNMP specific

features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type

equals

...
...
...
...
ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and

remain

...
...
...
...
available - remove skip SNMP box and SNMP polling begins again

Things I know kind of don't work right now:

Location override - poller/discovery doesn't seem to perform

geocoding and whatever else is happening there

Things that could be improved:

Unix Agent poller module etc is enabled by default for all

hosts, for ping only hosts, perhaps it should be disabled by

default? Will

...
...
...
...
improve performance by reducing the number of processes that

hang while the

...
...
...
...
10s default connect timeout happens.

Totally untested:

Use of autodiscovery SNMP skip - should work in theory,

unsure

...
...
...
...
if those parts of the patch should actually be used though.

Some people out

...
...
...
...
there may actually want to add anything that does respond to

ping and can

...
...
...
...
be found thru adjacency and routing protocol info etc??

Patch generated from recent trunk, touches files as below,

[root@desktop observium]# diff -r -u observium-trunk root | grep

-v

...
...
...
...
^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff

-Colin

Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Aaron Finney

4:36 p.m.

Awesome, that would be super useful for us. Thanks!

On Fri, Feb 14, 2020 at 1:21 PM Adam Armstrong via observium < observium@observium.org> wrote:

...

I can add that tomorrow.

Adam.

Sent from BlueMail http://www.bluemail.me/r?b=15774 On 14 Feb 2020, at 17:54, Aaron Finney via observium < observium@observium.org> wrote:

...
I have this working using "status equals alert" as the condition; is it possible to use the output or message of the probe for the condition instead? I.e. I'd love to use something like

message regexp CRITICAL(?=.*10.10.1.1)+

to match the specific test for the alert. This way, you could have multiple check_fping checks under the local machine and alert different groups depending on which test fails.

Thanks!

Aaron

On Thu, Feb 13, 2020 at 4:18 PM Aaron Finney aaron.finney@openx.com wrote:

...
That's a good callout; I'm using probes for LDAP and SSH checks with alerting, I'll see if I can leverage this for the same purpose, using the Observium instance itself as the host device for the probes. Still a bummer, but better than the alternatives (a separate system, or migrating to another system like LibreNMS).

Thanks,

Aaron

On Thu, Feb 13, 2020 at 3:31 PM Colin Stubbs cstubbs@gmail.com wrote:

...
Yeah, that's somewhat common when Adam doesn't like something.

Or possibly it was another developer who is now enrolled in the same attitude/culture.

However, while I am critical of how requests like this are handled, and customer engagement (or the lack thereof) more generally; they have started to add a better way of handling this kind of situation recently.

They've added a feature to utilise Nagios probes, and while you do need to associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments you want to the probe.

This allows you to monitor other hosts in your network using methods and protocols that would otherwise not be supported by Observium's core code without major modifications. It also means you can easily develop your own probes to meet more bespoke requirements quite easily.

We're yet to test and use it properly, but it's looking like it's going to be a good move.

There still does not appear to be any documentation yet though, from my quick check right now.

So you'll likely need to read the code to understand how to use it at present.

JIRA tickets are likely your other best source of info, I had to log one to fix adding a second probe awhile ago: https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22

I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.

e.g. using check_fping to monitor a router by IP,

[image: obs_probe_menu.png]

[image: obs_probes.png]

It appears as though alerting may now exist for this too.

And contrary to screenshot below, from the look of the code the metrics seem to be based on KVP's that the nagios plugin returns, which makes sense.

First scan of that code suggests that they won't be accessible though.

[image: obs_probe_alerts.png]

On Fri, 14 Feb 2020 at 06:48, Aaron Finney aaron.finney@openx.com wrote:

...
This issue appears to have been deleted from Jira. It also appears that multiple people are asking for this feature, which does exist on other platforms, and are all bring told that it's a stupid request.

Add me to the chorus of those who would benefit from this feature:

We want to use Observium as *the* platform for our corporate IT

team to see the status of our environment, and to generate appropriate alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a rotating group of MSP-provided technicians can easily ramp up to and work with. We believe Observium fits this use case perfectly. 3. There is a logical flow to this request:

Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if it's down or latency is high 4. The VPN tunnels between us and our SaaS partners terminate in Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN tunnel is not reachable, we need it to be a device in Observium that uses ping only to determine up/down/latency

I appreciate that people have a wide range of approaches to this issue, including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing. We just moved > 16k physical servers worth of compute, 7PB of storage, and multiple terabits of connectivity to GCP, all within a six-month window. Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which includes corporate IT systems. Everything will migrate to managed services wherever possible, operated by contracted L1 and L2 technicians. Everything lives as Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.

So at the risk of causing the maintainers to dig their heels in more, I would ask you to reconsider your position on this topic, and either vet the original poster's contributed patch, or consider adding a "no snmp" flag in a way that you're comfortable maintaining it.

Aaron

On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:

...
For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.

https://jira.observium.org/browse/OBS-2925

Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.

Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).

Tested:

Add/remove/rename ping only hosts via CLI

Add/remove ping only hosts via webUI

View/interact ping only hosts via webUI - SNMP specific

features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again

Things I know kind of don't work right now:

Location override - poller/discovery doesn't seem to perform

geocoding and whatever else is happening there

Things that could be improved:

Unix Agent poller module etc is enabled by default for all

hosts, for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.

Totally untested:

Use of autodiscovery SNMP skip - should work in theory, unsure

if those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??

Patch generated from recent trunk, touches files as below,

[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff

-Colin

Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

-- *Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

adama＠memetic.org

15 Feb 15 Feb

4:58 a.m.

Should be available in r10282.

Adam.

From: observium observium-bounces@observium.org On Behalf Of Aaron Finney via observium Sent: 14 February 2020 21:37 To: Observium observium@observium.org Cc: Aaron Finney aaron.finney@openx.com; Adam Armstrong adama@memetic.org Subject: Re: [Observium] OBS-2925 - Ping Only Hosts / OBS_SNMP_SKIP flag / snmp_skip attribute

Awesome, that would be super useful for us. Thanks!

On Fri, Feb 14, 2020 at 1:21 PM Adam Armstrong via observium <observium@observium.org mailto:observium@observium.org > wrote:

I can add that tomorrow.

Adam.

Sent from BlueMail http://www.bluemail.me/r?b=15774

On 14 Feb 2020, at 17:54, Aaron Finney via observium <observium@observium.org mailto:observium@observium.org > wrote:

I have this working using "status equals alert" as the condition; is it possible to use the output or message of the probe for the condition instead? I.e. I'd love to use something like

message regexp CRITICAL(?=.*10.10.1.1)+

to match the specific test for the alert. This way, you could have multiple check_fping checks under the local machine and alert different groups depending on which test fails.

Thanks!

Aaron

On Thu, Feb 13, 2020 at 4:18 PM Aaron Finney <aaron.finney@openx.com mailto:aaron.finney@openx.com > wrote:

Thanks,

Aaron

On Thu, Feb 13, 2020 at 3:31 PM Colin Stubbs <cstubbs@gmail.com mailto:cstubbs@gmail.com > wrote:

Yeah, that's somewhat common when Adam doesn't like something.

Or possibly it was another developer who is now enrolled in the same attitude/culture.

We're yet to test and use it properly, but it's looking like it's going to be a good move.

There still does not appear to be any documentation yet though, from my quick check right now.

So you'll likely need to read the code to understand how to use it at present.

JIRA tickets are likely your other best source of info, I had to log one to fix adding a second probe awhile ago: https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22

I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.

e.g. using check_fping to monitor a router by IP,

It appears as though alerting may now exist for this too.

And contrary to screenshot below, from the look of the code the metrics seem to be based on KVP's that the nagios plugin returns, which makes sense.

First scan of that code suggests that they won't be accessible though.

On Fri, 14 Feb 2020 at 06:48, Aaron Finney <aaron.finney@openx.com mailto:aaron.finney@openx.com > wrote:

Add me to the chorus of those who would benefit from this feature:

1. Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if it's down or latency is high 4. The VPN tunnels between us and our SaaS partners terminate in Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN tunnel is not reachable, we need it to be a device in Observium that uses ping only to determine up/down/latency

Aaron

On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium <observium@observium.org mailto:observium@observium.org > wrote:

For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.

https://jira.observium.org/browse/OBS-2925

Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.

Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).

Tested:

1. Add/remove/rename ping only hosts via CLI

2. Add/remove ping only hosts via webUI

3. View/interact ping only hosts via webUI - SNMP specific features/menus etc are hidden while skip SNMP is enabled

4. Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events

5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again

Things I know kind of don't work right now:

1. Location override - poller/discovery doesn't seem to perform geocoding and whatever else is happening there

Things that could be improved:

Totally untested:

Patch generated from recent trunk, touches files as below,

-Colin

Email: cstubbs @ gmail . com

_______________________________________________ observium mailing list observium@observium.org mailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

-- Aaron Finney Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com mailto:aaron.finney@openx.com -- Aaron Finney Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com mailto:aaron.finney@openx.com -- Aaron Finney Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com mailto:aaron.finney@openx.com _____ observium mailing list observium@observium.org mailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium _______________________________________________ observium mailing list observium@observium.org mailto:observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium -- Aaron Finney Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com mailto:aaron.finney@openx.com

Aaron Finney

17 Feb 17 Feb

2:17 p.m.

Tested, works perfectly. Many thanks for the guidance (Colin) and the fix (Adam). Just one big, happy, dysfunctional family. :D

On Sat, Feb 15, 2020 at 1:58 AM Adam Armstrong via observium < observium@observium.org> wrote:

...

Should be available in r10282.

Adam.

*From:* observium observium-bounces@observium.org *On Behalf Of *Aaron Finney via observium *Sent:* 14 February 2020 21:37 *To:* Observium observium@observium.org *Cc:* Aaron Finney aaron.finney@openx.com; Adam Armstrong < adama@memetic.org> *Subject:* Re: [Observium] OBS-2925 - Ping Only Hosts / OBS_SNMP_SKIP flag / snmp_skip attribute

Awesome, that would be super useful for us. Thanks!

On Fri, Feb 14, 2020 at 1:21 PM Adam Armstrong via observium < observium@observium.org> wrote:

I can add that tomorrow.

Adam.

Sent from BlueMail http://www.bluemail.me/r?b=15774

On 14 Feb 2020, at 17:54, Aaron Finney via observium < observium@observium.org> wrote:

I have this working using "status equals alert" as the condition; is it possible to use the output or message of the probe for the condition instead? I.e. I'd love to use something like

message regexp CRITICAL(?=.*10.10.1.1)+

to match the specific test for the alert. This way, you could have multiple check_fping checks under the local machine and alert different groups depending on which test fails.

Thanks!

Aaron

On Thu, Feb 13, 2020 at 4:18 PM Aaron Finney aaron.finney@openx.com wrote:

That's a good callout; I'm using probes for LDAP and SSH checks with alerting, I'll see if I can leverage this for the same purpose, using the Observium instance itself as the host device for the probes. Still a bummer, but better than the alternatives (a separate system, or migrating to another system like LibreNMS).

Thanks,

Aaron

On Thu, Feb 13, 2020 at 3:31 PM Colin Stubbs cstubbs@gmail.com wrote:

Yeah, that's somewhat common when Adam doesn't like something.

Or possibly it was another developer who is now enrolled in the same attitude/culture.

However, while I am critical of how requests like this are handled, and customer engagement (or the lack thereof) more generally; they have started to add a better way of handling this kind of situation recently.

They've added a feature to utilise Nagios probes, and while you do need to associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments you want to the probe.

This allows you to monitor other hosts in your network using methods and protocols that would otherwise not be supported by Observium's core code without major modifications. It also means you can easily develop your own probes to meet more bespoke requirements quite easily.

We're yet to test and use it properly, but it's looking like it's going to be a good move.

There still does not appear to be any documentation yet though, from my quick check right now.

So you'll likely need to read the code to understand how to use it at present.

JIRA tickets are likely your other best source of info, I had to log one to fix adding a second probe awhile ago: https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22

I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.

e.g. using check_fping to monitor a router by IP,

It appears as though alerting may now exist for this too.

And contrary to screenshot below, from the look of the code the metrics seem to be based on KVP's that the nagios plugin returns, which makes sense.

First scan of that code suggests that they won't be accessible though.

On Fri, 14 Feb 2020 at 06:48, Aaron Finney aaron.finney@openx.com wrote:

This issue appears to have been deleted from Jira. It also appears that multiple people are asking for this feature, which does exist on other platforms, and are all bring told that it's a stupid request.

Add me to the chorus of those who would benefit from this feature:

We want to use Observium as *the* platform for our corporate IT

team to see the status of our environment, and to generate appropriate alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a rotating group of MSP-provided technicians can easily ramp up to and work with. We believe Observium fits this use case perfectly. 3. There is a logical flow to this request:

Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if it's down or latency is high 4. The VPN tunnels between us and our SaaS partners terminate in Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN tunnel is not reachable, we need it to be a device in Observium that uses ping only to determine up/down/latency

I appreciate that people have a wide range of approaches to this issue, including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing. We just moved > 16k physical servers worth of compute, 7PB of storage, and multiple terabits of connectivity to GCP, all within a six-month window. Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which includes corporate IT systems. Everything will migrate to managed services wherever possible, operated by contracted L1 and L2 technicians. Everything lives as Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.

So at the risk of causing the maintainers to dig their heels in more, I would ask you to reconsider your position on this topic, and either vet the original poster's contributed patch, or consider adding a "no snmp" flag in a way that you're comfortable maintaining it.

Aaron

On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:

For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.

https://jira.observium.org/browse/OBS-2925

Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.

Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).

Tested:
Add/remove/rename ping only hosts via CLI
Add/remove ping only hosts via webUI
View/interact ping only hosts via webUI - SNMP specific
features/menus etc are hidden while skip SNMP is enabled
Alerting - device_status equals 0 && device_status_type equals
ping - will trigger alerts for host down/recovery events
Shifting a previously SNMP contactable host to ping only by
ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again

Things I know kind of don't work right now:
Location override - poller/discovery doesn't seem to perform
geocoding and whatever else is happening there

Things that could be improved:
Unix Agent poller module etc is enabled by default for all hosts,
for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.

Totally untested:
Use of autodiscovery SNMP skip - should work in theory, unsure if
those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??

Patch generated from recent trunk, touches files as below,

[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only

...
ping_only_hosts.diff

[root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff

-Colin

Email: cstubbs @ gmail . com

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

-- *Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

Adam Armstrong

14 Feb 14 Feb

1:44 a.m.

You ignored my reply on that ticket explaining how it'd need to be done if we were to accept a patch for it, so...

Adam.

⁣Sent from BlueMail

On 13 Feb 2020, 23:31, at 23:31, Colin Stubbs via observium observium@observium.org wrote:

...

Yeah, that's somewhat common when Adam doesn't like something.

Or possibly it was another developer who is now enrolled in the same attitude/culture.

However, while I am critical of how requests like this are handled, and customer engagement (or the lack thereof) more generally; they have started to add a better way of handling this kind of situation recently.

They've added a feature to utilise Nagios probes, and while you do need to associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments you want to the probe.

This allows you to monitor other hosts in your network using methods and protocols that would otherwise not be supported by Observium's core code without major modifications. It also means you can easily develop your own probes to meet more bespoke requirements quite easily.

We're yet to test and use it properly, but it's looking like it's going to be a good move.

There still does not appear to be any documentation yet though, from my quick check right now.

So you'll likely need to read the code to understand how to use it at present.

JIRA tickets are likely your other best source of info, I had to log one to fix adding a second probe awhile ago: https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22

I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.

e.g. using check_fping to monitor a router by IP,

[image: obs_probe_menu.png]

[image: obs_probes.png]

It appears as though alerting may now exist for this too.

And contrary to screenshot below, from the look of the code the metrics seem to be based on KVP's that the nagios plugin returns, which makes sense.

First scan of that code suggests that they won't be accessible though.

[image: obs_probe_alerts.png]

On Fri, 14 Feb 2020 at 06:48, Aaron Finney aaron.finney@openx.com wrote:

...
This issue appears to have been deleted from Jira. It also appears

that

...
multiple people are asking for this feature, which does exist on

other

...
platforms, and are all bring told that it's a stupid request.

Add me to the chorus of those who would benefit from this feature:

We want to use Observium as *the* platform for our corporate IT

team to see the status of our environment, and to generate

appropriate

...
alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a

rotating group

...
of MSP-provided technicians can easily ramp up to and work with.

We believe

...
Observium fits this use case perfectly. 3. There is a logical flow to this request:

Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if

it's down or

...
  latency is high
  4. The VPN tunnels between us and our SaaS partners terminate
in

...
  Google Cloud, so IPSLA is not an option
  5. Therefore, the logical conclusion is that if we want to
receive

...
  alerts when an endpoint on the far side of a VPN tunnel is not
reachable,

...
  we need it to be a device in Observium that uses ping only to
determine

...
  up/down/latency
I appreciate that people have a wide range of approaches to this
issue,

...
including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing.

We

...
just moved > 16k physical servers worth of compute, 7PB of storage,

and

...
multiple terabits of connectivity to GCP, all within a six-month

window.

...
Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which

includes

...
corporate IT systems. Everything will migrate to managed services

wherever

...
possible, operated by contracted L1 and L2 technicians. Everything

lives as

...
Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.

So at the risk of causing the maintainers to dig their heels in more,

I

...
would ask you to reconsider your position on this topic, and either

vet the

...
original poster's contributed patch, or consider adding a "no snmp"

flag in

...
a way that you're comfortable maintaining it.

Aaron

On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:

...
For the benefit of anyone on the list who doesn't use JIRA... and

also so

...
...
that others who support (want/need) this feature can comment.

https://jira.observium.org/browse/OBS-2925

Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have

hosts

...
...
that are ping only.

Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).

Tested:

Add/remove/rename ping only hosts via CLI

Add/remove ping only hosts via webUI

View/interact ping only hosts via webUI - SNMP specific

features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and

remain

...
...
available - remove skip SNMP box and SNMP polling begins again

Things I know kind of don't work right now:

Location override - poller/discovery doesn't seem to perform

geocoding and whatever else is happening there

Things that could be improved:

Unix Agent poller module etc is enabled by default for all

hosts,

...
...
for ping only hosts, perhaps it should be disabled by default?

Will improve

...
...
performance by reducing the number of processes that hang while

the 10s

...
...
default connect timeout happens.

Totally untested:

Use of autodiscovery SNMP skip - should work in theory, unsure

if

...
...
those parts of the patch should actually be used though. Some

people out

...
...
there may actually want to add anything that does respond to ping

and can

...
...
be found thru adjacency and routing protocol info etc??

Patch generated from recent trunk, touches files as below,

[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff

-Colin

Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Colin Stubbs

1:59 a.m.

No, I read that, but I'm not part of the development team nor can I be.

You should be taking that kind of prototype and feedback from a customer, defining how it should be really done, and delegating that to the development team.

If the JIRA ticket has actually been deleted, and now I've logged in a checked, it definitely appears to have been; that was clearly your choice to remove evidence of the request. It didn't need to be deleted, simply closed as fixed or rejected.

Hell it could even just be left as unresolved, another feature some annoying customer keeps asking you for, like all the others in there.

A good reference point when in future someone finally decides that it might be a feature worthy of developing.

Or, a ticket that someone can respond to and say "Hey, for the original reporter and anyone reading who wants this, did you know we've developed this feature that addresses this kind of problem but in a better way." as with the Nagios probes code that's now in there.

This is what I mean by lack of engagement.

Your seem to think your customers are annoying lepers who constantly bother you with demands to make the product do insane things.

Which is not the case... they're reporting legitimate problems that they need to solve. Solving those is valuable.

Observium as a product, and your business behind it, would go bananas if you simply addressed those core problems that people need.

-Colin

On Fri, 14 Feb 2020 at 16:44, Adam Armstrong via observium < observium@observium.org> wrote:

...

You ignored my reply on that ticket explaining how it'd need to be done if we were to accept a patch for it, so...

Adam.

Sent from BlueMail http://www.bluemail.me/r?b=15774 On 13 Feb 2020, at 23:31, Colin Stubbs via observium < observium@observium.org> wrote:

...
Yeah, that's somewhat common when Adam doesn't like something.

Or possibly it was another developer who is now enrolled in the same attitude/culture.

However, while I am critical of how requests like this are handled, and customer engagement (or the lack thereof) more generally; they have started to add a better way of handling this kind of situation recently.

They've added a feature to utilise Nagios probes, and while you do need to associate the probe config to an existing host in Observium for alert association purposes, you can override and pass whatever arguments you want to the probe.

This allows you to monitor other hosts in your network using methods and protocols that would otherwise not be supported by Observium's core code without major modifications. It also means you can easily develop your own probes to meet more bespoke requirements quite easily.

We're yet to test and use it properly, but it's looking like it's going to be a good move.

There still does not appear to be any documentation yet though, from my quick check right now.

So you'll likely need to read the code to understand how to use it at present.

JIRA tickets are likely your other best source of info, I had to log one to fix adding a second probe awhile ago: https://jira.observium.org/browse/OBS-3113?jql=text%20~%20%22probes%22

I'm guessing it's in Professional only at this point, but I havn't checked the community code so perhaps it's there too.

e.g. using check_fping to monitor a router by IP,

[image: obs_probe_menu.png]

[image: obs_probes.png]

It appears as though alerting may now exist for this too.

And contrary to screenshot below, from the look of the code the metrics seem to be based on KVP's that the nagios plugin returns, which makes sense.

First scan of that code suggests that they won't be accessible though.

[image: obs_probe_alerts.png]

On Fri, 14 Feb 2020 at 06:48, Aaron Finney < aaron.finney@openx.com> wrote:

...
This issue appears to have been deleted from Jira. It also appears that multiple people are asking for this feature, which does exist on other platforms, and are all bring told that it's a stupid request.

Add me to the chorus of those who would benefit from this feature:

We want to use Observium as *the* platform for our corporate IT

team to see the status of our environment, and to generate appropriate alerts 2. We have aggressively moved away from FTEs for our corporate IT staff, and need to provide a simplified environment that a rotating group of MSP-provided technicians can easily ramp up to and work with. We believe Observium fits this use case perfectly. 3. There is a logical flow to this request:

Observium is our sole monitoring and alerting platform for our entire corporate office infrastructure 2. Observium can only alert on metrics/events for devices 3. We have multiple SaaS partners we connect with over VPN with that we want to monitor for basic reachability, and alert if it's down or latency is high 4. The VPN tunnels between us and our SaaS partners terminate in Google Cloud, so IPSLA is not an option 5. Therefore, the logical conclusion is that if we want to receive alerts when an endpoint on the far side of a VPN tunnel is not reachable, we need it to be a device in Observium that uses ping only to determine up/down/latency

I appreciate that people have a wide range of approaches to this issue, including third-party applications running via cron with sendmail for alerts, but this is antithetical to how our IT landscape is changing. We just moved > 16k physical servers worth of compute, 7PB of storage, and multiple terabits of connectivity to GCP, all within a six-month window. Now that our production environment is migrated and our data centers decommissioned, we are cleaning up the rest of the mess, which includes corporate IT systems. Everything will migrate to managed services wherever possible, operated by contracted L1 and L2 technicians. Everything lives as Terraform code in github; in fact, nobody even has the permissions to instantiate resources manually.

So at the risk of causing the maintainers to dig their heels in more, I would ask you to reconsider your position on this topic, and either vet the original poster's contributed patch, or consider adding a "no snmp" flag in a way that you're comfortable maintaining it.

Aaron

On Sat, Mar 2, 2019 at 8:44 PM Colin Stubbs via observium < observium@observium.org> wrote:

...
For the benefit of anyone on the list who doesn't use JIRA... and also so that others who support (want/need) this feature can comment.

https://jira.observium.org/browse/OBS-2925

Attached patch defines OBS_SNMP_SKIP flag and uses snmp_skip device attribute, similar to OBS_PING_SKIP and ping_skip, in order to have hosts that are ping only.

Ping only hosts can still have the Observium Unix Agent installed (tested), and other poller modules such as IPMI enabled (untested).

Tested:

Add/remove/rename ping only hosts via CLI

Add/remove ping only hosts via webUI

View/interact ping only hosts via webUI - SNMP specific

features/menus etc are hidden while skip SNMP is enabled 4. Alerting - device_status equals 0 && device_status_type equals ping - will trigger alerts for host down/recovery events 5. Shifting a previously SNMP contactable host to ping only by ticking skip SNMP box - old SNMP graphs/etc are maintained and remain available - remove skip SNMP box and SNMP polling begins again

Things I know kind of don't work right now:

Location override - poller/discovery doesn't seem to perform

geocoding and whatever else is happening there

Things that could be improved:

Unix Agent poller module etc is enabled by default for all

hosts, for ping only hosts, perhaps it should be disabled by default? Will improve performance by reducing the number of processes that hang while the 10s default connect timeout happens.

Totally untested:

Use of autodiscovery SNMP skip - should work in theory, unsure

if those parts of the patch should actually be used though. Some people out there may actually want to add anything that does respond to ping and can be found thru adjacency and routing protocol info etc??

Patch generated from recent trunk, touches files as below,

[root@desktop observium]# diff -r -u observium-trunk root | grep -v ^Only > ping_only_hosts.diff [root@desktop observium]# cd root [root@desktop root]# svn status M add_device.php M html/pages/addhost.inc.php M html/pages/device/edit/device.inc.php M html/pages/device/edit.inc.php M html/pages/device/graphs.inc.php M html/pages/device/perf.inc.php M includes/config-variables.inc.php M includes/defaults.inc.php M includes/definitions.inc.php M includes/discovery/functions.inc.php M includes/functions.inc.php M includes/polling/functions.inc.php M poller.php M rename_device.php [root@desktop root]# svn info | grep ^Revision Revision: 9704 [root@desktop root]# [root@desktop observium]# mv ping_only_hosts.diff ping_only_hosts_r9704.diff

-Colin

Email: cstubbs @ gmail . com _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

--

*Aaron Finney*Infrastructure Engineering | OpenX 888 East Walnut Street, 2nd Floor | Pasadena, CA 91101 o: +1 (626) 466-1141 x6035 | aaron.finney@openx.com

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

2229

Age (days ago)

2580

Last active (days ago)

List overview

Download

13 comments

5 participants

tags (0)

participants (5)

Aaron Finney
Adam Armstrong
adama＠memetic.org
Colin Stubbs
Mike Stupalov