Change alert checkers/method to make alerts more flexible?
![](https://secure.gravatar.com/avatar/d162d143bb5df883cdb80b06ab1a6944.jpg?s=120&d=mm&r=g)
Me again. :) I love Observium, but the alerts are a bit of a pain point for me. Admittedly I don't have full, in-depth knowledge of the alerting mechanism but I was wondering why the check conditions and entities are separate. If it were possible (without completely rewriting the whole alert code) to have the entities be a part of the check conditions then that would, by design, allow us to set multiple check conditions on different entities within the same checker. See attached, I realize this doesn't work but this might help explain what I'm wanting. Since there's no entity for active used memory it's very hard to create actionable alerts for high memory utilization until it's too late and the host(s) start to swap. If the check conditions had metric, condition, value, and entity together then I could create such an alert by checking for physical memory used >= 95% *and* cached memory <= 10% (and... if so desired). On the other hand, if the metrics included the entity then there would be no need to have the entities at all? mempool_physical_perc, mempool_cached_perc, etc. I have high hopes for Observium and I'm trying to do what I can to tweak it to do what we need, but my PHP experience is limited. If I can manage to find a resource internally here at work to help out then hopefully we can contribute something back to the community if it's deemed worthy. Just some thoughts. Thanks for all the time and effort being put into Observium! -Hogan
![](https://secure.gravatar.com/avatar/0fa97865a0e1ab36152b6b2299eedb49.jpg?s=120&d=mm&r=g)
Hi,
This is a longstanding problem with collecting memory usage from UNIX hosts.
I think the best solution is to add an additional memory pool to UNIX devices which returns a "processed" value with actual used only. We did that with processors on UNIX hosts, where we return an "average" value, which is the only one you should really alert on.
Adam. On 24/02/2015 22:39:07, Hogan Whittall whittalh@yahoo-inc.com wrote: Me again. :) I love Observium, but the alerts are a bit of a pain point for me. Admittedly I don't have full, in-depth knowledge of the alerting mechanism but I was wondering why the check conditions and entities are separate. If it were possible (without completely rewriting the whole alert code) to have the entities be a part of the check conditions then that would, by design, allow us to set multiple check conditions on different entities within the same checker. See attached, I realize this doesn't work but this might help explain what I'm wanting. Since there's no entity for active used memory it's very hard to create actionable alerts for high memory utilization until it's too late and the host(s) start to swap. If the check conditions had metric, condition, value, and entity together then I could create such an alert by checking for physical memory used >= 95% *and* cached memory <= 10% (and... if so desired).
On the other hand, if the metrics included the entity then there would be no need to have the entities at all? mempool_physical_perc, mempool_cached_perc, etc.
I have high hopes for Observium and I'm trying to do what I can to tweak it to do what we need, but my PHP experience is limited. If I can manage to find a resource internally here at work to help out then hopefully we can contribute something back to the community if it's deemed worthy.
Just some thoughts. Thanks for all the time and effort being put into Observium!
-Hogan
![](https://secure.gravatar.com/avatar/d162d143bb5df883cdb80b06ab1a6944.jpg?s=120&d=mm&r=g)
It may have been an issue at some point, but there is data via SNMP that can be used without any processing. UCD-SNMP-MIB::memTotalFree.0 = INTEGER: 12890168 kB
This appears to be very close to (used - (free + free cache)) and if this could be alerted on by setting a minimum total free % then that would be huge. I've looked at this value on a few of my systems, some with high memory pressure and some with low, and this value is a much better representation of memory pressure than the processed "used" value that shows up in the Observium graphs. In fact, on a host that has swapped out 1.7GB the Observium graphs that process the memory info say that only 9% of the memory is used, but obviously there is more going on due to the amount of swap. On the host that has swapped, the SNMP memTotalFree number is roughly 6.9% which falls in line with the point where Linux can start to swap based on default sysctl settings and the guidelines that I pass on to our teams that they should keep 10% memory free for OS overhead otherwise the systems are likely to swap and hurt performance. If I could alert on "TotalFree < 10%" then that would likely catch hosts that are about to start swapping, or at the very least be more meaningful than alerting on the current Physical used % or on swap usage since by that point it becomes a reactive alert and not proactive. -Hogan
On Tuesday, February 24, 2015 1:07 PM, Adam Armstrong adama@memetic.org wrote:
Hi, This is a longstanding problem with collecting memory usage from UNIX hosts. I think the best solution is to add an additional memory pool to UNIX devices which returns a "processed" value with actual used only. We did that with processors on UNIX hosts, where we return an "average" value, which is the only one you should really alert on. Adam. On 24/02/2015 22:39:07, Hogan Whittall whittalh@yahoo-inc.com wrote:Me again. :) I love Observium, but the alerts are a bit of a pain point for me. Admittedly I don't have full, in-depth knowledge of the alerting mechanism but I was wondering why the check conditions and entities are separate. If it were possible (without completely rewriting the whole alert code) to have the entities be a part of the check conditions then that would, by design, allow us to set multiple check conditions on different entities within the same checker. See attached, I realize this doesn't work but this might help explain what I'm wanting. Since there's no entity for active used memory it's very hard to create actionable alerts for high memory utilization until it's too late and the host(s) start to swap. If the check conditions had metric, condition, value, and entity together then I could create such an alert by checking for physical memory used >= 95% *and* cached memory <= 10% (and... if so desired). On the other hand, if the metrics included the entity then there would be no need to have the entities at all? mempool_physical_perc, mempool_cached_perc, etc. I have high hopes for Observium and I'm trying to do what I can to tweak it to do what we need, but my PHP experience is limited. If I can manage to find a resource internally here at work to help out then hopefully we can contribute something back to the community if it's deemed worthy. Just some thoughts. Thanks for all the time and effort being put into Observium! -Hogan
participants (2)
-
Adam Armstrong
-
Hogan Whittall