Heya,
Hm, I had missed the screenshot due to reading the original thread on mobile. Sorry :-) Those are indeed the status indicators from the synology mib.
Ok so...
Entity type: Status
Device match: os equals synology
Entity match: status_descr match Disk*
Checker condition: status_event notequals ok
Should be good to go.
You can totally ignore the physical class stuff :-)
Tom
On 15/01/2016 23:35, Henrik Cednert (Filmlance) wrote:
Hi Tom
I'm not sure how to display the issue in a clearer way than with the screenshots I've already provided. But let's try.
And yes, the global issue is exactly that, get an alert when one disk dies, or **** up. But hey, let's make an alert to report all disks that has 'status' 'normal' since that's easier to test at the moment...
From my limited understanding of how observium works under the hood I thought this would be possible. New picture attached, not sure if it makes it clearer though.
1. When a disk fails its graph and "normal" label disappeared in UI. Disk still listed though.2. When disk is replaced with a healthy one, graph and status "Normal" is back in UI.3. When looking at the graphs of the event it for sure looks like "0" has been logged for the 'status' for a period of the failure.4. The MIB's in use are seen on the image too, SYNOLOGY-DISK-MIB5. In the big graph at the bottom physical class is listed as 'storage' and 'event' ok and 'status' normal.
What I want is to build an alert group to alert when 'event' in point 5 above ISN'T ok or 'status' ISN'T normal. And from what I understand it loggs 1 for 'normal' and 0 for not normal.
One thing I have noticed is that on the Synology NAS the 'physical class' is 'storage' for individual disks while it's 'hrDeviceDiskStorage' for our DDN NAS, but maybe that's not significant?
Not sure if this made it any clearer?
Cheers and thanks.
--Henrik Cednertcto | td | compositor
Filmlance InternationalDirect + 46 (0)704 71 89 54
From: observium <observium-bounces@observium.org> on behalf of Tom Laermans <tom.laermans@powersource.cx>
Reply-To: Observium Network Observation System <observium@observium.org>
Date: Friday 15 January 2016 at 22:55
To: Observium Network Observation System <observium@observium.org>
Subject: Re: [Observium] [ALERT-CHECKER] Alert to pick up on physical failure of disk.
We're all a bit lost - I don't think anyone 100% understands what you're talking about specifically (the global issue is clear: get an alert when one of your disks **** up).
Does your synology not support SYNOLOGY-DISK-MIB ? If it does, we have a boatload of per-disk status indicators on the main page you can alert on. If it doesn't, you may need to update its OS...
Tom
On 15/01/2016 22:42, Henrik Cednert (Filmlance) wrote:
Sorry. I feel lost... So the fact that it's NOT returning 1 isn't good enough to act on?
--Henrik Cednertcto | td | compositor
Filmlance InternationalDirect + 46 (0)704 71 89 54
From: observium <observium-bounces@observium.org> on behalf of Adam Armstrong <adama@memetic.org>
Reply-To: Observium Network Observation System <observium@observium.org>
Date: Friday 15 January 2016 at 22:38
To: "observium@observium.org" <observium@observium.org>
Subject: Re: [Observium] [ALERT-CHECKER] Alert to pick up on physical failure of disk.
In this case, I'm not sure what the status entry will be returning. You kinda need this information to make sure the check would pick the failure up :)
adam.
Sent from MailbirdOn 15/01/2016 22:36:01, Henrik Cednert (Filmlance) <henrik.cednert@filmlance.se> wrote:
I assume it did since there's a line at the 0 on attached graph. It's not there for the entire failure but at the start up until the morning of me replacing the disk. Since the disk is replaced and status is bak to 1 (ok/normal) I don't think the command will help me/us now. =/
--Henrik Cednertcto | td | compositor
Filmlance InternationalDirect + 46 (0)704 71 89 54
From: observium <observium-bounces@observium.org> on behalf of Adam Armstrong <adama@memetic.org>
Reply-To: Observium Network Observation System <observium@observium.org>
Date: Friday 15 January 2016 at 22:30
To: "observium@observium.org" <observium@observium.org>
Subject: Re: [Observium] [ALERT-CHECKER] Alert to pick up on physical failure of disk.
If it's writing RRDs, it should be calling the alerting code.
You can test this by running the poller in debug :
./poller.php -h <host> -m status -d -r
(-r disables rrd writing, so you don't dirty up the rrd files)
adam.
Sent from MailbirdOn 15/01/2016 22:24:38, Henrik Cednert (Filmlance) <henrik.cednert@filmlance.se> wrote:
Mkay. =/ So even if the "Status" variable/data/entry/cell (or what it is) in this particular case and for this disk is storing and logging "0" into the database and the rrd files, it's nothing we can alert or react on? =/
--Henrik Cednertcto | td | compositor
Filmlance InternationalDirect + 46 (0)704 71 89 54
From: observium <observium-bounces@observium.org> on behalf of Adam Armstrong <adama@memetic.org>
Reply-To: Observium Network Observation System <observium@observium.org>
Date: Friday 15 January 2016 at 22:13
To: "observium@observium.org" <observium@observium.org>
Subject: Re: [Observium] [ALERT-CHECKER] Alert to pick up on physical failure of disk.
The issue with this is that our alerting code is called during the polling process, so it's only called for entities which exist.
adam.
Sent from MailbirdOn 15/01/2016 22:11:12, Henrik Cednert (Filmlance) <henrik.cednert@filmlance.se> wrote:
But you do log status for it. And I assume everything is and/or could be watchable. So, maybe a bit ignorant, but isn't it "just" to monitor "status" as seen on my first screenshot? When != 1, alert...? I mean, there's still data in observium that it in theory could react on.
--Henrik Cednertcto | td | compositor
Filmlance InternationalDirect + 46 (0)704 71 89 54
From: observium <observium-bounces@observium.org> on behalf of Adam Armstrong <adama@memetic.org>
Reply-To: Observium Network Observation System <observium@observium.org>
Date: Friday 15 January 2016 at 22:08
To: "observium@observium.org" <observium@observium.org>
Subject: Re: [Observium] [ALERT-CHECKER] Alert to pick up on physical failure of disk.
This means that the device stops reporting stats for that entity, which is a little... unfortunate.
It's difficult (impossible) to alert on things which are removed from the device upon failure.
adam.
Sent from MailbirdOn 15/01/2016 20:22:56, Henrik Cednert (Filmlance) <henrik.cednert@filmlance.se> wrote:
Hi Adam
Thanks. The thing is, I already have that alert and it doesn't pick up on this particular event. Not sure it's the way the Synology handles it or what it is. But when a disk dies graphs and all just goes missing, see disk 12 on screenshot.
When looking in the Synology UI at it when a disk is dead it's pretty much the same. Not flagged or tagged as failed, just removed from the lists. Pretty stupid, yes I know but non the less I have to deal with it. =/ Hence the q about alerting on status != 0. =)
Cheers
--Henrik Cednertcto | td | compositor
Filmlance InternationalDirect + 46 (0)704 71 89 54
From: observium <observium-bounces@observium.org> on behalf of Adam Armstrong <adama@memetic.org>
Reply-To: Observium Network Observation System <observium@observium.org>
Date: Friday 15 January 2016 at 20:03
To: "observium@observium.org" <observium@observium.org>
Subject: Re: [Observium] [ALERT-CHECKER] Alert to pick up on physical failure of disk.
Hi Henrik,
This is a "status" entity check. You probably just want to create a check for all status, since it's easier to manage.
http://alpha.memetic.org/~adama/snaps/Observium_Dev____Alert_check_-_Google_Chrome_2016-01-15_19.58.56.pnghttp://alpha.memetic.org/~adama/snaps/Observium_Dev____Alert_check_-_Google_Chrome_2016-01-15_19.59.17.png
adam.
Sent from MailbirdOn 15/01/2016 18:31:02, Henrik Cednert (Filmlance) <henrik.cednert@filmlance.se> wrote:
Hi there
Have gotten some hints at IRC but can't really wrap my head around on how to set it up. Yeah, I know I'm probably stupid. But I can't find a complete list of all commands available and it doesn't complain if I feed it something invalid. So it's really a guessing game. =/
So the checker needs to do something like:
if status of physical_class('storage' or 'hrDeviceDiskStorage') != 1
send alert
Possible? Please give detailed instructions if possible.
Would be sweet if all different possible combinations of alert checkers was added to the demo instance so once can look there for guidance. =)
Cheers and thanks
--
Henrik Cednert
cto | td | compositor
Filmlance International
Direct + 46 (0)704 71 89 54
_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.orghttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.orghttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium