I'll be damned.

That was probably just pure luck but it alerted faster than expected.

Please see screenshot link displaying the alert mail, because they to look a tad odd: https://www.dropbox.com/s/5ynr67ior7at3e3/mailalert.png. It lasted for 5 minutes, or actually just a brief moment within the polling period, but look at the duration in the recovery email. Really odd. By the way, are there templates where we can configure the layout of the email alerts (subject, what it displays in body and more)?

Here is a screenshot of the graph created from the data in previous command. https://www.dropbox.com/s/nw47m75tj601gmh/csv_graph.png
So yes. There's spikes in the data and I guess they don't show up in data center expert because they run some filter at it or just ignore random peaks or something. 

The value of the peaks in the now collected data is ALWAYS 20480 which in some way feels significant since it's constant and feels like some sort of unit max value. Not sure if there's other higher peaks as well because what I see in Observium is higher or if the scale of the axis there is off, not sure but here's another screenshot. https://www.dropbox.com/s/0whwzf7wf34j8uu/observium_graph.png

But, feels like an APC issue then so guess I'll have the pleasure of once again dealing with that support. =/

Cheers and thanks for the help Adam. 


--
Henrik Cednert
cto | compositor

Filmlance International | www.filmlance.se
mobile [ + 46 (0)704 71 89 54 ]
skype  [ cednert ]

From: observium <observium-bounces@observium.org> on behalf of Henrik Cednert <henrik.cednert@filmlance.se>
Reply-To: Observium Network Observation System <observium@observium.org>
Date: Friday 10 June 2016 at 15:27
To: Observium Network Observation System <observium@observium.org>
Subject: Re: [Observium] Cool Output of APC in-row cooler, random enormous spikes.

Thanks Adam

Polling it every second now and storing to a log file for investigation with this. 

#!/bin/bash
while true; do 
        sleep 1 
        timestamp=$(date +"%T")
        sensor_value=$(/usr/bin/snmpbulkwalk -v2c -c 'public' -Pu -OQUs -m PowerNet-MIB -M /opt/observium/mibs/rfc:/opt/observium/mibs/net-snmp:/opt/observium/mibs/apc 'udp':'acrc02':'161' coolingUnitStatusAnalogValue.1.10)
        printf "$timestamp,$sensor_value\n" >>/home/neo/Documents/acrc02_coolingUnitStatusAnalogValue.1.10.txt
done


I'll be back and ask for advice when I get the next alert. =)

Cheers and thanks

--
Henrik Cednert
cto | compositor

Filmlance International | www.filmlance.se
mobile [ + 46 (0)704 71 89 54 ]
skype  [ cednert ]

From: observium <observium-bounces@observium.org> on behalf of Adam Armstrong <adama@memetic.org>
Reply-To: Observium Network Observation System <observium@observium.org>
Date: Friday 10 June 2016 at 14:45
To: "observium@observium.org" <observium@observium.org>
Subject: Re: [Observium] Cool Output of APC in-row cooler, random enormous spikes.

Hi Henrik,

Sensors are all "gauges", so there's no real concept of spikes the way there is for ports (which are counters).

If we're getting a spike, it's either that the device is sending the wrong data, or that the device is sending a number which hasn't been adjusted for the scale the device uses. It'll be the first if the 1.6-1.9 doesn't have any relation to the "real" number, and the latter if it's just off by a few orders of magnitude.

It's pretty unlikely that we could do any bad maths, maths is easy, and such bugs would present much more regularly. These sorts of things are pretty common on many vendor's SNMP implementations. You might be able to see it in action by frequently polling the sensor in question's OID and logging the value.

adam.

Sent from Mailbird

On 10/06/2016 13:40:59, Henrik Cednert (Filmlance) <henrik.cednert@filmlance.se> wrote:

Hello

Monitoring two APC in-row coolers. One of them gets some weird 1.6-1.9M watts spikes every now and then. Once a day or such, without regularity. I have started to monitor the same device in Schneider Data Center Expert just to see if it's a device or monitoring issue. There's no spikes there so I do think it's something with Observium and this particular unit. The other unit is fine and no spikes there. Only difference is that the load on this one is smaller and at times down to 0.

I do see that there's an option to log spikes in the config file. But not sure I can monitor a sensor with it since it says port and wants and ID. Can I debug these spikes in some way?

Cheers and thanks


--
Henrik Cednert
cto | compositor

Filmlance International | www.filmlance.se
mobile [ + 46 (0)704 71 89 54 ]
skype [ cednert ]



_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium