Hi Richard,

 

We are also using some IOS-XR devices.

Do you see sometimes empty spaces within your traffic graphs for the specific time when the thresholds are set to NULL or anything changed?

 

Mit freundlichem Gruß

Stadtwerke Norderstedt

 

Denis Klimek

 

Professional Network Engineer

IP-Systemtechnik

 

Tel:        040 / 521 04 – 1049

Mobil:    0151 / 652 219 06

 

dklimek@stadtwerke-norderstedt.de

www.stadtwerke-norderstedt.de

 

Von: observium [mailto:observium-bounces@observium.org] Im Auftrag von Richard Savage via observium
Gesendet: Montag, 24. Juni 2019 10:51
An: Observium
Cc: Richard Savage; Adam Armstrong
Betreff: Re: [Observium] Sensor limits creating false alerts

 

This was the event log at the time of the incident:


When it happens next is there any worthwhile in me running some commands from the CLI to get any debug?

Thanks

Richard

On 24/06/2019 09:47, Adam Armstrong via observium wrote:

Thresholds for this MIB come from the device itself. It seems likely that the device returned an incorrect threshold setting for some reason. 

 

Does it always change the threshold to the same value? Does that value match the thresholds of any other similar sensors?

 

We wouldn't autogenerate a threshold where one already exists.

 

adam.

On 2019-06-24 09:39:59, Richard Savage via observium <observium@observium.org> wrote:

Hi Adam

Ive managed to get the issue to re-create.  You can see from the screenshot that the alert triggered because the threshold value disappeared @08:24 on 18 June.  What happened is the threshold changed to 0V - (∞) 2.76V  This meant that the current value of 3.25 threw an alert.




Later on that day @14:35 on 18 June with no action from anyone the alert cleared and the sensor thresholds have returned to the correct values:






This is just 1 example of 1 sensor but at the time I had pretty much alerts for all sensors on the ASR.

Can this please be looked into and fixed?

If you need any more info then please let me know.

Thanks

Richard


On 31/05/2019 10:39, Klimek, Denis via observium wrote:

Hi all,

 

I think it’s related to the problem that the device is NOT returning any values for limits. So Observium is creating the limits dynamically based on the current sensor value that the device returned.

If the value is changing i.e. voltage, temperatures. We see also a lot of sensor updates in the eventlogs.

 

Tested with latest 6.3.3 and 6.5.3 on Cisco ASR.

 

Mit freundlichem Gruß

Stadtwerke Norderstedt

 

Denis Klimek

 

Professional Network Engineer

IP-Systemtechnik

 

Tel:        040 / 521 04 – 1049

Mobil:    0151 / 652 219 06

 

dklimek@stadtwerke-norderstedt.de

www.stadtwerke-norderstedt.de

 

Von: observium [mailto:observium-bounces@observium.org] Im Auftrag von Adam Armstrong via observium
Gesendet: Donnerstag, 30. Mai 2019 11:55
An: Richard Savage via observium
Cc: Adam Armstrong
Betreff: Re: [Observium] Sensor limits creating false alerts

 

These limits are likely being updated because the device has returned a different set of limits.

 

We'd need to know where these sensors came from to know.

 

adam.

On 2019-05-30 09:07:45, Richard Savage via observium <observium@observium.org> wrote:

From what I can see the sensor limits keep being updated on each poll.  Surely they should just be a static value and not need to be changed??



Thanks

Richard

 

On 29/05/2019 20:22, Richard Savage via observium wrote:

Ill try and get a screenshot next time it happens of the sensors, as it shows that the sensor value is still there in terms of the actual voltage, but the threshold limit isn’t there at all and just shows 0 or the infinity symbol from memory

 

Richard




On 29 May 2019, at 20:16, Richard Savage via observium <observium@observium.org> wrote:

 

Its a 9006 running 5.3.4 with 2 x A9K-RSP440-SE with some A9K-40GE-L and  A9K-16T/8-B line cards in

 

Thanks

 

Richard




On 29 May 2019, at 19:24, Klimek, Denis <DKlimek@Stadtwerke-Norderstedt.de> wrote:

 

Hi Richard,

 

which ASR9K model do you have and which IOS-XR version are you running?

 

Regards,

Denis

 

Von: observium <observium-bounces@observium.org> Im Auftrag von Richard Savage via observium
Gesendet: Mittwoch, 29. Mai 2019 17:16
An: Observium <observium@observium.org>
Cc: Richard Savage <richard@zananet.com>; Adam Armstrong <adama@memetic.org>
Betreff: Re: [Observium] Sensor limits creating false alerts

 

It seems to have become more of an issue on the ASR9K's in the v19.5.9916 that we are running.

Its happening more frequently now and is causing lots of false positives.  Is it possible once the threshold are polled, then just set them in stone?  I cant see any reason why the thresholds need to change?  For example a Voltage should always be between 2 values and those threshold values should change?  From what I can see its the actual limits that disappear and not the values.

Thanks

Richard

 

On 29/05/2019 14:55, Adam Armstrong via observium wrote:

Ahh ok.

 

I assume this is related to unstable polling. These are all from CISCO-ENTITY-SENSOR-MIB, which is pretty heavy, and Nexus has iffy SNMP.

 

I'm not sure how we can work around this. It's hard to know if something is not there forever, or just for now, and storing state on that stuff is pretty messy.

 

Is this is a more recent thing, or has it always happened? 

 

Disabling faster polling modes on a device like this would probably make it unpollable.

 

adam.

On 2019-05-29 14:52:14, Richard Savage via observium <observium@observium.org> wrote:

Hi Adam

This is mainly seen on Cisco ASR9K's but we have also seen this on Cisco Nexus switches as well.

No this affects Voltage, Temperature, and dbm levels on SFP's

Happy to provide debug if required.




Those are a few examples.  It seems to be a mass load of sensors at any one time.

Thanks

Richard

On 29/05/2019 14:46, Adam Armstrong via observium wrote:

Hi Richard,

 

I assume this is because sometimes the OIDs from which the limits come aren't polled sometimes?

 

Is this only a single (os) device? Only a single type of sensor, or multiple? (you should now be able to see the mib/oid on the sensor page)

 

adam.

On 2019-05-28 10:24:00, Richard Savage via observium <observium@observium.org> wrote:

Hi All

just wonder if anyone has any ideas on this? (Mike / Adam)

Thanks

Richard

 

On 21/05/2019 14:50, Richard Savage via observium wrote:

Hi All


I have found a bit of a potential bug with the sensor data output which doesnt seem to be related to any particular version of Observium
or device being polled (useful I know!) 

However it looks like some sensors limits are being updated to NULL regularly, which is throwing alerts and scaring staff ;-)
Noticed on both 19.3.9774 and 19.4.9840.

e.g from a device event log:

2019-05-10 12:36:07

Sensor updated (limits): limit_high -> "0", limit_high_warn -> "NULL", limit_low -> "0", limit_low_warn -> "NULL"

 

2019-05-10 13:08:08

Dbm Ethernet1/49 Lane 2 Transceiver Transmit Power above threshold: 0.03 dBm (> 0 dBm)



A re-discovery of the affected device will fix the issue, until the next time it happens.

Happy to provide any further debug required.

Thanks

Richard





_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

 





_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

 





_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

 

 

_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

 




_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

 



_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

 



_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium