Dear Mike,
I can confirm that I didn't have the issue with NetIron 5.7 and 5.8. It started when I upgraded Observium to latest stable.
Is that something you can correct on your end (as Adam proposed previously) ? Or do I have to go through the painful process of opening a case with BTAC and try convincing them about the issue ?
Best regards.
Le 10 août 2017 à 11:29, Mike Stupalov mike@observium.org a écrit :
Yah,
I see trouble..
and answer (as always for brocade) - I think issue in firmware.
How polling sensors works (after some our changes for polling speedup):
- fetch list of all sensors numeric oids from DB
- try to pre-cache this oids with snmpget multiple oid by chunks of 16 oids
- in sensors/status poll process, check if oid cached use it, if not -
try to snmpget current single oid. 4. process each sensor/status value
Ok, what happened on your devices:
- first 3 chunks (16 oids each) cached normally - total 48 oids
- all other chunks not fetched, because snmpget exit with timeout error:
CMD[/usr/bin/snmpget -v2c -c *** -Pu -OQUsn -M /opt/observium/mibs 'udp':'xxx':'161' .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.3 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.4 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.65 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.66 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.67 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.68 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.129 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.130 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.131 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.132 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.133 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.134 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.135 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.136 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.139 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.141]
CMD EXITCODE[1] CMD RUNTIME[6.021s] STDOUT[
] STDERR[ Timeout: No Response from udp:xxx:161. ]
- And later in sensor poll process this oids also can't be fetch, with
same timeout error:
CMD[/usr/bin/snmpget -v2c -c *** -Pu -OQv -m SNMPv2-MIB -M /opt/observium/mibs/rfc:/opt/observium/mibs/net-snmp 'udp':'xxx':'161' .1.3.6.1.4.1.1991.1.1.3.3.6.1.4.1]
CMD EXITCODE[1] CMD RUNTIME[6.0108s] STDOUT[
] STDERR[ Timeout: No Response from udp:xxx:161. ] SNMP STATUS[FALSE] SNMP ERROR[#1002 - Request timeout]
I not found for self devices with same firmware (5.8.x), but on 5.7.x and older I not see this issue..
Youssef BENGELLOUN - ZAHR wrote:
Dear Mike,
Because of mail size limitation, I sent those to Adam directly.
I will MP you with links and credentials to DL them from a secure plateform.
Best regards.
P.S : Regarding mail signatures, nothing I can do as it’s controlled by our corp IT.
Le 09/08/2017 22:23, « observium au nom de Mike Stupalov » <observium-bounces@observium.org au nom de mike@observium.org> a écrit :
I'm not see debug output for this device (in current thread).
Please attach debug for device polling:
./poller.php -d -h <device>
Pls keep full output (not just some parts).
P.S. This is possible to use mail signature without nested images? This complicates the search for mails with attachments.
Youssef BENGELLOUN - ZAHR wrote:
Dear Adam,
Does release 8709 have anything to do with issue ? r8709 | adama | 2017-08-06 22:44:53 +0200 (Sun, 06 Aug 2017) | 2 lines [IMPROVE] Improve sensor status table entry Best regards.
Le 20 juil. 2017 à 11:58, Adam Armstrong <adama@memetic.org mailto:adama@memetic.org> a écrit :
We will probably put in an is definition toggle to disable that on some oses.
I'll see when Mike gets back from the dark depths of central Russia :)
Adam.
Sent from BlueMail http://www.bluemail.me/r?b=10066
*Youssef BENGELLOUN - ZAHR* - Consultant Expert Prodware France T : +33 979 999 000 - F : +33 988 814 001 - ybzahr@prodware.fr
mailto:ybzahr@prodware.fr
Web : prodware.fr http://www.prodware.fr http://twitter.com/Prodware/ http://www.facebook.com/Prodware/ https://www.linkedin.com/company/prodwarefrance https://www.youtube.com/c/ProdwareFrance http://www.viadeo.com/fr/company/prodware http://www.prodware.fr/social-network/
On 20 Jul 2017, at 11:48, Youssef BENGELLOUN - ZAHR <ybzahr@prodware.fr mailto:ybzahr@prodware.fr> wrote:
Is that something you can correct ?
Best regards.
Le 20 juil. 2017 à 12:46, Adam Armstrong < adama@memetic.org mailto:adama@memetic.org> a écrit :
Yeah. We now use a single get request to pull all sensor data. It seems these queries are timing out on that device.
So it's just sitting idle whilst timing out, no CPU impact.
Adam.
Sent from BlueMail http://www.bluemail.me/r?b=10066
*Youssef BENGELLOUN - ZAHR* - Consultant Expert Prodware France T : +33 979 999 000 - F : +33 988 814 001 - ybzahr@prodware.fr
mailto:ybzahr@prodware.fr
Web : prodware.fr http://www.prodware.fr http://twitter.com/Prodware/ http://www.facebook.com/Prodware/ https://www.linkedin.com/company/prodwarefrance https://www.youtube.com/c/ProdwareFrance http://www.viadeo.com/fr/company/prodware http://www.prodware.fr/social-network/
On 20 Jul 2017, at 06:13, Youssef BENGELLOUN - ZAHR < ybzahr@prodware.fr mailto:ybzahr@prodware.fr> wrote:
Ok, good to hear that. Is this behavior related to something you changed between the last two merges for stable train code ? Best regards. Le 20 juil. 2017 à 00:37, Adam Armstrong < adama@memetic.org <mailto:adama@memetic.org>> a écrit :
Seems to be getting some timeouts when trying to do snmpgets. We might need to limit this on the brocades somehow. Adam. Sent from BlueMail <http://www.bluemail.me/r?b=10066>
*Youssef BENGELLOUN - ZAHR* - Consultant Expert Prodware France T : +33 979 999 000 - F : +33 988 814 001 - ybzahr@prodware.fr <mailto:ybzahr@prodware.fr> ------------------------------------------------------------------------ Web : prodware.fr <http://www.prodware.fr> <http://twitter.com/Prodware/> <http://www.facebook.com/Prodware/> <https://www.linkedin.com/company/prodwarefrance> <https://www.youtube.com/c/ProdwareFrance> <http://www.viadeo.com/fr/company/prodware> <http://www.prodware.fr/social-network/>
On 19 Jul 2017, at 12:46, Youssef BENGELLOUN - ZAHR < ybzahr@prodware.fr <mailto:ybzahr@prodware.fr>> wrote: I'm sending it directly to you as I'm hitting mail size limitation. Best regards. *Youssef BENGELLOUN - ZAHR* - Consultant Expert Prodware France T : +33 979 999 000 - F : +33 988 814 001 - ybzahr@prodware.fr <mailto:ybzahr@prodware.fr> ------------------------------------------------------------------------ Web : prodware.fr <http://www.prodware.fr> <http://twitter.com/Prodware/> <http://www.facebook.com/Prodware/> <https://www.linkedin.com/company/prodwarefrance> <https://www.youtube.com/c/ProdwareFrance> <http://www.viadeo.com/fr/company/prodware> <http://www.prodware.fr/social-network/> ------------------------------------------------------------------------ *De :* observium <observium-bounces@observium.org <mailto:observium-bounces@observium.org>> de la part de Adam Armstrong <adama@memetic.org <mailto:adama@memetic.org>> *Envoyé :* mercredi 19 juillet 2017 13:29 *À :* 'Observium' *Cc :* 'Observium' *Objet :* Re: [Observium] Important polling time increase since upgrade to r8697 Lol, that isn't a full poller debug, it's just the device being marked down and then the poller exiting :D Adam. Sent from BlueMail <http://www.bluemail.me/r?b=10066> On 19 Jul 2017, at 10:48, Youssef BENGELLOUN - ZAHR < ybzahr@prodware.fr <mailto:ybzahr@prodware.fr>> wrote: Hi, Apparently, attachments were too big. Retrying with only one of the devices. Best regards. *Youssef BENGELLOUN - ZAHR* - Consultant Expert Prodware France T : +33 979 999 000 - F : +33 988 814 001 - ybzahr@prodware.fr <mailto:ybzahr@prodware.fr> ------------------------------------------------------------------------ Web : prodware.fr <http://www.prodware.fr> <http://twitter.com/Prodware/> <http://www.facebook.com/Prodware/> <https://www.linkedin.com/company/prodwarefrance> <https://www.youtube.com/c/ProdwareFrance> <http://www.viadeo.com/fr/company/prodware> <http://www.prodware.fr/social-network/> *De : *Youssef BENGELLOUN - ZAHR <ybzahr@prodware.fr <mailto:ybzahr@prodware.fr>> *Date : *mercredi 19 juillet 2017 à 11:38 *À : *"observium@observium.org <mailto:observium@observium.org>" <observium@observium.org <mailto:observium@observium.org>> *Objet : *RE: [Observium] Important polling time increase since upgrade to r8697 Dear Tom, Please find a full poller debug for a CER-RT and an MLXe. Best regards. ------------------------------------------------------------------------ *De :*observium <observium-bounces@observium.org <mailto:observium-bounces@observium.org>> de la part de Tom Laermans <tom.laermans@powersource.cx <mailto:tom.laermans@powersource.cx>> *Envoyé :* mercredi 19 juillet 2017 11:28 *À :* observium@observium.org <mailto:observium@observium.org> *Objet :* Re: [Observium] Important polling time increase since upgrade to r8697 Hi Youssef, Run the poller with -d on one of the devices and send the output. If you add "-m sensors" it'll only poll the sensors. With regards to the SNMP errors, it does look like plenty of them are from a long time ago. Seems we don't have a housekeeping module for that yet... Tom On 07/19/2017 11:22 AM, Youssef BENGELLOUN - ZAHR wrote: Dear Adam, How can I do that ? Also, I’m seeing tons of SNMP errors under Performance Data > MIBs. See attached file for a CER-RT example. Best regards. *Youssef BENGELLOUN - ZAHR*- Consultant Expert Prodware France T : +33 979 999 000 - F : +33 988 814 001 - ybzahr@prodware.fr <mailto:ybzahr@prodware.fr> ------------------------------------------------------------------------ Web : prodware.fr <http://www.prodware.fr> <http://twitter.com/Prodware/> <http://www.facebook.com/Prodware/> <https://www.linkedin.com/company/prodwarefrance> <https://www.youtube.com/c/ProdwareFrance> <http://www.viadeo.com/fr/company/prodware> <http://www.prodware.fr/social-network/> *De : *observium <observium-bounces@observium.org> <mailto:observium-bounces@observium.org> au nom de Adam Armstrong <adama@memetic.org> <mailto:adama@memetic.org> *Répondre à : *Observium <observium@observium.org> <mailto:observium@observium.org> *Date : *mercredi 19 juillet 2017 à 11:11 *À : *'Observium' <observium@observium.org> <mailto:observium@observium.org> *Objet : *Re: [Observium] Important polling time increase since upgrade to r8697 How odd. Seems to be something that only affects one SNMP stack. Can you see what method it's using to poll these sensors? Adam. Sent from BlueMail <http://www.bluemail.me/r?b=10066> On 19 Jul 2017, at 08:08, Youssef BENGELLOUN - ZAHR <ybzahr@prodware.fr <mailto:ybzahr@prodware.fr>> wrote: Dear Adam, Previously installed was r8580 on stable train code. No configuration changes or versions upgrades happened in the last weeks. We only upgraded Obserivum. As for devices that polling time has increased, I have clearly identified 4 devices acting as MPLS PEs : · 3 Brocade MLXe routers sitting in different cities : Amsterdam : Paris 1 : Paris 2 : · 1 Brocade CER-RT sitting in Frankfurt : Looking at poller module stats, I can clearly see BGP and sensors modules are the most time consuming. Sensors is even more time consuming than BGP now. For example, in Frankfurt : When did sensors become so time consuming ? Best regards. *Youssef BENGELLOUN - ZAHR*- Consultant Expert Prodware France T : +33 979 999 000 - F : +33 988 814 001 - ybzahr@prodware.fr <mailto:ybzahr@prodware.fr> ------------------------------------------------------------------------ Web : prodware.fr <http://www.prodware.fr> <http://twitter.com/Prodware/> <http://www.facebook.com/Prodware/> <https://www.linkedin.com/company/prodwarefrance> <https://www.youtube.com/c/ProdwareFrance> <http://www.viadeo.com/fr/company/prodware> <http://www.prodware.fr/social-network/> *De : *observium <observium-bounces@observium.org> <mailto:observium-bounces@observium.org> au nom de Adam Armstrong <adama@observium.org> <mailto:adama@observium.org> *Répondre à : *Observium <observium@observium.org> <mailto:observium@observium.org> *Date : *mardi 18 juillet 2017 à 19:39 *À : *"observium@observium.org" <mailto:observium@observium.org> <observium@observium.org> <mailto:observium@observium.org> *Objet : *Re: [Observium] Important polling time increase since upgrade to r8697 Can you see any particular device which has increased? What was the previous version? adam. On 18/07/2017 08:23:37, Youssef BENGELLOUN - ZAHR <ybzahr@prodware.fr> <mailto:ybzahr@prodware.fr> wrote: Dear Observium community, I don’t know if I’m the only one noticing this, but polling cycle time has increased 2x fold since I upgraded to r8697 yesterday around 9AM : I used to be around the 120-150s average, now it’s up to 270-300s. As you can, no devices or pollers were added. From a system perspective, the box running observium is fine CPU / mem wize). Best regards. *Youssef BENGELLOUN - ZAHR*- Consultant Expert Prodware France T : +33 979 999 000 - F : +33 988 814 001 - ybzahr@prodware.fr <mailto:ybzahr@prodware.fr> ------------------------------------------------------------------------ Web : prodware.fr <http://www.prodware.fr> <http://twitter.com/Prodware/> <http://www.facebook.com/Prodware/> <https://www.linkedin.com/company/prodwarefrance> <https://www.youtube.com/c/ProdwareFrance> <http://www.viadeo.com/fr/company/prodware> <http://www.prodware.fr/social-network/> ------------------------------------------------------------------------ observium mailing list observium@observium.org <mailto:observium@observium.org> http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
BENGELLOUN - ZAHR Youssef - Consultant Expert Prodware France T : +33 979 999 000 F : +33 988 814 001 - ybzahr@prodware.fr Web : prodware.fr
BENGELLOUN - ZAHR Youssef - Consultant Expert Prodware France T : +33 979 999 000 F : +33 988 814 001 - ybzahr@prodware.fr Web : prodware.fr
_______________________________________________
observium mailing list observium@observium.org <mailto:observium@observium.org> http://postman.memetic.org/cgi-bin/mailman/listinfo/observium ------------------------------------------------------------------------ observium mailing list observium@observium.org <mailto:observium@observium.org> http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
-- Mike Stupalov Observium Limited, http://observium.org
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
-- Mike Stupalov Observium Limited, http://observium.org
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium