Yah,
I see trouble..
and answer (as always for brocade) - I think issue in firmware.
How polling sensors works (after some our changes for polling speedup):
1. fetch list of all sensors numeric oids from DB 2. try to pre-cache this oids with snmpget multiple oid by chunks of 16 oids 3. in sensors/status poll process, check if oid cached use it, if not - try to snmpget current single oid. 4. process each sensor/status value
Ok, what happened on your devices:
1. first 3 chunks (16 oids each) cached normally - total 48 oids 2. all other chunks not fetched, because snmpget exit with timeout error:
CMD[/usr/bin/snmpget -v2c -c *** -Pu -OQUsn -M /opt/observium/mibs 'udp':'xxx':'161' .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.3 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.4 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.65 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.66 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.67 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.68 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.129 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.130 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.131 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.132 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.133 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.134 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.135 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.136 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.139 .1.3.6.1.4.1.1991.1.1.3.3.6.1.2.141]
CMD EXITCODE[1] CMD RUNTIME[6.021s] STDOUT[
] STDERR[ Timeout: No Response from udp:xxx:161. ]
3. And later in sensor poll process this oids also can't be fetch, with same timeout error:
CMD[/usr/bin/snmpget -v2c -c *** -Pu -OQv -m SNMPv2-MIB -M /opt/observium/mibs/rfc:/opt/observium/mibs/net-snmp 'udp':'xxx':'161' .1.3.6.1.4.1.1991.1.1.3.3.6.1.4.1]
CMD EXITCODE[1] CMD RUNTIME[6.0108s] STDOUT[
] STDERR[ Timeout: No Response from udp:xxx:161. ] SNMP STATUS[FALSE] SNMP ERROR[#1002 - Request timeout]
I not found for self devices with same firmware (5.8.x), but on 5.7.x and older I not see this issue..
Youssef BENGELLOUN - ZAHR wrote:
Dear Mike,
Because of mail size limitation, I sent those to Adam directly.
I will MP you with links and credentials to DL them from a secure plateform.
Best regards.
P.S : Regarding mail signatures, nothing I can do as it’s controlled by our corp IT.
Le 09/08/2017 22:23, « observium au nom de Mike Stupalov » <observium-bounces@observium.org au nom de mike@observium.org> a écrit :
I'm not see debug output for this device (in current thread). Please attach debug for device polling: ./poller.php -d -h <device> Pls keep full output (not just some parts). P.S. This is possible to use mail signature without nested images? This complicates the search for mails with attachments. Youssef BENGELLOUN - ZAHR wrote: > Dear Adam, > > Does release 8709 have anything to do with issue ? > r8709 | adama | 2017-08-06 22:44:53 +0200 (Sun, 06 Aug 2017) | 2 lines > [IMPROVE] Improve sensor status table entry > Best regards. > > > Le 20 juil. 2017 à 11:58, Adam Armstrong <adama@memetic.org > <mailto:adama@memetic.org>> a écrit : > >> We will probably put in an is definition toggle to disable that on >> some oses. >> >> I'll see when Mike gets back from the dark depths of central Russia :) >> >> Adam. >> >> Sent from BlueMail <http://www.bluemail.me/r?b=10066> > > > > *Youssef BENGELLOUN - ZAHR* - Consultant Expert > Prodware France > T : +33 979 999 000 - F : +33 988 814 001 - ybzahr@prodware.fr > <mailto:ybzahr@prodware.fr> > ------------------------------------------------------------------------ > Web : prodware.fr <http://www.prodware.fr> > <http://twitter.com/Prodware/> <http://www.facebook.com/Prodware/> > <https://www.linkedin.com/company/prodwarefrance> > <https://www.youtube.com/c/ProdwareFrance> > <http://www.viadeo.com/fr/company/prodware> > <http://www.prodware.fr/social-network/> > > > > > > > > >> On 20 Jul 2017, at 11:48, Youssef BENGELLOUN - ZAHR >> <ybzahr@prodware.fr <mailto:ybzahr@prodware.fr>> wrote: >> >> Is that something you can correct ? >> >> Best regards. >> >> >> >> Le 20 juil. 2017 à 12:46, Adam Armstrong < adama@memetic.org >> <mailto:adama@memetic.org>> a écrit : >> >>> Yeah. We now use a single get request to pull all sensor data. >>> It seems these queries are timing out on that device. >>> >>> So it's just sitting idle whilst timing out, no CPU impact. >>> >>> Adam. >>> >>> Sent from BlueMail <http://www.bluemail.me/r?b=10066> >> >> >> >> *Youssef BENGELLOUN - ZAHR* - Consultant Expert >> Prodware France >> T : +33 979 999 000 - F : +33 988 814 001 - ybzahr@prodware.fr >> <mailto:ybzahr@prodware.fr> >> ------------------------------------------------------------------------ >> Web : prodware.fr <http://www.prodware.fr> >> <http://twitter.com/Prodware/> >> <http://www.facebook.com/Prodware/> >> <https://www.linkedin.com/company/prodwarefrance> >> <https://www.youtube.com/c/ProdwareFrance> >> <http://www.viadeo.com/fr/company/prodware> >> <http://www.prodware.fr/social-network/> >> >> >> >> >> >> >> >>> On 20 Jul 2017, at 06:13, Youssef BENGELLOUN - ZAHR < >>> ybzahr@prodware.fr <mailto:ybzahr@prodware.fr>> wrote: >>> >>> Ok, good to hear that. >>> >>> Is this behavior related to something you changed between >>> the last two merges for stable train code ? >>> >>> Best regards. >>> >>> >>> >>> Le 20 juil. 2017 à 00:37, Adam Armstrong < adama@memetic.org >>> <mailto:adama@memetic.org>> a écrit : >>> >>>> Seems to be getting some timeouts when trying to do >>>> snmpgets. We might need to limit this on the brocades somehow. >>>> >>>> Adam. >>>> >>>> Sent from BlueMail <http://www.bluemail.me/r?b=10066> >>> >>> >>> >>> *Youssef BENGELLOUN - ZAHR* - Consultant Expert >>> Prodware France >>> T : +33 979 999 000 - F : +33 988 814 001 - >>> ybzahr@prodware.fr <mailto:ybzahr@prodware.fr> >>> ------------------------------------------------------------------------ >>> Web : prodware.fr <http://www.prodware.fr> >>> <http://twitter.com/Prodware/> >>> <http://www.facebook.com/Prodware/> >>> <https://www.linkedin.com/company/prodwarefrance> >>> <https://www.youtube.com/c/ProdwareFrance> >>> <http://www.viadeo.com/fr/company/prodware> >>> <http://www.prodware.fr/social-network/> >>> >>> >>> >>>> On 19 Jul 2017, at 12:46, Youssef BENGELLOUN - ZAHR < >>>> ybzahr@prodware.fr <mailto:ybzahr@prodware.fr>> wrote: >>>> >>>> I'm sending it directly to you as I'm hitting mail size >>>> limitation. >>>> >>>> >>>> Best regards. >>>> >>>> >>>> >>>> >>>> >>>> >>>> *Youssef BENGELLOUN - ZAHR* - Consultant Expert >>>> Prodware France >>>> T : +33 979 999 000 - F : +33 988 814 001 - >>>> ybzahr@prodware.fr <mailto:ybzahr@prodware.fr> >>>> ------------------------------------------------------------------------ >>>> Web : prodware.fr <http://www.prodware.fr> >>>> <http://twitter.com/Prodware/> >>>> <http://www.facebook.com/Prodware/> >>>> <https://www.linkedin.com/company/prodwarefrance> >>>> <https://www.youtube.com/c/ProdwareFrance> >>>> <http://www.viadeo.com/fr/company/prodware> >>>> <http://www.prodware.fr/social-network/> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> *De :* observium <observium-bounces@observium.org >>>> <mailto:observium-bounces@observium.org>> de la part de >>>> Adam Armstrong <adama@memetic.org >>>> <mailto:adama@memetic.org>> >>>> *Envoyé :* mercredi 19 juillet 2017 13:29 >>>> *À :* 'Observium' >>>> *Cc :* 'Observium' >>>> *Objet :* Re: [Observium] Important polling time >>>> increase since upgrade to r8697 >>>> >>>> Lol, that isn't a full poller debug, it's just the >>>> device being marked down and then the poller exiting >>>> >>>> :D >>>> >>>> Adam. >>>> >>>> Sent from BlueMail <http://www.bluemail.me/r?b=10066> >>>> On 19 Jul 2017, at 10:48, Youssef BENGELLOUN - ZAHR < >>>> ybzahr@prodware.fr <mailto:ybzahr@prodware.fr>> wrote: >>>> >>>> Hi, >>>> >>>> >>>> >>>> Apparently, attachments were too big. Retrying with >>>> only one of the devices. >>>> >>>> >>>> >>>> Best regards. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *Youssef BENGELLOUN - ZAHR* - Consultant Expert >>>> Prodware France >>>> T : +33 979 999 000 - F : +33 988 814 001 - >>>> ybzahr@prodware.fr <mailto:ybzahr@prodware.fr> >>>> ------------------------------------------------------------------------ >>>> Web : prodware.fr <http://www.prodware.fr> >>>> <http://twitter.com/Prodware/> >>>> <http://www.facebook.com/Prodware/> >>>> <https://www.linkedin.com/company/prodwarefrance> >>>> <https://www.youtube.com/c/ProdwareFrance> >>>> <http://www.viadeo.com/fr/company/prodware> >>>> <http://www.prodware.fr/social-network/> >>>> >>>> >>>> >>>> *De : *Youssef BENGELLOUN - ZAHR >>>> <ybzahr@prodware.fr <mailto:ybzahr@prodware.fr>> >>>> *Date : *mercredi 19 juillet 2017 à 11:38 >>>> *À : *"observium@observium.org >>>> <mailto:observium@observium.org>" >>>> <observium@observium.org >>>> <mailto:observium@observium.org>> >>>> *Objet : *RE: [Observium] Important polling time >>>> increase since upgrade to r8697 >>>> >>>> >>>> >>>> Dear Tom, >>>> >>>> >>>> >>>> Please find a full poller debug for a CER-RT and an >>>> MLXe. >>>> >>>> >>>> >>>> Best regards. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> *De :*observium <observium-bounces@observium.org >>>> <mailto:observium-bounces@observium.org>> de la >>>> part de Tom Laermans <tom.laermans@powersource.cx >>>> <mailto:tom.laermans@powersource.cx>> >>>> *Envoyé :* mercredi 19 juillet 2017 11:28 >>>> *À :* observium@observium.org >>>> <mailto:observium@observium.org> >>>> *Objet :* Re: [Observium] Important polling time >>>> increase since upgrade to r8697 >>>> >>>> >>>> >>>> Hi Youssef, >>>> >>>> Run the poller with -d on one of the devices and >>>> send the output. If you add "-m sensors" it'll only >>>> poll the sensors. >>>> >>>> With regards to the SNMP errors, it does look like >>>> plenty of them are from a long time ago. >>>> >>>> Seems we don't have a housekeeping module for that >>>> yet... >>>> >>>> Tom >>>> >>>> >>>> >>>> On 07/19/2017 11:22 AM, Youssef BENGELLOUN - ZAHR >>>> wrote: >>>> >>>> Dear Adam, >>>> >>>> >>>> >>>> How can I do that ? >>>> >>>> >>>> >>>> Also, I’m seeing tons of SNMP errors under >>>> Performance Data > MIBs. See attached file for >>>> a CER-RT example. >>>> >>>> >>>> >>>> Best regards. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *Youssef BENGELLOUN - ZAHR*- Consultant Expert >>>> Prodware France >>>> T : +33 979 999 000 - F : +33 988 814 001 - >>>> ybzahr@prodware.fr <mailto:ybzahr@prodware.fr> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> >>>> >>>> Web : prodware.fr <http://www.prodware.fr> >>>> <http://twitter.com/Prodware/> >>>> <http://www.facebook.com/Prodware/> >>>> <https://www.linkedin.com/company/prodwarefrance> >>>> <https://www.youtube.com/c/ProdwareFrance> >>>> <http://www.viadeo.com/fr/company/prodware> >>>> <http://www.prodware.fr/social-network/> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *De : *observium >>>> <observium-bounces@observium.org> >>>> <mailto:observium-bounces@observium.org> au nom >>>> de Adam Armstrong <adama@memetic.org> >>>> <mailto:adama@memetic.org> >>>> *Répondre à : *Observium >>>> <observium@observium.org> >>>> <mailto:observium@observium.org> >>>> *Date : *mercredi 19 juillet 2017 à 11:11 >>>> *À : *'Observium' <observium@observium.org> >>>> <mailto:observium@observium.org> >>>> *Objet : *Re: [Observium] Important polling >>>> time increase since upgrade to r8697 >>>> >>>> >>>> >>>> How odd. >>>> >>>> Seems to be something that only affects one >>>> SNMP stack. >>>> >>>> Can you see what method it's using to poll >>>> these sensors? >>>> >>>> Adam. >>>> >>>> Sent from BlueMail >>>> <http://www.bluemail.me/r?b=10066> >>>> >>>> On 19 Jul 2017, at 08:08, Youssef BENGELLOUN - >>>> ZAHR <ybzahr@prodware.fr >>>> <mailto:ybzahr@prodware.fr>> wrote: >>>> >>>> Dear Adam, >>>> >>>> >>>> >>>> Previously installed was r8580 on stable >>>> train code. No configuration changes or >>>> versions upgrades happened in the last >>>> weeks. We only upgraded Obserivum. >>>> >>>> >>>> >>>> As for devices that polling time has >>>> increased, I have clearly identified 4 >>>> devices acting as MPLS PEs : >>>> >>>> >>>> >>>> · 3 Brocade MLXe routers sitting in >>>> different cities : >>>> >>>> >>>> >>>> Amsterdam : >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Paris 1 : >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Paris 2 : >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> · 1 Brocade CER-RT sitting in >>>> Frankfurt : >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Looking at poller module stats, I can >>>> clearly see BGP and sensors modules are the >>>> most time consuming. Sensors is even more >>>> time consuming than BGP now. For example, >>>> in Frankfurt : >>>> >>>> >>>> >>>> >>>> >>>> When did sensors become so time consuming ? >>>> >>>> >>>> >>>> Best regards. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *Youssef BENGELLOUN - ZAHR*- Consultant Expert >>>> Prodware France >>>> T : +33 979 999 000 - F : +33 988 814 001 - >>>> ybzahr@prodware.fr <mailto:ybzahr@prodware.fr> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> >>>> >>>> Web : prodware.fr <http://www.prodware.fr> >>>> >>>> <http://twitter.com/Prodware/> >>>> <http://www.facebook.com/Prodware/> >>>> <https://www.linkedin.com/company/prodwarefrance> >>>> <https://www.youtube.com/c/ProdwareFrance> >>>> <http://www.viadeo.com/fr/company/prodware> <http://www.prodware.fr/social-network/> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *De : *observium >>>> <observium-bounces@observium.org> >>>> <mailto:observium-bounces@observium.org> au >>>> nom de Adam Armstrong <adama@observium.org> >>>> <mailto:adama@observium.org> >>>> *Répondre à : *Observium >>>> <observium@observium.org> >>>> <mailto:observium@observium.org> >>>> *Date : *mardi 18 juillet 2017 à 19:39 >>>> *À : *"observium@observium.org" >>>> <mailto:observium@observium.org> >>>> <observium@observium.org> >>>> <mailto:observium@observium.org> >>>> *Objet : *Re: [Observium] Important polling >>>> time increase since upgrade to r8697 >>>> >>>> >>>> >>>> Can you see any particular device which has >>>> increased? >>>> >>>> >>>> >>>> What was the previous version? >>>> >>>> >>>> >>>> adam. >>>> >>>> On 18/07/2017 08:23:37, Youssef >>>> BENGELLOUN - ZAHR <ybzahr@prodware.fr> >>>> <mailto:ybzahr@prodware.fr> wrote: >>>> >>>> Dear Observium community, >>>> >>>> >>>> >>>> I don’t know if I’m the only one >>>> noticing this, but polling cycle time >>>> has increased 2x fold since I upgraded >>>> to r8697 yesterday around 9AM : >>>> >>>> >>>> >>>> >>>> >>>> I used to be around the 120-150s >>>> average, now it’s up to 270-300s. >>>> >>>> >>>> >>>> As you can, no devices or pollers were >>>> added. From a system perspective, the >>>> box running observium is fine CPU / mem >>>> wize). >>>> >>>> >>>> >>>> Best regards. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *Youssef BENGELLOUN - ZAHR*- Consultant >>>> Expert >>>> Prodware France >>>> T : +33 979 999 000 - F : +33 988 814 >>>> 001 - ybzahr@prodware.fr >>>> <mailto:ybzahr@prodware.fr> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> >>>> >>>> Web : prodware.fr <http://www.prodware.fr> >>>> >>>> <http://twitter.com/Prodware/> >>>> <http://www.facebook.com/Prodware/> >>>> <https://www.linkedin.com/company/prodwarefrance> >>>> <https://www.youtube.com/c/ProdwareFrance> >>>> <http://www.viadeo.com/fr/company/prodware> >>>> <http://www.prodware.fr/social-network/> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> observium mailing list >>>> >>>> observium@observium.org <mailto:observium@observium.org> >>>> >>>> http://postman.memetic.org/cgi-bin/mailman/listinfo/observium >>>> >>>> >>>> >>>>
BENGELLOUN - ZAHR Youssef - Consultant Expert Prodware France T : +33 979 999 000 F : +33 988 814 001 - ybzahr@prodware.fr Web : prodware.fr
>>>> >>>> observium mailing list >>>> >>>> observium@observium.org <mailto:observium@observium.org> >>>> >>>> http://postman.memetic.org/cgi-bin/mailman/listinfo/observium >>>> >>>> >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> observium mailing list >>>> observium@observium.org <mailto:observium@observium.org> >>>> http://postman.memetic.org/cgi-bin/mailman/listinfo/observium >>>> > _______________________________________________ > observium mailing list > observium@observium.org > http://postman.memetic.org/cgi-bin/mailman/listinfo/observium -- Mike Stupalov Observium Limited, http://observium.org _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium