Re: my server freaks out sometimes!
Adam, I hear what you are saying about udp flooding...there are routers (no filters) in between but no firewalls. i don't see any indication in the logs that there are udp flooding issues...also, why only every couple of months? Why not every night?
Tony Guadagno O +1 585 577 1003 C +1 585 703 6700 E tonyg@guadagnoconsulting.commailto:tonyg@guadagnoconsulting.com [cid:image001.jpg@01D84DD6.FC9912E0]
From: Adam Armstrong adama@observium.org Sent: Tuesday, August 22, 2023 12:14 PM To: Observium observium@lists.observium.org; Tony Guadagno via observium observium@lists.observium.org Cc: Observium Network Observation System observium@observium.org; Tony Guadagno tonyg@guadagno.org Subject: Re: [Observium] my server freaks out sometimes!
This is likely a firewall or similar intervening device being unhappy at UDP volume and rate limiting.
It's likely that your discovery run starts at midnight, so increases SNMP UDP traffic.
adam.
Tony Guadagno via observium wrote on 22/08/2023 14:01:
* Hi, I am using the pro version Observium 23.8.12912 (stable)https://www.observium.org/ I am having an issue where over the course of 2 hours (or so) all my devices go down and then up. It claims that it missed pings but there is no issue with any of the devices. This has happened to me 2 times in the past 3 months, both times starting at midnight. When I look in the event log, I am seeing some odd things on that date: [cid:image005.jpg@01D9D50B.B1A30DF0]
First, notice that there are a bunch of snmp timeouts which do not occur on any other day. Also, it seems that all the devices have changes...again this does not happen on any other night. After the first incident I increased the RAM on the server thinking that it was constrained, but the poller info does not look bad: [cid:image006.jpg@01D9D50B.B1A30DF0]
I would appreciate help in determining what to look at for the cause of this. Thanks
Tony Guadagno O +1 585 577 1003 C +1 585 703 6700 E tonyg@guadagnoconsulting.commailto:tonyg@guadagnoconsulting.com [cid:image001.jpg@01D84DD6.FC9912E0]
_______________________________________________
observium mailing list -- observium@lists.observium.orgmailto:observium@lists.observium.org
To unsubscribe send an email to observium-leave@lists.observium.orgmailto:observium-leave@lists.observium.org
What kind of routers? What firewall is between your observium and those devices?
is this all being polled over WAN? Are there any VPN's etc between them, all those kinds of divces might have (even unpublished) limits you are breaking.
We have a similar issue with *some* draytek routers (on client edge), which we poll via WAN, with a site-to-site tunnel which is used to monitor everything behind, in some cases 200+ devices. Despite having no rules/filtering in place (besides an allow-all rule from <our observium ip>, repeated for both WAN and LAN (or what vpn sees) IPs) we get issues like this about once a week on average. During these times, (assuming we didnt *know* VPN was up and stable), what we see is equivalent to if the WAN was up, but VPN had dropped, so theres a rief period, usually 2-3 polling cycles max, where nothing internally will respond. But heres the kicker, the internal stuff responds to ping (ICMP), but gets no UDP/SNMP response. I have also tested, with the few devices which support it, and TCP SNMP will continue to respond.
Its definitely annoying, but imho not the kind of issue I would bother Adam/etc with, as its clearly not observium at ffault here, but "something else in the stack", even if that is hard to pin down what exactly! Our simple and ~80% effective workaround, is that for these sites we dont let alerts trigger until 3 consecutive failures, which on client-edge stuff, is generally ok (and noting that client has perfectly working outbound access the entire time!!)
Regards, James Tandy TandyUK Servers Limited
Tel: 01903 247 011 Www:http://www.tandyukservers.co.uk Email:support@tandyukservers.co.uk
TandyUK Servers Limited Registered in England and Wales, Company number 8314911 VAT Registered in the UK, number 182 0661 19 Registered Office: Amelia House, Crescent Road, Worthing, BN11 1QR
On 22/08/2023 20:16, Tony Guadagno via observium wrote:
Adam, I hear what you are saying about udp flooding…there are routers (no filters) in between but no firewalls. i don’t see any indication in the logs that there are udp flooding issues…also, why only every couple of months? Why not every night?
Tony Guadagno
O +1 585 577 1003
C +1 585 703 6700
E tonyg@guadagnoconsulting.com
cid:image001.jpg@01D84DD6.FC9912E0
*From:*Adam Armstrong adama@observium.org *Sent:* Tuesday, August 22, 2023 12:14 PM *To:* Observium observium@lists.observium.org; Tony Guadagno via observium observium@lists.observium.org *Cc:* Observium Network Observation System observium@observium.org; Tony Guadagno tonyg@guadagno.org *Subject:* Re: [Observium] my server freaks out sometimes!
This is likely a firewall or similar intervening device being unhappy at UDP volume and rate limiting.
It's likely that your discovery run starts at midnight, so increases SNMP UDP traffic.
adam.
Tony Guadagno via observium wrote on 22/08/2023 14:01:
·Hi, I am using the pro version Observium 23.8.12912 (stable) <https://www.observium.org/> I am having an issue where over the course of 2 hours (or so) all my devices go down and then up. It claims that it missed pings but there is no issue with any of the devices. This has happened to me 2 times in the past 3 months, both times starting at midnight. When I look in the event log, I am seeing some odd things on that date: First, notice that there are a bunch of snmp timeouts which do not occur on any other day. Also, it seems that all the devices have changes…again this does not happen on any other night. After the first incident I increased the RAM on the server thinking that it was constrained, but the poller info does not look bad: I would appreciate help in determining what to look at for the cause of this. Thanks Tony Guadagno O +1 585 577 1003 C +1 585 703 6700 E tonyg@guadagnoconsulting.com cid:image001.jpg@01D84DD6.FC9912E0 _______________________________________________ observium mailing list --observium@lists.observium.org To unsubscribe send an email toobservium-leave@lists.observium.org
observium mailing list --observium@lists.observium.org To unsubscribe send an email toobservium-leave@lists.observium.org
I don't know.
This is a connectivity issue. We fork snmpwalk, if it doesn't get data from the end machine, there's not a lot we can do about that, especially if it's bizarrely intermittent.
It seems like you're having ICMP *and* UDP traffic being dropped, which might make it easier to work out why.
You'd likely never notice this kind of connectivity issue if the connectivity wasn't being relied on by something like Observium, because humans tend to just ignore momentary things.
Is there a large backup operation or some other scheduled thing happening at that time that's saturating the link?
Thanks, adam.
Tony Guadagno via observium wrote on 22/08/2023 20:16:
Adam, I hear what you are saying about udp flooding…there are routers (no filters) in between but no firewalls. i don’t see any indication in the logs that there are udp flooding issues…also, why only every couple of months? Why not every night?
Tony Guadagno
O +1 585 577 1003
C +1 585 703 6700
E tonyg@guadagnoconsulting.com mailto:tonyg@guadagnoconsulting.com
cid:image001.jpg@01D84DD6.FC9912E0
*From:*Adam Armstrong adama@observium.org *Sent:* Tuesday, August 22, 2023 12:14 PM *To:* Observium observium@lists.observium.org; Tony Guadagno via observium observium@lists.observium.org *Cc:* Observium Network Observation System observium@observium.org; Tony Guadagno tonyg@guadagno.org *Subject:* Re: [Observium] my server freaks out sometimes!
This is likely a firewall or similar intervening device being unhappy at UDP volume and rate limiting.
It's likely that your discovery run starts at midnight, so increases SNMP UDP traffic.
adam.
Tony Guadagno via observium wrote on 22/08/2023 14:01:
·Hi, I am using the pro version Observium 23.8.12912 (stable) <https://www.observium.org/> I am having an issue where over the course of 2 hours (or so) all my devices go down and then up. It claims that it missed pings but there is no issue with any of the devices. This has happened to me 2 times in the past 3 months, both times starting at midnight. When I look in the event log, I am seeing some odd things on that date: First, notice that there are a bunch of snmp timeouts which do not occur on any other day. Also, it seems that all the devices have changes…again this does not happen on any other night. After the first incident I increased the RAM on the server thinking that it was constrained, but the poller info does not look bad: I would appreciate help in determining what to look at for the cause of this. Thanks Tony Guadagno O +1 585 577 1003 C +1 585 703 6700 E tonyg@guadagnoconsulting.com <mailto:tonyg@guadagnoconsulting.com> cid:image001.jpg@01D84DD6.FC9912E0 _______________________________________________ observium mailing list --observium@lists.observium.org <mailto:observium@lists.observium.org> To unsubscribe send an email toobservium-leave@lists.observium.org <mailto:observium-leave@lists.observium.org>
observium mailing list -- observium@lists.observium.org To unsubscribe send an email to observium-leave@lists.observium.org
participants (3)
-
Adam Armstrong
-
James Tandy
-
Tony Guadagno