
Good morning, I have a problem I can not solve. I have two installations of observium, CE and SUBSCRIPTION, on 2 different servers.
My hosts are all down , 5 minutes after up. In both servers.
When I go to check, no host had fallen, and neither appears in the logs instability.
I installed another server, running only the SNMP and a dedicated link, and even then it falls and rises in 5 minutes.
What can it be? All three devices are dedicated to Observium, 1 physical machine, and another virtual. The client host only with SNMP are physical. All in 3 diferents Datacenter. See an example below.
2014-03-10 13:25:02 Machine1 System Device status changed to Up 2014-03-10 13:25:02 Machine2 System Device status changed to Up 2014-03-10 13:20:04 Machine1 System Device status changed to Down (ping) 2014-03-10 13:20:04 Machine2 System Device status changed to Down (ping)

I dont think this is a observium issue...
If you ping the host for 10 minutes, does it go down in Observium then?
/P
2014-03-10 18:49 GMT+01:00 Joarli Leandro [INITNET] jinitnet@gmail.com:
Good morning, I have a problem I can not solve. I have two installations of observium, CE and SUBSCRIPTION, on 2 different servers.
My hosts are all down , 5 minutes after up. In both servers.
When I go to check, no host had fallen, and neither appears in the logs instability.
I installed another server, running only the SNMP and a dedicated link, and even then it falls and rises in 5 minutes.
What can it be? All three devices are dedicated to Observium, 1 physical machine, and another virtual. The client host only with SNMP are physical. All in 3 diferents Datacenter. See an example below.
2014-03-10 13:25:02 Machine1 System Device status changed to Up 2014-03-10 13:25:02 Machine2 System Device status changed to Up 2014-03-10 13:20:04 Machine1 System Device status changed to Down (ping) 2014-03-10 13:20:04 Machine2 System Device status changed to Down (ping) -- Joarli Leandro Tel: (11) 4478-6171 jleandro@initnet.com.br
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

I am seeing something similar, but only with two hosts. They show down/up on what is seemingly a random pattern. The down/up emails are about 5 minutes apart when it happens. They are both domain controllers. They are VMs on hosts with many other VMs, so I know the network isn't going down, otherwise I'd lose them all at once. Also the host servers are monitored and they never show down. It's just these two DCs. They are Server 2012 and we have other Server 2012 VMs. I tried running snmpget from the CLI like the FAQ suggested, but it returns fine with no errors. I've even had Observium tell me these DCs are down when I am in active RDP sessions to them.
My gut says Observium is not to blame since only these two hosts are effected, but I don't have a lot of experience troubleshooting SNMP. I've been digging through the event viewer on the DCs trying to find a cause, but so far no luck.
John :-) ----------------------------- John Fano Systems Administrator North Canton City Schools john@northcantonschools.org 330.497.5600 x309 ----------------------------- "Well, we'll not risk another frontal assault. That rabbit's dynamite." - King Arthur, "Monty Python and the Holy Grail"
On Mon, Mar 10, 2014 at 2:01 PM, Peter Persson peter.persson@bredband2.sewrote:
I dont think this is a observium issue...
If you ping the host for 10 minutes, does it go down in Observium then?
/P
2014-03-10 18:49 GMT+01:00 Joarli Leandro [INITNET] jinitnet@gmail.com:
Good morning, I have a problem I can not solve. I have two installations of observium, CE and SUBSCRIPTION, on 2 different servers.
My hosts are all down , 5 minutes after up. In both servers.
When I go to check, no host had fallen, and neither appears in the logs instability.
I installed another server, running only the SNMP and a dedicated link, and even then it falls and rises in 5 minutes.
What can it be? All three devices are dedicated to Observium, 1 physical machine, and another virtual. The client host only with SNMP are physical. All in 3 diferents Datacenter. See an example below.
2014-03-10 13:25:02 Machine1 System Device status changed to Up 2014-03-10 13:25:02 Machine2 System Device status changed to Up 2014-03-10 13:20:04 Machine1 System Device status changed to Down (ping) 2014-03-10 13:20:04 Machine2 System Device status changed to Down (ping) -- Joarli Leandro Tel: (11) 4478-6171 jleandro@initnet.com.br
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Almost certainly firewalls.
Observium will mark a host down in only two circumstances, if it fails to get a ping reply from the host or if it fails to snmpget sysDescr.0 from the host at the beginning of the poller run. You can turn the machine off after these two checks have been run, and it'll still not mark it as down, even if all of the subsequent SNMP gets fail.
If you have unreliable connectivity, you can make the check more forgiving by modifying the retries/timeout settings in the config :
// PING Settings - Retries/Timeouts #$config['ping']['retries'] = 3; // How many times to retry ping (1 - 10) #$config['ping']['timeout'] = 500; // Timeout in milliseconds (50 - 2000)
// SNMP Settings - Timeouts/Retries disabled as default #$config['snmp']['timeout'] = 1; // timeout in seconds #$config['snmp']['retries'] = 5; // how many times to retry the que
Though, if you have to modify these very much, you probabably have bigger issues to fix...
adam.
On 2014-03-10 12:27, Fano, John wrote:
I am seeing something similar, but only with two hosts. They show down/up on what is seemingly a random pattern. The down/up emails are about 5 minutes apart when it happens. They are both domain controllers. They are VMs on hosts with many other VMs, so I know the network isn't going down, otherwise I'd lose them all at once. Also the host servers are monitored and they never show down. It's just these two DCs. They are Server 2012 and we have other Server 2012 VMs. I tried running snmpget from the CLI like the FAQ suggested, but it returns fine with no errors. I've even had Observium tell me these DCs are down when I am in active RDP sessions to them.
My gut says Observium is not to blame since only these two hosts are effected, but I don't have a lot of experience troubleshooting SNMP. I've been digging through the event viewer on the DCs trying to find a cause, but so far no luck.
John :-)
John Fano Systems Administrator North Canton City Schools john@northcantonschools.org 330.497.5600 x309
"Well, we'll not risk another frontal assault. That rabbit's dynamite." - King Arthur, "Monty Python and the Holy Grail"
On Mon, Mar 10, 2014 at 2:01 PM, Peter Persson peter.persson@bredband2.se wrote:
I dont think this is a observium issue...
If you ping the host for 10 minutes, does it go down in Observium then?
/P
2014-03-10 18:49 GMT+01:00 Joarli Leandro [INITNET] jinitnet@gmail.com:
Good morning, I have a problem I can not solve. I have two installations of observium, CE and SUBSCRIPTION, on 2 different servers.
My hosts are all down , 5 minutes after up. In both servers.
When I go to check, no host had fallen, and neither appears in the logs instability.
I installed another server, running only the SNMP and a dedicated link, and even then it falls and rises in 5 minutes.
What can it be? All three devices are dedicated to Observium, 1 physical machine, and another virtual. The client host only with SNMP are physical. All in 3 diferents Datacenter. See an example below.
2014-03-10 13:25:02 Machine1 System Device status changed to Up
2014-03-10 13:25:02 Machine2 System Device status changed to Up
2014-03-10 13:20:04 Machine1 System Device status changed to Down (ping)
2014-03-10 13:20:04 Machine2 System Device status changed to Down (ping)
-- Joarli Leandro Tel: (11) 4478-6171 jleandro@initnet.com.br _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]
Links:
[1] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

The hosts are 1 pfSense firewall and other UBUNTU 04.12 LTS is without firewall rules. I disable problem after the start and not start again.
I tested with this command line:
fping -t 50 -c 1 -q iis-xxxxxxx.net.br iis-easy.xxxxxx.br: xmt / rcv /% loss = 1/1/0%, min / avg / max = 42.6/42.6/42.6
If I increase to 10, will increase the time before the host look down??
2014-03-10 16:48 GMT-03:00 Adam Armstrong adama@memetic.org:
Almost certainly firewalls.
Observium will mark a host down in only two circumstances, if it fails to get a ping reply from the host or if it fails to snmpget sysDescr.0 from the host at the beginning of the poller run. You can turn the machine off after these two checks have been run, and it'll still not mark it as down, even if all of the subsequent SNMP gets fail.
If you have unreliable connectivity, you can make the check more forgiving by modifying the retries/timeout settings in the config :
// PING Settings - Retries/Timeouts #$config['ping']['retries'] = 3; // How many times to retry ping (1 - 10) #$config['ping']['timeout'] = 500; // Timeout in milliseconds (50 - 2000)
// SNMP Settings - Timeouts/Retries disabled as default #$config['snmp']['timeout'] = 1; // timeout in seconds #$config['snmp']['retries'] = 5; // how many times to retry the que
Though, if you have to modify these very much, you probabably have bigger issues to fix...
adam.
On 2014-03-10 12:27, Fano, John wrote:
I am seeing something similar, but only with two hosts. They show down/up on what is seemingly a random pattern. The down/up emails are about 5 minutes apart when it happens. They are both domain controllers. They are VMs on hosts with many other VMs, so I know the network isn't going down, otherwise I'd lose them all at once. Also the host servers are monitored and they never show down. It's just these two DCs. They are Server 2012 and we have other Server 2012 VMs. I tried running snmpget from the CLI like the FAQ suggested, but it returns fine with no errors. I've even had Observium tell me these DCs are down when I am in active RDP sessions to them.
My gut says Observium is not to blame since only these two hosts are effected, but I don't have a lot of experience troubleshooting SNMP. I've been digging through the event viewer on the DCs trying to find a cause, but so far no luck.
John :-)
John Fano Systems Administrator North Canton City Schools john@northcantonschools.org 330.497.5600 x309
"Well, we'll not risk another frontal assault. That rabbit's dynamite." - King Arthur, "Monty Python and the Holy Grail"
On Mon, Mar 10, 2014 at 2:01 PM, Peter Persson peter.persson@bredband2.se wrote:
I dont think this is a observium issue...
If you ping the host for 10 minutes, does it go down in Observium then?
/P
2014-03-10 18:49 GMT+01:00 Joarli Leandro [INITNET] jinitnet@gmail.com:
Good morning, I have a problem I can not solve. I have two installations of observium, CE and SUBSCRIPTION, on 2 different servers.
My hosts are all down , 5 minutes after up. In both servers.
When I go to check, no host had fallen, and neither appears in the logs instability.
I installed another server, running only the SNMP and a dedicated link, and even then it falls and rises in 5 minutes.
What can it be? All three devices are dedicated to Observium, 1 physical machine, and another virtual. The client host only with SNMP are physical. All in 3 diferents Datacenter. See an example below.
2014-03-10 13:25:02 Machine1 System Device status changed to Up
2014-03-10 13:25:02 Machine2 System Device status changed to Up
2014-03-10 13:20:04 Machine1 System Device status changed to Down (ping)
2014-03-10 13:20:04 Machine2 System Device status changed to Down (ping)
-- Joarli Leandro Tel: (11) 4478-6171 jleandro@initnet.com.br _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]
Links:
[1] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Your devices are not replying to ping, or the ping packets are being dropped.
Check your firewalls. It's always the firewall. Always.
adam.
On 2014-03-10 11:49, Joarli Leandro [INITNET] wrote:
Good morning, I have a problem I can not solve. I have two installations of observium, CE and SUBSCRIPTION, on 2 different servers.
My hosts are all down , 5 minutes after up. In both servers.
When I go to check, no host had fallen, and neither appears in the logs instability.
I installed another server, running only the SNMP and a dedicated link, and even then it falls and rises in 5 minutes.
What can it be? All three devices are dedicated to Observium, 1 physical machine, and another virtual. The client host only with SNMP are physical. All in 3 diferents Datacenter. See an example below.
2014-03-10 13:25:02 Machine1 System Device status changed to Up 2014-03-10 13:25:02
Machine2 System Device status changed to Up
2014-03-10 13:20:04
Machine1 System Device status changed to Down (ping)
2014-03-10 13:20:04
Machine2 System Device status changed to Down (ping)
-- Joarli Leandro Tel: (11) 4478-6171 jleandro@initnet.com.br _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Hi,
I agree. This kind of random behaviour is mostly seen when there are some sort of firewall traffic policers are in place. Observium does a lot against hosts, which is fine, but sometimes policers are triggered. I've seen it happen with other monitoring software as well.
Regards, Onno.
-----Original Message----- From: observium [mailto:observium-bounces@observium.org] On Behalf Of Adam Armstrong Sent: maandag 10 maart 2014 20:44 To: jinitnet@gmail.com; Observium Network Observation System Subject: Re: [Observium] Host down and up every 5 min.
Your devices are not replying to ping, or the ping packets are being dropped.
Check your firewalls. It's always the firewall. Always.
adam.
On 2014-03-10 11:49, Joarli Leandro [INITNET] wrote:
Good morning, I have a problem I can not solve. I have two installations of observium, CE and SUBSCRIPTION, on 2 different servers.
My hosts are all down , 5 minutes after up. In both servers.
When I go to check, no host had fallen, and neither appears in the logs instability.
I installed another server, running only the SNMP and a dedicated link, and even then it falls and rises in 5 minutes.
What can it be? All three devices are dedicated to Observium, 1 physical machine, and another virtual. The client host only with SNMP are physical. All in 3 diferents Datacenter. See an example below.
2014-03-10 13:25:02 Machine1 System Device status changed to Up 2014-03-10 13:25:02
Machine2 System Device status changed to Up
2014-03-10 13:20:04
Machine1 System Device status changed to Down (ping)
2014-03-10 13:20:04
Machine2 System Device status changed to Down (ping)
-- Joarli Leandro Tel: (11) 4478-6171 jleandro@initnet.com.br _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
participants (5)
-
Adam Armstrong
-
Fano, John
-
Joarli Leandro [INITNET]
-
Onno van der Leun
-
Peter Persson