Look for other, newly added cron jobs. Discovery is only run once every 6 hours.

It's pretty difficult to make the poller believe that something is offline without network issues, as all it does to decide is a ping and an snmp get.

adam.

On 26/11/2012 08:14, Robert Williams wrote:

Hi Guys,

 

Got a weird issue which has just started, seemingly by itself but I imagine there was a cause (I just don’t know it yet!).

 

In short, every 30 minute the poller decides that approximately a third of all devices (92 in total) are ‘Offline’. They then magically recover on the next poller interval.

 

The devices are not offline, and the Observium host does not loose connectivity (I’ve tested with numerous pings etc. during this predictable failure period).

 

Interestingly, the host on which Observium runs does record these interesting CPU and RAM metrics during that particular polling run:

 

 

 

Now, I’m guessing that there is maybe a more substantial ‘discovery’ run or similar every 30 minutes. For some reason, this more intensive run seems to be resulting in a load of devices going allegedly offline.

 

The problem started on Friday around 11pm and has repeated like clockwork since. We are running the latest SVN.

 

I’m a bit uncertain where to start with this one as although it’s predictable I can’t really see anything which would cause it to happen. Pointers for diagnosing further very welcome J

 

Cheers as always!


Robert Williams

Custodian DataCentre

email: Robert@CustodianDC.com




_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium