You're looking in completely the wrong place, and you've totally ignored everything i've said, so I give up.
have fun.
On 26/11/2012 12:24, Robert Williams wrote:
Right -- I've disabled a few of the recent additions and each time I disable one the number of hosts which are 'down' decreases. Now, with 3 hosts disabled I only have 2 hosts which are failing.
Also, the hosts which fail are always numbered sequentially and are always high-numbered, say above host ID 90.
Is it possible the poller is simply running out of time? Can the time be extended? Either way, why only on exactly every 6 polls / 30 minutes? Weird...
*Robert Williams*
Custodian DataCentre
email: Robert@CustodianDC.com
*From:*observium-bounces@observium.org [mailto:observium-bounces@observium.org] *On Behalf Of *Adam Armstrong *Sent:* 26 November 2012 16:58 *To:* Observium Network Observation System *Subject:* Re: [Observium] 30 minute poll issue
Btw, when one single installation out of thousands starts doing something like this, it's almost never related to the code, and almost always related to the system it's installed on.
adam.
On 26/11/2012 10:38, Robert Williams wrote:
Hi -- no other cron jobs on that box, it's purely running Observium and the only jobs are the poller itself and a selection of jobs which run at weekly or daily intervals for various system functions. There is also a weekly SVN pull for Observium :) As a test we have removed the most recently added device to see if that helps, but I could do with some way of recording what is happening on that particular poll. It's like clockwork but I can't see anything that would cause that on the network side, and everything definitely responds (from the Observium console) 100% during the poll itself. Cheers! *Robert Williams* Custodian DataCentre email: Robert@CustodianDC.com <mailto:Robert@CustodianDC.com> *From:*observium-bounces@observium.org <mailto:observium-bounces@observium.org> [mailto:observium-bounces@observium.org] *On Behalf Of *Adam Armstrong *Sent:* 26 November 2012 16:07 *To:* Observium Network Observation System *Subject:* Re: [Observium] 30 minute poll issue Look for other, newly added cron jobs. Discovery is only run once every 6 hours. It's pretty difficult to make the poller believe that something is offline without network issues, as all it does to decide is a ping and an snmp get. adam. On 26/11/2012 08:14, Robert Williams wrote: Hi Guys, Got a weird issue which has just started, seemingly by itself but I imagine there was a cause (I just don't know it yet!). In short, every 30 minute the poller decides that approximately a third of all devices (92 in total) are 'Offline'. They then magically recover on the next poller interval. The devices are not offline, and the Observium host does not loose connectivity (I've tested with numerous pings etc. during this predictable failure period). Interestingly, the host on which Observium runs does record these interesting CPU and RAM metrics during that particular polling run: Now, I'm guessing that there is maybe a more substantial 'discovery' run or similar every 30 minutes. For some reason, this more intensive run seems to be resulting in a load of devices going allegedly offline. The problem started on Friday around 11pm and has repeated like clockwork since. We are running the latest SVN. I'm a bit uncertain where to start with this one as although it's predictable I can't really see anything which would cause it to happen. Pointers for diagnosing further very welcome J Cheers as always! *Robert Williams* Custodian DataCentre email: Robert@CustodianDC.com <mailto:Robert@CustodianDC.com> _______________________________________________ observium mailing list observium@observium.org <mailto:observium@observium.org> http://postman.memetic.org/cgi-bin/mailman/listinfo/observium _______________________________________________ observium mailing list observium@observium.org <mailto:observium@observium.org> http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium