We have the issues on latest stable (we
even tried the just-committed 5898), and then switched to the
trunk branch to try incremental updates until the issue appeared.
Everything is OK up to revision 5883. However, when I ran a poll
with debug, everything works OK (even in the latest revisions).
We use the poller-wrapper script in our crontab, and when I edited
it to pass the debug argument to the poller.php script, the issue
is resolved. So only non-debug polling via the poller-wrapper
script seems (partially) broken, strangely.
I can provide the full debug and non-debug polling logs, if
required, but there isn't really anything useful there, since
everything works as expected with the debug turned on, and pretty
much nothing gets output on the non-debug runs that fail. Here
are the first 15 lines of the no-debug poll run log, with a host
"xxx1" that failed at the top:
Observium v0.14.10.5898
Poller
Starting polling run:
xxx1nyi.net 26 ios (cisco)
Observium v0.14.10.5898
Poller
Starting polling run:
xxx2.nyi.net 9 ios (cisco)
Observium v0.14.10.5898
Poller
Also, when the non-debug poll runs and fails, it only takes <
1 second to finish the poll. Whatever is failing is happening
very early in the polling process, before it runs any SQL. Here
is a snippet of some failures from the poll log:
[2014/10/21 14:35:00 -0400] discovery.php(84391):
/usr/local/www/apache24/observium/discovery.php: new - 0 devices
discovered in 0.002 secs
[2014/10/21 14:35:01 -0400] poller.php(84397):
/usr/local/www/apache24/observium/poller.php: xxx1.nyi.net - 1
devices polled in 0.294 secs
[2014/10/21 14:35:01 -0400] poller.php(84402):
/usr/local/www/apache24/observium/poller.php: xxx2.nyi.net - 1
devices polled in 0.516 secs
[2014/10/21 14:35:08 -0400] poller.php(84601):
/usr/local/www/apache24/observium/poller.php: xxx3.nyi.net - 1
devices polled in 0.405 secs
[2014/10/21 14:35:11 -0400] poller.php(84689):
/usr/local/www/apache24/observium/poller.php: xxx4.nyi.net - 1
devices polled in 0.259 secs
[2014/10/21 14:35:12 -0400] poller.php(84711):
/usr/local/www/apache24/observium/poller.php: xxx5.nyi.net - 1
devices polled in 0.334 secs
[2014/10/21 14:35:17 -0400] poller.php(84855):
/usr/local/www/apache24/observium/poller.php: xxx6.nyi.net - 1
devices polled in 0.170 secs
[2014/10/21 14:35:17 -0400] poller.php(84871):
/usr/local/www/apache24/observium/poller.php: xxx7.nyi.net - 1
devices polled in 0.270 secs
These servers normally take about 15 - 30 seconds each to poll.
Here is the start of the successful debug-enabled poll run for the
same host that failed first above (xxx1):
xxx1.nyi.net 26 ios (cisco)
Observium v0.14.10.5898
Poller
Starting polling run:
SQL[SELECT `device_id` FROM `devices` WHERE `disabled` = 0 AND
`device_id` = '9' ORDER BY `device_id` ASC]
SQL[SELECT * FROM `devices` WHERE `device_id` = '9']
And the poll log times:
[2014/10/21 14:37:58 -0400] poller.php(86759):
/usr/local/www/apache24/observium/poller.php: xxx1.nyi.net - 1
devices polled in 21.44 secs
[2014/10/21 14:38:00 -0400] poller.php(86755):
/usr/local/www/apache24/observium/poller.php: xxx2.nyi.net - 1
devices polled in 23.24 secs
[2014/10/21 14:38:04 -0400] poller.php(86753):
/usr/local/www/apache24/observium/poller.php: xxx3.nyi.net - 1
devices polled in 27.00 secs
[2014/10/21 14:38:05 -0400] poller.php(87587):
/usr/local/www/apache24/observium/poller.php: xxx4.nyi.net - 1
devices polled in 6.771 secs
[2014/10/21 14:38:07 -0400] poller.php(87645):
/usr/local/www/apache24/observium/poller.php: xxx5.nyi.net - 1
devices polled in 7.678 secs
[2014/10/21 14:38:07 -0400] poller.php(86760):
/usr/local/www/apache24/observium/poller.php: xxx6.nyi.net - 1
devices polled in 30.35 secs
[2014/10/21 14:38:16 -0400] poller.php(87247):
/usr/local/www/apache24/observium/poller.php: xxx7.nyi.net - 1
devices polled in 27.67 secs
On 10/21/2014 11:13 AM, Mike Stupalov wrote: