Thanks all. Yes, it's a new set up and a lot of new devices were added about 4 weeks ago.
An update here. We just have the number of Cores increased from 4 to 16, and looks like there has been improvement in terms of performance now.
But there is a new issue observed. Quite frequently (~1 out of 4 times), Observium is responding HTTP response code 500 with blank page when logging in, or when clicking around after login.
Any thoughts or suggestions on this?
Here are the logs when it happens:
/opt/observium/logs/access_log:
171.68.241.12 - - [30/Oct/2018:21:29:53 -0700] "GET / HTTP/1.1" 500 1728 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:63.0) Gecko/20100101 Firefox/63.0"
/opt/observium/logs/error_log:
[Tue Oct 30 21:29:54.329931 2018] [:error] [pid 21511] [client 171.68.241.12:54771] PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 135168 bytes) in /opt/observium/includes/db/mysqli.inc.php on line 316
Thanks.
- Gordon
From: observium <observium-bounces@observium.org> on behalf of Mike Stupalov via observium <observium@observium.org>
Reply-To: Observium <observium@observium.org>
Date: Sunday, October 28, 2018 at 4:09 AM
To: Observium <observium@observium.org>
Cc: Mike Stupalov <mike@observium.org>
Subject: Re: [Observium] Performance Issue - High CPU with 'mysqld'?
Simon Mousey Smith via observium wrote on 27/10/2018 23:55:
Hi Gordon
I think from what I can see from the pics, the 3rd graph along, bottom row
It looks like about 3-4 weeks ago its JUMPED UP in the wrapper processing times which means its taking too long to execute on each device or the main server
As I see, 4 weeks ago added many new devices.
It's not should be big trouble, but seems as need more optimizations, like:
- use rrdcached
- switch to SSD disks
for such number of devices - RRDs and DB must use faster disks, SSD is mandatory for you.
Did you change any configs then?
Any updates to hardware, software, etc?
Have u also tried the latest stable ? 9472
Have u also tried doing /opt/observium/poller.php -h gz-core01 -dd to see where it might be getting stuck?
Regards
Simon
On 26 Oct 2018, at 03:15, Gordon Cheng (gocheng) via observium <observium@observium.org> wrote:
Thanks Adam. Our pollerlog pages are below. Should we still try to disable fdb-table as suggested? If so, any docs we can follow to disable it?
Any other thoughts or suggestions? Thanks.
- Gordon
<image001.png>
<image002.png>
<image003.png>
From: observium <observium-bounces@observium.org> on behalf of Adam Armstrong via observium <observium@observium.org>
Reply-To: Observium <observium@observium.org>
Date: Tuesday, October 23, 2018 at 12:50 PM
To: Chris Neam via observium <observium@observium.org>
Cc: Adam Armstrong <adama@memetic.org>
Subject: Re: [Observium] Performance Issue - High CPU with 'mysqld'?
Hi,
You probably (definitely) need more cores and/or more I/O throughput capacity.
It's likely that you have multiple overlapping poller_wrapper processes running because they can't finish quickly enough.
what do the graphs on this page look like?
and what are the slowest devices on this :
You may be able to disable a module or two you don't care about that's taking time :
fdb-table is usually a good candidate.
adam.
On 2018-10-23 19:54:32, Gordon Cheng (gocheng) via observium <observium@observium.org> wrote:
Hi all,
I have a new Observium setup (verion 18.9.9428 (9th September 2018)) with the following VM spec:
Cores 4
Memory 64GB
HD 500GB
CentOS 7
PHP 7.0.31
MySQL 5.5.60-MariaDB
and with the following network devices to be monitored:
Devices - 634 (total), 597 (up), 37 (down)
Ports - 58,032 (total), 27,664 (up), 19,967 (down)
Sensors - 47,639 (total), 43,243 (ok), 4,396 (down)
Statuses - 5,880 (total), 5,854 (ok), 10 (alert)
---
Issue is that we are frequently seeing the following error when connecting on GUI/CLI, with slow response time on GUI/CLI:
DB Error 1040: Too many connections
And we are seeing gaps (broken graphs) like the following:
We have tried to bump up the number of 'max_connections' from original 151 to 500 and then 200, but it didn't help:
sjc-observium-1:/etc# grep max_conn my.cnf
max_connections=200
sjc-observium-1:/etc#
And it's observed that the CPU load is heavy with 'mysqld':
top - 09:36:17 up 35 days, 18:54, 2 users, load average: 74.18, 89.49, 82.67
Tasks: 665 total, 79 running, 586 sleeping, 0 stopped, 0 zombie
%Cpu(s): 62.9 us, 36.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 65810376 total, 2511836 free, 5989292 used, 57309248 buff/cache
KiB Swap: 4193020 total, 4141016 free, 52004 used. 59172312 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5152 mysql 20 0 2749132 664216 9164 S 20.7 1.0 4058:17 mysqld
31879 root 20 0 419676 27564 8264 R 4.2 0.0 0:00.13 php
31769 root 20 0 417212 25532 8336 S 3.6 0.0 0:00.12 php
31933 root 20 0 416868 25116 8148 S 3.2 0.0 0:00.10 php
31935 root 20 0 416736 24700 7832 S 3.2 0.0 0:00.10 php
<snip>
My 'config.php' is pretty much with default values.
---
I'm new to Observium. It'd be greatly appreciated if someone can shed some light how it can be resolved.
Thanks.
- Gordon
_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
_______________________________________________observium mailing listobservium@observium.orghttp://postman.memetic.org/cgi-bin/mailman/listinfo/observium
--
Mike Stupalov
Observium Limited, http://observium.org
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium