Thanks all.  Yes, it's a new set up and a lot of new devices were added about 4 weeks ago.

 

An update here.  We just have the number of Cores increased from 4 to 16, and looks like there has been improvement in terms of performance now.

 

But there is a new issue observed.  Quite frequently (~1 out of 4 times), Observium is responding HTTP response code 500 with blank page when logging in, or when clicking around after login.

 

Any thoughts or suggestions on this?

 

Here are the logs when it happens:

 

/opt/observium/logs/access_log:

171.68.241.12 - - [30/Oct/2018:21:29:53 -0700] "GET / HTTP/1.1" 500 1728 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:63.0) Gecko/20100101 Firefox/63.0"

 

/opt/observium/logs/error_log:

[Tue Oct 30 21:29:54.329931 2018] [:error] [pid 21511] [client 171.68.241.12:54771] PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 135168 bytes) in /opt/observium/includes/db/mysqli.inc.php on line 316

Thanks.

 

- Gordon

 

From: observium <observium-bounces@observium.org> on behalf of Mike Stupalov via observium <observium@observium.org>
Reply-To: Observium <observium@observium.org>
Date: Sunday, October 28, 2018 at 4:09 AM
To: Observium <observium@observium.org>
Cc: Mike Stupalov <mike@observium.org>
Subject: Re: [Observium] Performance Issue - High CPU with 'mysqld'?

 



Simon Mousey Smith via observium wrote on 27/10/2018 23:55:

Hi Gordon

 

I think from what I can see from the pics, the 3rd graph along, bottom row

 

It looks like about 3-4 weeks ago its JUMPED UP in the wrapper processing times which means its taking too long to execute on each device or the main server

As I see, 4 weeks ago added many new devices.

It's not should be big trouble, but seems as need more optimizations, like:
- use rrdcached
- switch to SSD disks

for such number of devices - RRDs and DB must use faster disks, SSD is mandatory for you.


 

Did you change any configs then?

 

Any updates to hardware, software, etc?

 

Have u also tried the latest stable ? 9472

 

Have u also tried doing   /opt/observium/poller.php -h gz-core01 -dd   to see where it might be getting stuck?

 

Regards

 

Simon

 

 



On 26 Oct 2018, at 03:15, Gordon Cheng (gocheng) via observium <observium@observium.org> wrote:

 

Thanks Adam.  Our pollerlog pages are below.  Should we still try to disable fdb-table as suggested?  If so, any docs we can follow to disable it?

 

Any other thoughts or suggestions?  Thanks.

 

- Gordon

 

<image001.png>

 

<image002.png>

<image003.png>

 

From: observium <observium-bounces@observium.org> on behalf of Adam Armstrong via observium <observium@observium.org>
Reply-To: Observium <observium@observium.org>
Date: Tuesday, October 23, 2018 at 12:50 PM
To: Chris Neam via observium <observium@observium.org>
Cc: Adam Armstrong <adama@memetic.org>
Subject: Re: [Observium] Performance Issue - High CPU with 'mysqld'?

 

Hi,

 

You probably (definitely) need more cores and/or more I/O throughput capacity.

 

It's likely that you have multiple overlapping poller_wrapper processes running because they can't finish quickly enough.

 

what do the graphs on this page look like?

 

 

and what are the slowest devices on this : 

 

 

You may be able to disable a module or two you don't care about that's taking time : 

 

 

fdb-table is usually a good candidate.

 

adam.

On 2018-10-23 19:54:32, Gordon Cheng (gocheng) via observium <observium@observium.org> wrote:

Hi all,

 

I have a new Observium setup (verion 18.9.9428 (9th September 2018)) with the following VM spec:

 

Cores 4

Memory 64GB

HD 500GB

CentOS 7

PHP 7.0.31

MySQL 5.5.60-MariaDB

 

and with the following network devices to be monitored:

 

Devices - 634 (total), 597 (up), 37 (down)

Ports - 58,032 (total), 27,664 (up), 19,967 (down)

Sensors - 47,639 (total), 43,243 (ok), 4,396 (down)

Statuses - 5,880 (total), 5,854 (ok), 10 (alert)

 

---

 

Issue is that we are frequently seeing the following error when connecting on GUI/CLI, with slow response time on GUI/CLI:

 

DB Error 1040: Too many connections

 

And we are seeing gaps (broken graphs) like the following:

 

 

We have tried to bump up the number of 'max_connections' from original 151 to 500 and then 200, but it didn't help:

 

sjc-observium-1:/etc# grep max_conn my.cnf

max_connections=200

sjc-observium-1:/etc#

 

And it's observed that the CPU load is heavy with 'mysqld':

 

top - 09:36:17 up 35 days, 18:54,  2 users,  load average: 74.18, 89.49, 82.67

Tasks: 665 total,  79 running, 586 sleeping,   0 stopped,   0 zombie

%Cpu(s): 62.9 us, 36.9 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st

KiB Mem : 65810376 total,  2511836 free,  5989292 used, 57309248 buff/cache

KiB Swap:  4193020 total,  4141016 free,    52004 used. 59172312 avail Mem  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

5152 mysql     20   0 2749132 664216   9164 S  20.7  1.0   4058:17 mysqld

31879 root      20   0  419676  27564   8264 R   4.2  0.0   0:00.13 php

31769 root      20   0  417212  25532   8336 S   3.6  0.0   0:00.12 php

31933 root      20   0  416868  25116   8148 S   3.2  0.0   0:00.10 php

31935 root      20   0  416736  24700   7832 S   3.2  0.0   0:00.10 php

<snip>

 

My 'config.php' is pretty much with default values.

 

---

 

I'm new to Observium.  It'd be greatly appreciated if someone can shed some light how it can be resolved.


Thanks.

 

- Gordon

                                                                                                                                                                                                  

_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

 




_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

 

--
Mike Stupalov
Observium Limited, http://observium.org