Just following up on this as I've further narrowed down the bug.

I disabled the pop-up feature in global settings :

Mouseover popups
Define the mouseover popups with extra information and graphs.



That fixed the problem completely. So the issue is related to the popup script (entity_popup.php). I do not know exactly where it goes wrong as most of the time this script works, but sometimes it doesn't and locks up the Apache webserver session and eventually locks up the entire webserver.

/Jesper

On Wed, Nov 9, 2016 at 10:25 AM Jesper Frank Nemholt <jfn@dassic.com> wrote:
Hi!

No answer from anyone so far, but I did get a bit closer to the issue today. The lockup happens exactly at the time I hover the mouse over the interface name/IP in the picture attached.
Hovering above will normally create a popup with the traffic information etc., but instead it locks up the entire http session.

In Chrome when this happens the status bar at the bottom of Chrome says "Waiting for socket". I tried to open a parallel browser (Safari & Firefox) and these can initially connect to Observium (new session) but if I go back to the same interface and hover the mouse above it, also they lock up, and eventually no sessions are left on the webserver and it will not respond to anything.
I saw elsewhere in a forum that the "Waiting for socket" can also relate to Chrome running out of sockets, but since I get the same error on two other browsers (with no other pages open), and since Apache is clearly locking up on something, I guess the browser itself is not the issue.

So it appears the issue is clearly related to the Observium entity_popup.php script.

Funny thing though is that in most cases this script does work, but it seems that sometimes it doesn't.



Screen Shot 2016-11-09 at 6.59.42 AM.jpg

/Jesper

On Sun, Nov 6, 2016 at 1:18 PM Jesper Frank Nemholt <jfn@dassic.com> wrote:
Just as an add-on, an strace on the Apache processes give this :

[root@observium-1-vm ~]# strace -p 9221

Process 9221 attached

flock(13, LOCK_EX


[root@observium-1-vm ~]# strace -p 9157

Process 9157 attached

select(17, [14 16], [], [], NULL


/Jesper


On Sun, Nov 6, 2016 at 12:56 PM Jesper Frank Nemholt <jfn@dassic.com> wrote:
Hi!

So related to this, I enabled server-status on Apache and here's what I see on the webserver when it locks up.

Based upon my past experience it seems it's somehow related to the script /ajax/entity_popup.php as I've seen it lock up several times when calling that script.

Any clues ?

Apache Server Status for 1.2.3.4 (via 1.2.3.4)

Server Version: Apache/2.4.6 (CentOS) PHP/5.4.16
Server MPM: prefork
Server Built: Jul 18 2016 15:30:14

Current Time: Sunday, 06-Nov-2016 12:51:50 PST
Restart Time: Friday, 04-Nov-2016 21:28:13 PDT
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 1 day 16 hours 23 minutes 37 seconds
Server load: 2.27 2.89 3.05
Total accesses: 6840 - Total Traffic: 165.9 MB
CPU Usage: u954.14 s117.84 cu292.79 cs133.22 - 1.03% CPU load
.047 requests/sec - 1196 B/second - 24.8 kB/request
9 requests currently being processed, 20 idle workers
______W_W__W_W_WW___W___WW___...................................
................................................................
................................................................
........

Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process

SrvPIDAccMCPUSSReqConnChildSlotClientVHostRequest
0-0142610/257/257_53.973711780.05.665.6610.31.212.162observium.xyz.com:80NULL
1-0142620/286/286_59.052842530.06.466.4610.31.212.162observium.xyz.com:80NULL
2-0142630/256/256_51.482842630.06.236.2310.31.212.162observium.xyz.com:80NULL
3-0142660/235/235_54.043712180.06.636.6310.31.212.162observium.xyz.com:80NULL
4-0142680/226/226_51.20245170.06.716.7110.31.212.162
5-0142700/229/229_56.392844200.06.456.4510.31.212.162observium.xyz.com:80NULL
6-0142728/305/305W69.47139075.77.037.0310.31.212.162observium.xyz.com:80POST /ajax/entity_popup.php HTTP/1.1
7-0142730/212/212_48.274209510.05.825.8210.31.212.162observium.xyz.com:80NULL
8-0142767/228/228W53.04140014.07.037.0310.31.212.162observium.xyz.com:80GET /graph.php?type=port_bits&legend=yes&height=100&width=275&t
9-0142780/235/235_52.99244490.06.666.6610.31.212.162
10-0152790/230/230_51.951100.06.196.1910.31.212.162observium.xyz.com:80NULL
11-0166489/244/244W60.07140032.97.387.3810.31.212.162observium.xyz.com:80GET /graph.php?type=port_bits&legend=yes&height=100&width=275&t
12-0166490/225/225_48.2591940.05.435.4310.31.212.162
13-0171326/225/225W47.62140014.75.665.6610.31.212.162observium.xyz.com:80POST /ajax/entity_popup.php HTTP/1.1
14-0171330/278/278_55.622410.05.505.5010.31.212.162
15-0171350/269/269W58.2412000.06.396.3910.31.212.162observium.xyz.com:80GET /addhost/ HTTP/1.1
16-0171360/353/353W69.369700.05.485.4810.31.212.162observium.xyz.com:80POST /ajax/entity_popup.php HTTP/1.1
17-0180150/195/195_47.284205150.05.305.3010.31.212.162observium.xyz.com:80NULL
18-0180160/352/352_72.911332440.07.347.3410.31.212.162observium.xyz.com:80NULL
19-0180170/241/241_65.892843050.07.797.7910.31.212.162observium.xyz.com:80NULL
20-0180180/352/352W72.8810500.05.985.9810.31.212.162observium.xyz.com:80GET /overview/ HTTP/1.1
21-0180190/222/222_52.682842970.05.995.9910.31.212.162observium.xyz.com:80NULL
22-0180200/270/270_54.703712250.05.525.5210.31.212.162observium.xyz.com:80NULL
23-0180210/220/220_49.6391550.06.186.1810.31.212.162
24-0180225/266/266W59.030049.37.157.1510.31.212.162observium.xyz.com:80GET /server-status HTTP/1.1
25-0161780/324/324W62.233500.06.186.1810.31.212.162observium.xyz.com:80GET / HTTP/1.1
26-086240/61/61_11.561351890.01.091.0910.31.212.162observium.xyz.com:80NULL
27-0110670/44/44_8.192842400.00.700.7010.31.212.162observium.xyz.com:80NULL


SrvChild Server number - generation
PIDOS process ID
AccNumber of accesses this connection / this child / this slot
MMode of operation
CPUCPU usage, number of seconds
SSSeconds since beginning of most recent request
ReqMilliseconds required to process most recent request
ConnKilobytes transferred this connection
ChildMegabytes transferred this child
SlotTotal megabytes transferred this slot

On Thu, Oct 27, 2016 at 10:07 AM Jesper Frank Nemholt <jfn@dassic.com> wrote:
Hi!

The server is fairly idle and good on memory and CPU. Apache on this server (it's a QEMU KVM VM) is only serving Observium.

/Jesper

On Thu, Oct 27, 2016 at 10:04 AM Derek <dandenoth@gmail.com> wrote:
How's the server itself doing when it happens? If you run a top command, is Apache maxing out CPU or hogging memory? I had something similar happen when I was trying to configure SSL/TLS on my Apache server, which was running multiple virtual hosts for Observium, a SVN frontend for RANCID, NIPAP, and Smokeping.

Derek

On Thu, Oct 27, 2016 at 8:46 AM, Jesper Frank Nemholt <jfn@dassic.com> wrote:
Hi!

I've started to see that on specific actions from within the Observium GUI Apache will lock up. It seems random what action it is, but all Apache workers stop responding and it takes forever to restart Apache afterwards (5-10 min to do a service httpd restart).

Anyone had the same issue ?

I see no errors in the Apache logs and no errors in Observium logs, nor in other logs on the LInux server (CentOS 7.2). Observium polling still run as usual, so it's just the web interface stopping.


/Jesper

_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium


_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium