Yes, it kind of bugs me also that it stalls the webserver and I see no errors on the webserver logs. But at least I can avoid the error by disabling the popups even though that is a nice feature to have.

To narrow it down further I'd probably have to add some logging to the entity_popup.php script to see on what line it stalls.

/Jesper

On Sun, Nov 20, 2016 at 2:13 PM Tom Laermans <tom.laermans@powersource.cx> wrote:
Jesper,

Thanks for the followup sleuthing. Unfortunately we can't reproduce this issue, and, well, a page should normally not DoS your Apache server regardless of the code being run by PHP. Yea, that's a big "should".


Tom


On 20/11/2016 21:59, Jesper Frank Nemholt wrote:
Just following up on this as I've further narrowed down the bug.

I disabled the pop-up feature in global settings :

Mouseover popups
Define the mouseover popups with extra information and graphs.



That fixed the problem completely. So the issue is related to the popup script (entity_popup.php). I do not know exactly where it goes wrong as most of the time this script works, but sometimes it doesn't and locks up the Apache webserver session and eventually locks up the entire webserver.

/Jesper

On Wed, Nov 9, 2016 at 10:25 AM Jesper Frank Nemholt <jfn@dassic.com> wrote:
Hi!

No answer from anyone so far, but I did get a bit closer to the issue today. The lockup happens exactly at the time I hover the mouse over the interface name/IP in the picture attached.
Hovering above will normally create a popup with the traffic information etc., but instead it locks up the entire http session.

In Chrome when this happens the status bar at the bottom of Chrome says "Waiting for socket". I tried to open a parallel browser (Safari & Firefox) and these can initially connect to Observium (new session) but if I go back to the same interface and hover the mouse above it, also they lock up, and eventually no sessions are left on the webserver and it will not respond to anything.
I saw elsewhere in a forum that the "Waiting for socket" can also relate to Chrome running out of sockets, but since I get the same error on two other browsers (with no other pages open), and since Apache is clearly locking up on something, I guess the browser itself is not the issue.

So it appears the issue is clearly related to the Observium entity_popup.php script.

Funny thing though is that in most cases this script does work, but it seems that sometimes it doesn't.



Screen Shot 2016-11-09 at 6.59.42 AM.jpg

/Jesper

On Sun, Nov 6, 2016 at 1:18 PM Jesper Frank Nemholt <jfn@dassic.com> wrote:
Just as an add-on, an strace on the Apache processes give this :

[root@observium-1-vm ~]# strace -p 9221

Process 9221 attached

flock(13, LOCK_EX


[root@observium-1-vm ~]# strace -p 9157

Process 9157 attached

select(17, [14 16], [], [], NULL


/Jesper


On Sun, Nov 6, 2016 at 12:56 PM Jesper Frank Nemholt <jfn@dassic.com> wrote:
Hi!

So related to this, I enabled server-status on Apache and here's what I see on the webserver when it locks up.

Based upon my past experience it seems it's somehow related to the script /ajax/entity_popup.php as I've seen it lock up several times when calling that script.

Any clues ?

Apache Server Status for 1.2.3.4 (via 1.2.3.4)

Server Version: Apache/2.4.6 (CentOS) PHP/5.4.16
Server MPM: prefork
Server Built: Jul 18 2016 15:30:14

Current Time: Sunday, 06-Nov-2016 12:51:50 PST
Restart Time: Friday, 04-Nov-2016 21:28:13 PDT
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 1 day 16 hours 23 minutes 37 seconds
Server load: 2.27 2.89 3.05
Total accesses: 6840 - Total Traffic: 165.9 MB
CPU Usage: u954.14 s117.84 cu292.79 cs133.22 - 1.03% CPU load
.047 requests/sec - 1196 B/second - 24.8 kB/request
9 requests currently being processed, 20 idle workers
______W_W__W_W_WW___W___WW___...................................
................................................................
................................................................
........

Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process

Srv PID Acc M CPU SS Req Conn Child Slot Client VHost Request
0-0 14261 0/257/257 _ 53.97 371 178 0.0 5.66 5.66 10.31.212.162 observium.xyz.com:80 NULL
1-0 14262 0/286/286 _ 59.05 284 253 0.0 6.46 6.46 10.31.212.162 observium.xyz.com:80 NULL
2-0 14263 0/256/256 _ 51.48 284 263 0.0 6.23 6.23 10.31.212.162 observium.xyz.com:80 NULL
3-0 14266 0/235/235 _ 54.04 371 218 0.0 6.63 6.63 10.31.212.162 observium.xyz.com:80 NULL
4-0 14268 0/226/226 _ 51.20 24 517 0.0 6.71 6.71 10.31.212.162

5-0 14270 0/229/229 _ 56.39 284 420 0.0 6.45 6.45 10.31.212.162 observium.xyz.com:80 NULL
6-0 14272 8/305/305 W 69.47 139 0 75.7 7.03 7.03 10.31.212.162 observium.xyz.com:80 POST /ajax/entity_popup.php HTTP/1.1
7-0 14273 0/212/212 _ 48.27 420 951 0.0 5.82 5.82 10.31.212.162 observium.xyz.com:80 NULL
8-0 14276 7/228/228 W 53.04 140 0 14.0 7.03 7.03 10.31.212.162 observium.xyz.com:80 GET /graph.php?type=port_bits&legend=yes&height=100&width=275&t
9-0 14278 0/235/235 _ 52.99 24 449 0.0 6.66 6.66 10.31.212.162

10-0 15279 0/230/230 _ 51.95 11 0 0.0 6.19 6.19 10.31.212.162 observium.xyz.com:80 NULL
11-0 16648 9/244/244 W 60.07 140 0 32.9 7.38 7.38 10.31.212.162 observium.xyz.com:80 GET /graph.php?type=port_bits&legend=yes&height=100&width=275&t
12-0 16649 0/225/225 _ 48.25 9 194 0.0 5.43 5.43 10.31.212.162

13-0 17132 6/225/225 W 47.62 140 0 14.7 5.66 5.66 10.31.212.162 observium.xyz.com:80 POST /ajax/entity_popup.php HTTP/1.1
14-0 17133 0/278/278 _ 55.62 24 1 0.0 5.50 5.50 10.31.212.162

15-0 17135 0/269/269 W 58.24 120 0 0.0 6.39 6.39 10.31.212.162 observium.xyz.com:80 GET /addhost/ HTTP/1.1
16-0 17136 0/353/353 W 69.36 97 0 0.0 5.48 5.48 10.31.212.162 observium.xyz.com:80 POST /ajax/entity_popup.php HTTP/1.1
17-0 18015 0/195/195 _ 47.28 420 515 0.0 5.30 5.30 10.31.212.162 observium.xyz.com:80 NULL
18-0 18016 0/352/352 _ 72.91 133 244 0.0 7.34 7.34 10.31.212.162 observium.xyz.com:80 NULL
19-0 18017 0/241/241 _ 65.89 284 305 0.0 7.79 7.79 10.31.212.162 observium.xyz.com:80 NULL
20-0 18018 0/352/352 W 72.88 105 0 0.0 5.98 5.98 10.31.212.162 observium.xyz.com:80 GET /overview/ HTTP/1.1
21-0 18019 0/222/222 _ 52.68 284 297 0.0 5.99 5.99 10.31.212.162 observium.xyz.com:80 NULL
22-0 18020 0/270/270 _ 54.70 371 225 0.0 5.52 5.52 10.31.212.162 observium.xyz.com:80 NULL
23-0 18021 0/220/220 _ 49.63 9 155 0.0 6.18 6.18 10.31.212.162

24-0 18022 5/266/266 W 59.03 0 0 49.3 7.15 7.15 10.31.212.162 observium.xyz.com:80 GET /server-status HTTP/1.1
25-0 16178 0/324/324 W 62.23 35 0 0.0 6.18 6.18 10.31.212.162 observium.xyz.com:80 GET / HTTP/1.1
26-0 8624 0/61/61 _ 11.56 135 189 0.0 1.09 1.09 10.31.212.162 observium.xyz.com:80 NULL
27-0 11067 0/44/44 _ 8.19 284 240 0.0 0.70 0.70 10.31.212.162 observium.xyz.com:80 NULL

Srv Child Server number - generation
PID OS process ID
Acc Number of accesses this connection / this child / this slot
M Mode of operation
CPU CPU usage, number of seconds
SS Seconds since beginning of most recent request
Req Milliseconds required to process most recent request
Conn Kilobytes transferred this connection
Child Megabytes transferred this child
Slot Total megabytes transferred this slot

On Thu, Oct 27, 2016 at 10:07 AM Jesper Frank Nemholt <jfn@dassic.com> wrote:
Hi!

The server is fairly idle and good on memory and CPU. Apache on this server (it's a QEMU KVM VM) is only serving Observium.

/Jesper

On Thu, Oct 27, 2016 at 10:04 AM Derek <dandenoth@gmail.com> wrote:
How's the server itself doing when it happens? If you run a top command, is Apache maxing out CPU or hogging memory? I had something similar happen when I was trying to configure SSL/TLS on my Apache server, which was running multiple virtual hosts for Observium, a SVN frontend for RANCID, NIPAP, and Smokeping.

Derek

On Thu, Oct 27, 2016 at 8:46 AM, Jesper Frank Nemholt <jfn@dassic.com> wrote:
Hi!

I've started to see that on specific actions from within the Observium GUI Apache will lock up. It seems random what action it is, but all Apache workers stop responding and it takes forever to restart Apache afterwards (5-10 min to do a service httpd restart).

Anyone had the same issue ?

I see no errors in the Apache logs and no errors in Observium logs, nor in other logs on the LInux server (CentOS 7.2). Observium polling still run as usual, so it's just the web interface stopping.


/Jesper

_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium


_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium


_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium


_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium