Hi!, anybody considered scaling adding more servers instead of going to a bigger one?. Using something like Open Grid Scheduler for example.
Currently I'm maxing out our server (2 x Xeon E5-2630, 16 cores) with 37500 ports, 548 devices
cplanning:~ # uptime 13:19pm up 151 days 3:03, 2 users, load average: 43.37, 43.00, 42.60
Regards, CI.-
On 08/23/2013 07:20 PM, Ciro Iriarte wrote:
Hi!, anybody considered scaling adding more servers instead of going to a bigger one?. Using something like Open Grid Scheduler for example.
Currently I'm maxing out our server (2 x Xeon E5-2630, 16 cores) with 37500 ports, 548 devices
cplanning:~ # uptime 13:19pm up 151 days 3:03, 2 users, load average: 43.37, 43.00, 42.60
Maxing out I/O I presume, not CPU?
Tom
2013/8/23 Tom Laermans tom.laermans@powersource.cx:
On 08/23/2013 07:20 PM, Ciro Iriarte wrote:
Hi!, anybody considered scaling adding more servers instead of going to a bigger one?. Using something like Open Grid Scheduler for example.
Currently I'm maxing out our server (2 x Xeon E5-2630, 16 cores) with 37500 ports, 548 devices
cplanning:~ # uptime 13:19pm up 151 days 3:03, 2 users, load average: 43.37, 43.00, 42.60
Maxing out I/O I presume, not CPU?
Tom _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Some stats:
procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 27 0 29200 1312344 1270820 58582484 0 0 0 19822 20043 30668 54 38 7 0 0 22 1 29200 1330072 1270820 58582496 0 0 0 17254 18590 30620 56 39 5 0 0 27 0 29200 1314016 1270820 58582500 0 0 0 7118 16939 27376 57 41 2 0 0 35 0 29200 1389532 1270820 58582504 0 0 0 29470 20672 29516 61 39 0 0 0 47 1 29200 1427376 1270820 58582568 0 0 39 27003 19989 27377 68 32 0 0 0 49 0 29200 1457816 1270820 58583952 0 0 241 12389 16634 25146 69 31 0 0 0 52 0 29200 1434356 1270820 58584000 0 0 0 8750 16340 25067 68 32 0 0 0 46 0 29200 1476840 1270820 58584104 0 0 1 17590 17662 25817 69 31 0 0 0
Hmm, now that I check, the system has additional load since Ago, 5. I'll have a look at that.
Regards,
Tried the poller wrapper as well?
But looks like you're likely limited on IO.
Tried rrd storage on ssd yet?
Sent from my iPhone
On 23 aug. 2013, at 19:21, "Ciro Iriarte" cyruspy@gmail.com wrote:
Hi!, anybody considered scaling adding more servers instead of going to a bigger one?. Using something like Open Grid Scheduler for example.
Currently I'm maxing out our server (2 x Xeon E5-2630, 16 cores) with 37500 ports, 548 devices
cplanning:~ # uptime 13:19pm up 151 days 3:03, 2 users, load average: 43.37, 43.00, 42.60
Regards, CI.- _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Yes, we're using the python poller wrapper. SSD is not available unluckily. CPU i/o wait is near 0. IOPS seems to top at 2k, so it shouldn't be an issue...
2013/8/23 Moerman, Maarten mmoerman@ebay.com:
Tried the poller wrapper as well?
But looks like you're likely limited on IO.
Tried rrd storage on ssd yet?
Sent from my iPhone
On 23 aug. 2013, at 19:21, "Ciro Iriarte" cyruspy@gmail.com wrote:
Hi!, anybody considered scaling adding more servers instead of going to a bigger one?. Using something like Open Grid Scheduler for example.
Currently I'm maxing out our server (2 x Xeon E5-2630, 16 cores) with 37500 ports, 548 devices
cplanning:~ # uptime 13:19pm up 151 days 3:03, 2 users, load average: 43.37, 43.00, 42.60
Regards, CI.- _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Btw, do you have issues? Load is unlikely to be an issue, unless you don't like high numbers...
Sent from my iPhone
On 23 aug. 2013, at 19:21, "Ciro Iriarte" cyruspy@gmail.com wrote:
Hi!, anybody considered scaling adding more servers instead of going to a bigger one?. Using something like Open Grid Scheduler for example.
Currently I'm maxing out our server (2 x Xeon E5-2630, 16 cores) with 37500 ports, 548 devices
cplanning:~ # uptime 13:19pm up 151 days 3:03, 2 users, load average: 43.37, 43.00, 42.60
Regards, CI.- _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Well, a coworker reported some broken graphs which I'm still researching, also I'm being proactive and I'm looking for the best way to scale.
Regards, CI.-
2013/8/23 Moerman, Maarten mmoerman@ebay.com:
Btw, do you have issues? Load is unlikely to be an issue, unless you don't like high numbers...
Sent from my iPhone
On 23 aug. 2013, at 19:21, "Ciro Iriarte" cyruspy@gmail.com wrote:
Hi!, anybody considered scaling adding more servers instead of going to a bigger one?. Using something like Open Grid Scheduler for example.
Currently I'm maxing out our server (2 x Xeon E5-2630, 16 cores) with 37500 ports, 548 devices
cplanning:~ # uptime 13:19pm up 151 days 3:03, 2 users, load average: 43.37, 43.00, 42.60
Regards, CI.- _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
That has definitely not been my experience. Load is the main issue on our server, and it seems to be largely due to contention during polls. I've tuned down vm.dirty_writeback_centisecs to 5000 (default is 500), and I/O is relatively low. But we monitor a lot of systems across bad ADSL links, and this means we have to run a lot of pollers in parallel to get a poll done in 5 minutes (some of our devices actually take more than 300 seconds to poll; they're disabled at the moment). This pushes up the load (we run 5 pollers per core) and makes the system very sluggish. I've ordered a new server to replace this one this week, and I ended up going for 2 x 8 cores for a much smaller install than Ciro's:
Devices 157 139 up 1 down 4 ignored 13 disabled Ports 9150 918 up 6 down 1083 ignored 7006 shutdown
Paul
On 08/24/2013 03:36 AM, Moerman, Maarten wrote:
Btw, do you have issues? Load is unlikely to be an issue, unless you don't like high numbers...
Sent from my iPhone
On 23 aug. 2013, at 19:21, "Ciro Iriarte" cyruspy@gmail.com wrote:
Hi!, anybody considered scaling adding more servers instead of going to a bigger one?. Using something like Open Grid Scheduler for example.
Currently I'm maxing out our server (2 x Xeon E5-2630, 16 cores) with 37500 ports, 548 devices
cplanning:~ # uptime 13:19pm up 151 days 3:03, 2 users, load average: 43.37, 43.00, 42.60
Regards, CI.-
participants (4)
-
Ciro Iriarte
-
Moerman, Maarten
-
Paul Gear
-
Tom Laermans