Hi guys, just wanted to show the new Observium I deployed for a customer this week :)
Everything polled from one single server without problems (pollingtime is about 140s for all devices) Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)
/Markus
Wow! What hardware are you using? Where are your devices located? I’m running about 90s for around 900 devices (21,000+ ports), but I have to transverse WAN links for 95% of it.
Tommy
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Markus Klock Sent: Wednesday, July 11, 2018 8:57 AM To: Observium Network Observation System observium@observium.org Subject: [Observium] New record?
Hi guys, just wanted to show the new Observium I deployed for a customer this week :) [cid:image002.jpg@01D418F6.BB4500B0] Everything polled from one single server without problems (pollingtime is about 140s for all devices) Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)
/Markus
CONFIDENTIALITY NOTICE: This electronic mail transmission and any accompanying documents contain information belonging to the sender which may be confidential and legally privileged. This information is only for the use of the individual or entity to whom this electronic mail transmission was intended. If you are not the intended recipient, any disclosure, copying, distribution, or action taken in reliance on the contents of the information contained in this transmission is strictly prohibited. If you have received this transmission in error, please immediately contact the sender and delete the message. Thank you.
Hi, The server is a dual Xeon 20-core CPUs, 64GB RAM and 2x400GB NVMe-disks specified for write-heavy operation. The NVMe-disks are really great for this type of workload as the can handle ridiculous amount of IOPS without trouble.
Yeah, the key parts of getting this to work is that all the 5000+ switches are close and connected with fiber so latency is really low and the switches themself are very fast on replying to SNMP-querys. (3-5s to poll everything on a single switch)
/Markus
2018-07-11 16:08 GMT+02:00 Balistreri, Thomas C - DOC < Thomas.Balistreri@wisconsin.gov>:
Wow! What hardware are you using? Where are your devices located? I’m running about 90s for around 900 devices (21,000+ ports), but I have to transverse WAN links for 95% of it.
Tommy
*From:* observium [mailto:observium-bounces@observium.org] *On Behalf Of *Markus Klock *Sent:* Wednesday, July 11, 2018 8:57 AM *To:* Observium Network Observation System observium@observium.org *Subject:* [Observium] New record?
Hi guys,
just wanted to show the new Observium I deployed for a customer this week :)
Everything polled from one single server without problems (pollingtime is about 140s for all devices)
Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)
/Markus
CONFIDENTIALITY NOTICE: This electronic mail transmission and any accompanying documents contain information belonging to the sender which may be confidential and legally privileged. This information is only for the use of the individual or entity to whom this electronic mail transmission was intended. If you are not the intended recipient, any disclosure, copying, distribution, or action taken in reliance on the contents of the information contained in this transmission is strictly prohibited. If you have received this transmission in error, please immediately contact the sender and delete the message. Thank you.
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
You should probably decide threads to even out the load. The goal is to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.
Adam.
Sent from BlueMail
On 11 Jul 2018, 07:18, at 07:18, "Balistreri, Thomas C - DOC" thomas.balistreri@wisconsin.gov wrote:
Wow! What hardware are you using? Where are your devices located? I’m running about 90s for around 900 devices (21,000+ ports), but I have to transverse WAN links for 95% of it.
Tommy
From: observium [mailto:observium-bounces@observium.org] On Behalf Of Markus Klock Sent: Wednesday, July 11, 2018 8:57 AM To: Observium Network Observation System observium@observium.org Subject: [Observium] New record?
Hi guys, just wanted to show the new Observium I deployed for a customer this week :) [cid:image002.jpg@01D418F6.BB4500B0] Everything polled from one single server without problems (pollingtime is about 140s for all devices) Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)
/Markus
CONFIDENTIALITY NOTICE: This electronic mail transmission and any accompanying documents contain information belonging to the sender which may be confidential and legally privileged. This information is only for the use of the individual or entity to whom this electronic mail transmission was intended. If you are not the intended recipient, any disclosure, copying, distribution, or action taken in reliance on the contents of the information contained in this transmission is strictly prohibited. If you have received this transmission in error, please immediately contact the sender and delete the message. Thank you.
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1
This makes it work with a lot fewer threads and the load on disk and database is much lower as only 1/5 of the data needs to be progressed at the same time.
/Markus
2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org:
You should probably decide threads to even out the load. The goal is to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.
Adam.
Sent from BlueMail http://www.bluemail.me/r?b=13187 On 11 Jul 2018, at 07:18, "Balistreri, Thomas C - DOC" < thomas.balistreri@wisconsin.gov> wrote:
Wow! What hardware are you using? Where are your devices located? I’m running about 90s for around 900 devices (21,000+ ports), but I have to transverse WAN links for 95% of it.
Tommy
*From:* observium [mailto:observium-bounces@observium.org] *On Behalf Of *Markus Klock *Sent:* Wednesday, July 11, 2018 8:57 AM *To:* Observium Network Observation System observium@observium.org *Subject:* [Observium] New record?
Hi guys,
just wanted to show the new Observium I deployed for a customer this week :)
Everything polled from one single server without problems (pollingtime is about 140s for all devices)
Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)
/Markus
CONFIDENTIALITY NOTICE: This electronic mail transmission and any accompanying documents contain information belonging to the sender which may be confidential and legally privileged. This information is only for the use of the individual or entity to whom this electronic mail transmission was intended. If you are not the intended recipient, any disclosure, copying, distribution, or action taken in reliance on the contents of the information contained in this transmission is strictly prohibited. If you have received this transmission in error, please immediately contact the sender and delete the message. Thank you.
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
But why though? :D
The poller-wrapper's entire purpose is to do what you're doing here :D
adam.
On 2018-07-12 00:36, Markus Klock wrote:
Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1
This makes it work with a lot fewer threads and the load on disk and database is much lower as only 1/5 of the data needs to be progressed at the same time.
/Markus
2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org:
You should probably decide threads to even out the load. The goal is to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.
Adam.
Sent from BlueMail [2]
On 11 Jul 2018, at 07:18, "Balistreri, Thomas C - DOC" thomas.balistreri@wisconsin.gov wrote:
Wow! What hardware are you using? Where are your devices located? I’m running about 90s for around 900 devices (21,000+ ports), but I have to transverse WAN links for 95% of it.
Tommy
FROM: observium [mailto:observium-bounces@observium.org] ON BEHALF OF Markus Klock SENT: Wednesday, July 11, 2018 8:57 AM TO: Observium Network Observation System observium@observium.org SUBJECT: [Observium] New record?
Hi guys,
just wanted to show the new Observium I deployed for a customer this week :)
Everything polled from one single server without problems (pollingtime is about 140s for all devices)
Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)
/Markus
CONFIDENTIALITY NOTICE: This electronic mail transmission and any accompanying documents contain information belonging to the sender which may be confidential and legally privileged. This information is only for the use of the individual or entity to whom this electronic mail transmission was intended. If you are not the intended recipient, any disclosure, copying, distribution, or action taken in reliance on the contents of the information contained in this transmission is strictly prohibited. If you have received this transmission in error, please immediately contact the sender and delete the message. Thank you.
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]
Links:
[1] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [2] http://www.bluemail.me/r?b=13187 _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
The same reason you recommend keeping polling-time as close to 300s as possible. When doing this you can run the poller with much fewer threads needed for polling. Instead of starting every 5min with 128 threads you can do 32 threads every 1min which will be a lot less CPU context switches and much lower IO spikes for the system. The system will be under little load all the time instead of one huge spike and then idle for 150s :) /Markus
2018-07-17 20:33 GMT+02:00 Adam Armstrong adama@memetic.org:
But why though? :D
The poller-wrapper's entire purpose is to do what you're doing here :D
adam.
On 2018-07-12 00:36, Markus Klock wrote:
Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1
This makes it work with a lot fewer threads and the load on disk and database is much lower as only 1/5 of the data needs to be progressed at the same time.
/Markus
2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org:
You should probably decide threads to even out the load. The goal is
to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.
Adam.
Sent from BlueMail [2]
On 11 Jul 2018, at 07:18, "Balistreri, Thomas C - DOC" thomas.balistreri@wisconsin.gov wrote:
Wow! What hardware are you using? Where are your devices located?
I’m running about 90s for around 900 devices (21,000+ ports), but I have to transverse WAN links for 95% of it.
Tommy
FROM: observium [mailto:observium-bounces@observium.org] ON BEHALF OF Markus Klock SENT: Wednesday, July 11, 2018 8:57 AM TO: Observium Network Observation System observium@observium.org SUBJECT: [Observium] New record?
Hi guys,
just wanted to show the new Observium I deployed for a customer this week :)
Everything polled from one single server without problems (pollingtime is about 140s for all devices)
Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)
/Markus
CONFIDENTIALITY NOTICE: This electronic mail transmission and any accompanying documents contain information belonging to the sender which may be confidential and legally privileged. This information is only for the use of the individual or entity to whom this electronic mail transmission was intended. If you are not the intended recipient, any disclosure, copying, distribution, or action taken in reliance on the contents of the information contained in this transmission is strictly prohibited. If you have received this transmission in error, please immediately contact the sender and delete the message. Thank you.
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]
Links:
[1] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [2] http://www.bluemail.me/r?b=13187 _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
I do not think you are right, Herr Swe.
Adam.
Sent from BlueMail
On 17 Jul 2018, 12:25, at 12:25, Markus Klock markus@best-practice.se wrote:
The same reason you recommend keeping polling-time as close to 300s as possible. When doing this you can run the poller with much fewer threads needed for polling. Instead of starting every 5min with 128 threads you can do 32 threads every 1min which will be a lot less CPU context switches and much lower IO spikes for the system. The system will be under little load all the time instead of one huge spike and then idle for 150s :) /Markus
2018-07-17 20:33 GMT+02:00 Adam Armstrong adama@memetic.org:
But why though? :D
The poller-wrapper's entire purpose is to do what you're doing here
:D
adam.
On 2018-07-12 00:36, Markus Klock wrote:
Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1
This makes it work with a lot fewer threads and the load on disk and database is much lower as only 1/5 of the data needs to be
progressed
at the same time.
/Markus
2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org:
You should probably decide threads to even out the load. The goal is
to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.
Adam.
Sent from BlueMail [2]
On 11 Jul 2018, at 07:18, "Balistreri, Thomas C - DOC" thomas.balistreri@wisconsin.gov wrote:
Wow! What hardware are you using? Where are your devices located?
I’m running about 90s for around 900 devices (21,000+ ports), but I have to transverse WAN links for 95% of it.
Tommy
FROM: observium [mailto:observium-bounces@observium.org] ON BEHALF OF Markus Klock SENT: Wednesday, July 11, 2018 8:57 AM TO: Observium Network Observation System observium@observium.org SUBJECT: [Observium] New record?
Hi guys,
just wanted to show the new Observium I deployed for a customer this week :)
Everything polled from one single server without problems (pollingtime is about 140s for all devices)
Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)
/Markus
CONFIDENTIALITY NOTICE: This electronic mail transmission and any accompanying documents contain information belonging to the sender which may be confidential and legally privileged. This information is only for the use of the individual or entity to whom this electronic mail transmission was intended. If you are not the intended recipient, any disclosure, copying, distribution, or action taken in reliance on the contents of the information contained in this transmission is strictly prohibited. If you have received this transmission in error, please immediately contact the sender and delete the message. Thank you.
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]
Links:
[1] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [2] http://www.bluemail.me/r?b=13187 _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
I must agree with Adam here, you're still polling all devices in 5 minutes, after eachother...
With your system you're just sometimes leaving gaps in between the high load instead of continuing with load then dropping off, but the end result is pretty much the same :-)
But hey whatever works ! The amount of devices is damn impressive :-)
On 7/17/2018 11:24 PM, Adam Armstrong wrote:
I do not think you are right, Herr Swe.
Adam.
Sent from BlueMail http://www.bluemail.me/r?b=13187 On 17 Jul 2018, at 12:25, Markus Klock <markus@best-practice.se mailto:markus@best-practice.se> wrote:
The same reason you recommend keeping polling-time as close to 300s as possible. When doing this you can run the poller with much fewer threads needed for polling. Instead of starting every 5min with 128 threads you can do 32 threads every 1min which will be a lot less CPU context switches and much lower IO spikes for the system. The system will be under little load all the time instead of one huge spike and then idle for 150s :) /Markus 2018-07-17 20:33 GMT+02:00 Adam Armstrong <adama@memetic.org <mailto:adama@memetic.org>>: But why though? :D The poller-wrapper's entire purpose is to do what you're doing here :D adam. On 2018-07-12 00:36, Markus Klock wrote: Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1 This makes it work with a lot fewer threads and the load on disk and database is much lower as only 1/5 of the data needs to be progressed at the same time. /Markus 2018-07-12 7:09 GMT+02:00 Adam Armstrong <adama@memetic.org <mailto:adama@memetic.org>>: You should probably decide threads to even out the load. The goal is to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load. Adam. Sent from BlueMail [2]
Hello!
Markus, nice result! Didn't you consider a compressed RAM disk for RRDs? A month ago I've done one more installation of Observium with RRDs in RAM and it's superfast in both polling and Web UI while keeping the underlying SSDs from the heavy IO, helping them to live more :) That installation has about 28GB of raw RRDs, compressed by about 6 times in RAM. The only drawback is that the RAM disk starts about 8 minutes on system startup, as it needs to read all the RRDs from physical drive.
Sent with [ProtonMail](https://protonmail.com) Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On July 19, 2018 4:28 PM, Tom Laermans tom.laermans@powersource.cx wrote:
I must agree with Adam here, you're still polling all devices in 5 minutes, after eachother...
With your system you're just sometimes leaving gaps in between the high load instead of continuing with load then dropping off, but the end result is pretty much the same :-)
But hey whatever works ! The amount of devices is damn impressive :-)
On 7/17/2018 11:24 PM, Adam Armstrong wrote:
I do not think you are right, Herr Swe.
Adam.
Sent from [BlueMail](http://www.bluemail.me/r?b=13187) On 17 Jul 2018, at 12:25, Markus Klock markus@best-practice.se wrote:
The same reason you recommend keeping polling-time as close to 300s as possible. When doing this you can run the poller with much fewer threads needed for polling. Instead of starting every 5min with 128 threads you can do 32 threads every 1min which will be a lot less CPU context switches and much lower IO spikes for the system. The system will be under little load all the time instead of one huge spike and then idle for 150s :) /Markus
2018-07-17 20:33 GMT+02:00 Adam Armstrong adama@memetic.org:
But why though? :D
The poller-wrapper's entire purpose is to do what you're doing here :D
adam.
On 2018-07-12 00:36, Markus Klock wrote:
Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1
This makes it work with a lot fewer threads and the load on disk and database is much lower as only 1/5 of the data needs to be progressed at the same time.
/Markus
2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org:
You should probably decide threads to even out the load. The goal is to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.
Adam.
Sent from BlueMail [2]
Intresting, how did you setup a compressed RAM-disk? And what compression algorithm is used? It would be pretty expensive for this setup through as I have about 430GB of RRDs :) /Markus
Den 20 juli 2018 21:43 skrev "xomka686" xomka686@protonmail.com:
Hello!
Markus, nice result! Didn't you consider a compressed RAM disk for RRDs? A month ago I've done one more installation of Observium with RRDs in RAM and it's superfast in both polling and Web UI while keeping the underlying SSDs from the heavy IO, helping them to live more :) That installation has about 28GB of raw RRDs, compressed by about 6 times in RAM. The only drawback is that the RAM disk starts about 8 minutes on system startup, as it needs to read all the RRDs from physical drive.
Sent with ProtonMail https://protonmail.com Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On July 19, 2018 4:28 PM, Tom Laermans tom.laermans@powersource.cx wrote:
I must agree with Adam here, you're still polling all devices in 5 minutes, after eachother...
With your system you're just sometimes leaving gaps in between the high load instead of continuing with load then dropping off, but the end result is pretty much the same :-)
But hey whatever works ! The amount of devices is damn impressive :-)
On 7/17/2018 11:24 PM, Adam Armstrong wrote:
I do not think you are right, Herr Swe.
Adam.
Sent from BlueMail http://www.bluemail.me/r?b=13187 On 17 Jul 2018, at 12:25, Markus Klock markus@best-practice.se wrote:
The same reason you recommend keeping polling-time as close to 300s as possible. When doing this you can run the poller with much fewer threads needed for polling. Instead of starting every 5min with 128 threads you can do 32 threads every 1min which will be a lot less CPU context switches and much lower IO spikes for the system. The system will be under little load all the time instead of one huge spike and then idle for 150s :) /Markus
2018-07-17 20:33 GMT+02:00 Adam Armstrong adama@memetic.org:
But why though? :D
The poller-wrapper's entire purpose is to do what you're doing here :D
adam.
On 2018-07-12 00:36, Markus Klock wrote:
Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1
This makes it work with a lot fewer threads and the load on disk and database is much lower as only 1/5 of the data needs to be progressed at the same time.
/Markus
2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org:
You should probably decide threads to even out the load. The goal is
to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.
Adam.
Sent from BlueMail [2]
_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
https://en.m.wikipedia.org/wiki/Zram
I think we might have this in our documentation, I forget.
Adam.
Sent from BlueMail
On 20 Jul 2018, 14:32, at 14:32, Markus Klock markus@best-practice.se wrote:
Intresting, how did you setup a compressed RAM-disk? And what compression algorithm is used? It would be pretty expensive for this setup through as I have about 430GB of RRDs :) /Markus
Den 20 juli 2018 21:43 skrev "xomka686" xomka686@protonmail.com:
Hello!
Markus, nice result! Didn't you consider a compressed RAM disk for RRDs? A month ago I've done one more installation of Observium with RRDs in RAM and it's superfast in both polling and Web UI while keeping the underlying SSDs from the heavy IO, helping them to live more :) That installation has about 28GB of raw RRDs, compressed by about 6 times in RAM. The only drawback is that the RAM disk starts about 8 minutes on system startup, as it needs to read all the RRDs from physical drive.
Sent with ProtonMail https://protonmail.com Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On July 19, 2018 4:28 PM, Tom Laermans tom.laermans@powersource.cx wrote:
I must agree with Adam here, you're still polling all devices in 5 minutes, after eachother...
With your system you're just sometimes leaving gaps in between the high load instead of continuing with load then dropping off, but the end result is pretty much the same :-)
But hey whatever works ! The amount of devices is damn impressive :-)
On 7/17/2018 11:24 PM, Adam Armstrong wrote:
I do not think you are right, Herr Swe.
Adam.
Sent from BlueMail http://www.bluemail.me/r?b=13187 On 17 Jul 2018, at 12:25, Markus Klock markus@best-practice.se wrote:
The same reason you recommend keeping polling-time as close to 300s
as
possible. When doing this you can run the poller with much fewer threads needed
for
polling. Instead of starting every 5min with 128 threads you can do 32 threads every 1min which will be a lot less CPU context switches and much
lower IO
spikes for the system. The system will be under little load all the time instead of one huge spike and then idle for 150s :) /Markus
2018-07-17 20:33 GMT+02:00 Adam Armstrong adama@memetic.org:
But why though? :D
The poller-wrapper's entire purpose is to do what you're doing here
:D
adam.
On 2018-07-12 00:36, Markus Klock wrote:
Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1
This makes it work with a lot fewer threads and the load on disk
and
database is much lower as only 1/5 of the data needs to be
progressed
at the same time.
/Markus
2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org:
You should probably decide threads to even out the load. The goal
is
to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.
Adam.
Sent from BlueMail [2]
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
I'm doing compressed RAM in a more weird, but yet working way :) I'm creating a regular tmpfs, a file in it and then a compressed ZFS Pool on top of that file. The reason for such approach is I wasn't able to get ZRAM to work. Compression algorithm I use is LZ4 and for 430GB of raw RRDs it will require no more than 80GB of RAM (given 5x compressratio). 80GB of DDR4 RDIMM RAM in turn costs about $1000, which sounds pretty worth the result for a 5000-device setup ;)
Sent with [ProtonMail](https://protonmail.com) Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On July 21, 2018 12:16 AM, Adam Armstrong adama@memetic.org wrote:
https://en.m.wikipedia.org/wiki/Zram I think we might have this in our documentation, I forget. Adam. Sent from [BlueMail](http://www.bluemail.me/r?b=13187) On 20 Jul 2018, at 14:32, Markus Klock markus@best-practice.se wrote:
Intresting, how did you setup a compressed RAM-disk? And what compression algorithm is used? It would be pretty expensive for this setup through as I have about 430GB of RRDs :) /Markus
Den 20 juli 2018 21:43 skrev "xomka686" xomka686@protonmail.com:
Hello!
Markus, nice result! Didn't you consider a compressed RAM disk for RRDs? A month ago I've done one more installation of Observium with RRDs in RAM and it's superfast in both polling and Web UI while keeping the underlying SSDs from the heavy IO, helping them to live more :) That installation has about 28GB of raw RRDs, compressed by about 6 times in RAM. The only drawback is that the RAM disk starts about 8 minutes on system startup, as it needs to read all the RRDs from physical drive.
Sent with [ProtonMail](https://protonmail.com) Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On July 19, 2018 4:28 PM, Tom Laermans tom.laermans@powersource.cx wrote:
I must agree with Adam here, you're still polling all devices in 5 minutes, after eachother...
With your system you're just sometimes leaving gaps in between the high load instead of continuing with load then dropping off, but the end result is pretty much the same :-)
But hey whatever works ! The amount of devices is damn impressive :-)
On 7/17/2018 11:24 PM, Adam Armstrong wrote:
I do not think you are right, Herr Swe.
Adam.
Sent from [BlueMail](http://www.bluemail.me/r?b=13187) On 17 Jul 2018, at 12:25, Markus Klock markus@best-practice.se wrote:
The same reason you recommend keeping polling-time as close to 300s as possible. When doing this you can run the poller with much fewer threads needed for polling. Instead of starting every 5min with 128 threads you can do 32 threads every 1min which will be a lot less CPU context switches and much lower IO spikes for the system. The system will be under little load all the time instead of one huge spike and then idle for 150s :) /Markus
2018-07-17 20:33 GMT+02:00 Adam Armstrong adama@memetic.org:
> But why though? :D > > The poller-wrapper's entire purpose is to do what you're doing here :D > > adam. > > On 2018-07-12 00:36, Markus Klock wrote: > >> Yeah, I usually even out the server load by splitting the number of >> devices in 5 and start the polling of them 1 minute apart with cron >> like this: >> 0-59/5 * * * * observium /opt/observium/observium-wrapper >> polling -i 5 -n 0 >> /dev/null 2>&1 >> 1-59/5 * * * * observium /opt/observium/observium-wrapper >> polling -i 5 -n 1 >> /dev/null 2>&1 >> 2-59/5 * * * * observium /opt/observium/observium-wrapper >> polling -i 5 -n 2 >> /dev/null 2>&1 >> 3-59/5 * * * * observium /opt/observium/observium-wrapper >> polling -i 5 -n 3 >> /dev/null 2>&1 >> 4-59/5 * * * * observium /opt/observium/observium-wrapper >> polling -i 5 -n 4 >> /dev/null 2>&1 >> >> This makes it work with a lot fewer threads and the load on disk and >> database is much lower as only 1/5 of the data needs to be progressed >> at the same time. >> >> /Markus >> >> 2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org: >> >>> You should probably decide threads to even out the load. The goal is >>> to be as close to 300 seconds as you can manage, else you'll get >>> spiky io and myself load. >>> >>> Adam. >>> >>> Sent from BlueMail [2]
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list
observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
I see, interesting solution :) /Markus
2018-07-21 13:12 GMT+02:00 xomka686 xomka686@protonmail.com:
I'm doing compressed RAM in a more weird, but yet working way :) I'm creating a regular tmpfs, a file in it and then a compressed ZFS Pool on top of that file. The reason for such approach is I wasn't able to get ZRAM to work. Compression algorithm I use is LZ4 and for 430GB of raw RRDs it will require no more than 80GB of RAM (given 5x compressratio). 80GB of DDR4 RDIMM RAM in turn costs about $1000, which sounds pretty worth the result for a 5000-device setup ;)
Sent with ProtonMail https://protonmail.com Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On July 21, 2018 12:16 AM, Adam Armstrong adama@memetic.org wrote:
https://en.m.wikipedia.org/wiki/Zram I think we might have this in our documentation, I forget. Adam. Sent from BlueMail http://www.bluemail.me/r?b=13187 On 20 Jul 2018, at 14:32, Markus Klock markus@best-practice.se wrote:
Intresting, how did you setup a compressed RAM-disk? And what compression algorithm is used? It would be pretty expensive for this setup through as I have about 430GB of RRDs :) /Markus
Den 20 juli 2018 21:43 skrev "xomka686" xomka686@protonmail.com:
Hello!
Markus, nice result! Didn't you consider a compressed RAM disk for RRDs? A month ago I've done one more installation of Observium with RRDs in RAM and it's superfast in both polling and Web UI while keeping the underlying SSDs from the heavy IO, helping them to live more :) That installation has about 28GB of raw RRDs, compressed by about 6 times in RAM. The only drawback is that the RAM disk starts about 8 minutes on system startup, as it needs to read all the RRDs from physical drive.
Sent with ProtonMail https://protonmail.com Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On July 19, 2018 4:28 PM, Tom Laermans tom.laermans@powersource.cx wrote:
I must agree with Adam here, you're still polling all devices in 5 minutes, after eachother...
With your system you're just sometimes leaving gaps in between the high load instead of continuing with load then dropping off, but the end result is pretty much the same :-)
But hey whatever works ! The amount of devices is damn impressive :-)
On 7/17/2018 11:24 PM, Adam Armstrong wrote:
I do not think you are right, Herr Swe.
Adam.
Sent from BlueMail http://www.bluemail.me/r?b=13187 On 17 Jul 2018, at 12:25, Markus Klock markus@best-practice.se wrote:
The same reason you recommend keeping polling-time as close to 300s as possible. When doing this you can run the poller with much fewer threads needed for polling. Instead of starting every 5min with 128 threads you can do 32 threads every 1min which will be a lot less CPU context switches and much lower IO spikes for the system. The system will be under little load all the time instead of one huge spike and then idle for 150s :) /Markus
2018-07-17 20:33 GMT+02:00 Adam Armstrong adama@memetic.org:
But why though? :D
The poller-wrapper's entire purpose is to do what you're doing here :D
adam.
On 2018-07-12 00:36, Markus Klock wrote:
Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1
This makes it work with a lot fewer threads and the load on disk and database is much lower as only 1/5 of the data needs to be progressed at the same time.
/Markus
2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org:
You should probably decide threads to even out the load. The goal is to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.
Adam.
Sent from BlueMail [2]
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Markus,
Will you share the Specs and resources of the single Ubuntu server running this site?
*Sam Jones*
Idaho State University
jonesamu@isu.edu
On Wed, Jul 11, 2018 at 7:56 AM, Markus Klock markus@best-practice.se wrote:
Hi guys, just wanted to show the new Observium I deployed for a customer this week :)
Everything polled from one single server without problems (pollingtime is about 140s for all devices) Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)
/Markus
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
participants (6)
-
Adam Armstrong
-
Balistreri, Thomas C - DOC
-
Markus Klock
-
Sam Jones
-
Tom Laermans
-
xomka686