New record?

older
additional file systems not shown...

Markus Klock

11 Jul 2018 11 Jul '18

3:56 p.m.

Hi guys, just wanted to show the new Observium I deployed for a customer this week :)

Everything polled from one single server without problems (pollingtime is about 140s for all devices) Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)

/Markus

Attachments:

attachment.html (text/html — 599 bytes)
observium_install.png (image/png — 118.2 KB)

Show replies by date

Balistreri, Thomas C - DOC

11 Jul 11 Jul

4:08 p.m.

Wow! What hardware are you using? Where are your devices located? I’m running about 90s for around 900 devices (21,000+ ports), but I have to transverse WAN links for 95% of it.

Tommy

From: observium [mailto:observium-bounces@observium.org] On Behalf Of Markus Klock Sent: Wednesday, July 11, 2018 8:57 AM To: Observium Network Observation System observium@observium.org Subject: [Observium] New record?

Hi guys, just wanted to show the new Observium I deployed for a customer this week :) [cid:image002.jpg@01D418F6.BB4500B0] Everything polled from one single server without problems (pollingtime is about 140s for all devices) Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)

/Markus

CONFIDENTIALITY NOTICE: This electronic mail transmission and any accompanying documents contain information belonging to the sender which may be confidential and legally privileged. This information is only for the use of the individual or entity to whom this electronic mail transmission was intended. If you are not the intended recipient, any disclosure, copying, distribution, or action taken in reliance on the contents of the information contained in this transmission is strictly prohibited. If you have received this transmission in error, please immediately contact the sender and delete the message. Thank you.

Markus Klock

5:08 p.m.

Hi, The server is a dual Xeon 20-core CPUs, 64GB RAM and 2x400GB NVMe-disks specified for write-heavy operation. The NVMe-disks are really great for this type of workload as the can handle ridiculous amount of IOPS without trouble.

Yeah, the key parts of getting this to work is that all the 5000+ switches are close and connected with fiber so latency is really low and the switches themself are very fast on replying to SNMP-querys. (3-5s to poll everything on a single switch)

/Markus

2018-07-11 16:08 GMT+02:00 Balistreri, Thomas C - DOC < Thomas.Balistreri@wisconsin.gov>:

...

Wow! What hardware are you using? Where are your devices located? I’m running about 90s for around 900 devices (21,000+ ports), but I have to transverse WAN links for 95% of it.

Tommy

*From:* observium [mailto:observium-bounces@observium.org] *On Behalf Of *Markus Klock *Sent:* Wednesday, July 11, 2018 8:57 AM *To:* Observium Network Observation System observium@observium.org *Subject:* [Observium] New record?

Hi guys,

just wanted to show the new Observium I deployed for a customer this week :)

Everything polled from one single server without problems (pollingtime is about 140s for all devices)

Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)

/Markus

CONFIDENTIALITY NOTICE: This electronic mail transmission and any accompanying documents contain information belonging to the sender which may be confidential and legally privileged. This information is only for the use of the individual or entity to whom this electronic mail transmission was intended. If you are not the intended recipient, any disclosure, copying, distribution, or action taken in reliance on the contents of the information contained in this transmission is strictly prohibited. If you have received this transmission in error, please immediately contact the sender and delete the message. Thank you.

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Adam Armstrong

12 Jul 12 Jul

7:09 a.m.

You should probably decide threads to even out the load. The goal is to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.

Adam.

⁣Sent from BlueMail

On 11 Jul 2018, 07:18, at 07:18, "Balistreri, Thomas C - DOC" thomas.balistreri@wisconsin.gov wrote:

...

Wow! What hardware are you using? Where are your devices located? I’m running about 90s for around 900 devices (21,000+ ports), but I have to transverse WAN links for 95% of it.

Tommy

From: observium [mailto:observium-bounces@observium.org] On Behalf Of Markus Klock Sent: Wednesday, July 11, 2018 8:57 AM To: Observium Network Observation System observium@observium.org Subject: [Observium] New record?

Hi guys, just wanted to show the new Observium I deployed for a customer this week :) [cid:image002.jpg@01D418F6.BB4500B0] Everything polled from one single server without problems (pollingtime is about 140s for all devices) Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)

/Markus

CONFIDENTIALITY NOTICE: This electronic mail transmission and any accompanying documents contain information belonging to the sender which may be confidential and legally privileged. This information is only for the use of the individual or entity to whom this electronic mail transmission was intended. If you are not the intended recipient, any disclosure, copying, distribution, or action taken in reliance on the contents of the information contained in this transmission is strictly prohibited. If you have received this transmission in error, please immediately contact the sender and delete the message. Thank you.

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Markus Klock

9:36 a.m.

Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1

This makes it work with a lot fewer threads and the load on disk and database is much lower as only 1/5 of the data needs to be progressed at the same time.

/Markus

2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org:

...

You should probably decide threads to even out the load. The goal is to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.

Adam.

Sent from BlueMail http://www.bluemail.me/r?b=13187 On 11 Jul 2018, at 07:18, "Balistreri, Thomas C - DOC" < thomas.balistreri@wisconsin.gov> wrote:

...
Wow! What hardware are you using? Where are your devices located? I’m running about 90s for around 900 devices (21,000+ ports), but I have to transverse WAN links for 95% of it.

Tommy

*From:* observium [mailto:observium-bounces@observium.org] *On Behalf Of *Markus Klock *Sent:* Wednesday, July 11, 2018 8:57 AM *To:* Observium Network Observation System observium@observium.org *Subject:* [Observium] New record?

Hi guys,

just wanted to show the new Observium I deployed for a customer this week :)

Everything polled from one single server without problems (pollingtime is about 140s for all devices)

Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)

/Markus

CONFIDENTIALITY NOTICE: This electronic mail transmission and any accompanying documents contain information belonging to the sender which may be confidential and legally privileged. This information is only for the use of the individual or entity to whom this electronic mail transmission was intended. If you are not the intended recipient, any disclosure, copying, distribution, or action taken in reliance on the contents of the information contained in this transmission is strictly prohibited. If you have received this transmission in error, please immediately contact the sender and delete the message. Thank you.

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Adam Armstrong

17 Jul 17 Jul

8:33 p.m.

But why though? :D

The poller-wrapper's entire purpose is to do what you're doing here :D

adam.

On 2018-07-12 00:36, Markus Klock wrote:

...

Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1

This makes it work with a lot fewer threads and the load on disk and database is much lower as only 1/5 of the data needs to be progressed at the same time.

/Markus

2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org:

...
You should probably decide threads to even out the load. The goal is to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.

Adam.

Sent from BlueMail [2]

On 11 Jul 2018, at 07:18, "Balistreri, Thomas C - DOC" thomas.balistreri@wisconsin.gov wrote:

...
Wow! What hardware are you using? Where are your devices located? I’m running about 90s for around 900 devices (21,000+ ports), but I have to transverse WAN links for 95% of it.

Tommy

FROM: observium [mailto:observium-bounces@observium.org] ON BEHALF OF Markus Klock SENT: Wednesday, July 11, 2018 8:57 AM TO: Observium Network Observation System observium@observium.org SUBJECT: [Observium] New record?

Hi guys,

just wanted to show the new Observium I deployed for a customer this week :)

Everything polled from one single server without problems (pollingtime is about 140s for all devices)

Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)

/Markus

CONFIDENTIALITY NOTICE: This electronic mail transmission and any accompanying documents contain information belonging to the sender which may be confidential and legally privileged. This information is only for the use of the individual or entity to whom this electronic mail transmission was intended. If you are not the intended recipient, any disclosure, copying, distribution, or action taken in reliance on the contents of the information contained in this transmission is strictly prohibited. If you have received this transmission in error, please immediately contact the sender and delete the message. Thank you.

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]

Links:

[1] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [2] http://www.bluemail.me/r?b=13187 _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Markus Klock

9:25 p.m.

The same reason you recommend keeping polling-time as close to 300s as possible. When doing this you can run the poller with much fewer threads needed for polling. Instead of starting every 5min with 128 threads you can do 32 threads every 1min which will be a lot less CPU context switches and much lower IO spikes for the system. The system will be under little load all the time instead of one huge spike and then idle for 150s :) /Markus

2018-07-17 20:33 GMT+02:00 Adam Armstrong adama@memetic.org:

...

But why though? :D

The poller-wrapper's entire purpose is to do what you're doing here :D

adam.

On 2018-07-12 00:36, Markus Klock wrote:

...
Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1

This makes it work with a lot fewer threads and the load on disk and database is much lower as only 1/5 of the data needs to be progressed at the same time.

/Markus

2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org:

You should probably decide threads to even out the load. The goal is

...
to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.

Adam.

Sent from BlueMail [2]

On 11 Jul 2018, at 07:18, "Balistreri, Thomas C - DOC" thomas.balistreri@wisconsin.gov wrote:

Wow! What hardware are you using? Where are your devices located?

...
I’m running about 90s for around 900 devices (21,000+ ports), but I have to transverse WAN links for 95% of it.

Tommy

FROM: observium [mailto:observium-bounces@observium.org] ON BEHALF OF Markus Klock SENT: Wednesday, July 11, 2018 8:57 AM TO: Observium Network Observation System observium@observium.org SUBJECT: [Observium] New record?

Hi guys,

just wanted to show the new Observium I deployed for a customer this week :)

Everything polled from one single server without problems (pollingtime is about 140s for all devices)

Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)

/Markus

CONFIDENTIALITY NOTICE: This electronic mail transmission and any accompanying documents contain information belonging to the sender which may be confidential and legally privileged. This information is only for the use of the individual or entity to whom this electronic mail transmission was intended. If you are not the intended recipient, any disclosure, copying, distribution, or action taken in reliance on the contents of the information contained in this transmission is strictly prohibited. If you have received this transmission in error, please immediately contact the sender and delete the message. Thank you.

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]

Links:

[1] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [2] http://www.bluemail.me/r?b=13187 _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Adam Armstrong

11:24 p.m.

I do not think you are right, Herr Swe.

Adam.

⁣Sent from BlueMail

On 17 Jul 2018, 12:25, at 12:25, Markus Klock markus@best-practice.se wrote:

...

The same reason you recommend keeping polling-time as close to 300s as possible. When doing this you can run the poller with much fewer threads needed for polling. Instead of starting every 5min with 128 threads you can do 32 threads every 1min which will be a lot less CPU context switches and much lower IO spikes for the system. The system will be under little load all the time instead of one huge spike and then idle for 150s :) /Markus

2018-07-17 20:33 GMT+02:00 Adam Armstrong adama@memetic.org:

...
But why though? :D

The poller-wrapper's entire purpose is to do what you're doing here

:D

...
adam.

On 2018-07-12 00:36, Markus Klock wrote:

...
Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1

This makes it work with a lot fewer threads and the load on disk and database is much lower as only 1/5 of the data needs to be

progressed

...
...
at the same time.

/Markus

2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org:

You should probably decide threads to even out the load. The goal is

...
to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.

Adam.

Sent from BlueMail [2]

On 11 Jul 2018, at 07:18, "Balistreri, Thomas C - DOC" thomas.balistreri@wisconsin.gov wrote:

Wow! What hardware are you using? Where are your devices located?

...
I’m running about 90s for around 900 devices (21,000+ ports), but I have to transverse WAN links for 95% of it.

Tommy

FROM: observium [mailto:observium-bounces@observium.org] ON BEHALF OF Markus Klock SENT: Wednesday, July 11, 2018 8:57 AM TO: Observium Network Observation System observium@observium.org SUBJECT: [Observium] New record?

Hi guys,

just wanted to show the new Observium I deployed for a customer this week :)

Everything polled from one single server without problems (pollingtime is about 140s for all devices)

Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)

/Markus

CONFIDENTIALITY NOTICE: This electronic mail transmission and any accompanying documents contain information belonging to the sender which may be confidential and legally privileged. This information is only for the use of the individual or entity to whom this electronic mail transmission was intended. If you are not the intended recipient, any disclosure, copying, distribution, or action taken in reliance on the contents of the information contained in this transmission is strictly prohibited. If you have received this transmission in error, please immediately contact the sender and delete the message. Thank you.

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [1]

Links:

[1] http://postman.memetic.org/cgi-bin/mailman/listinfo/observium [2] http://www.bluemail.me/r?b=13187 _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Tom Laermans

19 Jul 19 Jul

3:28 p.m.

I must agree with Adam here, you're still polling all devices in 5 minutes, after eachother...

With your system you're just sometimes leaving gaps in between the high load instead of continuing with load then dropping off, but the end result is pretty much the same :-)

But hey whatever works ! The amount of devices is damn impressive :-)

On 7/17/2018 11:24 PM, Adam Armstrong wrote:

...

I do not think you are right, Herr Swe.

Adam.

Sent from BlueMail http://www.bluemail.me/r?b=13187 On 17 Jul 2018, at 12:25, Markus Klock <markus@best-practice.se mailto:markus@best-practice.se> wrote:

The same reason you recommend keeping polling-time as close to
300s as possible.
When doing this you can run the poller with much fewer threads
needed for polling.
Instead of starting every 5min with 128 threads you can do 32
threads every 1min which will be a lot less CPU context switches
and much lower IO spikes for the system.
The system will be under little load all the time instead of one
huge spike and then idle for 150s :)
/Markus

2018-07-17 20:33 GMT+02:00 Adam Armstrong <adama@memetic.org
<mailto:adama@memetic.org>>:

    But why though? :D

    The poller-wrapper's entire purpose is to do what you're doing
    here :D

    adam.


    On 2018-07-12 00:36, Markus Klock wrote:

        Yeah, I usually even out the server load by splitting the
        number of
        devices in 5 and start the polling of them 1 minute apart
        with cron
        like this:
        0-59/5 *     * * *   observium
        /opt/observium/observium-wrapper
        polling -i 5 -n 0 >> /dev/null 2>&1
        1-59/5 *     * * *   observium
        /opt/observium/observium-wrapper
        polling -i 5 -n 1 >> /dev/null 2>&1
        2-59/5 *     * * *   observium
        /opt/observium/observium-wrapper
        polling -i 5 -n 2 >> /dev/null 2>&1
        3-59/5 *     * * *   observium
        /opt/observium/observium-wrapper
        polling -i 5 -n 3 >> /dev/null 2>&1
        4-59/5 *     * * *   observium
        /opt/observium/observium-wrapper
        polling -i 5 -n 4 >> /dev/null 2>&1

        This makes it work with a lot fewer threads and the load
        on disk and
        database is much lower as only 1/5 of the data needs to be
        progressed
        at the same time.

        /Markus

        2018-07-12 7:09 GMT+02:00 Adam Armstrong
        <adama@memetic.org <mailto:adama@memetic.org>>:

            You should probably decide threads to even out the
            load. The goal is
            to be as close to 300 seconds as you can manage, else
            you'll get
            spiky io and myself load.

            Adam.

            Sent from BlueMail [2]

xomka686

20 Jul 20 Jul

9:34 p.m.

Hello!

Markus, nice result! Didn't you consider a compressed RAM disk for RRDs? A month ago I've done one more installation of Observium with RRDs in RAM and it's superfast in both polling and Web UI while keeping the underlying SSDs from the heavy IO, helping them to live more :) That installation has about 28GB of raw RRDs, compressed by about 6 times in RAM. The only drawback is that the RAM disk starts about 8 minutes on system startup, as it needs to read all the RRDs from physical drive.

Sent with [ProtonMail](https://protonmail.com) Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On July 19, 2018 4:28 PM, Tom Laermans tom.laermans@powersource.cx wrote:

...

I must agree with Adam here, you're still polling all devices in 5 minutes, after eachother...

With your system you're just sometimes leaving gaps in between the high load instead of continuing with load then dropping off, but the end result is pretty much the same :-)

But hey whatever works ! The amount of devices is damn impressive :-)

On 7/17/2018 11:24 PM, Adam Armstrong wrote:

...
I do not think you are right, Herr Swe.

Adam.

Sent from [BlueMail](http://www.bluemail.me/r?b=13187) On 17 Jul 2018, at 12:25, Markus Klock markus@best-practice.se wrote:

...
The same reason you recommend keeping polling-time as close to 300s as possible. When doing this you can run the poller with much fewer threads needed for polling. Instead of starting every 5min with 128 threads you can do 32 threads every 1min which will be a lot less CPU context switches and much lower IO spikes for the system. The system will be under little load all the time instead of one huge spike and then idle for 150s :) /Markus

2018-07-17 20:33 GMT+02:00 Adam Armstrong adama@memetic.org:

...
But why though? :D

The poller-wrapper's entire purpose is to do what you're doing here :D

adam.

On 2018-07-12 00:36, Markus Klock wrote:

...
Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1

This makes it work with a lot fewer threads and the load on disk and database is much lower as only 1/5 of the data needs to be progressed at the same time.

/Markus

2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org:

...
You should probably decide threads to even out the load. The goal is to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.

Adam.

Sent from BlueMail [2]

Markus Klock

10:31 p.m.

Intresting, how did you setup a compressed RAM-disk? And what compression algorithm is used? It would be pretty expensive for this setup through as I have about 430GB of RRDs :) /Markus

Den 20 juli 2018 21:43 skrev "xomka686" xomka686@protonmail.com:

Hello!

Sent with ProtonMail https://protonmail.com Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On July 19, 2018 4:28 PM, Tom Laermans tom.laermans@powersource.cx wrote:

I must agree with Adam here, you're still polling all devices in 5 minutes, after eachother...

With your system you're just sometimes leaving gaps in between the high load instead of continuing with load then dropping off, but the end result is pretty much the same :-)

But hey whatever works ! The amount of devices is damn impressive :-)

On 7/17/2018 11:24 PM, Adam Armstrong wrote:

I do not think you are right, Herr Swe.

Adam.

Sent from BlueMail http://www.bluemail.me/r?b=13187 On 17 Jul 2018, at 12:25, Markus Klock markus@best-practice.se wrote:

...

The same reason you recommend keeping polling-time as close to 300s as possible. When doing this you can run the poller with much fewer threads needed for polling. Instead of starting every 5min with 128 threads you can do 32 threads every 1min which will be a lot less CPU context switches and much lower IO spikes for the system. The system will be under little load all the time instead of one huge spike and then idle for 150s :) /Markus

2018-07-17 20:33 GMT+02:00 Adam Armstrong adama@memetic.org:

But why though? :D

...
The poller-wrapper's entire purpose is to do what you're doing here :D

adam.

On 2018-07-12 00:36, Markus Klock wrote:

...
Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1

This makes it work with a lot fewer threads and the load on disk and database is much lower as only 1/5 of the data needs to be progressed at the same time.

/Markus

2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org:

You should probably decide threads to even out the load. The goal is

...
to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.

Adam.

Sent from BlueMail [2]

_______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Adam Armstrong

11:16 p.m.

https://en.m.wikipedia.org/wiki/Zram

I think we might have this in our documentation, I forget.

Adam.

⁣Sent from BlueMail

On 20 Jul 2018, 14:32, at 14:32, Markus Klock markus@best-practice.se wrote:

...

Intresting, how did you setup a compressed RAM-disk? And what compression algorithm is used? It would be pretty expensive for this setup through as I have about 430GB of RRDs :) /Markus

Den 20 juli 2018 21:43 skrev "xomka686" xomka686@protonmail.com:

Hello!

Markus, nice result! Didn't you consider a compressed RAM disk for RRDs? A month ago I've done one more installation of Observium with RRDs in RAM and it's superfast in both polling and Web UI while keeping the underlying SSDs from the heavy IO, helping them to live more :) That installation has about 28GB of raw RRDs, compressed by about 6 times in RAM. The only drawback is that the RAM disk starts about 8 minutes on system startup, as it needs to read all the RRDs from physical drive.

Sent with ProtonMail https://protonmail.com Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On July 19, 2018 4:28 PM, Tom Laermans tom.laermans@powersource.cx wrote:

I must agree with Adam here, you're still polling all devices in 5 minutes, after eachother...

With your system you're just sometimes leaving gaps in between the high load instead of continuing with load then dropping off, but the end result is pretty much the same :-)

But hey whatever works ! The amount of devices is damn impressive :-)

On 7/17/2018 11:24 PM, Adam Armstrong wrote:

I do not think you are right, Herr Swe.

Adam.

Sent from BlueMail http://www.bluemail.me/r?b=13187 On 17 Jul 2018, at 12:25, Markus Klock markus@best-practice.se wrote:

...
The same reason you recommend keeping polling-time as close to 300s

as

...
possible. When doing this you can run the poller with much fewer threads needed

for

...
polling. Instead of starting every 5min with 128 threads you can do 32 threads every 1min which will be a lot less CPU context switches and much

lower IO

...
spikes for the system. The system will be under little load all the time instead of one huge spike and then idle for 150s :) /Markus

2018-07-17 20:33 GMT+02:00 Adam Armstrong adama@memetic.org:

But why though? :D

...
The poller-wrapper's entire purpose is to do what you're doing here

:D

...
...
adam.

On 2018-07-12 00:36, Markus Klock wrote:

...
Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1

This makes it work with a lot fewer threads and the load on disk

and

...
...
...
database is much lower as only 1/5 of the data needs to be

progressed

...
...
...
at the same time.

/Markus

2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org:

You should probably decide threads to even out the load. The goal

is

...
...
...
...
to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.

Adam.

Sent from BlueMail [2]

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

xomka686

21 Jul 21 Jul

1:12 p.m.

I'm doing compressed RAM in a more weird, but yet working way :) I'm creating a regular tmpfs, a file in it and then a compressed ZFS Pool on top of that file. The reason for such approach is I wasn't able to get ZRAM to work. Compression algorithm I use is LZ4 and for 430GB of raw RRDs it will require no more than 80GB of RAM (given 5x compressratio). 80GB of DDR4 RDIMM RAM in turn costs about $1000, which sounds pretty worth the result for a 5000-device setup ;)

Sent with [ProtonMail](https://protonmail.com) Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On July 21, 2018 12:16 AM, Adam Armstrong adama@memetic.org wrote:

...

https://en.m.wikipedia.org/wiki/Zram I think we might have this in our documentation, I forget. Adam. Sent from [BlueMail](http://www.bluemail.me/r?b=13187) On 20 Jul 2018, at 14:32, Markus Klock markus@best-practice.se wrote:

...
Intresting, how did you setup a compressed RAM-disk? And what compression algorithm is used? It would be pretty expensive for this setup through as I have about 430GB of RRDs :) /Markus

Den 20 juli 2018 21:43 skrev "xomka686" xomka686@protonmail.com:

...
Hello!

Markus, nice result! Didn't you consider a compressed RAM disk for RRDs? A month ago I've done one more installation of Observium with RRDs in RAM and it's superfast in both polling and Web UI while keeping the underlying SSDs from the heavy IO, helping them to live more :) That installation has about 28GB of raw RRDs, compressed by about 6 times in RAM. The only drawback is that the RAM disk starts about 8 minutes on system startup, as it needs to read all the RRDs from physical drive.

Sent with [ProtonMail](https://protonmail.com) Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On July 19, 2018 4:28 PM, Tom Laermans tom.laermans@powersource.cx wrote:

...
I must agree with Adam here, you're still polling all devices in 5 minutes, after eachother...

With your system you're just sometimes leaving gaps in between the high load instead of continuing with load then dropping off, but the end result is pretty much the same :-)

But hey whatever works ! The amount of devices is damn impressive :-)

On 7/17/2018 11:24 PM, Adam Armstrong wrote:

...
I do not think you are right, Herr Swe.

Adam.

Sent from [BlueMail](http://www.bluemail.me/r?b=13187) On 17 Jul 2018, at 12:25, Markus Klock markus@best-practice.se wrote:

...
The same reason you recommend keeping polling-time as close to 300s as possible. When doing this you can run the poller with much fewer threads needed for polling. Instead of starting every 5min with 128 threads you can do 32 threads every 1min which will be a lot less CPU context switches and much lower IO spikes for the system. The system will be under little load all the time instead of one huge spike and then idle for 150s :) /Markus

2018-07-17 20:33 GMT+02:00 Adam Armstrong adama@memetic.org:

> But why though? :D > > The poller-wrapper's entire purpose is to do what you're doing here :D > > adam. > > On 2018-07-12 00:36, Markus Klock wrote: > >> Yeah, I usually even out the server load by splitting the number of >> devices in 5 and start the polling of them 1 minute apart with cron >> like this: >> 0-59/5 * * * * observium /opt/observium/observium-wrapper >> polling -i 5 -n 0 >> /dev/null 2>&1 >> 1-59/5 * * * * observium /opt/observium/observium-wrapper >> polling -i 5 -n 1 >> /dev/null 2>&1 >> 2-59/5 * * * * observium /opt/observium/observium-wrapper >> polling -i 5 -n 2 >> /dev/null 2>&1 >> 3-59/5 * * * * observium /opt/observium/observium-wrapper >> polling -i 5 -n 3 >> /dev/null 2>&1 >> 4-59/5 * * * * observium /opt/observium/observium-wrapper >> polling -i 5 -n 4 >> /dev/null 2>&1 >> >> This makes it work with a lot fewer threads and the load on disk and >> database is much lower as only 1/5 of the data needs to be progressed >> at the same time. >> >> /Markus >> >> 2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org: >> >>> You should probably decide threads to even out the load. The goal is >>> to be as close to 300 seconds as you can manage, else you'll get >>> spiky io and myself load. >>> >>> Adam. >>> >>> Sent from BlueMail [2]

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

observium mailing list

observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Markus Klock

5:26 p.m.

I see, interesting solution :) /Markus

2018-07-21 13:12 GMT+02:00 xomka686 xomka686@protonmail.com:

...

I'm doing compressed RAM in a more weird, but yet working way :) I'm creating a regular tmpfs, a file in it and then a compressed ZFS Pool on top of that file. The reason for such approach is I wasn't able to get ZRAM to work. Compression algorithm I use is LZ4 and for 430GB of raw RRDs it will require no more than 80GB of RAM (given 5x compressratio). 80GB of DDR4 RDIMM RAM in turn costs about $1000, which sounds pretty worth the result for a 5000-device setup ;)

Sent with ProtonMail https://protonmail.com Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On July 21, 2018 12:16 AM, Adam Armstrong adama@memetic.org wrote:

https://en.m.wikipedia.org/wiki/Zram I think we might have this in our documentation, I forget. Adam. Sent from BlueMail http://www.bluemail.me/r?b=13187 On 20 Jul 2018, at 14:32, Markus Klock markus@best-practice.se wrote:

...
Intresting, how did you setup a compressed RAM-disk? And what compression algorithm is used? It would be pretty expensive for this setup through as I have about 430GB of RRDs :) /Markus

Den 20 juli 2018 21:43 skrev "xomka686" xomka686@protonmail.com:

Hello!

Markus, nice result! Didn't you consider a compressed RAM disk for RRDs? A month ago I've done one more installation of Observium with RRDs in RAM and it's superfast in both polling and Web UI while keeping the underlying SSDs from the heavy IO, helping them to live more :) That installation has about 28GB of raw RRDs, compressed by about 6 times in RAM. The only drawback is that the RAM disk starts about 8 minutes on system startup, as it needs to read all the RRDs from physical drive.

Sent with ProtonMail https://protonmail.com Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On July 19, 2018 4:28 PM, Tom Laermans tom.laermans@powersource.cx wrote:

I must agree with Adam here, you're still polling all devices in 5 minutes, after eachother...

With your system you're just sometimes leaving gaps in between the high load instead of continuing with load then dropping off, but the end result is pretty much the same :-)

But hey whatever works ! The amount of devices is damn impressive :-)

On 7/17/2018 11:24 PM, Adam Armstrong wrote:

I do not think you are right, Herr Swe.

Adam.

Sent from BlueMail http://www.bluemail.me/r?b=13187 On 17 Jul 2018, at 12:25, Markus Klock markus@best-practice.se wrote:

...
The same reason you recommend keeping polling-time as close to 300s as possible. When doing this you can run the poller with much fewer threads needed for polling. Instead of starting every 5min with 128 threads you can do 32 threads every 1min which will be a lot less CPU context switches and much lower IO spikes for the system. The system will be under little load all the time instead of one huge spike and then idle for 150s :) /Markus

2018-07-17 20:33 GMT+02:00 Adam Armstrong adama@memetic.org:

But why though? :D

...
The poller-wrapper's entire purpose is to do what you're doing here :D

adam.

On 2018-07-12 00:36, Markus Klock wrote:

...
Yeah, I usually even out the server load by splitting the number of devices in 5 and start the polling of them 1 minute apart with cron like this: 0-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 0 >> /dev/null 2>&1 1-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 1 >> /dev/null 2>&1 2-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 2 >> /dev/null 2>&1 3-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 3 >> /dev/null 2>&1 4-59/5 * * * * observium /opt/observium/observium-wrapper polling -i 5 -n 4 >> /dev/null 2>&1

This makes it work with a lot fewer threads and the load on disk and database is much lower as only 1/5 of the data needs to be progressed at the same time.

/Markus

2018-07-12 7:09 GMT+02:00 Adam Armstrong adama@memetic.org:

...
You should probably decide threads to even out the load. The goal is to be as close to 300 seconds as you can manage, else you'll get spiky io and myself load.

Adam.

Sent from BlueMail [2]

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

Sam Jones

11 Jul 11 Jul

4:15 p.m.

Markus,

Will you share the Specs and resources of the single Ubuntu server running this site?

*Sam Jones*

Idaho State University

jonesamu@isu.edu

On Wed, Jul 11, 2018 at 7:56 AM, Markus Klock markus@best-practice.se wrote:

...

Hi guys, just wanted to show the new Observium I deployed for a customer this week :)

Everything polled from one single server without problems (pollingtime is about 140s for all devices) Big thanks to Mike who helped me identify and fix a scaling issue when running a huge amount of threads. (if you run an Observium-install that runs poller-wrapper with 100+ threads you should check out r9333)

/Markus

observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

2576

Age (days ago)

2586

Last active (days ago)

List overview

Download

14 comments

6 participants

tags (0)

participants (6)

Adam Armstrong
Balistreri, Thomas C - DOC
Markus Klock
Sam Jones
Tom Laermans
xomka686