speed up the poller with poller-wrapper.py
Hi all,
Are you not sleeping well because there are gaps in your graphs? Do you feel like your observium poller isn't putting enough effort into its sole task of polling every device and completing that task in less than 5 minutes?
Fear not, a dirty script called "poller-wrapper.py" is here!
I wrote a small wrapper that has a slightly more modern approach to getting things done. An issue with the multiple poller instances approach[1] is that when a single instance has to poll a few slow devices, that single instance cannot finish its work in time, while the instances that already finished their portion of the work will not 'help' the instance with the slow devices.
This wrapper script will create a Queue with all the work that has to be done and launch a number of threads which will take work from the Queue until the queue is empty. This distributes the workload over the instances move evenly, thus improving performance and thus making it more likely to finish all the polling work within 5 minutes.
You can easily test what number of threads works well with your environment by running the script a few times manually, here I run the script with 32 threads:
root@observium:/opt/observium# python poller-wrapper.py 32 worker Thread-23 finished device 75 in 9 seconds worker Thread-25 finished device 84 in 2 seconds worker Thread-21 finished device 81 in 5 seconds <SNIP> worker Thread-23 finished device 98 in 81 seconds worker Thread-3 finished device 94 in 116 seconds worker Thread-32 finished device 93 in 126 seconds poller-wrapper polled 130 devices in 179 seconds with 32 workers root@observium:/opt/observium#
Source: -------
URL: http://noc.as5580.net/~job/observium-poller-wrapper.py.txt
Installation: -------------
cd /opt/observium wget http://noc.as5580.net/~job/observium-poller-wrapper.py.txt -O poller-wrapper.py
Now open poller-wrapper.py with a text editor to change the database username & password.
open /etc/cron.d/observium and replace this line: */5 * * * * root /opt/observium/poller.php -h all >> /dev/null 2>&1 with something like this: */5 * * * * root python /opt/observium/poller-wrapper.py 16 >> /dev/null 2>&1
Kind regards,
Job Snijders Atrato IP Networks
[1] http://www.observium.org/wiki/Performance_tuning#Multiple_poller_instances
If the code is simple enough, I might add this to the respository.
dank je wel, kaaskop!
adam.
On 04/11/2012 08:40, Job Snijders wrote:
Hi all,
Are you not sleeping well because there are gaps in your graphs? Do you feel like your observium poller isn't putting enough effort into its sole task of polling every device and completing that task in less than 5 minutes?
Fear not, a dirty script called "poller-wrapper.py" is here!
I wrote a small wrapper that has a slightly more modern approach to getting things done. An issue with the multiple poller instances approach[1] is that when a single instance has to poll a few slow devices, that single instance cannot finish its work in time, while the instances that already finished their portion of the work will not 'help' the instance with the slow devices.
This wrapper script will create a Queue with all the work that has to be done and launch a number of threads which will take work from the Queue until the queue is empty. This distributes the workload over the instances move evenly, thus improving performance and thus making it more likely to finish all the polling work within 5 minutes.
You can easily test what number of threads works well with your environment by running the script a few times manually, here I run the script with 32 threads:
root@observium:/opt/observium# python poller-wrapper.py 32 worker Thread-23 finished device 75 in 9 seconds worker Thread-25 finished device 84 in 2 seconds worker Thread-21 finished device 81 in 5 seconds <SNIP> worker Thread-23 finished device 98 in 81 seconds worker Thread-3 finished device 94 in 116 seconds worker Thread-32 finished device 93 in 126 seconds poller-wrapper polled 130 devices in 179 seconds with 32 workers root@observium:/opt/observium#
Source:
URL: http://noc.as5580.net/~job/observium-poller-wrapper.py.txt
Installation:
cd /opt/observium wget http://noc.as5580.net/~job/observium-poller-wrapper.py.txt -O poller-wrapper.py
Now open poller-wrapper.py with a text editor to change the database username & password.
open /etc/cron.d/observium and replace this line: */5 * * * * root /opt/observium/poller.php -h all >> /dev/null 2>&1 with something like this: */5 * * * * root python /opt/observium/poller-wrapper.py 16 >> /dev/null 2>&1
Kind regards,
Job Snijders Atrato IP Networks
[1] http://www.observium.org/wiki/Performance_tuning#Multiple_poller_instances _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Hi Adam,
On Nov 5, 2012, at 1:05 AM, Adam Armstrong adama@memetic.org wrote:
If the code is simple enough, I might add this to the respository.
I found some bugs (thanks falz @ #observium :-) and fixed them. I've uploaded the new version to
http://noc.as5580.net/~job/observium-poller-wrapper.py.txt
The script should now work on the majority of python version and both Linux and FreeBSD. Adam, you should test the script yourself! :-)
As single file distributions suck I'll be maintaining a github repo here: https://github.com/Atrato/observium-poller-wrapper
dank je wel, kaaskop!
Geen probleem, ouwe! Laat me weten wat je denkt.
Kind regards,
Job
2012/11/5 Job Snijders job@instituut.net
Hi Adam,
On Nov 5, 2012, at 1:05 AM, Adam Armstrong adama@memetic.org wrote:
If the code is simple enough, I might add this to the respository.
I found some bugs (thanks falz @ #observium :-) and fixed them. I've uploaded the new version to
http://noc.as5580.net/~job/observium-poller-wrapper.py.txt
The script should now work on the majority of python version and both Linux and FreeBSD. Adam, you should test the script yourself! :-)
As single file distributions suck I'll be maintaining a github repo here: https://github.com/Atrato/observium-poller-wrapper
dank je wel, kaaskop!
Geen probleem, ouwe! Laat me weten wat je denkt.
Kind regards,
Job _______________________________________________ observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Hi!, I would like to report that it works on Solaris 11 too, aside from a deprecation warning. The polling is much smoother, the pending delay to narrow must be from the hardware being monitored...
Warning:
-- /usr/lib/python2.6/vendor-packages/MySQLdb/__init__.py:34: DeprecationWarning: the sets module is deprecated from sets import ImmutableSet --
CPU usage dropped after switching to the wrapper:
--- 11:06:40 %usr %sys %wio %idle 13:05:00 43 57 0 0 13:06:00 45 55 0 1 13:07:00 44 56 0 0 13:08:00 44 56 0 0 13:09:00 43 57 0 0 13:10:01 43 57 0 0 13:11:01 42 58 0 0 13:12:00 44 56 0 0 13:13:00 45 55 0 0 13:14:00 44 56 0 0 13:15:08 43 57 0 0 13:16:00 43 57 0 0 13:17:00 44 56 0 0 13:18:00 43 57 0 0 13:19:01 43 57 0 0 13:20:03 42 58 0 0 13:21:01 44 56 0 0 13:22:00 44 56 0 0 13:23:00 43 57 0 0 13:24:00 42 58 0 0 14:03:00 46 45 0 9 <-- last execution without the poller 14:04:00 17 3 0 80 14:05:00 31 6 0 63 14:06:00 48 12 0 40 14:07:00 33 8 0 59 14:08:00 29 7 0 64 14:09:01 24 7 0 69 14:10:00 34 9 0 58 14:11:00 36 10 0 54 14:12:00 48 10 0 42 14:13:00 56 11 0 32 14:14:00 62 12 0 27 14:15:00 60 11 0 29 14:16:00 58 11 0 30 14:17:01 60 13 0 28 14:18:00 50 10 0 40 14:19:00 23 2 0 75 14:20:00 20 2 0 78 14:21:00 19 2 0 79 14:22:00 42 10 0 48 14:23:00 39 9 0 52 14:24:00 65 13 0 22 14:25:00 66 13 0 21 14:26:00 65 12 0 23 14:27:00 61 11 0 28 14:28:01 54 11 0 36 14:29:00 54 10 0 36 14:30:00 58 11 0 31 14:31:00 54 10 0 35 14:32:00 55 11 0 34 14:33:00 21 3 0 77 14:34:00 21 3 0 77 14:35:00 19 4 0 77 14:36:00 23 4 0 73 14:37:00 26 2 0 73 14:38:00 25 1 0 74 14:39:00 20 1 0 79 14:40:00 41 8 0 51 14:41:00 51 10 0 39 14:42:00 64 12 0 23 14:43:00 70 12 0 18 14:44:00 61 12 0 27 14:45:00 61 12 0 27 14:46:00 58 11 0 31 14:47:00 55 11 0 34 14:48:00 54 11 0 35 14:49:00 54 11 0 35 14:50:00 45 9 0 46 14:51:00 22 2 0 76 14:52:00 20 1 0 78 ---
Increasing from 32 threads to 54 I got 3 more devices polled :P
--- INFO: poller-wrapper polled 696 devices in 806 seconds with 32 workers WARNING: the process took more than 5 minutes to finish, you need faster hardware or more threads INFO: in sequential style polling the elapsed time would have been: 25251 seconds WARNING: device 25 is taking too long: 501 seconds WARNING: device 427 is taking too long: 437 seconds WARNING: device 509 is taking too long: 501 seconds WARNING: device 605 is taking too long: 437 seconds WARNING: device 624 is taking too long: 390 seconds WARNING: device 625 is taking too long: 302 seconds WARNING: device 653 is taking too long: 405 seconds WARNING: device 697 is taking too long: 393 seconds ERROR: Some devices are taking more than 300 seconds, the script cannot recommend you what to do.
real 13m27,87s user 43m1,94s sys 13m46,06s ---
--- INFO: poller-wrapper polled 697 devices in 650 seconds with 48 workers WARNING: the process took more than 5 minutes to finish, you need faster hardware or more threads INFO: in sequential style polling the elapsed time would have been: 30667 seconds WARNING: device 509 is taking too long: 500 seconds WARNING: device 605 is taking too long: 319 seconds WARNING: device 625 is taking too long: 303 seconds WARNING: device 697 is taking too long: 320 seconds ERROR: Some devices are taking more than 300 seconds, the script cannot recommend you what to do.
real 10m50,97s user 44m23,66s sys 13m31,76s ---
--- INFO: poller-wrapper polled 699 devices in 658 seconds with 54 workers WARNING: the process took more than 5 minutes to finish, you need faster hardware or more threads INFO: in sequential style polling the elapsed time would have been: 33466 seconds WARNING: device 29 is taking too long: 359 seconds WARNING: device 349 is taking too long: 359 seconds WARNING: device 350 is taking too long: 359 seconds WARNING: device 369 is taking too long: 459 seconds WARNING: device 375 is taking too long: 343 seconds WARNING: device 456 is taking too long: 343 seconds WARNING: device 544 is taking too long: 359 seconds WARNING: device 605 is taking too long: 443 seconds WARNING: device 623 is taking too long: 359 seconds WARNING: device 624 is taking too long: 359 seconds WARNING: device 625 is taking too long: 459 seconds WARNING: device 646 is taking too long: 359 seconds WARNING: device 652 is taking too long: 306 seconds WARNING: device 658 is taking too long: 359 seconds ERROR: Some devices are taking more than 300 seconds, the script cannot recommend you what to do.
real 10m58,41s user 44m20,20s sys 13m23,49s ---
Thanks a lot for the code!
Regards,
participants (3)
-
Adam Armstrong
-
Ciro Iriarte
-
Job Snijders