![](https://secure.gravatar.com/avatar/687506d9a8149d33005d47b2c8ec86b5.jpg?s=120&d=mm&r=g)
Hi Adam,
2013/2/13 Adam Armstrong adama@memetic.org
ZFS is quite possibly the least optimal filesystem for running RRDs from. You want a simple filesystem without much overhead on operations, like EXT with journalling and atime turned off.
noatime is the first item on my ToDo. Was expecting to leave it working for a while before touching anything (just to have a baseline)
Also, it's possible that RRD is poorly optimised on solaris and uses slow system calls?
Hmm, how to tell?
You say 41 devices in an hour, but your paste suggests 468 devices in 73 minutes.
Well, that line didn't get to the mail. I was just taking one poller process into account. I thought that an execution time of about an hour is not nice while you intent to collect every 5 minutes... :)
What is your I/O subsystem? disk? ssd? how many?
Right now the BL460c machine is connected with 2 x 4GB interfaces to our SAN. The LUN provided is presented through a Hitachi USP-VM (hitting it's cache) but using a AMS2500 disk group (about 45 disks if I'm not wrong, shared with other workloads). Also I have available 2 SSD local disks that I'm considering using as ZIL and/or L2ARC (if/when they're needed)
A spindled disk will not behave well with more than 4-8 concurrent processes, an SSD might behave better.
You should also look at the new poller wrapper script, which might help you squeeze a bit more out of it, once you've sized it correctly
I'll search the archives about that...
One must ask why, if you have a dedicated machine, are you not just running it on Ubuntu, like we recommend?
Hmm, well, I'm really a happy SLES user, and as long as I can run an app on it, I'll choose it every time :D. The migration to Solaris was based on the posibility to use ZFS caching to SSD disks on the same Zpool, also, I was looking forward to try its new release.. I would just like to have references from other large installations, maybe I'm hitting a PHP limitation, maybe I need more parallel processes, maybe there's a freaking DNS problem, not sure right now, but I'm sure I shoudn't need more hardware :D
adam.
Regards, CI.-
On Wed, 13 Feb 2013 18:37:21 -0300, Ciro Iriarte cyruspy@gmail.com wrote:
Hi!, I've been running Observium on SLES11 for almost a year. This was
on a
8 vCPU + 2GB RAM VMware guest with a 30GB RRD footprint.
After adding about 320 new hosts (all of them, network gear) the RRD database went to 130GB and the little machine got peaks of 3k IOPS. The
I/O
wait was rather hi and the machine would suddently stop responding (Hypervisor stats showed 0% CPU usage).
So, I've migrated the installation to a physical machine running Solaris
11
(I know, it's not supported) and without any tunning (mysql/zfs are candidates) I'm seeing polling of 41 devices in about an hour. Is this usual/expected?
--
./poller.php 0/24 February 13, 2013, 18:27 - 27 devices polled in 4048. secs ./poller.php 7/24 February 13, 2013, 18:27 - 27 devices polled in 4075. secs ./poller.php 0/24 February 13, 2013, 18:28 - 27 devices polled in 3803. secs ./poller.php 22/24 February 13, 2013, 18:28 - 27 devices polled in 4404. secs ./poller.php 3/24 February 13, 2013, 18:28 - 28 devices polled in 4417. secs --
IOPS climbed to 5k, comments?
Regards,
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium