I have a rather old and crufty installation of Observium (now at 5668) with over 70 million syslog, eventlog, and timing records that are older than 7 days.
The first time I ran housekeeping -s to trim syslog messages it took about 30 minutes to return showing 42 million records. Just to be sure I was actually done deleting 42 million records, I re-ran the command. The same 42 million records were found. Meh.
I ran the housekeeping in debug mode and noticed that I didn't have posix support installed. So I installed the php-process package and tried again. Still no records deleted. I tried deleting fewer records by keeping 18 months worth and that didn't help either.
Here's the debug output from attempting to delete records older than 7 days: [root@xxx ~]# php /opt/observium/housekeeping.php -s -d DEBUG!
SQL[SELECT COUNT(*) FROM `syslog` WHERE UNIX_TIMESTAMP(`timestamp`) < 1405322208;] 42123350 syslog entries from before 2014-07-14 15:16:48 will be deleted [y/N]: y
SQL[DELETE FROM `syslog` WHERE UNIX_TIMESTAMP(`timestamp`) < 1405322208;] Syslog housekeeping: deleted entries
Deleting the eventlog records also fails to remove the records: [root@xxxx observium]# php housekeeping.php -e -d DEBUG!
SQL[SELECT COUNT(*) FROM `eventlog` WHERE UNIX_TIMESTAMP(`timestamp`) < 1380059875;] 29289378 eventlog entries from before 2013-09-25 05:57:55 will be deleted [y/N]: y
SQL[DELETE FROM `eventlog` WHERE UNIX_TIMESTAMP(`timestamp`) < 1380059875] Event log housekeeping: deleted entries
(As an aside, I'm not sure, but there may be a semi-colon missing on the delete statement for event log records.)
What I eventually found was that trying to delete this many records exhausts the lock table (?).
ERROR 1206 (HY000): The total number of locks exceeds the lock table size
Issuing the same command from the mysql CLI and adding a 10,000,000 limit allowed the deletion to work correctly
mysql> SELECT COUNT(*) FROM `eventlog` WHERE UNIX_TIMESTAMP(`timestamp`) < 1380059875; +----------+ | COUNT(*) | +----------+ | 28177378 | +----------+ 1 row in set (51.76 sec)
mysql> DELETE FROM `eventlog` WHERE UNIX_TIMESTAMP(`timestamp`) < 1380059875 LIMIT 10000000; Query OK, 10000000 rows affected (3 min 40.32 sec)
mysql> SELECT COUNT(*) FROM `eventlog` WHERE UNIX_TIMESTAMP(`timestamp`) < 1380059875; +----------+ | COUNT(*) | +----------+ | 18177378 | +----------+ 1 row in set (37.01 sec)
Interesting to note that deleting 10 million records at a time with the LIMIT statement doesn't always work. Adjusting the LIMIT up and down eventually got me where I was going.
Now that I have it the system down to a reasonable number of records, a nightly/weekly cron job should do just fine.
Thanks for the effort you put into this Tom. For me, this was like a hardware upgrade.