I have a rather old and crufty installation of Observium (now at 5668) with over 70 million syslog, eventlog, and timing records that are older than 7 days. 

 

The first time I ran housekeeping –s to trim syslog messages it took about 30 minutes to return showing 42 million records.  Just to be sure I was actually done deleting 42 million records, I re-ran the command.  The same 42 million records were found.  Meh.

 

I ran the housekeeping in debug mode and noticed that I didn’t have posix support installed.  So I installed the php-process package and tried again.  Still no records deleted.  I tried deleting fewer records by keeping 18 months worth and that didn’t help either.

 

Here’s the debug output from attempting to delete records older than 7 days:

[root@xxx ~]# php /opt/observium/housekeeping.php -s -d

DEBUG!

 

SQL[SELECT COUNT(*) FROM `syslog` WHERE UNIX_TIMESTAMP(`timestamp`) < 1405322208;]

42123350 syslog entries from before 2014-07-14 15:16:48 will be deleted [y/N]: y

 

SQL[DELETE FROM `syslog` WHERE UNIX_TIMESTAMP(`timestamp`) < 1405322208;]

Syslog housekeeping: deleted  entries

 

Deleting the eventlog records also fails to remove the records:

[root@xxxx observium]# php housekeeping.php -e -d

DEBUG!

 

SQL[SELECT COUNT(*) FROM `eventlog` WHERE UNIX_TIMESTAMP(`timestamp`) < 1380059875;]

29289378 eventlog entries from before 2013-09-25 05:57:55 will be deleted [y/N]: y

 

SQL[DELETE FROM `eventlog` WHERE UNIX_TIMESTAMP(`timestamp`) < 1380059875]

Event log housekeeping: deleted  entries

 

(As an aside, I’m not sure, but there may be a semi-colon missing on the delete statement for event log records.)

 

What I eventually found was that trying to delete this many records exhausts the lock table (?).

 

ERROR 1206 (HY000): The total number of locks exceeds the lock table size

 

Issuing the same command from the mysql CLI and adding a 10,000,000 limit allowed the deletion to work correctly

 

mysql> SELECT COUNT(*) FROM `eventlog` WHERE UNIX_TIMESTAMP(`timestamp`) < 1380059875;

+----------+

| COUNT(*) |

+----------+

| 28177378 |

+----------+

1 row in set (51.76 sec)

 

mysql> DELETE FROM `eventlog` WHERE UNIX_TIMESTAMP(`timestamp`) < 1380059875 LIMIT 10000000;

Query OK, 10000000 rows affected (3 min 40.32 sec)

 

mysql> SELECT COUNT(*) FROM `eventlog` WHERE UNIX_TIMESTAMP(`timestamp`) < 1380059875;

+----------+

| COUNT(*) |

+----------+

| 18177378 |

+----------+

1 row in set (37.01 sec)

 

Interesting to note that deleting 10 million records at a time with the LIMIT statement doesn’t always work.  Adjusting the LIMIT up and down eventually got me where I was going.

 

Now that I have it the system down to a reasonable number of records, a nightly/weekly cron job should do just fine.

 

Thanks for the effort you put into this Tom.  For me, this was like a hardware upgrade.