Hi,
Having a good time today…! Got another oddity since the upgrade of the Observium server itself to 16.04 unfortunately :(
In summary, all servers running BIND (they are still on 14.04) have now got zero ‘cache content’ stats since the Observium server was upgraded on Tuesday.
However, all the other BIND stats are working OK:
[cid:image001.jpg@01D1D944.38CA7190]
Looking at the debug output, it is getting the data fine, and it’s writing out the values to RRDCacheD OK.
I tried disabling RRDCacheD – it started working straight away. So, it looks like RRDCacheD (version 1.5.5 under 16.04) is unhappy with writing the ‘Cache’ elements content output for BIND.
If I had to guess, I’d say it’s either:
1) something to do with the quantity of DS elements within that RRD (all 42 of them)
2) something relating to the use of an “!” in the name of the elements (that’s a proper random guess)
The write attempt is:
mupdate /opt/observium/rrd/x.y.net/app-bind-23-cache-default.rrd N:6:3:::392500:799:13060:27296:825744:29:17815:555:67263:160:8::9660:2238:49:382:23:14:::::::259181:22:2::45094::26347::::::6715:::::::
My old RRDCacheD config was:
DISABLE=0 OPTS="-w 1800 -z 1800 -f 3600 -s www-data -l unix:/var/run/rrdcached.sock -j /var/lib/rrdcached/journal/ -F -b /opt/observium/rrd -B" MAXWAIT=30 ENABLE_COREFILES=0
The new config has different parameters, so I’ve setup as follows:
DAEMON=/usr/bin/rrdcached WRITE_TIMEOUT=1800 WRITE_JITTER=1800 BASE_PATH=/opt/observium/rrd/ JOURNAL_PATH=/var/lib/rrdcached/journal/ PIDFILE=/var/run/rrdcached.pid SOCKFILE=/var/run/rrdcached.sock SOCKGROUP=www-data BASE_OPTIONS="-F -B"
Does anyone know of anything which could cause this? Or a magic value in RRDCacheD like “maximum number of DS elements”…?
Cheers,
Robert Williams Custodian Data Centre Email: Robert@CustodianDC.com http://www.CustodianDC.com
Hi,
Just noticed this stayed broken, I had forgotten about it!
So, done some further digging and found a related error message revealed by journalctl:
Aug 30 16:29:22 OBSERVIUS rrdcached[3392]: queue_thread_main: rrd_update_r (/opt/observium/rrd/xxx.yyy.zzz/app-bind-31-cache-default.rrd) failed with status -1. (/opt/observium/rrd/xxx.yyy.zzz/app-bind-31-cache-default.rrd) failed with status -1. (/opt/observium/rrd/xxx.yyy.zzz/app-bind-31-cache-default.rrd: Function update_pdp_prep, case DST_GAUGE - Cannot convert '' to float)
It appears that there is something in the parsing of the data to the rrd file which IS acceptable when you do it with RRDTOOL direclty, but the same thing is NOT acceptable when you do it via RRDCACHED. Looks like a blank value that can’t be written to a float. Maybe it wants a ‘0’ not a null or something?
Is it feasible that this can be resolved with a tweak on the Observium side? There is little I can do to modify the way RRDCACHED decides if it is valid or not :(
Would be nice to have all those stats back again - Cheers!
Robert Williams Custodian Data Centre Email: Robert@CustodianDC.com http://www.CustodianDC.com From: observium [mailto:observium-bounces@observium.org] On Behalf Of Robert Williams Sent: 08 July 2016 18:12 To: Observium Network Observation System (observium@observium.org) observium@observium.org Subject: [Observium] bind_cache failing to write via RRDCached
In summary, all servers running BIND (they are still on 14.04) have now got zero ‘cache content’ stats since the Observium server was upgraded on Tuesday.
However, all the other BIND stats are working OK:
[cid:image001.jpg@01D20367.732D7740]
Looking at the debug output, it is getting the data fine, and it’s writing out the values to RRDCacheD OK.
I tried disabling RRDCacheD – it started working straight away. So, it looks like RRDCacheD (version 1.5.5 under 16.04) is unhappy with writing the ‘Cache’ elements content output for BIND.
The write attempt is:
mupdate /opt/observium/rrd/x.y.net/app-bind-23-cache-default.rrd N:6:3:::392500:799:13060:27296:825744:29:17815:555:67263:160:8::9660:2238:49:382:23:14:::::::259181:22:2::45094::26347::::::6715:::::::
My old RRDCacheD config was:
DISABLE=0 OPTS="-w 1800 -z 1800 -f 3600 -s www-data -l unix:/var/run/rrdcached.sock -j /var/lib/rrdcached/journal/ -F -b /opt/observium/rrd -B" MAXWAIT=30 ENABLE_COREFILES=0
The new config has different parameters, so I’ve setup as follows:
DAEMON=/usr/bin/rrdcached WRITE_TIMEOUT=1800 WRITE_JITTER=1800 BASE_PATH=/opt/observium/rrd/ JOURNAL_PATH=/var/lib/rrdcached/journal/ PIDFILE=/var/run/rrdcached.pid SOCKFILE=/var/run/rrdcached.sock SOCKGROUP=www-data BASE_OPTIONS="-F -B"
Robert,
Can you send a full debug log?
Thanks, Tom
On 08/31/2016 10:09 AM, Robert Williams wrote:
Hi,
Just noticed this stayed broken, I had forgotten about it!
So, done some further digging and found a related error message revealed by *journalctl*:
Aug 30 16:29:22 OBSERVIUS rrdcached[3392]: queue_thread_main: rrd_update_r (/opt/observium/rrd/xxx.yyy.zzz/app-bind-31-cache-default.rrd) failed with status -1. (/opt/observium/rrd/xxx.yyy.zzz/app-bind-31-cache-default.rrd) failed with status -1. (/opt/observium/rrd/xxx.yyy.zzz/app-bind-31-cache-default.rrd: Function update_pdp_prep, *case DST_GAUGE - Cannot convert '' to float*)
It appears that there is something in the parsing of the data to the rrd file which IS acceptable when you do it with RRDTOOL direclty, but the same thing is NOT acceptable when you do it via RRDCACHED. Looks like a blank value that can’t be written to a float. Maybe it wants a ‘0’ not a null or something?
Is it feasible that this can be resolved with a tweak on the Observium side? There is little I can do to modify the way RRDCACHED decides if it is valid or not :(
Would be nice to have all those stats back again - Cheers!
Robert Williams Custodian Data Centre Email: Robert@CustodianDC.com http://www.CustodianDC.com
*From:*observium [mailto:observium-bounces@observium.org] *On Behalf Of *Robert Williams *Sent:* 08 July 2016 18:12 *To:* Observium Network Observation System (observium@observium.org) observium@observium.org *Subject:* [Observium] bind_cache failing to write via RRDCached
In summary, all servers running BIND (they are still on 14.04) have now got zero ‘cache content’ stats since the Observium server was upgraded on Tuesday.
However, all the other BIND stats are working OK:
Looking at the debug output, it is getting the data fine, and it’s writing out the values to RRDCacheD OK.
I tried disabling RRDCacheD – it started working straight away. So, it looks like RRDCacheD (version 1.5.5 under 16.04) is unhappy with writing the ‘Cache’ elements content output for BIND.
The write attempt is:
mupdate /opt/observium/rrd/x.y.net/app-bind-23-cache-default.rrd N:6:3:::392500:799:13060:27296:825744:29:17815:555:67263:160:8::9660:2238:49:382:23:14:::::::259181:22:2::45094::26347::::::6715:::::::
My old RRDCacheD config was:
DISABLE=0
OPTS="-w 1800 -z 1800 -f 3600 -s www-data -l unix:/var/run/rrdcached.sock -j /var/lib/rrdcached/journal/ -F -b /opt/observium/rrd -B"
MAXWAIT=30
ENABLE_COREFILES=0
The new config has different parameters, so I’ve setup as follows:
DAEMON=/usr/bin/rrdcached
WRITE_TIMEOUT=1800
WRITE_JITTER=1800
BASE_PATH=/opt/observium/rrd/
JOURNAL_PATH=/var/lib/rrdcached/journal/
PIDFILE=/var/run/rrdcached.pid
SOCKFILE=/var/run/rrdcached.sock
SOCKGROUP=www-data
BASE_OPTIONS="-F -B"
observium mailing list observium@observium.org http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
Hi Tom,
Sure thing, here you go! Let me know if any additional info is useful.
Cheers,
Robert Williams Custodian Data Centre Email: Robert@CustodianDC.com http://www.CustodianDC.com From: observium [mailto:observium-bounces@observium.org] On Behalf Of Tom Laermans Sent: 31 August 2016 10:26 To: Observium Network Observation System observium@observium.org Subject: Re: [Observium] bind_cache failing to write via RRDCached
Robert,
Can you send a full debug log?
Thanks, Tom On 08/31/2016 10:09 AM, Robert Williams wrote: Hi,
Just noticed this stayed broken, I had forgotten about it!
So, done some further digging and found a related error message revealed by journalctl:
Aug 30 16:29:22 OBSERVIUS rrdcached[3392]: queue_thread_main: rrd_update_r (/opt/observium/rrd/xxx.yyy.zzz/app-bind-31-cache-default.rrd) failed with status -1. (/opt/observium/rrd/xxx.yyy.zzz/app-bind-31-cache-default.rrd) failed with status -1. (/opt/observium/rrd/xxx.yyy.zzz/app-bind-31-cache-default.rrd: Function update_pdp_prep, case DST_GAUGE - Cannot convert '' to float)
It appears that there is something in the parsing of the data to the rrd file which IS acceptable when you do it with RRDTOOL direclty, but the same thing is NOT acceptable when you do it via RRDCACHED. Looks like a blank value that can’t be written to a float. Maybe it wants a ‘0’ not a null or something?
Is it feasible that this can be resolved with a tweak on the Observium side? There is little I can do to modify the way RRDCACHED decides if it is valid or not :(
Would be nice to have all those stats back again - Cheers!
Robert Williams Custodian Data Centre Email: Robert@CustodianDC.commailto:Robert@CustodianDC.com http://www.CustodianDC.com From: observium [mailto:observium-bounces@observium.org] On Behalf Of Robert Williams Sent: 08 July 2016 18:12 To: Observium Network Observation System (observium@observium.orgmailto:observium@observium.org) observium@observium.orgmailto:observium@observium.org Subject: [Observium] bind_cache failing to write via RRDCached
In summary, all servers running BIND (they are still on 14.04) have now got zero ‘cache content’ stats since the Observium server was upgraded on Tuesday.
However, all the other BIND stats are working OK:
[cid:image001.jpg@01D2039A.A5AC38E0]
Looking at the debug output, it is getting the data fine, and it’s writing out the values to RRDCacheD OK.
I tried disabling RRDCacheD – it started working straight away. So, it looks like RRDCacheD (version 1.5.5 under 16.04) is unhappy with writing the ‘Cache’ elements content output for BIND.
The write attempt is:
mupdate /opt/observium/rrd/x.y.net/app-bind-23-cache-default.rrd N:6:3:::392500:799:13060:27296:825744:29:17815:555:67263:160:8::9660:2238:49:382:23:14:::::::259181:22:2::45094::26347::::::6715:::::::
My old RRDCacheD config was:
DISABLE=0 OPTS="-w 1800 -z 1800 -f 3600 -s www-data -l unix:/var/run/rrdcached.sock -j /var/lib/rrdcached/journal/ -F -b /opt/observium/rrd -B" MAXWAIT=30 ENABLE_COREFILES=0
The new config has different parameters, so I’ve setup as follows:
DAEMON=/usr/bin/rrdcached WRITE_TIMEOUT=1800 WRITE_JITTER=1800 BASE_PATH=/opt/observium/rrd/ JOURNAL_PATH=/var/lib/rrdcached/journal/ PIDFILE=/var/run/rrdcached.pid SOCKFILE=/var/run/rrdcached.sock SOCKGROUP=www-data BASE_OPTIONS="-F -B"
_______________________________________________
observium mailing list
observium@observium.orgmailto:observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium
participants (2)
-
Robert Williams
-
Tom Laermans