For some reason, this didn't make it into the mailing list archive, so I'm trying again..  Any help is appreciated!



On Mar 25, 2014, at 8:48 PM, Ricardo M Meleschi wrote:

Hello everyone,

I've identified an issue with storing of values that are grabbed by the unix-agent.  Specifically, a problem with the bind sub-agent.  Apparently, when req-in,QUERY exceeds a value of 2,147,483,648, the value that actually gets stored in the app-bind-122-req-in.rrd is:

# rrdtool info app-bind-122-req-in.rrd | grep query
ds[query].index = 0
ds[query].type = "DERIVE"
ds[query].minimal_heartbeat = 600
ds[query].min = 0.0000000000e+00
ds[query].max = 7.5000000000e+06
>>  ds[query].last_ds = "2147483647"  <<
ds[query].value = 0.0000000000e+00
ds[query].unknown_sec = 0

This is despite the current value actually being:

# ./bind  | grep -i req-in,Q
req-in,QUERY:2368417221

I initially thought there was an issue with a 32 bit vs 64 bit counter in the RRD itself, but it looks like ds[query].max is well above a 32bit variable's max value.  I am at a loss as to where the problem may be and am hoping that someone on the mailing list can point me in the right direction.  I can't believe I'm the first one to run into this issue actually...

Here's what happens to my graph when this issue arises:

<Screenshot 2014-03-25 20.36.20.png>

This happens to other dns graphs as well, but it happens to this one the quickest, since it hits that 'limit' faster than the other counters do.

Additionally, I'm running 0.14.3.5117 if that helps.  If I restart named, the graph begins graphing properly again, but I don't want to consider that a 'fix', especially since I'd be resetting all of my bind servers every week.  Doable, but not ideal.

Thanks for any assistance,
Ricardo