Hi,
When debugging why my memcached graphs weren't working I discovered the following problem:
SQL[SELECT * FROM `applications` WHERE `device_id` = '41' AND `app_type` = 'memcached'] Including: applications/memcached.inc.phpmemcachedmemcached(127.0.0.1:11211) SQL[SELECT app_id FROM `applications` WHERE `device_id` = '41' AND `app_instance` = '127.0.0.1:11211'] RRD[cmd[update /opt/observium/rrd/v25.invalpool.nl/app-memcached-127.0.0.1_11211.rrd N:64:2.0.16-stable:0:0:5:1045:6202:8:0:0:14045:129043:6202:0:0:0:0:7184655:13093279] stdout[ERROR: /opt/observium/rrd/v25.invalpool.nl/app-memcached-127.0.0.1_11211.rrd: conversion of '2.0.16-stable' to float not complete: tail '.16-stable'] stderr[]]
Obviously the data in the rrdtool update command were wrong. A version number shouldn't be in there... So I looked at what the memcached agent script was returning and how that was parsed by the poller memcached module. It turns out that there is a difference. The agent script returns a list of values in this order:
accepting_conns, auth_cmds, auth_errors, bytes, bytes_read, bytes_written, cas_badval, cas_hits, cas_misses, cmd_flush, cmd_get, cmd_set, cmd_touch, conn_yields, connection_structures, curr_connections, curr_items, decr_hits, decr_misses, delete_hits, delete_misses, evicted_unfetched, evictions, expired_unfetched, get_hits, get_misses, hash_bytes, hash_is_expanding, hash_power_level, incr_hits, incr_misses, libevent, limit_maxbytes, listen_disabled_num, pid, pointer_size, reclaimed, reserved_fds, rusage_system, rusage_user, threads, time, total_connections, total_items, touch_hits, touch_misses, uptime, version
The poller module parses them in this order:
accepting_conns, auth_cmds, auth_errors, bytes, bytes_read, bytes_written, cas_badval, cas_hits, cas_misses, cmd_flush, cmd_get, cmd_set, conn_yields, connection_structures, curr_connections, curr_items, decr_hits, decr_misses, delete_hits, delete_misses, evictions, get_hits, get_misses, incr_hits, incr_misses, limit_maxbytes, listen_disabled_num, pid, pointer_size, rusage_system, rusage_user, threads, time, total_connections, total_items, uptime, version
As you can see this breaks at cmd_touch: the agent sends a variable the poller doesn't expect, and as the data is sent value-only it has no way to determine that the imput is in different places than it expects...
What I suggest to do is the following: - update the agent so that key:value is sent instead of just value - update the poller module so that it can parse key:value input - make the poller module also accept the current value-only input as a fall-back
If we agree that this is the way forward I'll send in a patch.
Cheers, Sander
participants (1)
-
Sander Steffann