I'm not quite sure, I find RHEL to be unusable voodoo. :)

There's definitely an rrdcached process running, you should probably kill it and use the service scripts to restart it (and hope it actually restarts!)

adam.

On 2018-11-01 04:08:59, Gordon Cheng (gocheng) via observium <observium@observium.org> wrote:

Thanks Adam.

 

It's version 1.6.0-0.3:

 

sjc-observium-1:/root# yum install rrdtool

<...snipped...>

Package rrdtool-1.6.0-0.3.opennms.el7.centos.x86_64 already installed and latest version

Nothing to do

sjc-observium-1:/root#

 

I thought "service rrdcached stop" would stop the process, but looks like it's NOT the case?

 

sjc-observium-1:/var/run/rrdcached# cat /var/run/rrdcached/rrdcached.pid

2453

sjc-observium-1:/var/run/rrdcached# ps aux | grep rrdcache

apache    2453  0.1  0.1 574224 73488 ?        Ssl  16:34   0:20 /usr/sbin/rrdcached -w 1800 -z 1800 -f 3600 -s apache -l unix:/var/run/rrdcached/rrdcached.sock -j /var/tmp -m 664 -F -b /opt/observium/rrd -B -p /var/run/rrdcached/rrdcached.pid -l /var/run/rrdcached/rrdcached.sock

root     14550  0.0  0.0 112652   956 pts/0    S+   20:55   0:00 grep --color=auto rrdcache

root     19867  0.0  0.0 119184  1932 pts/1    S+   16:51   0:00 man rrdcached

sjc-observium-1:/var/run/rrdcached# service rrdcached stop

Stopping rrdcached (via systemctl):                        [  OK  ]

sjc-observium-1:/var/run/rrdcached# ps aux | grep rrdcache

apache    2453  0.1  0.1 574224 73488 ?        Ssl  16:34   0:20 /usr/sbin/rrdcached -w 1800 -z 1800 -f 3600 -s apache -l unix:/var/run/rrdcached/rrdcached.sock -j /var/tmp -m 664 -F -b /opt/observium/rrd -B -p /var/run/rrdcached/rrdcached.pid -l /var/run/rrdcached/rrdcached.sock

root     14790  0.0  0.0 112652   960 pts/0    S+   20:55   0:00 grep --color=auto rrdcache

root     19867  0.0  0.0 119184  1932 pts/1    S+   16:51   0:00 man rrdcached

sjc-observium-1:/var/run/rrdcached# service rrdcached start

Starting rrdcached (via systemctl):  Job for rrdcached.service failed because a timeout was exceeded. See "systemctl status rrdcached.service" and "journalctl -xe" for details.

                                                           [FAILED]    

sjc-observium-1:/var/run/rrdcached#

 

Does it mean we already have a copy of rrdcached running properly and so another service start would fail?

 

And we can just un-comment the rrdcached entry in config.php, and rrdcached should then start working?

 

Thanks.

 

- Gordon

 

 

From: observium <observium-bounces@observium.org> on behalf of Adam Armstrong via observium <observium@observium.org>
Reply-To: Observium <observium@observium.org>
Date: Wednesday, October 31, 2018 at 5:41 PM
To: David Rossi via observium <observium@observium.org>
Cc: Adam Armstrong <adama@memetic.org>
Subject: Re: [Observium] 'rrdcached' Config Issue

 

Hi,

 

I'm not sure how rrdcached could possible have a different timestamp than the rest of the system. oO

 

I wonder if this is because you started rrdcached during a poll or something. Or perhaps have 2 copies of rrdcached running.

 

I just installed a clean install via the install script and enabled rrdcached and it worked ok on Ubuntu 18.04.

 

/etc/defaults/rrdcached : 

 

####

# /etc/default file for RRD cache daemon

 

# Full path to daemon

DAEMON=/usr/bin/rrdcached

 

# Optional override flush interval, in seconds.

WRITE_TIMEOUT=300

 

# Optional override maximum write delay, in seconds.

WRITE_JITTER=0

 

# Optional override number of write_threads

WRITE_THREADS=4

 

# Where database files are placed.  If left unset, the default /tmp will

# be used.  NB: The daemon will reject a directory that has symlinks as

# components.  NB: You may want to have -B in BASE_OPTS.

BASE_PATH=/opt/observium/rrd/

 

# Where journal files are placed.  If left unset, journaling will

# be disabled.

JOURNAL_PATH=/var/lib/rrdcached/journal/

 

# FHS standard placement for process ID file.

PIDFILE=/var/run/rrdcached.pid

 

# FHS standard placement for local control socket.

SOCKFILE=/var/run/rrdcached.sock

 

# Optional override group that should own/access the local control

# socket

SOCKGROUP=www-data

 

# Optional override access mode of local control socket.

SOCKMODE=0660

 

# Optional unprivileged group to run under when daemon.  If unset

# retains invocation group privileges.

DAEMON_GROUP=observium

 

# Optional unprivileged user to run under when daemon.  If unset

# retains invocation user privileges.

DAEMON_USER=observium

 

# Network socket address requests.  Use in conjunction with SOCKFILE to

# also listen on INET domain sockets.  The option is a lower-case ell

# ASCII 108 = 0x6c, and should be repeated for each address.  The

# parameter is an optional IP address, followed by an optional port with

# a colon separating it from the address.  The empty string is

# interpreted as "open sockets on the default port on all available

# interfaces", but generally does not pass through init script functions

# so use -L with no parameters for that configuration.

#NETWORK_OPTIONS="-L"

 

# Any other options not specifically supported by the script (-P, -f,

# -F, -B).

BASE_OPTIONS="-B"

####

 

This is the command as it's running : 

 

/usr/bin/rrdcached -B -w 300 -z 0 -t 4 -b /opt/observium/rrd/ -j /var/lib/rrdcached/journal/ -G observium -U observium -p /var/run/rrdcached.pid -s www-data -m 0660 -l unix:/var/run/rrdcached.sock

 

What version of rrdtool is this?

 

adam.

On 2018-11-01 00:02:04, Gordon Cheng (gocheng) via observium <observium@observium.org> wrote:

Hi all,

 

In an attempt to improve the performance (previous email thread with subject Performance Issue - High CPU with 'mysqld'?), I'm trying to use 'rrdcached' on a new setup (version 18.9.9428 (9th September 2018) with the following configuration:

 

===

 

sjc-observium-1:/etc/init.d# cat rrdcached

<...snipped...>

# Path to the apachectl script, server binary, and short-form for messages.

rrdcached=/usr/sbin/rrdcached

prog=rrdcached

pidfile=/var/run/rrdcached/rrdcached.pid

sockfile=/var/run/rrdcached/rrdcached.sock

lockfile=/var/lock/subsys/rrdcached

RETVAL=0

<...snipped...>

 

-

 

sjc-observium-1:/etc/sysconfig# cat rrdcached

<...snipped...>

RRDCACHED_USER="apache"

OPTIONS="-w 1800 -z 1800 -f 3600 -s apache -l unix:/var/run/rrdcached/rrdcached.sock -j /var/tmp -m 664 -F -b /opt/observium/rrd -B"

SOCKPERMS=0660

<...snipped...>

 

-

 

sjc-observium-1:/usr/lib/tmpfiles.d# cat rrdcached.conf

d       /run/rrdcached          -       apache  apache

sjc-observium-1:/usr/lib/tmpfiles.d#

 

-

 

sjc-observium-1:/opt/observium# grep rrd config.php

//rrdcached

#$config['rrdcached']    = "unix:/var/run/rrdcached/rrdcached.sock"; <<<====== will un-comment once rrdcached is configured and running

sjc-observium-1:/opt/observium#

 

-

 

sjc-observium-1:/root# service rrdcached stop

Stopping rrdcached (via systemctl):                        [  OK  ]

sjc-observium-1:/root# service rrdcached start

Starting rrdcached (via systemctl):

 

Job for rrdcached.service failed because a timeout was exceeded. See "systemctl status rrdcached.service" and "journalctl -xe" for details.

                                                           [FAILED]

sjc-observium-1:/root#

 

-

 

sjc-observium-1:/opt/observium# systemctl status rrdcached.service -l

● rrdcached.service - SYSV: rrdcached is a daemon that receives updates to existing rrd files, accumulates them, and writes updates to file

   Loaded: loaded (/etc/rc.d/init.d/rrdcached; bad; vendor preset: disabled)

   Active: failed (Result: timeout) since Wed 2018-10-31 16:39:48 PDT; 11min ago

     Docs: man:systemd-sysv-generator(8)

  Process: 2449 ExecStart=/etc/rc.d/init.d/rrdcached start (code=exited, status=0/SUCCESS)

   CGroup: /system.slice/rrdcached.service

           └─2453 /usr/sbin/rrdcached -w 1800 -z 1800 -f 3600 -s apache -l unix:/var/run/rrdcached/rrdcached.sock -j /var/tmp -m 664 -F -b /opt/observium/rrd -B -p /var/run/rrdcached/rrdcached.pid -l /var/run/rrdcached/rrdcached.sock

 

Oct 31 16:34:51 sjc-observium-1 rrdcached[2453]: queue_thread_main: rrd_update_r (/opt/observium/rrd/aan-oob01/perf-pollermodule-fdb-table.rrd) failed with status -1. (/opt/observium/rrd/aan-oob01/perf-pollermodule-fdb-table.rrd: illegal attempt to update using time 1541024231 when last update time is 1541028534 (minimum one second step))

Oct 31 16:34:51 sjc-observium-1 rrdcached[2453]: queue_thread_main: rrd_update_r (/opt/observium/rrd/aan-oob01/perf-pollermodule-entity-physical.rrd) failed with status -1. (/opt/observium/rrd/aan-oob01/perf-pollermodule-entity-physical.rrd: illegal attempt to update using time 1541024224 when last update time is 1541028525 (minimum one second step))

Oct 31 16:34:51 sjc-observium-1 rrdcached[2453]: queue_thread_main: rrd_update_r (/opt/observium/rrd/aan-oob01/perf-pollermodule-hr-mib.rrd) failed with status -1. (/opt/observium/rrd/aan-oob01/perf-pollermodule-hr-mib.rrd: illegal attempt to update using time 1541024165 when last update time is 1541028466 (minimum one second step))

Oct 31 16:34:51 sjc-observium-1 rrdcached[2453]: queue_thread_main: rrd_update_r (/opt/observium/rrd/aan-oob01/perf-pollermodule-ipSystemStats.rrd) failed with status -1. (/opt/observium/rrd/aan-oob01/perf-pollermodule-ipSystemStats.rrd: illegal attempt to update using time 1541024166 when last update time is 1541028467 (minimum one second step))

Oct 31 16:34:51 sjc-observium-1 rrdcached[2453]: queue_thread_main: rrd_update_r (/opt/observium/rrd/aan-oob01/perf-pollermodule-loadbalancer.rrd) failed with status -1. (/opt/observium/rrd/aan-oob01/perf-pollermodule-loadbalancer.rrd: illegal attempt to update using time 1541024223 when last update time is 1541028524 (minimum one second step))

Oct 31 16:34:51 sjc-observium-1 rrdcached[2453]: queue_thread_main: rrd_update_r (/opt/observium/rrd/aan-oob01/perf-pollermodule-mempools.rrd) failed with status -1. (/opt/observium/rrd/aan-oob01/perf-pollermodule-mempools.rrd: illegal attempt to update using time 1541024147 when last update time is 1541028719 (minimum one second step))

Oct 31 16:39:48 sjc-observium-1 systemd[1]: rrdcached.service start operation timed out. Terminating.

Oct 31 16:39:48 sjc-observium-1 systemd[1]: Failed to start SYSV: rrdcached is a daemon that receives updates to existing rrd files, accumulates them, and writes updates to file.

Oct 31 16:39:48 sjc-observium-1 systemd[1]: Unit rrdcached.service entered failed state.

Oct 31 16:39:48 sjc-observium-1 systemd[1]: rrdcached.service failed.

sjc-observium-1:/opt/observium#

 

===

 

Any suggestions or docs/links showing how it can be resolved?

 

Thanks.

 

- Gordon