Ahh, that makes sense. I thought it was really odd that it wasn’t able to ping a local RRD database or MySQL instance.

As I’ve done more, it seems to be isolated to being able to ping APC NMS2 cards, since all of our other hosts (and older UPSes with NMS1 cards) work just fine.

I’ve set the following in config.php, but it doesn’t seem like it takes, since it’s a generous retries, and when I brute-force it, after the 2nd or 3rd run of poller.php on a host, it takes.

// PING Settings - Retries/Timeouts
$config['ping']['retries'] = 6;    // How many times to retry ping
$config['ping']['timeout'] = 1500;  // Timeout in milliseconds

This is what I’m seeing in the ping log. I have no experience with fping, so I can’t tell if the ping settings are taking effect or not.

2014-05-28 07:10:34 | PING ERROR: itups04-01.net.internal (1) | FPING OUT: 10.1.4.4 : xmt/rcv/%loss = 1/0/100%
MTR OUT: HOST: observium                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.1.101.1                 0.0%     5    5.5   4.3   0.3   7.8   3.4
  2.|-- 10.1.4.4                   0.0%     5    1.3   1.1   1.0   1.3   0.1

2014-05-28 07:11:05 | PING ERROR: itups-mpoe-01.net.internal (1) | FPING OUT: 10.1.22.97 : xmt/rcv/%loss = 1/0/100%
MTR OUT: HOST: observium                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.1.101.1                 0.0%     5    8.2   5.7   0.3  10.5   5.0
  2.|-- 10.1.22.97                 0.0%     5    1.1   1.3   1.1   1.9   0.4

2014-05-28 07:15:08 | PING ERROR: itups03-01.net.internal (1) | FPING OUT: 10.1.3.4 : xmt/rcv/%loss = 1/0/100%
MTR OUT: HOST: observium                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.1.101.1                 0.0%     5    0.3   1.4   0.2   3.6   1.6
  2.|-- 10.1.3.4                   0.0%     5   67.4  14.5   1.0  67.4  29.6

2014-05-28 07:15:16 | PING ERROR: itups03-02.net.internal (1) | FPING OUT: 10.1.13.6 : xmt/rcv/%loss = 1/0/100%
MTR OUT: HOST: observium                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.1.101.1                 0.0%     5    8.2   1.8   0.2   8.2   3.5
  2.|-- 10.1.13.6                  0.0%     5    0.9  25.7   0.9  62.6  33.7

2014-05-28 07:15:24 | PING ERROR: itups04-02.net.internal (1) | FPING OUT: 10.1.14.4 : xmt/rcv/%loss = 1/0/100%
2014-05-28 07:15:27 | PING ERROR: itups-mpoe-02.net.internal (1) | FPING OUT: 10.1.22.98 : xmt/rcv/%loss = 1/0/100%
MTR OUT: HOST: observium                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.1.101.1                 0.0%     5    0.2   2.2   0.2   9.3   4.0
  2.|-- 10.1.14.4                  0.0%     5    1.1   1.1   1.0   1.3   0.1

MTR OUT: HOST: observium                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.1.101.1                 0.0%     5    2.6   2.3   0.2   4.8   2.1
  2.|-- 10.1.22.98                 0.0%     5    1.2   1.2   1.0   1.3   0.1

2014-05-28 07:15:34 | PING ERROR: itups04-01.net.internal (1) | FPING OUT: 10.1.4.4 : xmt/rcv/%loss = 1/0/100%
MTR OUT: HOST: observium                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.1.101.1                 0.0%     5    0.2   1.6   0.2   6.9   3.0
  2.|-- 10.1.4.4                   0.0%     5    1.1  29.1   1.1  71.9  37.5

2014-05-28 07:16:07 | PING ERROR: itups-mpoe-01.net.internal (1) | FPING OUT: 10.1.22.97 : xmt/rcv/%loss = 1/0/100%
MTR OUT: HOST: observium                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 10.1.101.1                 0.0%     5    3.9   2.2   0.3   6.3   2.7
  2.|-- 10.1.22.97                 0.0%     5    1.1   1.2   1.1   1.3   0.1

Is there anything else I’m missing, or additional debugging output or logging 

So far as I can tell, something changed with the .13 to .14 observium update with pinging (Most of this troubleshooting came from the earlier thread from March 27th from Mark Nellmeann) (http://comments.gmane.org/gmane.network.observium.general/2064). I just don’t know enough about the back-end systems and I don’t want to risk loosing our historical data.

Thanks for the help, and for this excellent system!

Andrew Davis
IT Systems
J. David Gladstone Institutes

(415) 734-2549
andrew.davis@gladstone.ucsf.edu

On May 27, 2014, at 11:41 PM, Tom Laermans <tom.laermans@powersource.cx> wrote:

It's actually just "Unpingable."

The other output is debug output: RRD[blah] and SQL[blah].

Either way, your host is unpingable.

Tom

On 28/05/2014 05:38, Andrew Davis wrote:
All,

We’ve been running Observium on Ubnutu 12.04 for a little over a year now, and after applying the latest community update, we seem to have devices periodically reporting as down.

The error I’m getting is bouncing between UnpingableMySQL and UnpingableRRD.

With debug enabled, this is an example of what we’re seeing:

/poller.php -h itups03-02.net.internal
Observium v0.14.4.5229
Poller

Starting polling run:


SQL[SELECT `device_id` FROM `devices` WHERE `disabled` = 0 AND `hostname` LIKE 'itups03-02.net.internal' ORDER BY `device_id` ASC]

SQL[SELECT * FROM `devices` WHERE `device_id` = '40']

SQL[SELECT * FROM devices_attribs WHERE `device_id` = '40']
itups03-02.net.internal 40 apc
UnpingableRRD[cmd[update /opt/observium/rrd/itups03-02.net.internal/status.rrd N:0]
stdout[OK u:0.00 s:0.00 r:0.05]
stderr[]]
RRD[cmd[update /opt/observium/rrd/itups03-02.net.internal/ping.rrd N:U]
stdout[OK u:0.00 s:0.00 r:0.06]
stderr[]]
RRD[cmd[update /opt/observium/rrd/itups03-02.net.internal/ping_snmp.rrd N:U]
stdout[OK u:0.00 s:0.00 r:0.06]
stderr[]]

SQL[INSERT INTO `perf_times` (`type`,`doing`,`start`,`duration`,`devices`)  VALUES ('poll','itups03-02.net.internal','1401246542.664','0.066','1')]
./poller.php itups03-02.net.internal May 27, 2014, 20:09 - 1 devices polled in 0.066 secs
MySQL: Cell[0/0s] Row[1/0s] Rows[1/0s] Column[0/0s] Update[0/0s] Insert[1/0s] Delete[0/0s]


Any advice on where we can look?  Seems like the only way to get them back up is to keep running the poller for the particular host until the poller runs.

Thanks!

Andrew Davis
IT Systems
J. David Gladstone Institutes



_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium

_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium