On 23 Oct 2020, at 6:01 am, Eric W. Bates via observium <observium@observium.org> wrote:

Is it a VM? Clocks on VM can be weird because a loaded hypervisor does not always provide regular time slots. I've seen it cause all sorts of errors in the ntp logs.

In the case of VMWare, they offer a feature "take time from server" (or some such spelling) where the clock is controlled by an installed copy of VMWare Tools on the VM. VMWare tools gets it from the ESXi parent. This actually works quite well and you're recommended to only run ntp on ESXi and not on the VM.

Sadly, I've seen no such option in other hypervisors; but I've only worked with 2 or 3 others.

On 10/22/20 2:47 PM, Adam Armstrong via observium wrote:
The rrd files contain the historical data. If you delete them, the historical data will go away.
This is the only thing that would make the historical data go away, the rrds being removed and then automatically recreated.
As for that log entry, it seems like your system’s clock is going backwards sometimes :
1603020715 when last update time is 1603021928
You were attempting to insert data with the timestamp:
Sunday, 18 October 2020, 11:31:55 AM GMT
But the previous data inserted was at time:
Sunday, 18 October 2020, 11:52:08 AM GMT
There’s some super weird stuff going on with that system.
Adam.
*From:*Gordon Cheng (gocheng) <gocheng@cisco.com>
*Sent:* 22 October 2020 18:33
*To:* Observium <observium@observium.org>; Adam Armstrong <adama@observium.org>
*Subject:* Graphs keep reset randomly and losing previous history
Hi Adam and Observium team:
We recently started having an issue with our Observium (20.9.10749) that the graphs for different devices would restart and all its previous history would be gone:
Graphical user interface Description automatically generated
And we usually see the following ‘messages’ under /var/log around that time:
Oct 18 05:07:13 sjc-observium-1 rrdcached[1495]: queue_thread_main: rrd_update_r (/opt/observium/rrd/atl-wan04/status.rrd) failed with status -1. (/opt/observium/rrd/atl-wan04/status.rrd: illegal attempt to update using time 1603020715 when last update time is 1603021928 (minimum one second step))
Oct 19 05:12:05 sjc-observium-1 rrdcached[14841]: queue_thread_main: rrd_update_r (/opt/observium/rrd/atl-wan04/status.rrd) failed with status -1. (/opt/observium/rrd/atl-wan04/status.rrd: illegal attempt to update using time 1603106221 when last update time is 1603108322 (minimum one second step))
Oct 20 05:32:47 sjc-observium-1 rrdcached[14471]: queue_thread_main: rrd_update_r (/opt/observium/rrd/atl-wan04/status.rrd) failed with status -1. (/opt/observium/rrd/atl-wan04/status.rrd: illegal attempt to update using time 1603194156 when last update time is 1603194737 (minimum one second step))
We have tried restarting the rrdcached process, and deleting the rrdcached files which are not updated for some time (with the command “find * -type f -mtime +5 -delete” under the rrd directory). But they don’t help much.
Do you have any suggestions how we can further investigate and troubleshoot this?
Thanks.
- Gordon
_______________________________________________
observium mailing list
observium@observium.org
https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpostman.memetic.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fobservium&data=04%7C01%7Cebates%40whoi.edu%7C43507d02d8c3415b094a08d876baf5b2%7Cd44c5cc6d18c46cc8abd4fdf5b6e5944%7C0%7C0%7C637389892656200801%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=S4zCi0RantQvRNaEvLpIvcebYh26gsVAaum5uQ9R7Gg%3D&reserved=0

_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium