We are doing something similar to this, although our goal is more to ensure we have a replica of the data, not to ensure that the service is highly available.
DRBD for block level replication of the RRD’s, and Mysql replication for the database side of things. However, we are not using Pacemaker/Corosync to do the fail over. The pacemaker/corosync configuration is fairly well understood for a Mysql,
DRBD and VIP scenarios.
What you do need to figure out though are the cron jobs for polling. These also need to fail over, you don’t want them running on both machines. There are a number of ways that this can be achieved, e.g sentinel files to disable the polling, or
installing/removing the cron files when the node is made active/inactive.
We are using physical hardware with a lot of cores (48 vCPU) and nvme storage, but we have about 150GB of RRDs which is about 500K RRDs and 1300 devices. So it depends upon the scale of what you are trying to monitor. While we have 96GB in the
server, 50G is used and about 30G is buffer cache. Most of the memory is used for RRDcached. RRDcached is a good thing to even out your I/O but in your HA scenario it is going to cause data loss when you fail over unexpectedly due to the deferred writes to
disk. In a controlled fail over, you need to ensure that RRDcached properly flushes out before the switch over. If you do not use RRDcached, you will need a lot more peak IOPs available to you.
Dear Observium Experts,
Am new to Observium, we have requirement to setup Observium in HA using DRDB, Pacemaker/Heartbeat & Corosync.
#DRBD for block level replication, in our case this will be /opt/observium mount point.
# Pacemaker/Heartbeat & Corosync main used for virtual IP switch over and auto mount of /opt/observium if primary nodes fails.
VM Configurations
OS : Ubuntu 18.04 64bit
vCPU : 6
vRAM : 16Gb
OS Disk : 40Gb
DRBD disk : 350Gb ( for /opt/observium )
Is the HA setup prescribed or using single VM is enough.
Basically we want to avoid single point of failure when we implement observium in production.
Suggestion from experts users are welcome and thanks in advance for your valuable time.
Thanks & Regards
Mohammed Suhail | System Engineer
Tel : (415) 349-2100, Extn- 2139
PGP
Fingerprint : 026B
6CEB CC08 152D 9372 0F51 DDFB 6758 6924 FBCC
_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium