Dear Milton,

Appreciate your quick response and thank you for your detailed email , you have listed out the valuable points, will consider your inputs while planning the implementation.

@Other Experts, Please input if you have any points or suggestions.

Thanks & Regards

Mohammed Suhail | System Engineer

PGP Fingerprint : 026B 6CEB CC08 152D 9372 0F51 DDFB 6758 6924 FBCC

cid:image016.png@01D321AA.4A00A6F0



On Tue, Nov 5, 2019 at 3:35 AM Milton Ngan <milton@valvesoftware.com> wrote:
We are doing something similar to this, although our goal is more to ensure we have a replica of the data, not to ensure that the service is highly available.

DRBD for block level replication of the RRD’s, and Mysql replication for the database side of things. However, we are not using Pacemaker/Corosync to do the fail over. The pacemaker/corosync configuration is fairly well understood for a Mysql, DRBD and VIP scenarios. 

What you do need to figure out though are the cron jobs for polling. These also need to fail over, you don’t want them running on both machines. There are a number of ways that this can be achieved, e.g sentinel files to disable the polling, or installing/removing the cron files when the node is made active/inactive. 

We are using physical hardware with a lot of cores (48 vCPU) and nvme storage, but we have about 150GB of RRDs which is about 500K RRDs and 1300 devices. So it depends upon the scale of what you are trying to monitor. While we have 96GB in the server, 50G is used and about 30G is buffer cache. Most of the memory is used for RRDcached. RRDcached is a good thing to even out your I/O but in your HA scenario it is going to cause data loss when you fail over unexpectedly due to the deferred writes to disk. In a controlled fail over, you need to ensure that RRDcached properly flushes out before the switch over. If you do not use RRDcached, you will need a lot more peak IOPs available to you. 



On Nov 4, 2019, at 1:35 PM, Mohammed Kokani via observium <observium@observium.org> wrote:

  Dear Observium Experts,

Am new to Observium, we have requirement to setup Observium in HA using DRDB, Pacemaker/Heartbeat & Corosync.

#DRBD for block level replication, in our case this will be /opt/observium mount point.
# Pacemaker/Heartbeat & Corosync main used for virtual IP switch over and auto mount of /opt/observium if primary nodes fails.

VM Configurations
OS : Ubuntu 18.04 64bit
vCPU : 6
vRAM : 16Gb
OS Disk : 40Gb
DRBD disk : 350Gb ( for /opt/observium )

Is the HA setup prescribed or using single VM is enough.

Basically we want to avoid single point of failure when we implement observium in production.

Suggestion from experts users are welcome and thanks in advance for your valuable time. 


Thanks & Regards

Mohammed Suhail | System Engineer

Tel : (415) 349-2100, Extn- 2139

PGP Fingerprint : 026B 6CEB CC08 152D 9372 0F51 DDFB 6758 6924 FBCC

cid:image016.png@01D321AA.4A00A6F0

_______________________________________________
observium mailing list
observium@observium.org
http://postman.memetic.org/cgi-bin/mailman/listinfo/observium