The article describes the creation of a hyperconverged high availability cluster, where VMmanager cluster nodes are also the Ceph cluster nodes.
This cluster scheme:
- reduces the number of required servers;
- ensures system fault tolerance.
Implementation requirements
To implement the cluster you will need:
- server for the VMmanager platform. Recommended requirements:
- CPU with 4 cores and x86_64 architecture;
- 8 GB RAM;
- SSD drive with at least 300 GB capacity;
- other requirements see in the Server requirements article;
- three servers for VMmanager cluster nodes. Recommended requirements:
- CPU with 12 cores, 24 threads and hardware virtualization support for x86_64 architecture;
- 128 GB RAM;
- OS AlmaLinux 9;
- SSD-drive of at least 500 GB for OS operation;
- SSD-drive of at least 1000 GB for OSD Ceph operation;
- 1 Gbit/s network card — network interface for OS and virtual machines connection;
- 10 Gbit/s network card — network interface for Ceph connection;
- other requirements see in the Server requirements for the cluster article.
Operation of the cluster on servers with lower specifications is not guaranteed.
For stable cluster operation its nodes must:
- be connected to the network via 10 Gbit/s interfaces;
- use the same CPU model;
- have the same disk configuration.
Configuration order
The cluster must be configured in the following sequence:
- If the VMmanager platform is not installed, install it according to the instructions in the Installation article.
- Prepare servers for cluster nodes — install AlmaLinux OS from the Minimal ISO image.
- Create a VMmanager cluster according to the instructions from the article Creating a cluster. At the storage configuration stage, select the type — File storage.
- Add nodes to the VMmanager cluster according to the instructions in the Managing nodes in the cluster article.
- Configure the Ceph cluster following the instructions in the Configuring a Ceph cluster section.
- Connect the Ceph network storagr to the VMmanager cluster following the instructions in the Managing cluster storages article. When connected, create a new pool and Ceph user.
- Make the Ceph storage the primary storage and disconnect the file storage from the VMmanager cluster following the instructions in the Managing cluster storages article.
- Enable high availability in the VMmanager cluster following the instructions in the Configuring high availability article. When enabled:
- Create another Ceph user. Do not use the user that was created when you connected the storage.
- Specify an arbitrary directory name.
- Specify a verification IP address that will be available using the ping utility from all nodes in the cluster.
Configuring a Ceph cluster
As an example, the instructions in this section are for servers with the following settings:
- first node — IP address: 192.168.100.1/24, hostname: node1;
- second node — IP address: 192.168.100.2/24, hostname: node2;
- third node — IP address: 192.168.100.3/24, hostname: node3;
- disk path on all servers is /dev/nvme1n1.
When executing the commands, specify real IP addresses and network prefix, hostname of servers and disk path.
Network preparation
Allocate a separate IPv4 network for Ceph operation. To do this, assign an additional static IP address from this network to the 10 Gbit/s network interfaces on all nodes in the cluster by commands:
nmcli con add type ethernet con-name <interface> ipv4.addresses <network/prefix> ipv4.method manual
nmcli con up <interface>
Пояснения к команде:
- <network/prefix> — IP address with a network prefix. For example,
192.168.100.1/24
; - <interface> — name of the network interface. For example, enp1s0.
Ceph configuration
- Install Podman and LVM2 software on all nodes in the cluster:
dnf install podman lvm2 -y
- On the first cluster node:
- Download and install cephadm utility by commands:
CEPH_RELEASE=quincy curl --silent --remote-name --location https://download.ceph.com/rpm-${CEPH_RELEASE}/el9/noarch/cephadm chmod +x cephadm mv cephadm /usr/local/sbin/
- Start the Ceph cluster creation: More about this command in the Ceph documentation.
cephadm bootstrap --mon-ip 192.168.100.1 --skip-monitoring-stack --allow-fqdn-hostname
- Copy the Ceph SSH keys to the remaining nodes:
ssh-copy-id -f -i /etc/ceph/ceph.pub 192.168.100.2 ssh-copy-id -f -i /etc/ceph/ceph.pub 192.168.100.3
- Open the shell of the cephadm utility:
cephadm shell
- Check the status of the Ceph cluster:
ceph -s
Example of output on successful cluster creationcluster: id: 46f8a2aa-9514-11f0-96a1-5254006b08ce health: HEALTH_WARN OSD count 0 < osd_pool_default_size 3 services: mon: 1 daemons, quorum flax-chloromelanite (age 96s) mgr: flax-chloromelanite.lzuwda(active, since 72s) osd: 0 osds: 0 up, 0 in data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs:
- Add the remaining nodes to the Ceph cluster by commands: More about these commands in the Ceph documentation.
ceph config set mon public_network 192.168.100.0/24 ceph orch host add node2 192.168.100.2 ceph orch host add node3 192.168.100.3
- Add server disks to the storage by commands: More about these commands in the Сeph documentation.
ceph orch daemon add osd node1:/dev/nvme1n1 ceph orch daemon add osd node2:/dev/nvme1n1 ceph orch daemon add osd node3:/dev/nvme1n1
- Configure the CephFS: The command will create a volume named vm6, create the necessary pools and configure the file system. You can specify an arbitrary pool name instead of vm6.
ceph fs volume create vm6
- Download and install cephadm utility by commands:
- Wait for Ceph to synchronize across all nodes in the cluster. You can monitor the synchronization status using the command: If the nodes are successfully synchronized, the output should contain the string:
ceph -s
health: HEALTH_OK
Replication settings
By default, Ceph creates pools with a parameter replicated size 3
. This means that each piece of data (PG) will be stored on at least three disks (OSD). This setting protects data from being lost if one or two disks fail.
You can reduce the number of replicas. This will increase the amount of data that can be placed in the cluster, but will increase the risk of data loss if multiple disks fail at the same time. If you reduce the number of replicas to one, the data will be lost when OSDs fail.
To change the number of replicas, run the command:
ceph osd pool set <имя_пула> size <replicas_number>
Cluster monitoring
Cluster status
To check the cluster status, run the command:
ceph -s
cluster:
id: 46f8a2aa-9514-11f0-96a1-5254006b08ce
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 12h)
mgr: a(active, since 11h)
osd: 12 osds: 12 up, 12 in
data:
pools: 4 pools, 512 pgs
objects: 2.45M objects, 45 TiB
usage: 135 TiB used, 65 TiB avail
pgs: 512 active+clean
io:
client: 1.2 MiB/s rd, 3.4 MiB/s wr, 125 op/s rd, 450 op/s wr
Comments:
health
— cluster health status:- HEALTH_OK — cluster is in working state, no problems detected;
- HEALTH_WARN — problems in cluster operation, the following is a description of the problem;
- HEALTH_ERR — serious cluster errors have occurred. It is urgent to diagnose the problem;
services
— Ceph services. Contains information about how many services should be running and how many are currently running;data
— Ceph data:- number of pools and PGs;
- status of pools and PGs;
- usage — volume of space occupied. Must be no higher than 90%;
- messages like
XXX/YYY objects misplaced
— rebalancing is in progress. Wait until the rebalancing is complete;
io
— speed of data transfer for clients and recovery mechanisms:- messages like recovery:
XX MiB/s, YY objects/s
— rebalancing is in progress. Wait until the rebalancing is complete.
- messages like recovery:
Read more about command ouput in the Ceph documentation.
OSD filling
To see the status of each OSD, run the command:
ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE PGS STATUS TYPE NAME
-1 8.00000 - 80 TiB 36 TiB 44 TiB 45.00 - root default
-3 2.00000 - 20 TiB 9 TiB 11 TiB 45.00 - host node-1
0 ssd 1.00000 1.00000 10 TiB 4.6 TiB 5.4 TiB 46.00 89 up osd.0
2 ssd 1.00000 1.00000 10 TiB 4.4 TiB 5.6 TiB 44.00 86 up osd.2
-5 2.00000 - 20 TiB 9 TiB 11 TiB 45.00 - host node-2
5 ssd 1.00000 1.00000 10 TiB 4.6 TiB 5.4 TiB 46.00 90 up osd.5
-7 2.00000 - 20 TiB 9 TiB 11 TiB 45.00 - host node-3
9 ssd 1.00000 1.00000 10 TiB 4.7 TiB 5.3 TiB 47.00 90 up osd.9
11 ssd 1.00000 1.00000 10 TiB 4.3 TiB 5.7 TiB 43.00 85 up osd.11
When examining the output of the command, pay attention to the %USE column. If the value exceeds 90, the cluster will become inoperable. To recover the cluster, you will need to add disks to the OSD. See the Ceph and RedHat documentation for details.
Metrics monitoring
You can export Ceph metrics in the format of the Node Exporter tool of the Prometheus monitoring system. To do this:
- Run the command:
ceph mgr module enable prometheus
- Define the id of the Ceph cluster — a parameter fsid in the file /etc/ceph/ceph.conf:
grep fsid /etc/ceph/ceph.conf
Example outputfsid = d46f1a09-1bc0-423a-9fdb-893b58511cb3
- Add Ceph cluster nodes to Prometheus as targets. Replace in the configuration:
- Value of the replacement parameter with the Ceph cluster id.
- IP addresses in the target parameter with the IP addresses of the Ceph cluster nodes. Example configuration
- job_name: ceph relabel_configs: - replacement: efdba66f-8760-42dc-a497-47e3378347a7 source_labels: - __address__ target_label: cluster - replacement: ceph_cluster source_labels: - instance target_label: instance scheme: http static_configs: - targets: - 192.168.100.1:9283 - 192.168.100.2:9283 - 192.168.100.3:9283
The collected metrics can be viewed in the Grafana monitoring system. To do this, import ready-to-use dashboards from the Ceph repository.
Changing the cluster size
Adding nodes
To add a node to the cluster:
- Prepare the server as a VMmanager cluster node.
- Add the server to the VMmanager cluster.
- Add the server to the Ceph cluster. Example configuration
- Wait for the rebalancing to complete.
Deleting nodes
To remove a node from the cluster:
- Transfer VMs, their disks, and images from this node to other nodes.
- Stop and disable the OSD of this node following the instructions in the Ceph documentation.
- Wait for the rebalancing to complete. If you perform the following step before the rebalancing is complete, data in the cluster may be lost.
- Disconnect the node from the Ceph cluster following the instructions in the Ceph documentation.
- Remove the node from the VMmanager cluster.