Creating a hyperconverged cluster

The following terms are used in the article:

The article describes the creation of a hyperconverged high availability cluster, where VMmanager cluster nodes are also the Ceph cluster nodes.

This cluster scheme:

reduces the number of required servers;
ensures system fault tolerance.

Implementation requirements

To implement the cluster you will need:

server for the VMmanager platform. Recommended requirements:
- CPU with 4 cores and x86_64 architecture;
- 8 GB RAM;
- SSD drive with at least 300 GB capacity;
- other requirements see in the Server requirements article;
three servers for VMmanager cluster nodes. Recommended requirements:
- CPU with 12 cores, 24 threads and hardware virtualization support for x86_64 architecture;
- 128 GB RAM;
- OS AlmaLinux 9;
- SSD-drive of at least 500 GB for OS operation;
- SSD-drive of at least 1000 GB for OSD Ceph operation;
- 1 Gbit/s network card — network interface for OS and virtual machines connection;
- 10 Gbit/s network card — network interface for Ceph connection;
- other requirements see in the Server requirements for the cluster article.

Operation of the cluster on servers with lower specifications is not guaranteed.

For stable cluster operation its nodes must:

be connected to the network via 10 Gbit/s interfaces;
use the same CPU model;
have the same disk configuration.

Configuration order

The cluster must be configured in the following sequence:

If the VMmanager platform is not installed, install it according to the instructions in the Installation article.
Prepare servers for cluster nodes — install AlmaLinux OS from the Minimal ISO image.
Create a VMmanager cluster according to the instructions from the article Creating a cluster. At the storage configuration stage, select the type — File storage.
Add nodes to the VMmanager cluster according to the instructions in the Managing nodes in the cluster article.
Configure the Ceph cluster following the instructions in the Configuring a Ceph cluster section.
Connect the Ceph network storagr to the VMmanager cluster following the instructions in the Managing cluster storages article. When connected, create a new pool and Ceph user.
Make the Ceph storage the primary storage and disconnect the file storage from the VMmanager cluster following the instructions in the Managing cluster storages article.
Enable high availability in the VMmanager cluster following the instructions in the Configuring high availability article. When enabled:
1. Create another Ceph user. Do not use the user that was created when you connected the storage.
2. Specify an arbitrary directory name.
3. Specify a verification IP address that will be available using the ping utility from all nodes in the cluster.

Configuring a Ceph cluster

As an example, the instructions in this section are for servers with the following settings:

first node — IP address: 192.168.100.1/24, hostname: node1;
second node — IP address: 192.168.100.2/24, hostname: node2;
third node — IP address: 192.168.100.3/24, hostname: node3;
disk path on all servers is /dev/nvme1n1.

When executing the commands, specify real IP addresses and network prefix, hostname of servers and disk path.

Network preparation

Allocate a separate IPv4 network for Ceph operation. To do this, assign an additional static IP address from this network to the 10 Gbit/s network interfaces on all nodes in the cluster by commands:

nmcli con add type ethernet con-name <interface> ipv4.addresses <network/prefix> ipv4.method manual 
nmcli con up <interface>

Пояснения к команде:

<network/prefix> — IP address with a network prefix. For example, 192.168.100.1/24;
<interface> — name of the network interface. For example, enp1s0.

Ceph configuration

For detailed instructions on how to configure Ceph in AlmaLinux OS, please refer to the official Ceph documentation.

Install Podman and LVM2 software on all nodes in the cluster:
```
dnf install podman lvm2 -y
```

On the first cluster node:

Download and install cephadm utility by commands:

CEPH_RELEASE=quincy
curl --silent --remote-name --location https://download.ceph.com/rpm-${CEPH_RELEASE}/el9/noarch/cephadm
chmod +x cephadm
mv cephadm /usr/local/sbin/

Start the Ceph cluster creation:

cephadm bootstrap --mon-ip 192.168.100.1 --skip-monitoring-stack --allow-fqdn-hostname

More about this command in the Ceph documentation.

Copy the Ceph SSH keys to the remaining nodes:

ssh-copy-id -f -i /etc/ceph/ceph.pub 192.168.100.2
ssh-copy-id -f -i /etc/ceph/ceph.pub 192.168.100.3

Open the shell of the cephadm utility:
```
cephadm shell
```

Check the status of the Ceph cluster:

ceph -s

Example of output on successful cluster creation

 cluster:
    id:     46f8a2aa-9514-11f0-96a1-5254006b08ce
    health: HEALTH_WARN
            OSD count 0 < osd_pool_default_size 3

  services:
    mon: 1 daemons, quorum flax-chloromelanite (age 96s)
    mgr: flax-chloromelanite.lzuwda(active, since 72s)
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

Add the remaining nodes to the Ceph cluster by commands:

ceph config set mon public_network 192.168.100.0/24
ceph orch host add node2 192.168.100.2
ceph orch host add node3 192.168.100.3

More about these commands in the Ceph documentation.

Add server disks to the storage by commands:

ceph orch daemon add osd node1:/dev/nvme1n1
ceph orch daemon add osd node2:/dev/nvme1n1
ceph orch daemon add osd node3:/dev/nvme1n1

More about these commands in the Сeph documentation.

Configure the CephFS:
```
ceph fs volume create vm6
```
The command will create a volume named vm6, create the necessary pools and configure the file system. You can specify an arbitrary pool name instead of vm6.

Wait for Ceph to synchronize across all nodes in the cluster. You can monitor the synchronization status using the command:
```
ceph -s
```
If the nodes are successfully synchronized, the output should contain the string:
```
health: HEALTH_OK
```

Replication settings

By default, Ceph creates pools with a parameter replicated size 3. This means that each piece of data (PG) will be stored on at least three disks (OSD). This setting protects data from being lost if one or two disks fail.

You can reduce the number of replicas. This will increase the amount of data that can be placed in the cluster, but will increase the risk of data loss if multiple disks fail at the same time. If you reduce the number of replicas to one, the data will be lost when OSDs fail.

To change the number of replicas, run the command:

ceph osd pool set <имя_пула> size <replicas_number>

Cluster monitoring

Cluster status

To check the cluster status, run the command:

ceph -s

Example output

  cluster:
    id:     46f8a2aa-9514-11f0-96a1-5254006b08ce
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 12h)
    mgr: a(active, since 11h)
    osd: 12 osds: 12 up, 12 in

  data:
    pools:   4 pools, 512 pgs
    objects: 2.45M objects, 45 TiB
    usage:   135 TiB used, 65 TiB avail
    pgs:     512 active+clean  

  io:
    client:   1.2 MiB/s rd, 3.4 MiB/s wr, 125 op/s rd, 450 op/s wr

Comments:

health — cluster health status:
- HEALTH_OK — cluster is in working state, no problems detected;
- HEALTH_WARN — problems in cluster operation, the following is a description of the problem;
- HEALTH_ERR — serious cluster errors have occurred. It is urgent to diagnose the problem;
services — Ceph services. Contains information about how many services should be running and how many are currently running;
data — Ceph data:
- number of pools and PGs;
- status of pools and PGs;
- usage — volume of space occupied. Must be no higher than 90%;
- messages like XXX/YYY objects misplaced — rebalancing is in progress. Wait until the rebalancing is complete;
io — speed of data transfer for clients and recovery mechanisms:
- messages like recovery: XX MiB/s, YY objects/s — rebalancing is in progress. Wait until the rebalancing is complete.

Read more about command ouput in the Ceph documentation.

OSD filling

To see the status of each OSD, run the command:

ceph osd df tree

Example output

ID  CLASS  WEIGHT   REWEIGHT  SIZE   USE    AVAIL   %USE  PGS  STATUS  TYPE NAME  
-1         8.00000         -  80 TiB  36 TiB  44 TiB  45.00   -          root default
-3         2.00000         -  20 TiB  9 TiB   11 TiB  45.00   -              host node-1
0    ssd   1.00000  1.00000  10 TiB  4.6 TiB 5.4 TiB  46.00  89      up      osd.0
2    ssd   1.00000  1.00000  10 TiB  4.4 TiB 5.6 TiB  44.00  86      up      osd.2
-5         2.00000         -  20 TiB  9 TiB   11 TiB  45.00   -              host node-2
5    ssd   1.00000  1.00000  10 TiB  4.6 TiB 5.4 TiB  46.00  90      up      osd.5
-7         2.00000         -  20 TiB  9 TiB   11 TiB  45.00   -              host node-3
9    ssd   1.00000  1.00000  10 TiB  4.7 TiB 5.3 TiB  47.00  90      up      osd.9
11   ssd   1.00000  1.00000  10 TiB  4.3 TiB 5.7 TiB  43.00  85      up      osd.11

When examining the output of the command, pay attention to the %USE column. If the value exceeds 90, the cluster will become inoperable. To recover the cluster, you will need to add disks to the OSD. See the Ceph and RedHat documentation for details.

Metrics monitoring

You can export Ceph metrics in the format of the Node Exporter tool of the Prometheus monitoring system. To do this:

Run the command:
```
ceph mgr module enable prometheus
```
Define the id of the Ceph cluster — a parameter fsid in the file /etc/ceph/ceph.conf:
```
grep fsid /etc/ceph/ceph.conf
```
Example output
```
fsid = d46f1a09-1bc0-423a-9fdb-893b58511cb3
```

Add Ceph cluster nodes to Prometheus as targets. Replace in the configuration:

Value of the replacement parameter with the Ceph cluster id.

IP addresses in the target parameter with the IP addresses of the Ceph cluster nodes.

Example configuration

- job_name: ceph
  relabel_configs:
  - replacement: efdba66f-8760-42dc-a497-47e3378347a7
    source_labels:
    - __address__
    target_label: cluster
  - replacement: ceph_cluster
    source_labels:
    - instance
    target_label: instance
  scheme: http
  static_configs:
  - targets:
    - 192.168.100.1:9283
    - 192.168.100.2:9283
    - 192.168.100.3:9283

The collected metrics can be viewed in the Grafana monitoring system. To do this, import ready-to-use dashboards from the Ceph repository.

Changing the cluster size

Adding nodes

To add a node to the cluster:

Prepare the server as a VMmanager cluster node.
Add the server to the VMmanager cluster.
Add the server to the Ceph cluster.
Example configuration
The instructions are given for a server with the IP address 192.168.100.4, node4 hostname and disk path /dev/nvme1n1. When executing the commands, specify real server parameters.
1. Install Podman and LVM2 software on the server for the new node:
  dnf install podman lvm2 -y
2. On the first cluster node:
  1. Copy the Ceph SSH keys the server for the new node:
    ssh-copy-id -f -i /etc/ceph/ceph.pub 192.168.100.4
  2. Open the shell of the cephadm utility:
    cephadm shell
  3. Add the node to the Ceph cluster:
    ceph orch host add node4 192.168.100.4
  4. Add the server disk to the storage:
    ceph orch daemon add osd node4:/dev/nvme1n1
Wait for the rebalancing to complete.

Deleting nodes

To remove a node from the cluster:

Transfer VMs, their disks, and images from this node to other nodes.
Stop and disable the OSD of this node following the instructions in the Ceph documentation.
Wait for the rebalancing to complete.
If you perform the following step before the rebalancing is complete, data in the cluster may be lost.
Disconnect the node from the Ceph cluster following the instructions in the Ceph documentation.
Remove the node from the VMmanager cluster.

Useful tips