Balancer

Balancer is a service that enables automatic cluster node load balancing by redistributing virtual machines (VMs) among nodes. The VM allocation procedure is performed automatically at an interval specified in the balancer settings.

Restrictions

The balancer function is only available:

in the VMmanager Infrastructure version;
in clusters with KVM virtualization type;
in clusters with Switching and IP-fabric network configuration type.

Logic of operation

Balancer settings are applied at the cluster level. Different balancer settings can be specified for different clusters.

When the balancer is enabled, the interval at which it checks nodes for overload is set. By default, a node is considered overloaded if its CPU or RAM use is above 70%. If necessary, the CPU and RAM use thresholds can be changed via an API request.

By default, the balancer covers all VMs in the cluster. You can select VMs to which the balancer actions should not be applied.

The balancer migrates VMs between nodes using a live migration mechanism. The conditions under which live migration cannot be performed are specified in the article VM migration. In addition, the following restrictions apply:

cannot migrate between clusters;
cannot migrate to nodes in maintenance mode;
the option to reassign network interfaces when they do not match is unavailable.

The operation of the balancer consists of iterations. In each iteration, the balancer:

Requests information from the statistics service on the average CPU and RAM use for the set balance check period. For example, if the administrator has set the check interval to 10 minutes, the balancer will request data for the last 10 minutes.
If the number of nodes in the cluster is large, it is possible that the statistics service will not have time to collect data from all nodes during the balance check period. In this case, the balancer will make decisions based on partial data. The list of nodes whose statistics were not taken into account is recorded in the service log.
Compiles a list of overloaded nodes based on statistics. The higher the average CPU and RAM use on a node, the more overloaded it is considered to be. If there are no overloaded nodes, no VM migration is performed in the current iteration.
Sorts the list in order from the most loaded node to the least loaded node.
Compiles a list of VMs that can be migrated from the first node in the list. Excluded from the candidates for migration are:
- VMs turned off;
- VMs for which balancer actions are not applied;
- VMs to which the ISO image is mounted;
- VMs for which the snapshots were created;
- VMs that were created or moved to the node during the last five iterations.
Sorts the list in order from least loaded VM to most loaded VM. If a VM to be migrated is not found, the search is repeated on the next node in the list. If a suitable VM is not found on any node, the migration will not be performed.
Selects the first VM in the list to migrate.
Determines the node to which the VM can be migrated. The selection of a node takes into account:
- whether it is allowed to create VMs on this node;
- the possibility of live migration of VMs;
- whether the node will become overloaded after the VM migration.
Performs VM migration to the selected node. If no suitable node is found, the next VM in the list is selected for migration. The node selection procedure is repeated for this VM. If no suitable node is found for any VM, the migration will not be performed.
Waits for the VM migration to complete and records its result in the history table.
Schedules the next iteration to run after a set balance check period.

In the current implementation, the balancer can migrate at most one VM per iteration.

Managing balancer

For the balancer to operate correctly, enable CPU and RAM data collection from nodes and VMs in the statistics collection settings. Read more in Managing statistics.

To manage the balancer in a cluster, enter Clusters → select a cluster → click Parameters → Balancer.

Enabling and disabling balancer

To enable the balancer:

Click Enable the balancer.
Specify the Balance check interval in minutes.

A cluster with the balancer enabled is displayed with the icon in the cluster table.

To disable the balancer in the cluster, click Disable the balancer.

To disable the balancer for a specific VM, enter Virtual machines → select VM → click Parameters → Balancer → Apply Balancer to this VM toggle switch. VMs with the balancer disabled are displayed with the icon in the VM table.

Section interface

Editing settings

To edit the CPU and RAM thresholds above which a node is considered overloaded:

Connect to the server with the platform via SSH. For more information about connecting via SSH, see Workstation setup.
If curl and jq utilities are not installed on the server, install them:
```
dnf install curl jq || sudo apt install curl jq
```
Get the authorization token:
```
curl -k -X POST -H "accept: application/json" -H "Content-Type: application/json" 'https://domain.com/auth/v4/public/token' -d '{"email": "admin_email", "password": "admin_pass"}'
```
Comments to the command

domain.com — domain name or IP address of the server with the platform
admin_email — platform administrator's email
admin_pass — platform administrator's password

In response, you will get the message in the form:
Example of response in JSON
```
{
  "confirmed": true,
  "expires_at": null,
  "id": "6",
  "token": "4-e9726dd9-61d9-2940-add3-914851d2cb8a"
}
```
Save the received token value.
Execute the API request:
```
curl -k -H "x-xsrf-token: <token>" -X POST "https://domain.com/vm/v3/cluster/<cluster_id>" -d '{"balancer_config": {"high_threshold_cpu": <cpu_value>, "high_threshold_mem": <ram_value>}}'
```
Comments to the command

<token> — authorization token
domain.com — domain name or IP address of the server with the platform
<cluster_id> — cluster id. To get the cluster id, open the Clusters section in the platform interface. The value is displayed in the id column
<cpu_value> — CPU use threshold, %
<ram_value> — RAM use threshold, %
Check that the settings are applied:
```
curl -k -H "x-xsrf-token: <token>" -X GET "https://domain.com/vm/v3/cluster/<cluster_id>" | jq '.balancer_config'
```
Comments to the command

<token> — authorization token
domain.com — domain name or IP address of the server with the platform
<cluster_id> — cluster id. To get the cluster id, open the Clusters section in the platform interface. The value is displayed in the id column
Responses example
```
{
  "high_threshold_cpu": 90,
  "high_threshold_mem": 90,
  "period_minutes": 60
}
```

Monitoring

The Monitoring tab displays CPU and RAM use scales on the cluster nodes and a balance indicator. The closer the indicator is to the center of the scale, the better balanced the cluster is by CPU or RAM use.

If the indicator is:

in the left part of the scale — the cluster is rather underloaded on this resource;
in the right part of the scale — the cluster is rather overloaded on this resource.

The amount of indicator deviation from the center is affected by the number of overloaded nodes in the cluster. If the load of each node in the cluster is within the consumption threshold, the amount of deviation will be minimized.

Example of cluster overload

Viewing history

The History tab displays the balancer's actions to migrate VMs between cluster nodes.

Log files

The balancer actions are performed by the balancer service in the balancer container. Service log file — /var/log/balancer.log.

Logs of related services can be useful for identifying problems with the balancer:

Service	Location of logs	Purpose
vm_reader	/var/log/vm_reader.log file in the vm_box container*	obtaining configuration data on clusters, nodes and VMs
statistic	statistic container	getting statistical data
notifier	notifier container	receiving notifications of cluster configuration updates sending notifications to other services
carbonapi	carbonapi container	intermediate service for obtaining node and VM statistics from the clickhouse service
clickhouse	clickhouse container	storage of collected statistics

* There can be more than one vm_reader.log file. In this case the directory will contain vm_reader.log.N files, e.g. vm_reader.log.1, vm_reader.log.2, etc.

Useful tips