Platform not working after upgrading to version 2023.09.1
Description
When upgrading to version 2023.09.1 with the command
vm update
or installing platform version 2023.09.1, docker-related errors may occur. For example, an error when recreating containers:
ERROR: for vm_box Renaming a container with the same name as its current name
ERROR: for vm_box Renaming a container with the same name as its current name
Encountered errors while bringing up the project.
fail
Characteristics of the problem:
- the platform interface is partially or completely inaccessible;
- container operations are partially or not performed;
-
the platform installation ends with an error:
Configuring Docker ... done Checking docker version ... done Pulling images ... fail exit status 1
Analyzing the install.log installation log reveals problems with docker or a missing compose module.
Example of the docker error2023/08/16 13:10:24 Running command 'systemctl restart docker' Job for docker.service failed because the control process exited with error code.
Example of the compose errorTraceback (most recent call last): File "/usr/local/bin/docker-compose", line 5, in <module> from compose.cli.main import main ModuleNotFoundError: No module named 'compose'
The above problem is related to docker and is solved by installing docker-compose version v2.
Solution
To solve the problem:
- Connect to the server with the platform via SSH.
-
Download and install docker-compose with the command
curl -SL https://github.com/docker/compose/releases/download/v2.20.3/docker-compose-linux-x86_64 -o /usr/local/bin/docker-compose
Failed to install the gomon service on a new node
Description
The node connection is not working correctly. After connecting a new node, the following incorrect behavior is possible:
- false error messages. For example, Error #5344. Insufficient RAM on the node while the RAM index meets the requirements;
- libvirt and qemu version information is not updated;
- failed to create VMs on this node;
- VM migration to this node is unavailable;
- other problems on the new node.
The reason for this is that the gomon service failed to install when the new node was connected. Gomon is a service that runs on the node and is responsible for statistics and monitoring.
To check the status of the gomon service, run the command on the node:
systemctl status gomon
If the service is not installed, the output is as follows:
Unit gomon.service could not be found.
Solution
To solve the problem:
- Connect to the server with the platform via SSH.
-
Restart the nodewatch container:
docker stop nodewatch
docker start nodewatch
After restarting, wait approximately 10 minutes. During this time, the gomon service will install on the node.
- Connect to the node server via SSH.
-
Check the status of the gomon service:
systemctl status gomon
-
Restart the gomon service:
systemctl restart gomon
If you check the node without the recommended wait of 10 minutes after restarting nodewatch, the results may not be displayed correctly.
The new node will then be connected without errors.
Error #5349 No connection to node
Description
This error usually occurs when there is no connection to a node. But in version 2023.07.1, the false error message #5349 appears when a user performs any action on a VM that is not connected to a node. For example, false error message #5349 may appear when changing the configuration or starting the VM.
Solution
To resolve the error:
- Connect to the server with the platform via SSH.
-
Restart the monitor service with the command:
docker exec -it vm_box supervisorctl restart monitor