This article contains commands to identify causes of incorrect operation of the platform, as well as commands to restart containers and some services to restore their work.
General diagnostics
The section contains a list of commands that you can run as a first diagnostic step. These commands will eliminate basic problems and reduce troubleshooting time.
Operating system (OS) version
If the master or server OS for the node is not supported by the platform, the installation or connection will end with an error. To determine the OS version, run the command:
cat /etc/*release
See VMmanager documentation for a list of supported operating systems:
- for the master server - in the Server requirements article;
- for the cluster node - in the Server requirements for the cluster article.
Server date and time
During periodic synchronization with the license server, the date and time are checked. If an incorrect date or time is set on the server with the platform, the platform will be blocked or its operation will be incorrect. To determine the current date and time on the server, run the command:
date -R
Disk space and RAM usage
For the correct operation of the platform, free disk space and RAM must meet the requirements specified in the Server requirements article in the VMmanager documentation. In addition, if there is not enough free space or RAM, virtual machines and backups will not be created. To check the amount of used disk space and file system data, run the command:
df -hT
To check the information about RAM, run the command:
free -h
Inodes
Inodes is the structure in which file metadata is stored. The platform will not work correctly if the server is out of inodes, even if there is free disk space. Typical behavior when there is a shortage of inodes includes reduced performance, inability to create files, incorrect output of information in the platform interface. To check the number and proportion of inodes used in the filesystem, run the command:
df -i
Troubleshooting Linux systems
Examine the system logs to troubleshoot your Linux system. Below are tools for troubleshooting and searching for errors in the server logs. For more information about the product logs, see the VMmanager documentation Platform Logs article.
Circular kernel buffer
One way to find out if the system is working incorrectly is to look at the kernel log using the dmesg utility. The kernel records all events in a circular buffer while the system is booting and running. dmesg will allow you to examine the kernel messages and identify hardware-related problems. To search for problems, run this command:
dmesg | grep -i -E 'error|failed|critical|bug|panic'
The journalctl utility
You can use the journalctl utility to analyze the logs and detect system problems. The utility displays the logs of the Linux system services. To detect abnormal behavior of a Linux system, run the command:
journalctl | grep -i -E 'error|failed|critical|bug|panic'
CPU
CPU architecture
Reduced performance of a platform node may be due to the technical characteristics of the CPU. In addition, information about the CPU architecture can be useful in diagnosing fine-tuning problems with virtual machines. To display the information, run the command:
lscpu
CPU count on nodes
VMmanager 6 licensing only takes into account physical cores. To specify the exact CPU value when ordering a license, count the number of cores on the nodes with the command:
dmidecode --type processor | grep -i "core count" | grep -Eo "[0-9]+?"
CPU count is also necessary when platform control is blocked due to the CPU cores number on node exceeds license limit error. The error occurs if the number of physical cores in the connected node exceeds the limit. To check if the limit is exceeded, verify the output of the dmidecode command against the VMmanager license settings.
System load
If there is an increased load on the system, the performance of the nodes will decrease. With the command for CPU count, you can determine the load on the system. To do this, compare the number of physical cores with the Load Average parameter. Run the command:
uptime
The Load Average parameter value must be less than the number of cores obtained by the command for CPU count.
Virtual machines (VMs)
VM status
You can use the virsh utility to display the status of all virtual machines for troubleshooting.
To execute virsh commands, first connect to the node:
docker exec --tty --interactive vm_box ssh -i /opt/ispsystem/vm/etc/.ssh/vmmgr.1 <IP_address> -p 22
To display the status of all VMs, run the command:
virsh list --all
To display the status of a particular VM, run the command:
virsh list --all | grep <название ВМ>
The libvirt virtualization daemon
Libvirt is a toolkit for virtualization management. Without Libvirt (libvirtd) running, the platform will not work correctly. Check the status of the service with the command:
systemctl status libvirtd
If the service is stopped or inactive, start it:
systemctl start libvirtd
If libvirt is not installed, the output of the libvirtd systemctl status command will contain a message:
Unit libvirtd.service could not be found
In this case:
-
Install libvirt manually with an OS-dependent command:
For RHEL-based operating systems (CentOS, AlmaLinux)yum install libvirt
For Deb-based operating systems (Ubuntu, Astra Linux)apt install libvirt
-
Start the service:
systemctl start libvirtd
-
Add libvirtd to the autostart:
systemctl enable libvirtd
- Re-check the status of the service to make sure that it is running.
Containers
The docker service
The Docker daemon is a service that manages containers as well as other docker entities: networks, storage and images. If this service is not running, the platform will not work. To check the status of docker, run the command:
systemctl status docker
If the service is stopped, start it with the command:
systemctl start docker
To check the version of docker, run the command:
docker version
Restarting the docker service
If the docker service is not working properly, restarting the service helps fix it. To do this, run the command:
systemctl restart docker.service
Перезапуск службы помогает исправить ряд ошибок, которые могут возникнуть при запуске, перезапуске или выключении платформы:
-
error while removing network: network <network_name> has active endpoints
Пример ошибкиerror while removing network: network vm_vm_box_net id 88888ggggg has active endpoints exit status 1
-
ERROR: for <service_name> Cannot start service <service_name>: endpoint with name <container_name> already exists in network <network_name>
Пример ошибкиERROR: for auth_back Cannot start service auth_back: endpoint with name vm_auth_back_1 already exists in network vm_vm_box_net
In the above example, the vm_auth_back_1 container failed to start.
- ERROR: for input Cannot start service input: driver failed programming external connectivity on endpoint vm_input_1
To correct the above errors, restart the docker service with the command above.
If the problem could not be resolved, contact technical support through your client area under Support → Support tickets → Add.
Status of containers
To diagnose possible problems, display a list of containers and their statuses. To display a list of all running containers, run the command:
docker ps
To get a list of all containers, including stopped ones, run the command:
docker ps -a
If you want to check the status of a specific container, run the command:
docker ps | grep <container_name>
Restarting the container
If the container does not work correctly, restarting it may help. To do this, run the command:
docker restart <container_name>
Restarting taskmgr
If the Task Manager does not work correctly, for example, there are frozen tasks, restarting the taskmgr container may help. To do this, run the command:
docker exec -it vm_box supervisorctl restart taskmgr
Restarting monitor
You may need to restart the monitoring service if no statistics are displayed on the nodes. To do this, run the command:
docker exec -it vm_box supervisorctl restart monitor
Logging
To analyze a container's events, examine its log. To display the last 100 lines of the container log, run the command:
docker logs --tail 100 <container_name>
Firewall
If there are no rules for the docker service in the firewall, there may be problems with the platform and network. The necessary rules are created automatically when you start the docker service, we do not recommend that you modify or delete them manually.
Service status and configuration
To check the status of the firewall service, run the command depending on the OS:
systemctl status nftables
systemctl status firewalld
To display the service configuration, run the command depending on the operating system:
nft list ruleset
firewall-cmd --list-ports
Restarting the service
Restarting the service is necessary if it does not work correctly, as well as to restore the default rules. To restart the service, run the command:
systemctl restart firewalld.service
systemctl restart nftables.service
To restore the default rules:
- Restart the firewall service with one of the commands presented above.
-
Restart docker with the command:
systemctl restart docker.service
- Restart the platform with the command from the section Restarting the platform of this article.
Searching for information in the database
There are potential risks involved in tampering with the DB. We do not recommend making manual edits to the database, as it can disrupt the correct operation of the platform.
Any actions with the database should be performed only after backing up the platform.
Using queries to the database, you can see information about the state of VMs, nodes and other platform objects. Below is a list of queries for retrieving data from the database.
To run queries, connect to the MySQL container:
docker exec -it mysql bash -c "mysql isp -p\$MYSQL_ROOT_PASSWORD"
VM info
Information about the VM will display all its status parameters, internal name, and node data.
To get information about the virtual machine, run the query:
select * from vm_host where id=<id_vm>\G;
To display information about the node and the internal VM name, run the query:
select id,internal_name,node from vm_host where id=<id_vm>\G;
Information about the node
Information about the node will display the selected node parameters.
To check the information about the node, run the query:
select id,name,ip_addr,ssh_port from vm_node where id=<id_node>;
To check the network on the node, run the query:
select * from vm_node_interfaces where node=<id_node> \G;
VM virtual disks
Information about the virtual disk will help to diagnose problems associated with it. For example, the virt-resize: error, which can occur if the value of expand_part (partition to expand) and the size of the virtual disk in the database is incorrect.
To view full information about the disk, run this query:
select * from vm_disk where name = 'example_name' \G;
To check the actual disk size, run the command on the node:
virsh domblkinfo --human 1111_example_name vda
Backups
By querying the database, you can check the backup schedule to identify possible problems. To get the information, run the query:
select * from vm_schedule;