02 April 2019

Dmitry Syrovatskiy

Back-end developer

VMmanager 6. How to get rid of an unnecessary and keep useful

My name is Dmitry and I am a developer in ISPsystem. A few time ago we launched a public beta-testing of the new version of VMmanager, our panel for virtual machines (VMs) management. I wanted to tell how we took decisions on what should we keep from the old version and what to get rid of. I’ll cover the most important moments for us: a library for working with libvirt, supporting multiple OS at the panel installation stage, moving from a monolith structure to microservices, deployment of VMs.
VMmanagner is a panel for deploying, managing and monitoring of virtual servers based on KVM and OVZ virtualization. The previous generation was released in 2012. By today the interface became outdated and a centralized architecture slowed down the product development. It was a time for creating a new version.

Working with libvirt: evaluating options, choosing the libraries

In our panel, we use libvirt as a tool for managing KVM-virtualization. In order to work with the libvirt, in 2012 we chose the library written in C, because it was convenient for the team that worked on VMmanager back then. As a result - a large volume of the C++ code calling C-based library that worked with libvirt directly.
Before creating the new product we decided to take a look back, evaluate the existing version and decide if we should take a certain feature/technology in the new version. It was important to pick the good part from the current product and determine the bad ideas to avoid it in the new one.
We put VMmanager team together to make a retrospective of working on the current version of the products. It required tons of coffee, lots of patience and hundreds of stickers of three types:
  1. What was good about the product? What features were endorsed by the users? What functionality received no bug-reports? What did we like ourselves?
  2. What was bad about the product? What caused problems all the time? What slowed down the development and prevented using new technologies?
  3. What can we change? What is demanded by users? What do we want to change ourselves?
VMmanager team included both the employees working with the product for years and the ones who had a fresh vision on the new product. Another bunch of stickers was written according to Feature Requests from users and ideas of the product manager. All these stickers were put on a big white-board and helped us a lot in the future.
Let’s get back to history. Here is the piece of code where C++ of 98 standards coexists with calling C-library. There was no place for this code in the new products, but how can we replace it? It was necessary to keep the functionality of working with VMs but make the code more compact and convenient.
After studying the issue we found out that no matter what solution and programming language we choose, the answer was creating a wrapping for C-library. It is worth to mention the great library from Digital Ocean written on Go. It uses RPC protocol for working with Libvirt directly, however, it has some drawbacks. In the end, we choose python-library.
In result, we got a high speed of writing the code and simplicity of reading and using it. Let me explain it in details.
  1. A high speed. Thanks to the new library, we can quickly prototype a certain part of working with a domain directly from the console on a debugging server without reassembling the main application.
  2. Simplicity. Instead of calling multiple C++ methods in some handler we call Python-script transmitting the parameters we need.
  3. Debugging became much faster and painless as well. In my opinion, it may cause an interesting user experience. Just imagine, that a sysadmin annoyed by the fact that his VMs wait for Shutdown before Destroy can re-assign the script for host_stop method.
In the end, we had a simple and handy tool for working with VMs on a server level.

Product distribution: getting rid of many packages and moving to Docker.

VMmanager 5 was distributed as a set of linux-packages. It supported CentOS 6/7 and Debian 7 (not anymore). What did it mean for us? More build-serves, more work for QA engineers, more requirements to the code quality. For example, an official repository of CentOS 7 has qemu 1.5.3, while CentOS 6 has qemu 0.12.1. But the users may use other repos with higher versions of this package. It meant that we had to support multiple versions of API for working with VMs, especially for migration.
It was important to keep in mind the difference in initialization (init, systemd), differences in naming for packages and utilities. The utilities working for CentOS won’t work for Debian or their versions may be very different in official repos. For every Push it was necessary to assemble the packages of all the version and don’t forget to test them all.
We could not repeat this in a new product. In order to stop supporting multiple logics, we decided to leave only CentOS. Did we solve the problem? Not entirely. We wanted to avoid the check of OS version, necessary utilities availability, SELinux rules before the installation. We also didn’t want to re-configure the firewall and repo lists. In an ideal world, the product and environment should be deployed in a couple of clicks. In order to do so, we wrapped the project into the docker-container.
All that is need to be done now is:

# docker pull vmmanager
# docker run -d vmmanager:latest

Voila, the panel is up and running.
Of course, I’m exaggerating a bit: the user has to install Docker, plus we have several different containers and VMmanager itself currently run in swarm-mode as SaaS service. In fact, it is worth to devote another article about the challenges we faced when chose Docker and how we handled them.
However, it is amazing how we simplified the development and deployment for the panel that required 2097 lines of code for install.sh just a few years ago.
In the end:
  1. A homogeneous installation environment simplifies writing the code and reduce the amount of resources required for building and QA.
  2. Using Docker technologies makes deployment procedure more simple and predictable.

Architecture: moving from a monolith to microservices?

The fifth generation of VMmanager represents a big monolith system with an outdated standard of C++. As a result, we have issues in using new technologies, lack of possibilities in horizontal scaling and face difficulties in legacy-code refactoring. In VMmanager 6 we decided to use microservices as a solution to these issues.
Microservices is a modern trend having its own pros and cons. I’ll try to describe my own vision on the advantages of this architecture and tell how we solved the issues caused by implementing it.

Advantages

A small service gives a vast range of advantages: it’s easy to write, test, debug, and besides, it allows using new languages of programming. If your project has the monolith structure, it is unlikely that you decide to re-write a part of it using another more perspective language. However, it is easy to use if you use the microservices. Besides the language, you may also try new technologies that cover business needs. For example, we saved lots of time by using Golang for creating some microservices.
Team scaling. When we worked on VMmanager 5, we had many developers who committed in a single repository trying to keep the monolith structure. Now we can divide these developers into several different teams that will work on their own services. It also simplifies the adaptation for new members of the team because they work in a limited context. On the other hand, with this approach we losing the employees who aggregate the huge amount of information and can give an answer to any question regarding the whole system. There is a chance that we will change our opinion on this in the future.
Independent degradation. It is an advantage and disadvantage of microservices at the same time. The application won’t work properly if the authorization service is down. On the other hand, in VMmanager 5 collecting statistics from several hundreds of VMs required a lot of resources from the server. As a result, during such peak loads, even a simple request from a user may take much more time to be performed compared with normal time. A separate service for collecting statistics doesn’t affect other services and can be scaled up by adding resources to the server or by adding new collectors for the statistics. We can even assign a separate server for Graphite so this service can collect statistics there. It would be impossible if we used the monolith with a single base.

Disadvantages

A context for a request. With the monolith, all my debugging was limited to a couple of requests in the console:

# tail -n 460 var/vmmgr.log | grep ERR
# tail -n 460 var/vmmgr.log | grep thread_id_with_err

Done! I could trace the complete request starting from its launching to the moment when an error emerged.
But what should I do now when a request may travel from one microservice to another calling the other services and leaving records in different databases? For the convenience of debugging, we created Request Info that included an ID for any request and information about the user or service that made this request. It allows us to trace the complete chain of events. However, now we consider creating the service for aggregating the logs. Maybe we will use Elasticsearch, we’ll decide it soon.
Data incoherence. In the microservices architecture, the data is decentralized and there is no central database to keep all the data. When I worked on this article I was thinking about the interaction between microservices and found out that the issue of incoherence may be solved by the monolith.
We had built a monolith with the main database and wrapped there a major part of transactive actions. This monolith is surrounded with microservices that don’t affect the consistency of the main data. The only exclusion is the combination of Monolith+Authorization Service. The problem here is the fact that the main database doesn’t include the user data: their roles and other parameters - all data of this kind are kept in the authorization service.
A user of VMmanager 6 may work with VMs in the monolith, but if his rights are changed in authorization service or if his account is deleted, the system must react at once. In this case, the data incoherence is achieved thanks to the checking of user parameters before performing any request.
For other microservices, failing to log into the statistics service won’t affect the normal work of VMs. And besides, even if it fails - we can always try to log in again. Ok, Mr. Statistics Collection Service, welcome on board! The opposite situation is adding define domain service (creating a VM with libvirt). Since we have the monolith in the core of our product, having this microservice makes no sense.

VM deployment: installing from an image instead of network installation

In VMmanager 5 VM deployment takes quite a lot of time by today’s standards because the OS is installed by the network.
For Centos, Fedora, RedHat we used kickstart-method:
  1. Create kickstart-file.
  2. Enter the link for the response file in the Linux kernel parameters inst.ks=< link to kickstart file>.
  3. Launch kickstart-installation.
Kickstart-file is pretty flexible and allows you to describe every stage of installation starting from its method and defining time zone to a disk partitioning and network configuration. An URL parameter in our templates indicates that the installation performs from a remote server.
For Debian and Ubuntu — we used a preseed method:
preseed is similar to the previous method, it is also based on the configuration file and its content. We configured the network installation here as well.
We used the same approach for installing FreeBSD, but instead of using the kickstart-file, we launch our own shell-script.

Advantages

The given approach for installation allows us to use the same templates in several products: VMmanager and DCImanager (a platform for dedicated servers management).
The process of VMs deployment is pretty flexible and an admin of the panel can simply copy our OS template and change the configuration file at his own desire.
The users always have the latest versions of OS if they update their systems from the remote server on-time.

Disadvantages

At our experience, the installation flexibility wasn’t appreciated by the major part of VMmanager users. Specific configurations of the kickstart file were important only for dedicated servers. However, a long time for OS installation represented a real problem. In order to install the latest version of OS, it is necessary to keep a part of the installer on the Internet, while the other part should be kept locally in initrd and the versions of these installers should be the same.
This problem can be solved by creating a pool of pre-deployed VMs and making an own OS repository. However, this method will create additional costs.
How can we solve these issues without creating additional repositories and pools? We decided to use the images of OS. Now we have the following installation process:
  1. Copying OS image into the VM’s disk.
  2. Extend the main image folder on a volume of available space left after the copying.
  3. Basic configuration (setting password, timezone, etc.)
They say that everything new is actually well-forgotten old. We used OS images in VDSmanager (the 4th generation of our VPS management platform).
But wait, what about the flexibility of installation? As our experience shows, the major part of users doesn’t care about specific network configuration and VMs disk partitioning. Ok, but how about the problem of using outdated data? We solve it by keeping updated images of OS in our repository and installing minor updates with an initial configuration script. So if you will create a new VM and log in there you’ll see yum update running.
As a result, we have a virtual machine, where deployment time depends only on disk copying, disk extension and starting OS. This approach allows users to create images and share them. A user may install LAMP or any other environment on his VM and create the image of this VM. If he needs to deploy the VM with the same environment - there won’t be a need to install necessary utilities.
For configuring and modifying the folders we used libguestfs utilities. For example, changing password for Linux VM in VMmanager 5 required 40 lines of code that included mount, chroot and usermod. In the new panel we use only one line:

command = "/usr/bin/virt-customize --root-password password:{password} --domain '{domain_name}'".format(password=args.password, domain_name=args.domain_name)

As a result, we made VM deployment as fast as possible. Of course, network configuration and internal scripts installation increase a total time of deployment a bit. In order to solve this issue, we decided to show installation steps in the interface to fill the gap between VM creation and the moment when it is ready.
The flexibility of deployment became even higher in VMmanager 6 and our users can create their own images with the required environment.

What did we manage to do

When we worked on VMmanager 6, we kept in mind the main drawback of the previous product, its sophistication, and tried to solve it. We decreased the time of performing the main actions and let users worked with the panel instead of forcing them to wait while these actions were completed. Containerization made the installation process more simple and convenient. Using new technologies and different languages of programming simplified product development and support. The microservice architecture allowed us to add new features quickly with just little restrictions.
As a conclusion, I want to say that a new product is always a great chance to try new technologies and different approaches to development. However, you should always keep in mind the reasons for adding a new feature or technologies and what benefits will it bring to you and your customers. Good luck!

Dmitry Syrovatskiy

Back-end developer