12 August 2020

Reading time: 21 minutes

Analyzing the user activity graph

Denis Naumov and Danil Zakharov,

Product Analytics

There are hundreds of articles on the Internet about the benefits of customer behavior analysis. Primarily this is true for retail sales. From product basket analysis, ABC and XYZ analysis to retention marketing and personal offers. Different analysis methods have been used for decades, the algorithms have been refined, the code is written and debugged — seems like nothing can stop you from simply starting to use one. In our case, there was one fundamental problem — we in ISPsystem are engaged in software development, not retail.

My name is Denis Naumov and currently I am responsible for the backend of analytical systems in ISPsystem. And this is the story of how my colleague Danil Zakharov, who is responsible for data visualization, and I tried to look at our software products through the lens of this knowledge. Let us begin with the background, as usual.

We decided: why not give it a try?

At that time, I was the developer in R&D Department. It all began, when Danil read about Retentioneering — a tool for analyzing user navigation in applications. I was a little skeptical about the idea of using it with us. As examples, the developers of the library cited application analysis, where the target action was clearly defined — ordering or other way of paying the company owner. Our products are delivered on-premise. That is, a user buys a license first, and only then starts to work his/her way through the application. Yes, we have demo versions. You can try the product so you do not get a pig in a poke.

Most of our products are focused on the hosting market. These are large customers, and our Business Development department advises them on product opportunities. It also means that at the moment of purchase our clients already know what problems our software will help them to solve. Their routes in the application must match those of the product CJM, and UX solutions will make sure the client does not get confused in them. Spoiler alert: this is not always the case. Our encounter with the library was postponed... but not for too long.

It all changed, when we released our startup — Cartbee — the platform for creating an online store from an Instagram account. In this application, the user was given a two-week period for free use of all functionality. Afterwards it was necessary to decide whether to subscribe. And this was very good in the concept of "path to the target action". So it was decided: we will try it!

First results or where to get ideas from

The development team and I connected the product to the event collection system in just one day. I will tell you right away that ISPsystem uses its own system for collecting events about page visits, but nothing prevents you from using Yandex.metric which allows you to upload raw data free of charge for the same purposes. Examples of library use were studied, and after a week of data collection, we received a transition graph.

The transitions graph. Basic functionality only, the rest of the transitions have been removed for clarity.

It turned out just like in the example: plane, clear and attractive. From this graph, we were able to identify the most frequent routes and transitions where people stay the longest. This helped us to understand the following:

Instead of a large CJM, which covers a dozen entities, only two are actively used. It is necessary to additionally direct users to the right places with the help of UX solutions.
On some pages designed by UX designers as through pages, people spend unnecessarily much time. We need to find out what are the stop elements on a particular page and fix it.
After 10 transitions, 20% of people began to get tired and abandoned the session. And that's considering that we had whole 5 pages of onboarding in the app! We need to identify pages where users regularly abandon sessions and shorten the path to them. Or better still: identify any regular routes and allow to quickly navigate from the source page to the destination page.

Seems somewhat similar to ABC analysis and the analysis of abandoned baskets, doesn’t it? And here we have revised our attitude to the applicability of this tool for on-premise products. It was decided to analyze the actively traded and used product — VMmanager 6. It is much more complex, the number of entities is about ten times greater. We were anxious to see what the transitions graph would be like.

About frustrations and excitements

Frustration No.1

It was the end of the working day, the end of the month and the end of the year at the same time. December 27. The data have been accumulated, the queries have been written. There were seconds left before everything is processed and we could take a look at the results of our work to see where the next year would begin. R&D department, product manager, UX designers, teamleaders, and developers gathered in front of the monitor to see what the user paths in their product look like, but... this was what we saw:

Transitions graph plotted by Retentioneering

Excitement No.1

It is strongly connected, dozens of entities, unobvious scenarios. All that was clear was that the new working year would begin not with analysis, but with the invention of a way to simplify the work with this graph. Nevertheless, I couldn't help feeling that things were a lot simpler than they seemed. After fifteen minutes of studying the source code of Retentioneering, we managed to export the plotted graph in dot format. This allowed dumping the graph into another tool - Gephi. And there were plenty of ways to analyze graphs in it: layering, filters, statistics — all you need to do in the interface is to set the right parameters. With that thought, we left for the New Year holidays.

Frustration No.2

After getting back to work it turned out that while everyone was resting, our clients were exploring the product. And very diligently in fact, so that events that had not existed before now appeared in the storage. It meant that the requests had to be updated.

A little context for the reader to appreciate the sadness of this fact. We collect both the events we have marked (e.g. clicks on some buttons) and the URLs of the pages visited by the user. In the case of Cartbee, the model "one action - one page" worked. But with VMmanager, the situation was already quite different: on one page several modal windows could open. In them, the user could solve various tasks. For example:
URL `/host/item/24/ip(modal:modal/host/item/ip/create)`
means that the user added an IP address to the "IP addresses" page. And here you can see two problems at once:

The URL has some kind of path parameter — virtual machine ID. It needs to be excluded.
The URL contains the modal window ID. We need to somehow "unpack" these URLs.

Another problem was that there were parameters in the very events we marked. For example, there were five different ways to get to the page with information about a virtual machine from the list. Accordingly, one event was sent, but with the parameter indicating which of the ways the user made the transition. There were many such events, and all parameters were different. And we have all the logic of data extraction written in the SQL dialect for Clickhouse. Requests of 150-200 strings were beginning to seem normal. We were getting surrounded by problems.

Excitement No.2

One early morning, Danil, sadly scrolling through a query for the second minute, offered me: "Why don't we write the data processing pipelines?" We thought it over and decided that if we should do it, it should be something like ETL. So that it would both filter and pull necessary data from other sources. And this is how our first analytical service with a fully-fledged backend was born. It implements five main stages of data processing:

Download events from the raw data storage and prepare them for processing.
Clarification — "unpacking" those identifiers of modal windows, event parameters and other details clarifying the event.
Enriching (derived from the word “getting rich”) — supplementing events with data from external sources. At that time, only our BILLmanager billing system was included.
Filtering is the process of dismissing events that distort the analysis results (events from internal stands, outliers, etc.).
Downloading of the received events to the storage, which we called pure data.

It was then possible to keep the data up to date by adding rules to process an event or even a group of similar events. For example, since that moment we have never updated the URL unpacking. Although, several new URL variants have been added during this time. They correspond to the rules already specified in the service and are processed correctly.

Frustration No.3

Once we started the analysis, we realized why the graph was so connected. The point was that almost every N-gram contained transitions that could not be made through the interface.

A small investigation began. I was confused by the fact that there were no impossible transitions within one entity. So it was not a bug of the event collection system or our ETL service. It felt as if the user was working in several entities at the same time, without moving from one entity to another. How can this be achieved? By using different browser tabs.

In analyzing Cartbee, we were saved by its specific nature. The application was used from mobile devices, where it is simply inconvenient to work from several tabs. This here was a desktop application and while the task was being performed in one entity, it is reasonable to want to make good use of that time by configuring or monitoring the status in another. And in order not to lose progress, you just opened another tab.

Excitement No.3

Colleagues from front-end development taught the event collection system to distinguish between tabs. Therefore, we could start the analysis. And we did. As expected, CJM did not match with the actual paths: users spent a lot of time in the catalog pages, abandoned sessions and tabs in the most unexpected places.

By analyzing the transitions, we were able to find problems in some Mozilla builds. Due to the peculiarities of the implementation, the navigation elements were lost or half-empty pages were displayed, which should have only been available to the administrator. The page opened, but the content from the backend did not come. Counting the transitions made it possible to estimate which features were actually used. Chains of events allowed you to understand how the user ended up getting one error or another. The data allowed for testing based on user behavior. It was a success; the idea turned out to be not futile.

Automation of analytics

At one of the demonstrations of the results, we showed how Gephi is used for graph analysis. In this tool, you can display data on transitions in a table. And then our Head of UX shared a very important idea, which influenced the development of the whole direction of behavioral analytics in the company: "Let's do the same, but in Tableau and with filters, it's more convenient."

Then I thought: "Why not, Retentioneering stores all data in the pandas.DataFrame structure anyway. And this is basically a table. That is how another service came into being: Data Provider. It not only made a table out of the graph, but also calculated how popular the page and the associated functionality are, how it affects user retention, how long users stay on it, which pages users leave most often. Meanwhile, the use of visualization in Tableau has reduced the cost of studying the graph so much that the iteration time of behavior analysis in the product has been reduced by almost half.

Danil will tell you how this visualization is applied and what conclusions can be drawn.

Bring more tables to the god of tables!

In a simplified form, the task was formulated as follows: display the graph of transitions in Tableau, provide the ability to filter, make it as clear and convenient as possible.

We did not really feel like drawing an oriented graph in Tableau. And in case of success, the profit, compared to Gephi, seemed unobvious. We needed something much simpler and more accessible. A table! After all, the graph can be easily represented as rows in a table, where each row is an edge of the "source-value" type. Moreover, this table has already been carefully prepared by Retentioneering and Data Provider. It was a small matter: output the table to Tableau and share the report.

However, here we faced another problem. What to do with the data source? It was impossible to connect pandas.DataFrame, because Tableau does not have such a connector. Building a separate storage base for the Count seemed too radical a solution with vague prospects. Yet, local unloading options were not suitable due to the need for constant manual operations. We searched the list of available connectors, and our eye fell on the item Web Data Connector, which was somewhat lonely sitting at the bottom.

Tableau has a wide range of connectors. We found the one that solved our problem

What kind of animal is it? Several new tabs in the browser — and it became clear that this connector allows you to receive data when accessing the URL. Backend for the calculation of the data itself was almost ready, and it was only left to connect it with WDC. For a few days, Denis studied the documentation and battled with the Tableau mechanisms, and then he sent me a link that I inserted in the connection window.

Form of connection to our WDC. Denis has developed his frontend and took care of security

After a couple of minutes of waiting (the data are calculated dynamically at the request), a table appeared:

This is what a raw array of data in the Tableau interface looks like

As promised, each row of such a table was a graph edge, i.e., a directional user transition. It also contained several additional features. For example, the number of unique users, the total number of transitions and others.

You could put this table into the report as is, generously sprinkle with filters and set the tool’s sail. Sounds logical. What can be done with the table? This is not our preferred path because we are not simply creating a table but a tool for analyzing and making product-related decisions.

As a rule, when analyzing data a person wants to get answers to questions. Excellent. That is exactly where we will start.

What transitions are the most frequent?
When do users leave a particular page?
How long on average does a user spend before leaving the page?
How frequent is the transition from A to B?
At what pages does the session end?

Each of the reports, or a combination of them, should allow users to find their own answers to these questions. The key strategy here is to provide a tool for self-analytics. This is useful both to reduce the load on the analytics department and to reduce the time for decision-making — after all, you no longer need to go to Youtrack and create a task for the analyst, just open the report.

So what did we get eventually?

# Where do users go to most often from the dashboard?

A fragment of our report. After the dashboard, users went either to the VM list or to the node list.

Let us take the general table with transitions and filter it by the source page. Most often, they leave the dashboard for the list of virtual machines. Moreover, the Regularity column suggests that this is a repetitive action.

# Where do users come to the cluster list from?

Filters in reports work both ways: you can find out where users left from, or you can find out where they went.

From the examples, you can see that even having two simple filters and ranking rows by values allows you to get information quickly.

Let us ask a more difficult question.

# Where does the user abandon the session most often?

VMmanager users often work in separate tabs

For this purpose, we need a report whose data is aggregated by sources of transitions. As destinations, we took the so-called breakpoints - events that served as the end of the chain of transitions.

It is important to note here that this can be either the end of the session or the opening of a new tab. You can see from the example that the chain ends most often with a table listing virtual machines. The characteristic behavior is switching to another tab, which corresponds to the expected pattern.

We tested the usefulness of these reports first on ourselves when we analyzed Vepp, our yet another product. With the advent of tables and filters, hypotheses were tested faster and our eyes were less tired.

In developing reports, we have not forgotten about the visual design. When working with tables of such size it is an important factor. For example, we used a quiet range of colors, easy to perceive monospace font for numbers, color backlighting of lines according to the numerical values of characteristics. Such details improve the user experience and increase the likelihood of a successful takeoff for the tool within the company.

The table turned out to be quite large, but hopefully it has not stopped being readable

Separately, it is worth mentioning the training of our internal clients: product engineers and UX designers. Manuals with examples of analysis and hints when working with filters were specially prepared for them. We inserted the links to the manuals directly into the report pages.

We created the manual as a simple presentation on Google Docs. With Tableau tools, you can display web pages right inside your report book

In place of an afterword

So what does it all boil down to? We were able to get a tool for everyday use relatively quickly and cheaply. Beyond doubt, it is definitely not a replacement for the graph itself, the click heat map or the webvisor. Nevertheless, such reports significantly complement the above-mentioned tools, provide food for thought as well as new product and interface hypotheses.

This story was only the beginning for the development of analytics in ISPsystem. In six months, seven more new services appeared, including digital portraits of the user in the product and a service for the formation of bases for Look-alike targeting, but more about them in the articles to follow.