We are constantly surrounded by digital suggestions: every time we access a streaming platform or visit a e-commercesome algorithm based onartificial intelligence is working for us to curate the selection of content we find in front of us, be it digital content, products to purchase, and so on. This system, technically defined recommendation engineis based on the analysis of big data and on complex algorithms machine learning designed to interpret our past behaviors and anticipate our future desires. The purpose of these technologies is twofold: on the one hand they help us users to orient ourselves in endless catalogues, allowing us to discover films, songs or products that we would struggle to find on our own, and on the other they are essential for companies to keep our involvement high and stimulate sales. It is no coincidence that the market for these systems is almost worth it today 7 billion dollarswith this figure expected to triple within the next few years.
But How do algorithms decide what to recommend to us? In summary, the process begins with widespread collection of our databoth those provided voluntarily and those deduced from our online activities. This information is stored in huge databases, analyzed to identify recurring patterns and finally filtered through three main methodologies: the collaborative filteringwhich compares us to other similar users; the one based on contentswhich analyzes the intrinsic characteristics of what we have already appreciated; hey hybrid systems. While these algorithms dramatically improve our user experience, they bring with them significant challenges, from the need to protect our privacy and comply with regulations, to the risk of encountering biases learned from the data itself, to the technical complexity of providing us with suggestions in real time. Let’s delve deeper into the topic taking into account that recommendation systems can have different specificities and modus operandi dissimilar to each other.
How recommendation algorithms work and what benefits they bring
To fully understand how recommendation algorithms are able to decipher our tasteswe need to analyze the 5 operational phases that transform our interactions into accurate predictions, starting from first phasethe data collectionwhich represents the main fuel of the entire process. Recommendation engines feed on two categories of traces that we leave online: i explicit datathat is, our direct and conscious actions such as a “Like”, a written review or a star rating, ei implicit datamuch more numerous and subtle, which include our browsing history, clicks, past purchases or even the time we spend looking at a product. They are often added to these demographic and psychographic datasuch as our age or our lifestyle. All this data, once collected, passes to second phasethat of archiving. It is at this point that the data is stored in complex storage structures, known as data warehouse or data lake. Once “stored”, the data passes to third phasethat of analyseswhere machine learning algorithms look for mathematical correlations to create predictive models.
There fourth phaseone of the most important, is that of filteringwhich determines the logic of the suggestion. In the collaborative filteringused massively by giants such as Amazon And Spotifythe system is based on the assumption that if we and another user have had similar preferences in the past, we are likely to continue to have them; if we liked the same films as another user, the algorithm will also recommend those that he has seen and we have not. This method can be based on memorycalculating the proximity between users or it can be model-basedexploiting neural networks of deep learning to fill the gaps in our preferences. The main limit here is the so-called “cold start”: if we are new users and do not have a history, the system struggles to identify us.
The alternative is the content-based filteringwhich instead of observing other users, focuses on the characteristics of the objects we liked. If we listened to a song with certain tags, genre and rhythm, the algorithm will treat us as vectors in a vector spaceoffering us other songs “close” to the known ones. This approach solves the problem, just mentioned, of the fateful “cold start”, but risks locking us into a bubble where we are always offered things that are too similar to those we already know, limiting the discovery of the new.
To overcome the flaws of both filtering methods, platforms such as Netflix they adopt hybrid systemsvery powerful but expensive in terms of calculation. The benefits for our experience are tangible: we save time by avoiding endless scrolling and we discover relevant content, so much so that 80% of Netflix viewing comes from these suggestions.
Critical issues related to recommendation systems
Regardless of recommendation system in usesome are not missing criticality inherent in this technology. In addition to the complexity of managing millions of simultaneous recommendations, there is a risk that algorithms learn and amplify social prejudices present in the training data, generating biased recommendations, without forgetting the delicate privacy issue linked to the massive collection of our personal information. There is so much to say regarding the critical issues of these systems and it is such a vast and boundless topic that it deserves an ad hoc study.
