1 Related Work

In my opinion, the research on algorithmic personalisation has to cope with two major obstacles, which I will explore in more detail in the current section.

Firstly, the research is situated in and influenced by the broader social and political debate from which it draws its legitimacy and by which it is biased in the sense that it tends to uncritically support the assumption of great dangers to society from algorithms in general. Instead, I suggest to test the personalisation of users and the segmentation of recommended content by the algorithm in more detail, in order to establish a common ground for interdisciplinary research and cross-cultural comparisons. From these findings, the research can advance towards conclusions and management insights.

Secondly, the research follows the understanding of personalisation, which is a common means of mass communication for the age of broadcasting and radio, and therefore implements the pattern of massive and uniform influence. As a result, it is built on the automated collection of vast amounts of data, which accumulates contingency rather than enabling powerful insights. To overcome this, I propose to adapt the new philosophical and socio-critical concepts of human-algorithmic interaction as singularisation and artificial communication, together with a more nuanced view of culture as the interplay of content and context. To understand users' engagement with different types of content, smaller but heavily annotated data collections could provide a more realistic view. In this respect, curated data collection and annotation is a much more time-consuming and labour-intensive research design than large, but not qualitatively annotated data sets. However, it is preferable when it comes to entertainment-driven algorithmic recommender systems such as YouTube.

1.1 Social Debate and Impact on/of the Research

Within the public sphere, YouTube is perceived as a political and educational actor. The platform was forced to take additional measures to censor radical and pornographic content. As a reaction to the accusation of ideological manipulation during the Trump era the channels had to reveal their financial sources, a note on which appears not only in the basic information about the channel, but also directly in the video window. A publication of the NY Times pointed to the promotion of Fox News on the platform, which had moved to the top of recommendation lists due to adjustments of the algorithms [1]. Within these discussions, the dissemination of Russian propaganda through YouTube plays a crucial role. After YouTube was accused of promoting videos from Russia Today to users, who were searching on YouTube for information concerning Trump’s relations with Russia, the platform declared that through the algorithm ‘content from more authoritative sources is featured more prominently in search results and watch-next recommendations in certain contexts’ [2]. In the case of Russia Today, which is disseminating propaganda beyond the Russian borders [3, 4], it being marked as an ‘authoritative source’ seems particularly ridiculous. The situation escalated once more in February 2022 when Russian propaganda channels were shut down by the platform. Since then, on the other side of the globe in contemporary Russia, the prohibition of YouTube is also constantly under discussion.

Hence, the pressure on YouTube is constantly growing. The attack on the Capitol boosted the debate even more: The US Congress wrote a letter to the executive officers of YouTube, claiming the radicalisation of users ‘in a digital echo chamber that your company designed, built, and maintained’ and requiring that the platform ‘make additional permanent changes to its recommendation systems’ and even that it disable the auto-play by default [5]. The concept of echo chambers, referenced in the letter to the US Congress, originates from Noelle-Neumann's seminal 1974 publication [6] and refers to the reinforcement of an individual's beliefs by the media environment, whereby he/she focuses on like-minded sources and excludes the opposite. From the outset of the debate surrounding the echo chamber hypothesis, research has contested the issue due to its involvement with the question of individual choice, which proves challenging to document and analyse. Consequently, the issue of responsibility cannot be readily addressed amidst the uncertainty. Following the rise of algorithmic personalisation, the issue of echo chambers in digital media has become increasingly pressing. This is due to the prediction of a user's choices by recommender systems. The belief that recommendations narrow the scope of possible choices and push users towards like-minded channels and content, has caused a renewed interest in the concept of echo chambers (recently summarised in debates by sources [7 and 8]), which still remains a topic of debate in research [9]. Since a recent study revealed the absence of such bubbles with regards to certain topics [10], a study of YouTube recommendations cannot just start from the assumption that such echo chambers pre-exist. Instead, it must reflect on own methodology in producing bubble-like effects.

However, in the case of YouTube the hypothesis of echo chambers was perceived as different to other social media. As Ledwich and Zaitsev pointed out, the lack of transparency regarding the actions of users led to a significant shift towards accusations aimed at the platform: ‘While previous comments on the role that social media websites play in spreading radicalisation have focused on user contributions, the implications of the recommendation algorithm strictly implicate YouTube’s programming as an offender’ ([11], 2; for an interesting approach to filling in these gaps in YouTube data with the help of Twitter metrics of views and shares see [12]). Due to the impossibility of using metrics developed for other social networks to investigate echo chambers on YouTube (for example, like [7] did in their recent investigation of echo chambers on Facebook, Twitter, Reddit and Gap), the disproportion in the research grows.

While being completely comprehensible in terms of research pragmatism, the yet unproven claim of threats posed to society by the YouTube algorithm directs the research towards the high-risk hypotheses regarding polarisation and radicalisation. They are extremely difficult to verify, mainly because the researchers cannot estimate the efficiency of the recommendations. According to YouTube’s CEO, Susan Wojcicki, 70% of video choices are triggered by recommendations, as seen in her interview [13], but can we believe this self-praise blindly? Here I would like to highlight the argument put forward by Florian Muhle, pointing out that various criticisms of the algorithmic personalisation tend to overestimate its effectiveness ([14], 153). If we begin to question this, do we not lose the legitimisation of the studies, because who would care about ineffective personalisation?

The imagination of critics is fuelled by the presupposition that too little is known about the function of the algorithm. They seem to ignore the fact that the basic features of the Deep Learning algorithm and its interactions with the data base are made public in the publication of the YouTube team [15]. The paper explains that views and ratings, which played an important role in the past, were replaced by ‘long clicks’ and viewing time. The time window was adapted to avoid the focus on old videos and to allow new and engaging videos to appear in the recommendation lists more quickly. Qualitative characteristics such as video tags and descriptions help to identify relevant videos, whereby user history, searches, region, gender and other demographic features are involved in the estimation of the likelihood that a video will be engaging. Based on this estimation, the deep learning algorithm not only selects candidates from the vast archive, but also ranks this selection.

This information is certainly not sufficient to reverse engineer the algorithm. This is, in any case, almost impossible based on recommendation lists, without any knowledge of the candidates which were excluded from the selection, and with no access to the database from which the algorithm is operating. However, it is a far cry from the ‘black-box’-argument, which is commonplace in the research (see for example in [16]). It is also enough to put into question the necessity of reverse engineering that could, in the end, be very unrewarding: ‘What would the study of algorithms look like if we accepted that algorithms are inscrutable because there’s actually nothing to see? What metaphors are most useful for making sense of this?’ [17].

In the present situation, the controversy about the destructive potential of AI and the overestimation of the effectiveness of the algorithms leads to frustration and the impression of exaggerated dangers, which perfectly fit the discussion regarding the relation between ‘public relevant algorithms’ and ‘calculated public’ [18, 168]. Even if we consider that the risks of being captured in an echo chamber might be mitigated by using a variety of communication channels instead of exclusively using YouTube [9], the risks for the public space and individuals through the recommendation systems can hardly be assessed realistically when seen as a ‘ghost in the machine’ ([19], 257). The demand for ‘algorithmic sovereignty’ [16] should thus result in the development of practical tools and support for users instead of painting the devil on the wall.

1.2 A Bit of Philosophy Wouldn’t Hurt

The routine studies on polarisation, radicalisation and echo chamber effects of recommender systems could be avoided by looking at the conceptualisations in contemporary philosophy and sociology. It might be advantageous to bear in mind the basic fact that YouTube is not acting as medium in a classical sense—creating and broadcasting the same range of content to a wide range of the public.Footnote 1 YouTube solely redistributes content created by others, whereby recommendation algorithms present users with a selection from the vast archives. To understand the logic of such redistribution, Andreas Reckwitz’s theory of personalisation and singularisation as culturalisation might be of use. Reckwitz claims that it is time to change the way we think about the algorithm, since ‘we have been accustomed since early modernity to thinking about technology in terms of the industrial-mechanical paradigm of standardisation and thus also in terms of discipline and control’ ([21], 166). Since new digital technologies focus on customisation, the research must track minute, unstable and subject-driven traces of influence, which probably do not affect society on a large scale, in the way that the propaganda of the early industrial age did.

The pitfalls of such a generalised positivist approach can be exemplified using the daily monitoring of data.algotransparency.org, which collects YouTube recommendations based on a list of more than 800 English-language channels and presents the results in a list of the most recommended videos. To obtain the recommendations, AlgoTransparency’s monitoring uses the most recently uploaded videos from the list of the channels, but no personalisation of users. The observations are limited to the US- and generally English-speaking-media and its public. The site also presents metrics on how often videos were recommended and by which number of channels. Beyond the undisputedly valuable documentation made by this project, the interpretation of results seems to be especially difficult, as the leading question—‘What does AI want you to see?’—is not specific enough and can be interpreted in innumerable ways. This example shows that big data collections might not work in cases of algorithmic personalisation because they are able to accumulate contingencies rather than provide significant evidence.

When searching for a needle in a haystack, there is no need to make the haystack even bigger—instead, the research could experiment with lesser, more intelligent data in order to analyse the way modern propaganda reaches specific groups and users. The impressive report by Christopher Wylie demonstrates the extent to which the examination of cultural biases played into the creation and dissemination of propaganda around the globe [22]. To stay up-to-date with these developments the research has to incorporate the perspective of culture studies with their competence in the analysis of choices, which are conditioned by such factors as cultural area, traditions and trends or by ideological presuppositions. Furthermore, cultural mechanisms are crucial for the understanding of new digital practices, as they reflect on the pre-requisites of ‘social culturalisation’ which, according to Reckwitz, infiltrate the very core of digital technologies:

Why is it even necessary to talk about culture here? < … > The digital computer network is a culture machine, which means that its technologies are focused on the production, circulation, and reception of narrative, aesthetic, ludic, or design-based formats of culture. We are all familiar with the thesis that the computer and the internet have brought about an information or knowledge society and have led to a proliferation of information and data. This argument, however, is still too deeply rooted in industrial society’s tradition of thought and its culture of instrumental rationality. It fails to see the most influential aspect of the computer revolution, which is the fact that it has impelled the omnipresence of culture and affectivity. ([21], 169)

This points to an essential characteristic of YouTube’s recommender. Its aim is to enhance a user’s engagement, which means to entertain and to keep him at the platform as long as possible. According to Reckwitz, affectation is a basic performative cultural quality (see also [23]), extensively used by social media platforms and referred through research as ‘affective computing’ ([16], 3). A sound example of such investigation is offered in [24], which provides an analysis of sentiment patterns of videos from left- and right-wing YouTube news channels.

While creating lists of potentially engaging videos and ranking them, the algorithm makes pragmatic decisions based on the metrics of which videos were able to affect users with a similar history and profile. Elena Esposito described the logic of algorithmic personalisation as the rationalisation of contingency, meaning the pragmatic situation of information overload for a user, who looks for help and orientation ([25], 235). It is certainly true that ‘behind a rhetoric of convenience, the notion of relieving users from “decision fatigue” masks a political project’ ([26], 4). However, this political project in the face of digital transformation is not censure-driven, as it was in the early industrial age, but is built upon a ‘kind of parasitic use of intelligence’:

 < … > the machine recognises and uses the previous selections of the users who had good reasons for their selections, and followed certain meanings < … > this is a mechanism that uses popularity as a sign of affinity < … > this mechanism is always agile and flexible, because it is not bound by any argument or ontology ([25], 241, my translation).

Hence, the contingency reduction by the algorithm is based on pragmatic rationalisation and prediction of user behaviour rather than on ideological presuppositions. If so, then the problem involves the question of affinity between certain content based on similar user histories and the ability of YouTube recommender to reinforce such tendencies. Due to the lack of the selection ontology and formalised logic, the recommendation lists present a spectrum of possible choices, which varies not only between users, but also between different viewings of the same video by the same user across time. This is why the rationalisation of contingency through the recommendation system results in an output, which in turn holds the risks of contingency for the research.

1.3 The Role of Culture

To understand how cultural factors play into the creation of video recommendations, the understanding of culture has to be examined in more detail. Within the studies of algorithmic systems, culture is mostly reduced to a kind of bias, for example of language, of gender or of origin. A recent example: A study of algorithmic curation of ‘cultural content’ in recommender systems for movies, music and literature focuses exclusively on the notion of ‘cultural citizenship’, establishing a means for the analysis of diversity in recommendations. Despite the very sincere effort to ‘enhance both social integration and pluralistic cultural experience across communities’ ([20], 2), the categories, which are used to analyse ‘underrepresented content’ in three massive datasets, operationalize the notion of culture on a very simplistic level. In fact, it reduces all the complexity of cultural experience to the level of content, which is excluded from the analysis, but automatically considered to be diverse if it fulfils the criteria of gender of its author, genre, or language of origin.Footnote 2

However, the reduction of culture to the level of cultural bias would surprise scholars, who spend decades studying cultural phenomena. For them, culture has to do with the form and the content of human-created artifacts, about complex historical developments and the way people reflect their own being in the world. It can and should be not reduced to certain features, constructed to support statistical findings of research. Its semantics are ambiguous, there are paradoxes and levels of meaning to be revealed through research—a Glasperlenspiel, something of immanent value. It is hard to find a position between substantial research of content and observation of the way this content is presented through algorithmic-driven systems. And yet, we must start the discussion in order to understand how cultural issues are being misused in order to spread propaganda on social media. To achieve this balance, Andreas Reckwitz's cited definition of the "cultural machine" could provide a starting point, considering affectation and engagement as the basic cultural properties that recommender systems aim to achieve, from which reflections on personalisation factors (affectation of/engagement for whom?) can be launched. For cultural studies, this would certainly be a superficial definition of culture, excluding the issue of artistic excellence. However, it still explores the concept of genre as a dynamic between the expectations of producers and consumers of cultural artefacts and narratives [27].

Two years after the intensification of Russian-Ukrainian-War in 2022, this issue became even more urgent than before. The occurrence of the war between historically and culturally related countries might not be a unique historical matter. But the extent to which the different visions of the historical past, narratives of politically and culturally significant events and persons are being used to legitimise the war, makes this event a vivid example of the existential role culture is playing in people’s life. As recent study on the worldwide support for the sanctions against Russia on the basis of the massive collection of Facebook posts demonstrates how the opinion of people across the globe differs significantly [28]. What caused the pro-Russian sentiment to grow over the years is the worldwide dissemination of propaganda, an example of which I documented in the dataset on the representation of Krymnash on YouTube [29] and which I want to discuss in more detail in the present publication. The shutdown of eminent Russian propaganda channels on YouTube in February 2022 made this dataset historical: 90% of documented videos are not available under corresponding links anymore (test on 23.05.23).

1.4 Contested Methodology

A valuable methodological innovation in research is a more or less stable procedure of data collection, recently classified as ‘”sock-puppet” audit approach’ ([30], 1). However, this approach is frequently criticised due to the interpretative decisions and manual curation required at every stage. Generally, it concerns the initial setting of the experiment with a primary video seed. Even when starting with a clean browser history and new IP, a researcher must start with a video in a certain language, at a given time and location (might be changed through VPN but not ignored), in order to collect the recommendations linked to it. These recommendations will never remain stable and identical, even for the same user: Viewing the same video several times results in different recommendations.

This is also true for different users who watch the same video—they receive different recommendations. The chosen topic, its thematic and structural correlations or whether it is trending inevitably affect the data collection. The algorithm is subject to change and performs within the environment, which is itself constantly changing due to the uploading and deleting of videos by users. Every data collection will be biased, every observation limited to language, situation, and time. Every effort to diminish such prejudices by obtaining more reliable information (for instance, within a brief timeframe or by consistent actions) will nonetheless remain biased due to the limitations imposed. It will, quite significantly, overlook the fundamental objective of algorithmic customization, which aims to connect with one particular user, as previously mentioned. Hence, studies on algorithmic personalisation should refrain from overemphasising this factor by collecting large datasets in order to claim objectivity, and start using them in a creative way, involving personalisation of users on many levels.

User personalisation is a crucial and double-edged parameter for data collection. If the personalisation of users—in the termini of Reckwitz, singularisation—goes too deep this can render the data disparate and the results meaningless. On the other hand, missing personalisation has already given cause for criticism in the past, as in the case of [11], (see summary of the criticism on this approach by [31]). A special research design might be helpful in making observations on different personalisation parameters, as seen in [32], which confirm the viewing history as being the most influential personalisation parameter—valuable insight, still unique in this field of research. Consequently, a moderate approach would be to personalise users based solely on user history, by viewing a list of videos with the same opinions. This might be a means of compromise and was used in this study. In the finding sections, I present my observation of six “sock-puppets” to be a good starting point, the outcome however depends on the duration of data collection.

There are also insufficient observations with regards to how the range of the list affects a user’s choice. It is likely that a large portion of users do not know exactly where the auto-play on/off button is, which is turned on by default. Among those who do know, there are certainly many people tired of constantly having to make choices—hauntingly present in our culture—and who consciously do not bother switching auto-play off. For these users, YouTube can function as a broadcaster (see [33]). Taking this into account, the first place plays the most prominent role. However, what can be said for the first three, five or even ten recommendations? As explained in [30], videos ‘at the top of the recommendation tree generally appear to be more significantly more popular, diverse, and less semantically similar to recommendations at the bottom of the tree’ (10)—an observation, which is confirmed in my dataset, but should be studied further due to prospect adjustment of the algorithm.

Further findings of the cited paper, aimed at saving the costs for the research, coincide with my observations during the collection of the dataset, for example the possibility of reuse of ‘sock-puppets’ during the data collection, the dominant impact of recently watched videos and the lack of necessity for the full video to be watched (stage two and three). However, the most cost-intensive parameter of the research of YouTube-recommendations is the annotation of recommended videos. While an automated annotation, based on NLP, can be used to estimate the proximity of certain content,Footnote 3 there is still a need for human assistance to grasp minute differences in position, irony and context. For the present project, it was the most work-intensive factor, due to the fact that it involved the annotation by two independent annotators, both of whom had to watch the videos in full length. As almost the whole dataset was in Russian and Ukrainian, language competencies were essential. This factor is crucial for projects with cross-cultural background, as different topics might have different topic segmentation in different cultural contexts.

Overall, concerning YouTube recommendations, it is hard to construct statistical evidence and to reduce variables while avoiding the construction of the investigated filter bubbles by the restricted settings of data collection. However, finding myself in this situation I feel comfortable as a scholar of culture because the multiplicity of choices is the existential situation when it comes to understanding how content and context play out together. For my first data collection, the aim was to make a preliminary test of methodology and context. The research questions are as follows: RQ1 Are there filter bubbles in the user recommendations? How can they be described based on the content characteristics of videos? RQ2 What parameters of research design (number of users and watched videos, duration of experiment) influence the outcome?

2 Experimental Setting

2.1 First Data Collection

2.1.1 The Context and the Selection of Initial Videos

For my data collection, I took the example of Crimea’s annexation in 2014, which is highly polarising within Russian society. Since the events of the “Russian Spring”, a narrative of the unique qualities of Crimea and its strategic role not only for the Russian military forces, but also for the Russian identity, was continuously developing in the official rhetoric. Stephen Hutchings and Joanna Szostek gave a precise description of how the close nexus between the narratives of Russian nationhood and the legitimisation of the Crimea annexation developed in the course of the events [35]. After the annexation, the question “Whose Crimea?” advanced quickly as the key issue for the internal identification of political actors. The celebration of the annexation in March 2021, for which many thousands of people gathered in a stadium regardless of the risk of infection by the Corona virus, demonstrates that the topic remains vital and is being actively instrumentalised to gain more support for the government in the time of falling popularity.

For the first data collection, fifteen initial videos were manually selected by the author using the YouTube search function. The videos had to be relatively short (under 20 min), seven of them represented a clearly positive attitude towards the annexation, seven evaluated this event negatively, one video represented an ambivalent perspective. Because of the high level of polarization, it was difficult to identify ambivalent videos. The dataset was collected in three phases during 2020, using snowball-sampling.Footnote 4 I adapted the procedure from the DMI Winter School in Amsterdam [31], involving a group of users viewing the same list of videos. In the course of my experimentation, users viewed YouTube videos, and YouTube’s recommendations were collected using the browser extension YTTREX (Initiative Tracking Exposed, basic features and sample of analysis (see [36]), description of performance and data gaps in [29]). As opposed to 10 and 32, I rejected taking many topics for comparison, because I intended to go into more detail with the interpretation of content than a broader cross-topic-comparison would allow. Assuming that the algorithm would select and then rank the content based on the estimation of its engaging qualities, I wanted to collect the maximum number of recommended videos on the topic and then estimate their potential to engage and persuade more users.

During the data collection, user variables were kept as constant as possible for the purpose of uniformity of the research design. In order to anonymise users as much as possible, no log-in into pre-existing user accounts was performed, the history and cache of the browser were cleaned (prior to all stages and additionally during the first data collection: after the first neutral video and after watching the pro-annexation list of videos) and the browser language was set to English (USA). During the first data collection, a neutral video was viewed, and then the browser’s history and cache were cleaned. After this, users watched seven videos with positive evaluations of the annexation. After cleaning the browser history and cache again, users watched the last seven videos with a clearly negative attitude towards the annexation.

After the evaluation of the first dataset proved the existence of a filter bubble on the pro-annexation part of spectrum, the decision was made to collect more data according to the RQ3: How big is this bubble? Is it possible to exhaust its limits? 17 videos for the second and 44 videos for the third collection were selected from the top relevant videos from the previous collection (see overview over experiment stages in Table 1).

Table 1 Overview over the user properties

2.1.2 Annotation and Analysis of the First Data Collection

Data analysis was performed with Python libraries in the Jupyter Notebook environment and with Gephi for visualizing data as graphs. In my analysis, I consequently oriented myself on the relation between watched and recommended video instead of user-video approach, because the properties of users were kept as constant as possible, and the personalisation of users happened solely on the basis of the watched history. The watched-recommended approach sheds light on exactly this personalisation factor, taking the sequences of watched videos as a trigger and the recommended videos as the outcome.

Testing the formal parameters of the collected data, repetitiveness of recommendations was evaluated as a parameter indicating clusters or bubbles. After watching pro-annexation videos, users received more repetitive recommendations (the overall percentage of unique recommendations to the number of observations is lower by 7%). To get a more differentiated view, the number of unique videos across takes in relation to the order of recommended (within the top three and first recommended).

As the graph (Fig. 1) demonstrates, the variability among the top recommendations is bigger in 11–17, when users watched contra-annexation videos. Overall, at the second stage the variability among recommendations is lower than on the third, with the top recommendations being more repetitive except with video 13. Further analysis proved that this relation was not determined by the recommendations of the same authors as of watched videos. In all likelihood such videos will appear below the top three recommended videos. The next hypothesis was that, as the formation of recommendations happens simultaneously to the start of video watch, the channel viewed may impact not the current, but the next list of recommended videos. However, there were even less matches between the author of recommended videos and the author of precedent videos than between the author of recommended and the author of watched videos. This did happen once, at 24, however it is the only example of this in our data. As the users in our collections did not view the video of the same channel more than once, this cannot be verified but remains a possibility in case of different research design.

Fig. 1
figure 1

Number of unique recommendations in percent to the total number of observations across stages of the first data collection, 1 (ambivalent), 11–17 (positive), 21–27 (critical attitude toward the Crimea’s annexation)

As the differences between the second and the third stage could not be explained based on formal properties of the watched videos, there was a need for evaluation of the content. Two independent annotators watched all recommended videos in full length and evaluated such parameters as relevance, evaluation of the annexation, language, topic, and category based on the codebook. In addition, the presence of actual topics, such as the increasing issue of the pandemic and various anniversaries, among them the anniversary of the Crimea annexation, was marked in the dataset.Footnote 5 Although time-consuming, this annotation proved to be useful in facilitating the analysis of the next data collections, because it helped to develop strategies for the automated annotation of the second and third datasets.

Looking at different takes (Fig. 2), it seems that at some stages in watching history, more relevant videos were recommended. Interestingly, in both pro- and contra-settings, the video number six (take 16 and 26) corresponded to the peaks of relevant recommended videos.

Fig. 2
figure 2

Relevance across takes in the first dataset

In the second stage (11–17) the positive evaluation clearly dominates in relevant recommendations (Fig. 3), the ambivalent view is also more frequent than in the third stage (21–27). Negative views on annexation dominate in the third stage, but positive attitudes towards the annexation are more frequent than negative attitudes after watching pro-annexation videos.

Fig. 3
figure 3

Evaluation across takes in the first dataset

These tendencies increase if the first recommended videos are evaluated (Fig. 4).

Fig. 4
figure 4

Evaluation across first recommended videos in the first dataset

While evaluation proves predictable (videos with positive attitude recommended after watching pro-annexation videos and vice versa), a glance at different takes shows that videos with the opposite view appear in the first position as well, although they might be rarer. At certain takes (13, 27) only non-relevant videos were recommended. The tendency is there, but—as far as recommendations are representing possibilities of choice, it is impossible to predict which decision a user will make. However, while looking at unique videos in relation to the view of the event of the annexation, videos with a positive view will be more repetitive, appearing more than five times across recommendations and stages. This situation can be clearly observed in the graph of the first data collection (Fig. 5).

Fig. 5
figure 5

Evaluation and engagement in the first data collection, directed graph (Gephi, watched (Source) to recommended (Target), layout ForceAtlas 2, noverlap, nodes ranking after in-degree between 10 and 50, watched videos as triangles with size 30, colour after evaluation and engagement (see legend)

While the recommendations of the contra-annexation videos on the right upper side of the graph (left) are barely intersecting, recommended videos with positive attitude towards annexation are closely interlinked and build a bubble on the left side of the graph. In case of politically significant issues such as Crimea’s annexation, evaluation of the event goes along with a certain engagement (right graph). The dominance of blue nodes on the left and the red and pink nodes on the right side demonstrates the link between videos pro- and contra-annexation and a certain ideological perspective. On the right side another cluster appears, where contra-annexation videos are framed by recommendations of videos with critical attitudes towards the Russian government and Putin. The evaluation of the video categories indicates the domination of Russian TV productions on the left side. The most recommended videos in this segment are relatively long, professional TV-documentaries. On the right side, professional or amateur blogs dominate, and videos from Ukrainian TV channels are recommended more often than videos from Russian TV.

The evaluation of language annotation demonstrates that the language clearly beats the location. Although users started with a clean browser history and used VPN to move IP addresses to the USA or other countries, only 9 recommendations are in languages other than Russian and Ukrainian, those to whom the topic of the annexation matters (featuring only one video in English among those relevant and one video in Spanish). Although the recommendations in the Russian language prevail (2841), the Ukrainian (31) or mixed-languages (73) are still present despite the fact that videos in Ukrainian were not on the list of those watched. A test of the co-occurrence of evaluation and language in data demonstrates that 100% of Ukrainian and mixed-language recommendations are negative towards the annexation. Among Russian-language recommendations, positive evaluations prevail, but negative and ambivalent attitudes towards the annexations are also present. The ambivalent view is exclusively present in Russian-language-videos. Since Ukraine is a victim of the annexation, videos involving Ukrainian-language speakers evaluate the annexation negatively. After watching videos with positive evaluations of the annexation, users are unlikely to be recommended a video in Ukrainian or mixed. This demonstrates to which extent the language of chosen videos impacts a user’s experience with YouTube, which—in case of physical migration—may lead to a full isolation from the video content in the language of the country the user is located in and can help to hold the cultural and ideological ties with the country of origin.

Different features of the annotated recommendations reveal the opaque nature of ‘echo chambers’ or ‘filter bubbles’: since the YouTube algorithm is not relying on ontology for creating recommendations, the output contains a mixture of content, which might not be relevant for the main topic of watched videos, but is somehow related in terms of type of media, language or political attitude. The detection of a filter bubble is therefore linked to the narrower or broader definition of such bubble (for example alt-right, partisan, leftist etc.). Bubbles might appear or disappear in data if we change the level of the definition. Generally, in the first dataset from the perspective of the media setting and political engagement the bubbles are easier to observe across different topics. In terms of the language, the whole dataset can be classified as a large Russian-dominated bubble with inclusions of Ukrainian on the one side. However, the pro-annexation, relevant videos from Russian TV channels appear to be the most promoted across the whole dataset: they are frequently recommended on the top of all lists.

2.2 Second and Third Data Collections

The second data collection starts from a selection of videos taken from the first one and uses the foremost relevant recommendations, each recommended more than 12 times. Although the second data collection did not start with the same list of videos to watch and with a considerable time interval, we still observe 40% identical results. This demonstrates that the fluidity of YT recommendations is relative. There are obviously certain links that remain sustainable over time. With the help of the annotation of the first dataset, I annotated 23% of the unique videos, 13% of which are relevant. For the automated estimation of relevance, the absence of common tags proved to be a good indicator of irrelevancy. On the contrary, the presence of shared tags does not clearly indicate relevant videos. In the second dataset, positive evaluation of the Crimean annexation prevails.

For the third data collection, the pragmatic decision was to explore the limits of this pro-annexation bubble by collecting recommendations from the topmost relevant recommendations (44 videos, including several watched videos from the second data collection as they were frequently recommended again). In the third sample, my data collections via snowball principle reached certain limits. Using the most recommended relevant videos in order to collect new data would most likely result in the list of watched videos from previous collections.

2.3 Comparison Between Three Datasets

To follow up with the RQ2 during the second and third stages, test samples were formed for the comparison of three datasets. I therefore separated the first test sample by selecting six users from the first dataset and the second by selecting the first seventeen starting videos of the third dataset (see Table 2).

Table 2 Description of the dataset and the test samples

Comparing the percentage values in the last column, we see a range between 19 and 40%. The first dataset and test sample show the largest values, presumably because of the longer period of data collection. In the cases of the second and third datasets, collected within a shorter period, having six users watching the same list of videos shows the same relation between the numbers of unique recommendations in relation to the total recommended.

The comparison between the first dataset and the test sample demonstrates that the decrease in the number of users leads to more disparate recommendations. The evaluation of how the number of users affects the relation between unique recommendations and the total recommendations shows the same tendency across three datasets (Fig. 6).

Fig. 6
figure 6

Relation between the number of users and percentage of unique recommendations to the total recommendations

In all datasets, the relation of unique recommended videos to the total recommended videos remains stable. In case of second and third dataset, which were collected within a shorter period, there are more compact sets of recommended videos. However, even in the first dataset, with longer periods of data collection, there is a similar relation at a higher percentual level. This demonstrates how the data collections are affected by the number of users and duration of time collection. Six users seem to be a reasonable compromise to get a base for comparative analysis without accumulating too much noise (which is a cost factor for the annotation). A longer duration seems to also be preferable for reducing the possibility of influencing the working of the algorithm through repetitive actions. However, this small test on the methodology of data collection is exploratory, there are more tests on different topics and conditions needed.

2.4 Mutual Relationships

The consolidation of three datasets into one makes an evaluation of most robust relations possible. After building and filtering the Gephi graph, we can easily recognise nodes with strong reciprocal correspondence (Fig. 7).

Fig. 7
figure 7

Gephi graph of consolidated dataset from three data collections, filtered after mutual degree, range 3–13. Nodes with negative or ambivalent views of the annexation are marked by green circles

Interestingly, not a single video watched at the first stage of the experiment is to be found in this graph. Not one of these relatively short videos was recommended again frequently enough. To the contrary, watched videos from the second dataset, which were among the most frequently recommended and which were selected as the seed effectively referred back to each other. In this network of mutual relations, only five recommendations (marked with green circles) represent negative or ambivalent views of the annexation. They are linked to the dominant videos in the graph, which promote the positive view of annexation. Visible nodes represent a selection of videos from the list of 76 watched (otherwise, the videos would not build mutual pairs). A look at the titles shows that long documentaries from Russian TV channels prevail.

Generally, the abundance of videos about Crimea’s annexation on YouTube might only be an impression caused by recurrent recommendations. The evaluation of three data collections shows the intrusiveness of Russian propaganda videos with prevalent positive attitudes towards the annexation. They effectively pass the ball to each other, while videos of independent bloggers with negative evaluation of the topic do not build strong clusters of relevant videos. Another interesting factor is the dates of video upload. The plot below (Fig. 8) shows peaks of relevant video publications throughout 2014, 2015 and 2016 and, after this, on annual anniversaries of the event in March with the biggest peak at the 5 year-anniversary in 2019.

Fig. 8
figure 8

Upload dates of relevant unique videos in consolidated dataset of three data collections

Corresponding to the peaks of interest on anniversaries, TV documentaries were uploaded after being broadcasted on TV and fulfil a ritual function, while adding little new information to the understanding of the event. The investment from the side of the Russian government in a series of similar TV propaganda productions unfolds its full potential on YouTube. After being uploaded, they became available to the global public, build a chain and relate to each other via recommendations. This way, they promoted Russian propaganda messages around the globe till the shutdown of propaganda channels on YouTube in 2022. In May 2023, only 24% of the recommendations, 10% of unique recommended are still available.

3 Interpretation of the Outcome

In my analysis, I identified a cluster of recurrent videos based on mutual references with the strong dominance of pro-annexation videos. I started from a list of videos with different attitudes towards the annexation and included contra-annexation videos into the lists being watched. For subsequent data collection, the relevant recommended videos, which appeared at the top of the list, were selected to simulate the situation in which users look for the information about the annexation, picking only relevant videos from the list of recommendations. In the progressive data collection, recommendations of pro-annexation videos flooded the dataset, being the most recommended across datasets.

To interpret these results, we should consider the interpretative choices which were made during the experiment. Every take started with a video, which was selected by the criterium of relevance, whether from the search or from the range of previous recommendations. This behaviour corresponds with the situation when a user is looking for information concerning certain topic and decides to pick the most relevant. Therefore, the observed results are influenced by this decision, creating a filter bubble of a certain kind, which most likely do not appear under different conditions, for example if random videos were chosen or if videos were irrelevant to the topic, but frequently recommended videos were included in the seed.

Although restricted to these specific experimental conditions, there are still remarkable findings within the results. First, it was possible to identify numerous videos relevant to the topic, but predominantly with a positive evaluation of the Crimea’s annexation. Second, the number of these videos were still limited, while reciprocal recommendations took place. The time setting of the experiment, with longer pauses between three stages, makes these reciprocal relations between watched and recommended quite interesting. There is obviously a kind of stable relation, which make videos with similar content relate to each other, although the algorithm does not evaluate the content on the base of an ontology.

In the case of Crimea’s annexation on YouTube, most promoted videos across the collected dataset are long and professionally assembled documentaries, representing positive attitude towards the annexation and with only few of them critical about the event. One fact to highlight is that these videos, produced by Russian broadcasting companies, were available, while the choice of relevant content on the opposite side was small. This disproportion can be explained from the findings of the research demonstrating how Krymnash-euphoria was developing within the Russian society and being used by political actors. Since the events of the ‘Russian Spring’, a narrative of the unique qualities of Crimea and its strategic role not only for the Russian military forces, but also for the Russian identity, was continuously developing in the official rhetoric. Mikhail Suslov delivered a concise report on media debates surrounding the event of the annexation and on the rising tendency to use the slogan ‘Crimea-is-ours’ in the sarcastic sense [37]. According to Suslov, the latter holds an example of how ‘humour “ceases to struggle”’, while sarcasm points at a ‘perceived lack of agency’ and at the ‘tradition of staging one’s powerlessness and alienation from politics’ by the bloggers with views opposing the mainstream ([37], 601–603). As demonstrated above, contra-narratives are not closely interlinked and might be regarded as less powerful, because they rather relate to the anti-Putinist agenda and not to relevant contra-annexation videos. I would, however, not claim the lack of agency on the contra-annexation-side. There are several productions with high view counts, which are critical about annexation and are recommended across the dataset. Obviously, in the end, the choice rests with the user.

A user with an affinity to the topic can easily identify the relevant videos, but mostly those with positive attitudes towards the event of annexation, because the algorithm evaluates them as being engaging. To examine this result objectively, it is recommended to consider it within the framework of analytical philosophy's central concepts of singularisation and pragmatic reduction contingency, as discussed earlier. Affectation does not exclusively mean transmitting of the simple emotional ‘yes!’ or ‘no!’ to the user. It is a rather complex process of presenting attractive form and content. The engaging qualities of pro-annexation videos are enforced by their high inner complexity—a key feature of singularities according to Reckwitz. Crimea’s advancement to status of the ‘Russian national fetish’, as outlined by Constantine Pleshakov in ‘The Crimean Nexus’ ([38], 93–111), provides a benefit to the creators of the pro-annexation content. While the creators of contra-annexation content can hardly add something to the simple fact of the violation of the international law, the pro-annexation videos develop narrative lines from the alleged baptism of Prince Vladimir in 988 in Kherson, through the war with the Crimean Khanate, the Crimean war and the transfer of Crimea to Ukraine by Nikita Khrushchev, through to the event of the annexation. From this point of view, the present conflict is being regarded a continuation of former wars in the region of Crimea, literally as the ‘same war’. Even the criticism of Crimea’s annexation is classified as the ‘traditional Russophobia’, the continuation of the spreading of negative images of Russia in the West from the past into the present—a broad topic, which became extremely popular on the Russian TV during recent years (about the ironical usage of this striking term see [39]).

Therefore, the example of the annexation of Crimea illustrates how algorithmic personalisation as user singularisation towards content, which also exhibits singularisation characteristics, can be highly sensitive. YouTube can be viewed as a perfect marketplace in which 'cultural' and 'automated' singularisation (21, 176) converge and interact. The network of complex messages and associations around the simple—and unfortunately not unique—issue of military aggression results in a culturalised narrative with engaging qualities, that makes such productions a perfect fit for the medium of YouTube. On the platform, they promote the complexity of the question and create an information overload for a single user, who will most likely not receive suggestions with a different attitude, unless he consciously searches for them himself. Overabundance of the redundant details makes these productions indistinguishable: After watching several videos of this kind, users are unlikely to remember the argumentation line or differentiating details, as they are similar in form and content. This way, the network of videos continuously performs the culturalisation of the issue and creates a strong hermeneutical-narrative framework, in which the singularities of the Crimea, the event of the annexation and Russian identity are supporting each other.

The complexity and opacity, upon which this foundation of meaning is based, serves to obscure the essence of the case and makes it highly meaningful to the portion of the Russian population which painfully experiences the lack of national identity inside and beyond the Russian borders. Looking for information on Crimea’s annexation on YouTube, such users might look for different interpretations than those which are current in their countries of residence. Viewing a few videos in Russian configures the virtual environment of users to a more than 90% Russian-language environment. This exposes them to Russian propaganda, against which they possess little resistance in comparison to the users who are living in Russia and experience not the virtual, glorifying image, but the reality of living in Russia.

4 Conclusion

The paradoxical coincidence between the development of a ‘hyper-globalised media environment’ and the emergence of ‘recursive nationhood’ in the post-Soviet space, which was observed by Steven Hutchings ([40], 126) inside the algorithm-driven space of YouTube, takes the form of the nexus between cultural and automated singularisation, which after Reckwitz gives pre-requisites for the establishment of digital neo-communities ([21], 188–191). Examinations into the global dissemination of Russian propaganda could thus benefit from the involvement of cultural studies, with its expertise in analysing the interplay between content and context, as well as in examining migrant communities and their situation in the global cultural context. (on the role of language factor in establishing a Russian cyber empire within the Post-Soviet cyberspace see [41]; on the nexus between the use of Russian language and the construction of national identity in migrant communities see contributions in [42]).

However, this relationship might not be straightforward and can hardly be grasped in terms of exposure of certain groups to certain content, as implicit within the paradigm of mass-media communication from the broadcasting era. Based on the recommendations lists, it is extremely hard to make such conclusions, because the in- and the output of AI-generated recommendations would always be linked to a certain behaviour of a user and invoke a comparison with the choices of numerous other users behind the scenes. This complex relationship requires a shift in our understanding of communication between humans and algorithms. As Elena Esposito claims in her recent publication, AI is to the lesser extent a matter of ‘intelligence’ as of ‘communication’ ([43], 2). ‘Artificial communication’ changes the way we perceive communication as it means not a transmission of a message from A to B, but only B to classify the message as something meaningful ([43], 7). According to Esposito, artificial communication frequently takes the form of lists or rankings ([43], 19–43), as in case of YouTube recommender algorithm.

Related to the case of YouTube recommendations in a broader and to the case of “Krymnash” in a narrower sense, the artificial communication takes place within a certain context. The expertise of cultural factors and language helps to configure the primary settings as well as to interpret the output. Starting to ‘speak’ with the algorithm, we must make it in a meaningful way and get meaningful results. However, these results mean something in a limited setting, as they are singularised and unique in form and time. Having a range of six or more users with similar configuration helps to reduce the contingency and to evaluate the more frequent occurrences within the range of possibilities. However, the choice of ‘real’ users outside of a modelled situation still needs to be examined in an empirical setting and might be a prospect for conducting further research.