The Enemy Hates Best? Toxicity in League of Legends and Its Content Moderation Implications

In recent decades, many sectors of our society have been digitized, and much of our life has moved to cyberspace, especially in terms of entertainment. Users meet, relate, and cooperate in the new public space that is the internet and form digital communities. Video games play a leading role in the formation of such communities. However, these communities also present antisocial behaviors, ranging from disruptive actions to harassment and hate speech. Such behaviors, encompassed under the umbrella term toxicity, are a major concern for both users and those in charge of moderating these spaces. This article focuses on toxicity in today’s leading online video game League of Legends. Three hundred twenty-eight matches were reviewed using a system of two judges to study the prevalence of these problematic behaviors. We find that 70% of matches were affected by disruptive behavior. Nevertheless, only 10.9% of the analyzed matches were exclusively affected by downright harmful behavior. In our view, the results have relevant implications for content moderation policy that are also addressed in this paper.


Introduction
According to the International Telecommunications Union (ITU) (2021), the percentage of people using the Internet is already at 90% in countries that are considered developed by the United Nations; coverage is 87% in Europe and 81% in North America. Globally, 63% of the world's population has access to the Internet. The Internet has thus become 1 3 part of our daily lives. While watching television is still the main leisure activity for Americans, the time spent on computers and video games is ever-increasing, especially among the younger population (Bureau of Labor Statistics, 2022). It has been estimated that, on average, North American teenagers spend between 4 and 6 h a day on digital media (web navigation, social media, gaming, texting). This figure has been steadily growing over the past two decades and has been accompanied by a decrease in the consumption of traditional media in the same age bracket (Twenge et al., 2019). This displacement has not only been observed for media consumption. Other areas like shopping have seen a shift from physical interaction to online transactions, thus reducing the time invested in traveling to make purchases (Le et al., 2021).
With digitization, new spaces for human interaction have emerged (Lupton, 2015). Given its degree of adoption, the Internet can be considered a new type of public space, which is in turn composed of different cyberplaces (Miró-Llinares & Johnson, 2018) where subjects meet and form communities. Some authors suggest that these communities can act as third places (Steinkuehler & Williams, 2006), neutral grounds where individuals are free to come and go as they please with no obligations and little involvement with other participants (Oldenburg, 1999). Indeed, the creation of these spaces, including those that are part of video games, may be protected under freedom of expression (Brown v. Entertainment Merchants Assn. et al., 564 U.S. 786 2011). It has been argued that players' speech may be safeguarded by this right as well (Balkin, 2004;Jørgensen & Mortensen, 2022).
However, the Internet also facilitates the dissemination of harmful content (ECHR, Delfi As v. Estonia § 110). The Internet does not discriminate between users' expressions (ECHR, Engels v. Russia § 30) and can be used to offend and harm others, just as any other technological tool. Indeed, online harassment, revenge porn, and defamation have all been matters of concern and have triggered discussions about how to tackle malicious speech without hampering the benefits of online communication (Citron, 2014).
The perturbing history of "a rape in cyberspace" shows that such behaviors have existed in online game communities for a long time (Dibbell, 1994;Suzor, 2019). Other controversies, such as Gamergate, exemplify how video game communities can create systematic harassment toward individuals that attempt to make video games more inclusive for women (Salter, 2017). Professional players also exhibit problematic behaviors. A relevant number of renowned players in the Esports sphere have been suspended for toxic behavior (Tseng, 2020). Riot Games, the owner of League of Legends (LOL), has banned entire LOL teams for similar reasons (Plunkett, 2013). Such problematic behaviors in video game communities have received growing academic attention over the last years (see: Mora-Cantallops & Sicilia, 2018a;Shen et al., 2020;Beres et al., 2021;Canossa et al., 2021).
In this paper, we will attempt to contribute to this body of knowledge by measuring the prevalence of toxic behavior on 328 competitive LOL matches. To this end, we offer a definition of toxic behavior, also known as disruptive behavior, creating a tailored taxonomy for LOL that discriminates between conducts that affect different values or interests. The results show that 70% of the games were affected by some kind of toxic behavior and that 30% of all players who participated in them committed some kind of toxic behavior. However, the most serious behaviors (hate speech, threats, etc.) are very rare, with most of the toxicity coming from insults or complaints about the performance of teammates. As addressed in the discussion, the prevalence and characteristic of the analyzed behaviors might have important implications regarding the moderation of these spaces by game developers.

Online Social Gaming: League of Legends
The video game industry has benefited the most from the digitalization of society. In fact, in 2020, this industry generated around 177.8 billion dollars (Wijman, 2021). Video games have also changed, despite being eminently digital. They have gravitated toward the Internet and have, in turn, become more social (Rimington et al., 2016). The gamer's experience has ceased to be marked by isolation and has now got an eminently social character (Yee, 2014). This means that players experience various forms of in-game interaction (i.e., within the game itself): They are part of digital communities, travel through different platforms (Twitch, Youtube, specific forums, etc.), and engage in professional competitive environments (Esports). All video games past a certain threshold of popularity generate fan communities. However, games with high in-game interaction tend to generate especially rich communities and have drawn the interest of researchers for more than a decade. This is most notably true of massively multiplayer online role-playing games (MMORPG) (Cole & Griffiths, 2007;Johnson et al., 2009;Schiano et al., 2014;Shen et al., 2020;Yee, 2006).
In recent years, multiplayer online battle arena (MOBA) games have received a lot of attention, both among players and academics. This particular gaming genre is a subgenre of real-time strategy games in which two teams, usually formed by five players, face each other (Mora-Cantallops & Sicilia, 2018a). These games emphasize competitive gameplay and rank players according to match results, demanding a high level of strategic and mechanical skill (Johnson et al., 2015;Kou et al., 2018). These elements require players to get both in-game experience and acquire knowledge outside the game (e.g., forms of play, terminology). Players also agree on the most efficient tactics available, also known as the metagame or meta (Donaldson, 2017), resulting in a rich social exchange taking place both inside and outside the game.
League of Legends is the main exponent of contemporary MOBA. Developed by Riot Games, the game exceeded 180 million active users in October 2021, according to developer data (Zaragoza, 2022). As stated above, this type of video game transcends its medium. In fact, League of Legends is the most-watched video game on the streaming platform Twitch since 2019, both in terms of users and watch time (TwitchTracker, 2022). In addition, LOL has an intensely competitive landscape. The World Tournament achieved an average audience of around 30 million viewers in 2021, and its final match peaked at around 73 million viewers (Liber, 2021).

The Issue of Toxicity
As mentioned above, League of Legends is a team game where a matchmaking system assigns users to one of two teams (Mora-Cantallops & Sicilia, 2018b). Players are selected based on a numerical score, called MMR (i.e., matchmaking rating). The MMR considers previous game results in order to balance the gameplay experience for players. However, the algorithm on which the MMR is based is not publicly available, and players cannot consult their MMR score (Kou et al., 2018).
In addition, the game has a strong tactical component, so players must communicate with each other. This necessity increases as players' skill levels progress (Monge & O'Brien, 2022). There are two main tools to satisfy this need: during character selection, a waiting-room style chat is used to discuss strategic choices; during the actual match, a small chat window on the game interface allows players to give tactical commands. While these tools may allow for cooperation between team members and even rivals, they are also the vehicle for toxic behavior. These can range from complaints about other players' performance to insults and harassment (Neto & Becker, 2018).
League of Legends users frequently complain about other players' behavior, especially their teammates' (Grandprey-Shores et al., 2014;Neto & Becker, 2018). Complaints occur most often in a competitive game mode called ranked, where game results earn or lose player points (LP). These points determine their invisible ELO 1 and their visible player rank (iron, bronze, silver, gold, platinum, diamond, master, grand master, and challenger) (Kou et al., 2018). In the community, a player's skill level is expressed publicly by their rank. Moreover, players who reach the challenger rank are considered hirable by semi-professional teams. In this competitive context, players are concerned about the toxic behavior of their peers, due to player experience deterioration and its connection to negative results (Grandprey-Shores et al., 2014). In addition, Riot Games has expressed its concern about these problematic behaviors on several occasions (Burrell, 2020;McWhertor, 2012).
Although complaints are numerous, there is no consensus on the definition of toxicity in the literature (Kordyaka et al., 2020;Kou, 2020). Toxicity is mainly used as an umbrella term, describing a wide range of negative behaviors, including harassment, griefing, and cheating (Adinolf & Turkay, 2018). These behaviors, in addition, highly depend on the immediate and cultural context (Beres et al, 2021). Toxicity has been defined as "the use of profane language by one player to insult or humiliate a different player in his own team" (Märtens et al., 2015) and "a denominator for aggressive and abusive interactions or relationships, both online and offline" (Deslauriers et al., 2020). As stated above, toxicity has been linked to the term griefing. A griefer can be described as a "player who derives their enjoyment from performing actions that detract from the enjoyment of the game by other players instead of just playing the game" (Mulligan et al., 2003). The Fair Play Alliance (henceforth FPA) has focused on analyzing "disruptive behavior" and abandoned the term toxicity. This term comprises a range of different conducts that mar a player's experience or a community's well-being (FPA, 2020). In addition, disruptive behavior "refers to conduct that does not align with the norms that a player and the community have set". In a similar way, some scholars (Boudreau, 2019) support the use of the term transgressive gameplay, to define the acts against the norms of the community.
Toxic (or disruptive/transgressive) behavior is a broad term that encompasses disapproved actions affecting the gameplay. However, the definitions above show that in-game mechanics and norms shaped by developers and players are key to concretizing this issue (Deslauriers et al., 2020;Foo & Koivisto, 2004;Jørgensen & Mortensen, 2022;Kou & Nardi, 2014). While common expressions like insults or taunts are presumably disapproved in most communities as well as in the offline world, actions can also disrupt other players' experience in specific games (Busch et al., 2015). For instance, abandoning a match before it has ended affects the other users' enjoyment of MOBA, because teams are likely to underperform if a member is absent (Canossa et al., 2021). In comparison, the log-out of any given player in an MMORPG does not represent a considerable gameplay disruption, since players are free to interact with the game world alone (e.g., Fallout 76, World of Warcraft). Hence, a viable definition of toxicity for its detection in LOL must be tailored to this game specifically.
LOL's End User License Agreement (EULA) broadly defines conducts that are deemed unacceptable by Riot Games. The EULA punishes a wide spectrum of behaviors. Some of them could be categorized as "violent communication" (Miró-Llinares, 2016) and even be subject to criminal prosecution in most European criminal legislations. 2 This is true, for instance, of the rules sanction "harassing, stalking or threatening other players or Riot Games employees" and "transmitting or communicating any content which Riot finds offensive to players" (Riot Games, 2021).
While most rules echo norms of state criminal codes, some behaviors sanctioned in the EULA are tailored to the necessities of the game. Riot forbids any technical tool that enables players to cheat, for example by installing third-party mods or hacks. Furthermore, LOL players are punished for throwing (i.e., ostensibly giving up on the match), disconnecting, or leaving a game.
The EULA is detailed and expanded upon in the reporting system, which Riot establishes to help players notice deviant behaviors among users (Riot Games, 2013). Players can currently be reported for the following reasons: • Insulting, harassing, or offensive language directed at other players • Hate speech such as homophobia, sexism, racism, and ableism • Intentionally ruining the game for other players with in-game actions such as griefing, feeding, or purposely playing in a way to make it harder for the rest of the team • Leaving or going AFK at any point during the match being played • Unnecessarily disruptive language or behavior that derails the match for other players • Inappropriate summoner names Besides, Riot clarifies that the following actions should not be considered deviant behavior: • Playing poorly but still trying to win • Strong language that does not insult or demean other people • Choosing unusual champions, building unusual items, or experimenting with new ideas that do not match the current meta Reporting is implemented inside the game through seven categories that are rather similar in their definition. These are displayed to players on the reporting screen with little to no context, leaving it up to them to interpret or research their definitions on the Riot Games support website. As Table 1 shows, three of these categories refer to in-game, non-expressive behaviors (i.e., leaving the game, intentional feeding, and cheating). Verbal abuse, hate speech, and offensive or inappropriate names, on the other hand, are behaviors based on verbal (i.e., written) expression. Finally, negative attitudes can be understood as a hybrid category that encompasses both in-game actions (e.g., griefing) and expressions related to the poor course of the game, demeaning a team's performance.
Considering the possible consequences of players' actions, we can distinguish "disruptive" 3 from "harmful" behavior. Actions such as going "AFK," "intentional feeding," "negative attitude," and "cheating" are disruptive in the sense that they can negatively affect a player's gameplay, that is "the actions performed by the player when involved in a challenge" (Guardiola, 2019). Since gameplay is the result of the "emotionally-charged interaction between the player and the game components" (Guardiola, 2019), the highlighted behaviors can result in making the game too challenging for the affected player, breaking the balanced sense of struggle that games should pose to be enjoyable (Costikyan, 2002).
Harmful behavior, on the other hand, is defined as behavior that can cause "significant emotional, mental or even physical harm to players or other people in the player's life such as family and friends" (FPA & ADL, 2020). Some of these, like hate speech or insults, are already sanctioned by state criminal codes or tort law and are not protected by the principle of freedom of expression contemplated in the European Convention of Human Rights (ECHR, Feret V. Belgium, § § 75-78). Even if their particular characterization as harmful can be discussed (Feinberg, 1985), the term, as introduced by the FPA, helps to highlight a range of conducts that surpass the degree of seriousness that disruptive behaviors entail. This term, therefore, encompasses speech or acts that are (a) likely to offend others (Feinberg, 1985), (b) constitute a direct or indirect incitement to violence or (c) "any behavior that may be considered offensive or demeaning to society even if it is not directed at a specific person" (Miró-Llinares, 2016).
The degree of prevalence of this wide range of conducts is unknown despite being problematic for Riot. According to a recent update by Riot, "only 5% of players are consistently disruptive." This means that this cluster of players is the most complained about and the remaining users only "get tilted every once in a while" (Timttamoster, 2022). This statement contradicts Riot's lead designer's previous claims, who had stated that most toxic behavior was attributed to average players instead of trolls (Maher, 2016). In addition, the relevant literature has not been able to paint a precise picture of toxicity in LOL. Based on datasets extracted from the game DOTA, a MOBA and LOL predecessor, Martens et al. (2015) developed an automated system for the detection of toxicity in chats of multiplayer online games. The study focused mainly on the detection of toxic verbal expressions, excluding other antisocial behavior such as leaving the game or more complex behaviors. Despite this restriction, the authors find that at least one toxic expression is present in 63% of the analyzed games' chats.
A similar methodology was used by Kwak & Blackburn (2015) to approximate the characteristics and distribution of toxic expressions throughout matches, in this case, based on data from LOL. Their automatic detection system allowed them to appraise how certain toxic terms were more likely to appear as the game progressed. Moreover, they observed the importance of complaints and insults regarding the performance of other players on the team. This last conclusion coincides with what Neto and Becker found (Neto & Becker, 2018) using a similar database but through their own topic modeling system. Their database, however, consisted of games in which a player had previously reported another user.
In 2019, ADL carried out a US national survey about the social interactions and experiences of video game players. The study found that "nearly three quarters (74%) of online multiplayer gamers have experienced some form of harassment in online multiplayer games." In the case of LOL, ADL reported that "three-quarters of League of Legends players had also experienced in-game harassment, with 36 percent experiencing frequent harassment" (ADL, 2019).

Content Moderation in Videogames: Riot Facing Toxicity
Online communities are governed by large platforms and intermediaries, which detect, assess, and intervene in users' speech in an effort to provide content moderation (Gillespie et al., 2020). Content moderation requires an infrastructure that coordinates human moderators and artificial intelligence systems to enforce platform rules. This complex infrastructure allows the members of these communities to steer away from harmful or illegal content. However, it has sparked a debate on the types of content that should be removed from platforms and the overall accountability of these measures (Busch et al., 2015;Gillespie, 2018;Keller, 2018;Suzor, 2019).
Content moderation is present in online multiplayer games. Following Balkin's early thoughts, game developers control user speech in at least two ways (2004), through game code and an End User License Agreement (EULA). Firstly, in accordance with Lessig, the architecture of virtual space is a factor that shapes users' behavior (1999). This particularly informs verbal expressions in gaming. Game design can affect user choice by preventing players from performing malicious acts, like murdering NPC children (e.g., The Elder Scrolls: Skyrim, Red Dead Redemption II) (Jørgensen & Mortensen, 2022). Furthermore, when chats are deployed to enable users to text each other, filters can be introduced to prevent players from using certain words 4 (e.g., LOL).
Secondly, games affect player attitude through a binding agreement that must be signed before entering, known as the End User License Agreement (EULA). EULAs often contain community guidelines that identify the community's values. They constitute legal grounds for account termination (Suzor, 2019). Moreover, the enforcement of EULA protects developers from vicarious liability due to potential tort claims regarding content posted by their users (Fairfield, 2009). 5 In addition, keeping their spaces safe helps sustain and grow communities (ADL, 2019; Sparrow et al., 2021). In this sense, if players experience harassment or other variants of toxic online behavior, they may abandon the game, which goes against developer interests (ADL, 2019). That said, enforcement of content moderation policies also decreases the player base, at least affecting the players that the measures target for being toxic (Sparrow et al., 2021). Content moderation involves the application of a rule system and a series of sanctions for those who infringe on community rules. The most known sanction is the suspension of a player's account, also dubbed banning. A LOL account may be suspended when the EULA is breached. However, these rules are not defined through an exhaustive list of examples of inappropriate behavior (Riot Games, 2021). While banning may be necessary in extreme cases, the industry has highlighted some problems associated with this measure, regarding disproportion and possible damages to freedom of speech (Balkin, 2004;Chelsea, 2017;Meehan, 2006). Consequently, there are alternatives to banning, such as temporary bans or cooldowns before being able to join a match again, which have proven to be remarkably effective in other contexts (Matias, 2019;Lewington & Committee, 2021).
Defining, establishing, and enforcing punitive measures is challenging and once a game manages to build a substantial player base, the scale of necessary moderation activities increases significantly. This can pose an unmanageable workload to Trust and Safety teams. The industry often claims that it is impossible for human moderators to control chat feeds and other behaviors due to the quantity of content that needs reviewing (Sparrow et al., 2021). This was one of the reasons why Riot deleted the LOL Tribunal, as they considered it to be "slow and inefficient" (Draggles et al., 2018). Instead, Riot has focused on developing AI tools that can carry out most moderation tasks. Even though Riot's terms of service are not explicit on this matter, we can presume that AI systems conduct all moderation tasks, and human moderators focus on reviewing player reports and appeals. However, it is not likely that these tools will replace human intervention completely (Zachary, 2019). AI systems to require human feedback to be improved and continuously trained (Lewington & Committee, 2021). In addition, ethical concerns are voiced from outside the industry, claiming that AI systems are not suitable to detect the more ambiguous offenses, which require a deep knowledge of the context (Duarte & Llansó, 2017;Shenkman et al., 2021).

Data and Method
During April and May 2022, ElmilloR and Riot Games organized the "SoloQChallenge," a League of Legends competition that involved Spanish streamers familiar with LOL. Seventy-four players across two categories partook in the competition. Categories were based 5 According to Sect. 230 of the Communication Decency Act from the USA, game developers would be exempt from this type of liability in most cases. But according to the legal framework of the European Union, game developers can be held liable for user comments if they do not comply with the E-commerce on the players' previous experience in the video game. Two player categories were created: low Elo for streamers with relatively little experience in the videogame (max. platinum division) and high Elo for professional and ex-professional players and streamers specialized in League of Legends who play in the four highest divisions, that is, the 98th percentile of players in Europe (OP.GG, 2022). Riot provided each streamer with a new account, without an MMR or previous history, but at the necessary level to be able to play matches competitively. Participants had to use these accounts to play and stream ranked games to gain LP and move up through the divisions. The player with the highest score within their category was declared the winner of the competition.
Participants in the competition did not play against each other, but against other players playing ranked games. Their level of experience can be assumed to be comparable due to the division system. The equilibrium of matchups makes the SoloQChallenge matches very similar to those that players encounter normally. In addition, the absence of MMR and account history in the competition eliminated matchup bias and fostered free player encounters. Given the availability of the matches in VOD format, the SoloQChallenge was a unique opportunity to measure the prevalence of toxic behaviors in LOL.
Our sample is composed of game evaluations from the low Elo category, based on the assessments of two judges. Judge number 1 reviewed the first 10 games of 31 participants and the first 9 games of 2 of the participants, totaling 328 games. Judge number 2 followed the same selection criteria and analyzed 198 games in total.
Both judges had previous experience with the video game and watched the matches independently. 6 They annotated verbal behavior according to the coding scheme (Table 2) using the left-column descriptors as category tags. To perform this task, they were provided with a table of behavioral categories as a template and a list of videos to watch. They were also provided with a definition of each of the categories and were instructed on their application. The judges also had to collect the results of the game and information regarding users who had exhibited any categorizable behavior, as well as the minute of the video in which the behavior had occurred. This allowed the researchers to review and verify annotations. To determine inter-judge agreement for the 198 games analyzed by both judges, we calculated Cohen's Kappa index (Cohen, 1960) for both the presence of problematic behavior in a given match (K = 0.74, p-value < 0.001), and whether the game was affected only by harmful behavior, only disruptive behaviors, or both types of behavior (K = 0.64, p-value < 0.001). Both coefficients yielded substantial inter-judge agreement (Altman, 1999) -in the first case, close to excellent agreement (Fleiss et al., 2003). We posit that this high level of agreement speaks to the quality of the evaluations, even in cases where only assessments from judge 1 are available. To improve data reliability, one of the researchers acted as a third judge, deciding in case of disagreement between the judges. Twenty-two of the actions noted were recategorized by the third judge, generating the final dataset on which data analyses were executed.
The categories were created by the researchers based on Riot Games' reporting categories and the work of Neto & Becker (2018), who identified the following main categories of negative behavior in the LOL game chat through topic modeling: complaints, arguments, insults, and taunts. Consequently, modifications were made to the company's report categories (Table 2, right column). Firstly, the category "inappropriate or offensive names" was not analyzed, as it was not possible to find a solid definition of what should be understood as Table 2 Annotated verbal behavior according to the coding scheme a Hate speech is defined as "the use of one or more particular forms of expression -namely, the advocacy, promotion or incitement of the denigration, hatred or vilification of a person or group of persons, as well any harassment, insult, negative stereotyping, stigmatization or threat of such person or persons and any justification of all these forms of expression -that is based on a non-exhaustive list of personal characteristics or status that includes 'race', colour, language, religion or belief, nationality or national or ethnic origin, as well as descent, age, disability, sex, gender, gender identity and sexual orientation" Analysis category such. Similarly, the category cheating was not contemplated as it was considered highly difficult to determine whether a player was cheating by mere visual inspection of the matches. For the same reason, the category "griefing" was not used for content evaluation. The category negative attitude was specified to refer only to complaints about other players' performances. The updated report categories are summarized in Table 2 (left column). We included a category labeled "other disruptive behaviors" so that judges could record obvious disruptive behavior that was not stipulated by the coding scheme. Secondly, relying on the taxonomy of violent and hateful communication developed by Miró-Llinares (2016), the report category "verbal abuse" was divided into two categories: "Insults" are defined as expressions that affect players' reputation and involve the use of swear words, while "wishes for death or serious harm" elevates the threshold of the seriousness of the conduct considering the reference to physical harm.

Results
We detected some type of problematic behavior in 229 out of 328 matches (70%). No behaviors of interest were found in the remaining 30% of matches. The analysis of different types of behavior shows that 45.9% of matches are affected exclusively by disruptive behaviors, while 43.2% are affected by both disruptive and harmful behaviors. We find that only 10.9% of the analyzed matches are affected by harmful behaviors exclusively (Table 3,

right column).
Zooming in on the individual players, we find that 29.6%, that is 398 out of 1343, players committed some kind of disruptive behavior during the matches. Note that 72% of reported users lost the game and only 28% ended up winning. The analysis of different types of behavior shows that most of the reported players (59%) only engage in disruptive behavior. However, 25.1% engage in harmful behavior and 15.8% engage in both types of behavior (Table 3, left column).
Disaggregating these categories into specific behaviors according to our coding scheme (Table 2), we find that complaints about teammates' performance, as a form of mere disruptive behavior, are the most frequent problematic behavior, both across all matches (52.4%) and all players (18.3%) ( Table 4). More serious harmful behaviors, such as hate speech or death wishes, have relatively low frequencies. They occur in 3.4% and 7% of all matches respectively and are uttered by 0.8% and 1.8% of all players respectively. Insults as a form of harmful behavior, on the other hand, have the second highest frequency overall. Insults occur in 34% of all matches and are uttered by 10.2% of all players analyzed.
Players can not only perform different types of toxic behavior within the same game but can also perform the same behavior on multiple occasions. AFKing a match is an exception to this because players can only leave the match once. Also, behaviors that have to be reiterated to be considered toxic cannot be covered by this analysis (see "Limitations"). The corpus contains a total of 805 individual instances of toxic behavior (complaints, insults). Complaints are the most frequent type of behavior (n = 463), followed by insults (n = 218). On average, reported players utter 2.05 toxic expressions, resulting in a low count of toxic behavior per player per game. The average number of insults uttered per reported player is 1.59, with only 10% of users exceeding 3 insults in a given match, indicating that the proportion of players who perform multiple toxic behaviors is small. Toxicity seems to be evenly distributed across games. Twentyfive percent of matches contain more than 5 toxic actions, while 3 toxic behaviors

Limitations
Although our study aims at the highest epistemic and methodological rigor, certain limitations in its design must be acknowledged. Firstly, Riot Games do not provide detailed definitions for reportable behavior in their community guidelines. The academic literature presents a similar problem as it is not unanimous on clear-cut distinctions between different types of behavior. While we have tried to define our categories for analysis as solidly and transparently as possible, these terminological indiscretions negatively impact the internal validity of our findings, because we have had to establish our own boundaries following the definitional criteria explained above. Consequently, changes in conceptualizations or the coding scheme are probable to lead to slightly different research results. We hope to mitigate these shortcomings by grouping the different categories into two more coarsely defined, and thus more easily distinguishable, macro-categories.
Secondly, it should be acknowledged that the data obtained is highly dependent on the sensitivity and specificity of the judges regarding toxic behavior. While we introduced inter-judge agreement measures to control for stringency in the scores, some variables present more categorical problems regarding classification. For example, behaviors such as abusive pinging (in the category of complaints) have to occur repeatedly to count as such and be reportable. Some users complain throughout the matches in such a continuous and repeated manner that it becomes unviable for judges to annotate each isolated behavior and to define where one complaint ends and where another starts. In these cases, we, therefore, chose to primarily report the proportion of players reported and the proportion of games affected by such behavior instead of analyzing individual instances. Regarding the overall value added of this study, however, it should be noted that this bias mainly results in the underestimation of behaviors of lesser severity, such as repeated complaints. Our coding scheme is still viable for the collection and analysis of severe toxicity, namely, hate speech and wishes of death and serious harm, since the underlying legal definitions are clearer and utterances can be faithfully counted as discrete occurrences.
Finally, it should also be recognized that the external validity of our study is compromised by the streamers, that is, the participants in our study, being aware that they are being watched while playing. This may, consciously or unconsciously, lead to behavioral modifications, most notably a higher degree of compliance with the community guidelines. In defense of our data collection decisions, however, we argue that the high degree of habituation and immersion of players in the game and their environment, as well as the absence of constant reminders of the recording process, somewhat mitigate the risk of extreme behavioral modifications. This bias can, nevertheless, never be perfectly controlled.

Discussion and Conclusions
Riot Game claims that most disruptive behaviors are exhibited by a small percentage of players (5%). Using categories that are similar to reportable behavior in Riot Games' EULA, our judgment task shows that 24.3% of players engaged in toxic behavior during the analyzed competition matches. It is hard to assert which players exhibit these behaviors more frequently, and for whom they are exceptional events. Without the disclosure of Riot's data, we will hardly be able to verify this number.
Nevertheless, our research shows that a significant number of the reviewed matches are affected by disruptive behaviors (70%). Taking into account that our definition of disruptive behavior encompasses actions that are sanctioned by Riot's EULA, the evidence gathered suggests that those terms and conditions are consistently breached. Thus, and taking into account the particularities of the SoloQChallenge (see "Limitations"), we can safely affirm that toxicity (or disruptive behaviors in the context of LOL) is indeed a phenomenon that it is likely to affect the majority of LOL matches.
Having said that, disruptive conducts are prevalent over more serious behavior, which we have classified as harmful. Furthermore, among instances of harmful behavior, insults are the predominant category. In the context of the game and the possible scope of attention of such comments, it is not likely that they pose a significant threat to users' reputations and well-being. By contrast, hate speech and wishes for death or serious harm, constitute an exception among the reported behaviors in our sample.
From the perspective of the game company's liability, this is positive. Intermediaries fulfill the function of keeping virtual spaces safe from harmful expressions as part of their services (Gillespie, 2018). Indeed, the European Union legal framework makes platforms responsible for the curation of these types of wrongdoings, a responsibility that is going to be expanded in the upcoming Digital Service Act. Thus, a significant prevalence of instances of harmful conduct over disruptive behavior would call for a stronger content moderation response. However, it appears that this is not the case in our sample, that is, the LOL chat. This does not mean that other disruptive behaviors are not concerning to players. The categories that we have utilized to measure phenomena of toxicity characterize behaviors that are sanctionable by criminal or tort law as more harmful to players. However, in a competitive landscape such as LOL, where insults and disrespectful complaints are likely to be normalized (Beres et al., 2021), players might be more concerned about behaviors that ruin matches for their team (i.e., going AFK or voluntary death). Further research is required to assert if this is indeed the case, considering players' perceptions to strengthen the detection of disruptive behavior in the future based on situational requirements rather than legal standards alone.
According to the results of this research, the concern for toxicity should be understood primarily as a concern for disruptive behaviors, acts that negatively affect player's gameplay. Consequently, the actions of the managers of these spaces should be at least partially oriented towards establishing systems that serve the interests of their community (Busch et al., 2015), and not exclusively to comply with external legal frameworks. Legal frameworks are designed to address the behaviors that society as a whole is most concerned about, but they do not fully cover behaviors that negatively affect a community.
In consequence, the results of the study raise the issue of the enforcement of Riot's policy regarding disruptive, not harmful conduct. Judging whether a teammate goes AFK or voluntarily kills their champion to ruin the game requires a review of matches by moderators that understand the game mechanics more deeply. The experience of this study shows that these are complex decisions in which numerous variables must be considered (the presumable intention of the player when commenting, the previous acts of their teammates, and the overall state of the game). If these are the more prevalent conducts, and the duration of a LOL game is typically 30 min, a considerable investment must be made by Riot to detect and act against this content. If human moderators are only tasked with the review of complaints, many more harmful conducts may remain unsanctioned. Automation of in-game moderation, on the other hand, does not seem to be a viable solution considering the risks that these systems pose in the face of complex judgments that could lead to wrong decisions and sanctions against honorable players. Such systems are not only unjust but could, in the long term, lead to frustration with the game and negatively affect the size of the population of players.
Bans and terminations of accounts on the grounds of disruptive behaviors should thus be carefully considered, particularly to avoid systematically unfair or possibly even discriminatory responses. Banning could be a disproportionate sanction in some situations, especially in games in which players purchase skins or other virtual goods which cannot be used once the account is terminated (Chelsea, 2017;Meehan, 2006). Suspending player accounts can hamper both economic interests and freedom of expression (Balkin, 2004). Moreover, its effectiveness has also been called into question because players can open other accounts or use tools to hide their digital identity. In this sense, the FPA has warned that "banning a player reduces their attachment to the game and sense of responsibility for their actions" and could lead players to act carelessly because they do not have a consistent account or identity to protect (FPA & ADL, 2020). Thus, considering more appropriate measures to deal with less worrying conduct is desirable. Temporary bans, or the establishment of cooldowns before being able to join a match again, may be appropriate measures. In addition, the effectiveness of nonpunitive actions like user warnings should not be overlooked. Giving users feedback about their actions after a match promotes awareness of community rules and decreases the likelihood of repeating offenses (Lewington & Committee, 2021). Studies outside the industry support these findings, pointing out that information about social norms positively influences newcomers' behavior, at least in some online communities (Matias, 2019). Soft approaches, like warning users about allegedly disruptive actions and the promotion of honest behavior through reward systems, seem more appropriate, with the added benefit of avoiding damage that a wrongful account suspension represents.
Finally, it should be noted that having found some kind of toxic behavior in 70% of the matches analyzed, it is safe to say that League of Legends has a problem with toxicity. Moreover, toxicity seems to be more or less evenly distributed across all the matches it affects, and there is no evidence that toxicity is the responsibility of a small group of players. In fact, 24% of the players in the games analyzed committed some kind of toxic behavior. However, most of this toxicity stems from disruptive behavior, mainly complaints about the performance of teammates. The harmful behavior is rare, although this does not mean that they are irrelevant. Consequently, although we can affirm that there is a problem with toxicity, we must also stress that it should not be understood primarily in legal terms or in terms of the need for regulation external to the video game. The issue is primarily about how toxicity harms members of the community and how it may be driving certain players out of the game or even excluding them.
Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This publication was made possible by Grant FJC2020-042961-I funded by Ministerio de Ciencia e Imnovación/Agencia Estatal de Investigación/ 10.13039/501100011033 and by the European Union "NextGenerationEU"/"PRTR," by the project #FakePenal funded by the Spanish Ministry of Science and Imnovation (Knowledge Generation Projects), and the Grant PIF 2020 funded by the University of the Basque country.