The correlation between positive reviews, playtime, design and game mechanics in souls-like role-playing video games

The “Souls-like” role-playing video game genre was inadvertently created due to the influence of the “Souls franchise”. However, each game has different twists which can be new gameplay mechanics, graphical style, etc. while maintaining the core elements of the “Souls franchise”. The goal of this study is to understand which gameplay mechanics are more liked by comparing reviews of these games to each other. Thus, different game design elements and game mechanics are investigated in 21 “Souls-like” video games to see how the users reacted to them and whether they positively reviewed them. All (993,932) reviews were scraped from the Steam webpage regarding these games in the middle of April 2021 using the steam_reviews Python package. These reviews contain the playtime at review, whether a positive or negative rating is given, and a textual component among others. Overall, 11 various game design elements and game mechanics were set up for the investigation: the setting, graphical dimensions as well as style, level design, and whether there are difficulty settings, multiplayer features, upgradeable weapons/armor, equipment durability, in-game map, extra penalties upon death, and a classic level-up system. Based on data distributions, either the t-test or the Mann-Whitney-Wilcoxon test was used for the analysis. The syuzhet package, which uses Natural Language Processing methods, was used in the statistical program package R along with the NRC Emotion Lexicon to evaluate the textual parts. According to the results, a slight-to-moderate correlation exists between positive reviews and the users’ playtimes: more playtimes mean a larger chance of having positive reviews. Significant differences also exist in the percentages of positive reviews among these games: Hollow Knight is the most liked game. Out of the investigated 11 factors, significant differences exist among all of them: drawn graphics (96.48%) and 2D style (95.61%) are the two most liked factors, while pixel graphics (87.11%) and a futuristic setting (86.74%) are the two least liked ones. Almost every factor can significantly affect all eight basic emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, trust). The exceptions are graphical dimensions, weapon/armor upgradability, in-game map, and extra penalties upon character death as no significant differences exist in case of trust (p = 0.85), anger (p = 0.24), sadness (p = 0.21) and disgust (p = 0.095), respectively when the average sentiments per review were examined. Future “Souls-like” game design and development can be influenced by the results as game developers can more easily choose from the factors they want to implement or whether they want them at all in their games.


Introduction
Video games have come a long way since their inception and nowadays, several genres of them exist, such as first-person shooters, real-time strategies, and role-playing games. Throughout the years video game genres have evolved as well by incorporating different design elements and game mechanics. This makes gaming experience unique as having good gameplay is important [13,14].
Developing a unique gaming experience is not an easy task and guides were created to make the procedure simpler [1,8]. Designing role-playing games is even more complex as according to Horsfall and Oikonomou, players prefer a good combat system, story, and interaction with nonplayer characters (NPCs) [22]. Daneels et al. also proved that narrative, interaction with characters, and narrative-impacting choices elicited eudaimonic moments [11]. Such design elements as realistic characters and tone-appropriate soundtrack scores can also enhance player experiences. The latter is also strengthened by Skalski and Whitbred, however spatial dimensions do not significantly affect enjoyment [42]. Contrarily to spatial dimensions, graphics and "juiciness" are important as well [26]. According to Bostan and Kaplancali, certain game mechanics such as melee combat, fixing broken equipment, gathering information, acquiring something, etc. can motivate players by providing their needs [5].
The results in the study of Salmon et al. also show that games that are challenging are in the top three types of games played by seniors [41].
The "Souls-like" video game genre was inadvertently created due to the influence of the "Souls franchise" as several other game developers implemented its "Souls formula" in their games. However, each "Souls-like" game is given new twists which can mean new gameplay mechanics, graphical style, etc. while maintaining the core elements such as the unforgiving difficulty, the so-called "bonfire checkpoint system", and the environmental/contextual storytelling. As there is a high emphasis on the environments, screenshots of Dark Souls, Dark Souls II -Scholar of The First Sin, and Dark Souls III are shown in Fig. 1.
Changes among "Souls-like" games can either be small or large: for example, gunplay (ranged combat) is one of the main focuses of Remnant: From The Ashes instead of the (mostly) melee combat that can be found in the Dark Souls games; or another example is Salt and Sanctuary which is a 2D game instead of a 3D one. Most games do not provide an in-game map, however some do. Therefore, differences exist between the various "Souls-like" games, but what do players think about them? To find an answer to this question, the reviews on Steam have to be analyzed.

Analyzing reviews on steam
Analyzing Steam reviews is not a foreign concept as several studies investigate their usefulness. Lin et al. studied the reviews found on early access games and concluded that players leave more positive reviews after the game leaves early access [30]. According to Busurkina et al., reviews usually contain seven topics: achievements, narrative, social interaction, social influence, visual/value, accessories, and general experience. Besides these, users are likely to report bugs in the reviews [7]. As Steam reviews have multiple components, not every part of them is necessary to identify whether they are useful or not. According to Kang et al. the total votes on a review by others is the most important in this regard, while the second most important factor is the user's recommendation (whether a review is positive or negative) for the game [25]. However, the study of Eberhard et al. shows that those reviews that are more voted on are more complex and express more negative sentiments as they are critical towards the game [12].
Then, what is the case with the "Souls franchise" and "Souls-like" games? Is their reception significantly different due to the various design choices? What are the sentiments of players?
To find an answer to these questions, several "Souls-like" games are investigated along with the "Souls franchise". These examined games are shown in Table 1: Therefore, for this investigation, this article is structured as the following: the core game mechanics of "Souls franchise" and "Souls-like" games are discussed in Section 2, the research questions are presented in Section 3, the used materials and methods are shown in Section 4, while the results and discussion can be seen in Sections 5 and 6, respectively. Lastly, conclusions are drawn in Section 7. In this section, the game design elements and mechanicswhich some call an aesthetic category [29] are presented in detail. As shown in Fig. 2, the appearance of "Souls-like" games varies among them mainly due to design elements, but sometimes due to their mechanics as well. Even though their appearance varies, their two core elements remain. These two core elements are the unforgiving difficulty and environmental/contextual storytelling which are detailed in subsections 2.1 and 2.2, respectively.

The unforgiving difficulty
According to the Flow theory of Csikszentmihalyi, players should be put in a so-called "Flow channel" which exists between the stages of boredom and anxiety [36]. Players should occasionally step into both stages which can create the concept of flow and interesting gameplay. However, the time spent in either boredom or anxiety should not be long, otherwise the player could get demotivated or annoyed. It should be noted that staying in the anxiety stage can also increase the heart rate of the player [4].
However, these games are more about perseverance than are about Flow because the stage of boredom is very small [33], and interaction with the vast environment can be done in multiple ways [37]. Also, the game's world is hostile and anything could kill the player character. At a first glance, the player does not know which NPCs are friendly or are hostile. Since the games usually do not restrict killing NPCs, friendly ones can also be killed. This can prevent completing certain quests or can set the player character on a new path. Thus, the challenge is usually a lot higher than the player's skill. According to Rogers, the stage of "pain and loss" can be reached due to the high difficulty, and the player's skills can also be improved which is another stage [40]. This can create some goals and motivation [31]. This is one of the keys to the successful flow of gameplay found in these types of games [2].

Combat
Even the weakest enemy can kill the player character with three or four attacks. To add to the difficulty, sometimes enemies are in a certain position where the player cannot see them (e.g. behind a column). This means that the player has to know their position, attack pattern, and the area's layout to survive the encounter.
Since the focus is on melee combat in "Souls-like" games, the weapons and armor of the player character can be upgraded in most of them. As aggressive combat is preferred when playing these games, a "stamina system" is also used: when the player dodges, runs, or attacks, the stamina of the character decreases. When the stamina is empty, the player character cannot do anything besides walking. Therefore, resting is required to refill it, but the player character can be attacked during resting.
As the difficulty of these games is high, the players have multiplayer features in most of them. This means they can summon other players into their games for help, usually before fighting bosses. However, Remnant: From The Ashes is one game that offers a full campaign playthrough with two other players at most.

The bonfire checkpoint system
The so-called "bonfire checkpoint system" is the most important part of the "Souls-like" games. Depending on the games, the bonfires themselves can change graphically due to the changes in the story. For example, the bonfire is illustrated as a bench in Hollow Knight, as a shrine in Nioh: Complete Edition, or as a crystal in Remnant: From The Ashes. However, their functions remain the same in each game. This system is illustrated with a flowchart in Fig. 3.
In this simplified version of the "bonfire checkpoint system" the player character can either do two things: interact with a bonfire or be killed. In the case of interaction, a new checkpoint is created near the bonfire. The player character becomes healed and all other enemies (except the bosses) respawn in the world of the game. The health potions (e.g. Estus Flasks in Dark Souls) are also refilled. In case of death, first, the currency is dropped on the ground. Then, the player character is either returned fully healed with his refilled health potions to the last checkpoint or the beginning of an area. The latter usually only happens in the beginning of the game. The enemies also respawn in the case of the player character dying. Now, the player character has a choice that is not explicitly illustrated in Fig. 3: the player character can either go back to pick up the previously dropped currency or can continue the game. It should be noted that if the player character is killed again, then the previously dropped currency will be lost forever. However, new currency (which was gathered after the previous death) can be dropped in a new spot upon dying. Naturally, if the player character picks up the previous currency, it is added to the new currency in the character's inventory.

Level design
Regarding level design, most of the "Souls-like" games have an interconnected and openworld map that can be perceived as giant [16], whereas in Nioh: Complete Edition, the player has to choose a level from the level selection screen. The 2D games share this interconnected open-world, but they are called "Souls-like Metroidvania" games. Metroidvania is a portmanteau of Metroid and Castlevania as they were the pioneers of this genre: usually, these are 2D side-scroller role-playing games with an open world. The player has to navigate the levels, defeat enemies and bosses to proceed, however there are environmental hazards as well, e.g.
traps. An in-game map is present in these games, except in the case of Salt and Sanctuary. Similar maps can also be found in Code Vein or Remnant: From The Ashes which are 3D "Souls-like" games. There is a map in Sekiro: Shadows Die Twice, but it does not tell the exact position of the player, thus it is not counted as an in-game map.
However, knowing the area is not easy, because most "Souls-like" games do not show the layout, do not point in a direction, and do not have an in-game map. For example in the first Dark Souls, it is only mentioned to "ring a bell inside the tallest tower". The player has to figure out how to get there as the game is not linear and no marks can be seen that could indicate the correct way. It is possible to go into different directions as well and begin a new quest.

The environmental/contextual storytelling
A recurring theme of death is featured by the "Souls franchise" and it is even built into their stories and gameplay [34,45]: the player character is some kind of undead on a questa hero's journey [9,43] to find a cure and/or rekindle a fire to stop the end of "The Age of Fire" and even the end of the world [19]. However, it is up to the player to stop this because it is also possible to hasten the world's end in the games. It is the player's choice, usually depending on what the player understood of the story.
As can be expected, the story of the franchise is deep and evolved throughout the Dark Souls games [17]. Although, the storytelling in these "Souls" games is quite unusual: besides environmental storytelling and cryptic dialogues of NPCs, the story of the games is told through item descriptions [3]. In other words, the storytelling can be contextual as well within the games.
A great example of environmental storytelling is in the original Dark Souls: the town of Oolacile is corrupted by the Abyss. The player starts at the top of the tallest building. First, the player has to go down to the surface level, and lastly, to the Abyss itself. As the player ventures downward, more black substances appear on the buildings, more buildings can be seen destroyed. Even the environment itself becomes darker later on, and the enemies appear more inhuman. An example of contextual storytelling is in the description of the "Witch's Ring" in Dark Souls III: "The Witch of Izalith and her daughters, scorched by the Flame of Chaos, taught humans the art of pyromancy and offered them this ring (…)". This sentence tells the player that the art of pyromancy (which is a form of magic in the "Souls franchise") is taught by The Witch of Izalith. This fact could not be known if the player did not read the item descriptions. The game will not tell the player during the playthrough. There are countless other similar examples.
It should be noted that the item descriptions are as cryptic as the dialogues of NPCs. It is possible that the players could not understand the story even after finishing the games if they did not search for crucial information that is hidden in the world and item descriptions. However, the community of this franchise is quite large, meaning that online forums and videos exist intending to piece the story of the games together [23].

Research questions
As could be seen in the introductory section, game design elements and mechanics make the games unique. The effects of narrative, graphics, character models, sounds, melee combat, fixing broken equipment, gathering information, and acquiring items were investigated in the literature. Also, according to previous results, analyzing whether Steam reviews are positive or negative can determine what players liked or did not like. As people write reviews, their sentiments are reflected in the text.
Since there are several "Souls-like" games available, and they have design elements and mechanics that vary, it would be interesting to see whether players have different opinions and feelings about them. Therefore, four research questions (RQs) were formulated to investigate whether these differences positively or negatively affect the reception of the games. RQ1: Are there any significant differences in the percentage of positive reviews among the games?
RQ2: Does the playtime of the user correlate with the percentage of positive reviews the games received? RQ3: Do the different game design elements and game mechanics influence the percentage of positive reviews? RQ4: Do the different game design elements and game mechanics influence the emotions in the reviews?

Materials and methods
To answer the RQs, a considerable amount of user reviews was needed. To scrape the Steam webpage, the freely available steam_reviews package was used [20]. This package was developed in Python under the MIT license and it uses the Steamworks API. With the use of this package, user reviews were scraped from the Steam website during the middle of April 2021. The reviews were downloaded in a .json format. Then, they were imported using the jsonlite package into R in which they were evaluated [10]. The investigation consisted of three phases. The first phase was the scraping (subsection 4.1), the second was the creation of factors that were investigated (subsection 4.2) and the third was the evaluation itself (subsection 4.3). The latter phase was made up of three various parts: the first part focused on the correlation between playtime and the reviews, the second focused on the percentage of positive reviews grouped by games and certain factors, while the third part focused on the text component inside the reviews.

The scraping process
Before the scraping process began, the games themselves had to be selected. The selection was made somewhat easier by the tag system on Steam, however the tags are placed by users on the games. This fact is important because it can make the system faulty. For example, at the time of writing this article, "VEGAS Pro 14 Edit Steam Edition" is tagged as "Souls-like" on Steam, despite it being a video editing software. Therefore, the popular tags on these games were examined carefully. The selection criteria were the following: is tagged as "Souls-like" on Steam, has similar combat, and a checkpoint system to the "Souls franchise", as well as it is not an early access title. To ascertain whether the selected games could be considered "Soulslike", they were tried out and most of them were even played through. For example, Titan Souls was not included in the investigation: it was tagged as "Souls-like" on Steam, but it did not contain the "bonfire checkpoint system". It is more like a "Shadow of the Colossus"-type of role-playing game, than a "Souls-like". This means that the game has an open world and besides the player character, only bosses exist in it. The player has to find them and defeat them. If all are defeated, the game is completed.
After the games were selected, the Steam reviews subpage was examined. Each game on Steam has a subpage that contains the user reviews. The reviews are complex and each contains multiple fields of information [46], but only the following were  Overall, the number of scraped reviews was 993,932. This also means thatregarding these gamesall reviews were scraped from Steam during the middle of April 2021. The number of these records can also be seen in Table 2 regarding each game. Based on the number of reviews alone, Dark Souls III is the most played game in this genre.

The investigated factors
After the scraping was completed, the factors that were critical to the comparison were set up. For the investigation, the design elements and game mechanics in the previously mentioned games were carefully examined and the differences between them were noted.  After these factors were set up, several categories were created from these differences to serve as the basis of comparison. Each of the games was examined by trying/playing them, and the possible categories were noted by hand. The categories were designed to have multiple games in them. These categories can be seen in Table 3, where their possible states are also presented.
Afterward, the games were carefully examined by trying/playing them and these previously mentioned factors were assigned to them manually. These can be found in Table 4.

Evaluation of the data
On Steam, every review is either positive or negative. This is symbolized with a thumbs up or a thumbs down on each game's review page. Thumbs up also means that the game is recommended by the reviewer, while thumbs down naturally means that it is not recommended. For reading purposes, these are used as synonyms in this article; therefore, positive review = thumbs up = recommended game; while negative review = thumbs down = not recommended game.
As was mentioned in the beginning of this section, the evaluation was done in three parts. First, the correlation was investigated between the playtime and the reviews. As each review contained the playtime at reviewing, it was easy to calculate. It should be noted that the steam_review package scrapes the playtimes in minutes. These were converted to hours. The results of this part of the evaluation can be found in subsection 5.1.
The second part focused on the percentage of positive reviews regarding each game. After scraping 993,932 reviews and looking through the data, it was concluded that 904,005 of them are positive ones. This means that 89,927 reviews are negative. In other words, approximately 90.96% of the reviews are positive and 9.04% are negative. Naturally, this is only the overall number. In the next section, this is elaborated on regarding each game, and each factor. First, the games themselves are compared. Afterward, they are grouped by one factor, then by all Table 3 The investigated states of "Souls-like" game design elements and mechanics

Category
Possible states factors, and lastly, are compared to each other. This part of the evaluation can be seen in subsection 5.2. However, the only problem with the Steam reviews was that they do not have a rating system between the scale of 1-10 (or 1-5). To get a better picture of the reviews, their textual parts were analyzed as well. Therefore, for the textual analysis, the "syuzhet" Natural Language Processing package was used in R [24,47]. With its help, it is possible to classify emotions people felt while writing the reviews. Four sentiment lexicons are incorporated by this package. The NRC Emotion Lexicon was chosen as it is free for research purposes, and because it proved to be useful in the past [6,28]. Eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (positive and negative) exist in it. With the syuzhet package's customizable get_nrc_sentiment(char_v, cl = NULL, language = "english", lowercase = TRUE) function, the sentiment of each word or sentence can be easily assessed: they are compared to the words in the previously mentioned emotion lexicon. Then, a data frame is created in which each row corresponds to a sentence in the reviews, while the sentiments of every emotion are represented by columns [35]. Negative numbers are converted from the negative column. Afterward, they are added to the values in the positive column. The resulting matrix shows the number of sentiments per sentence. This was the third part of the evaluation and can be found in subsection 5.3.

Results
Before further research commenced, the distributions of the playtime, reviews, and factors had to be investigated as it was imperative to know them. For this, the Kolmogorov-Smirnov test was used [44]: it can compare a sample with a reference probability distribution or two samples to each other. Regarding the playtimes, the hypotheses of normal distribution are rejected in the case of every game: p < 2.2 × 10 −16 in all of them. Next, the distribution of the percentage of positive reviews was investigated: if a review was positive, it was counted as 1, and when it was negative, it was counted as 0. When investigating the percentage of positive reviews, the hypothesis of normal distribution is accepted with p ¼ 0:8104. Regarding the factors per game, the hypotheses of normal distribution are also accepted in the case of every factor. The distributions of the percentage of positive reviews per game are detailed in Fig. 4. As could be observed in this section, the distribution of some data is normal, while it is not normal in some cases. Due to this fact, parametric and nonparametric tests should be used, respectively for further analyses. Such parametric test is the t-test [38], while such nonparametric test is the Mann-Whitney-Wilcoxon test [32]. These two are used for further investigation.

Investigating the correlation between the playtimes and positive reviews
First, the playtimes, afterward the number of recommendations was examined. The results of the examination can be seen in Table 5. The playtimes are depicted in hours in every case.
According to the scraped reviews, if the playtimes at review are investigated, then Dark Souls III is the game that has the largest average playtime (104.07 h), while Death's Gambit has the smallest average (12.99 h). If the playtimes are still counted even after reviewing, these two games would still have the largest and smallest average playtimes, respectively with Hollow Knight is the best positively reviewed "Souls-like" game with an average of 0.9692, while Sekiro: Shadows Die Twice is the second most positively reviewed (0.9478) and Dark Souls III is the close third (0.9388). Lords of The Fallen has the smallest average of positive reviews with 0.6072. The next to examine was the correlation between the recommendations and playtime. As mentioned earlier, the distributions of playtimes are not normal. Therefore, a non-parametric correlation test, called the Spearman's rank correlation coefficient was used to determine whether they are independent of each other [15]. The results of this investigation can also be seen in Table 5.
According to the data, the correlation varies by each game, while the significance is very strong in each case (p < 2.2 × 10 −16 ). Every correlation value is positive which means that if the playtime increases, the player is more likely to recommend the game. Also, three pairs are close to being insignificant: Ashen and The Surge (p = 0.02532); Blasphemous and Star Wars Jedi: Fallen Order (p = 0.04333); and lastly, Death's Gambit and Hellpoint (p = 0.04596). However, based on the previous p-values of the average percentage of positive reviews, it can be stated that these are the games that received similar reviews to each other.
Next, the percentage of positive reviews was assessed among the factors. The investigation resulted in the following: & SET: there are strong significant differences in positive reviews among all settings (p < 2 × 10 −16 in all cases, except between medieval and Japanese settings (p = 0.0013)). The percentages of positive reviews are 91.82%, 91.57%, and 86.74% for medieval, Japanese, and futuristic settings, respectively. & GD: "Souls-like" games with 2D graphics are significantly (p < 2 × 10 −16 ) more likely to receive a positive review than with 3D graphics. The percentages of positive reviews are 95.61% and 89.94%, respectively. & GS: there are strong significant differences in positive reviews among all graphical styles in all cases (p < 2 × 10 −16 ), except among cel-shaded and pixel graphics where the significance is weak (p = 0.027). The percentages of positive reviews are 96.48%, 90.03%, 87.72%, and 87.11% for drawn, realistic, cel-shaded, and pixel graphics, respectively. & LD: "Souls-like" games which feature an interconnected world are significantly (p < 2 × 10 −16 ) more likely to receive a positive review than those with level selection. The percentages of positive reviews are 91.44% and 87.18%, respectively. & DIFF: "Souls-like" games which do not have difficulty settings are significantly (p < 2 × 10 −16 ) more likely to receive a positive review than those which have. The percentages of positive reviews are 91.43% and 87.40%, respectively. & MUL: "Souls-like" games which do not have multiplayer features are significantly (p < 2 × 10 −16 ) more likely to receive a positive review than those which have. The percentages of positive reviews are 92.20% and 90.06%, respectively. & UPG: "Souls-like" games which do not allow weapon and/or armor upgrades are significantly (p < 2 × 10 −16 ) more likely to receive a positive review than those which allow. The percentages of positive reviews are 92.95% and 90.39%, respectively. & DUR: "Souls-like" games which have an equipment durability mechanic are significantly (p < 2 × 10 −16 ) more likely to receive a positive review than those which not. The percentages of positive reviews are 91.52% and 90.44%, respectively. & MAP: "Souls-like" games which have an in-game map are significantly (p < 2 × 10 −16 ) more likely to receive a positive review than those which not. The percentages of positive reviews are 92.56% and 90.25%, respectively. & PEN: "Souls-like" games which have additional penalties upon character death are significantly (p < 2 × 10 −16 ) more likely to receive a positive review than those which not. The percentages of positive reviews are 92.77% and 90.30%, respectively. & LVL: When the level-up system is not the classic one (meaning not done at a "bonfire" or specific characters near it), "Souls-like" games are significantly (p < 2 × 10 −16 ) more likely to receive a positive review than those which have the classic level-up system. The percentages of positive reviews are 94.04% and 90.18%, respectively.
Afterward, all factors were investigated. According to these variables, 16 levels could be created of these 21 "Souls-like" games. The following insignificant differences were found among the positive reviews: Sanctuary. This means that the percentage of positive reviews between these two games are insignificantly different (p = 0.94784).
The differences between the positive reviews of all other "Souls-like" games were strongly significant. The six combinations of factors that are most likely to receive more positive or less positive reviews are seen in Table 6: According to Table 6, the best percentage of positive reviews is in the case of Hollow Knight as there is only one game in this combination of factors. The percentage of positive reviews is 96.92% at this level. The second is Sekiro: Shadows Die Twice with 94.78%, as there is only one game in this combination of factors. The third combination of factors consists of Dark Souls, Dark Souls -Remastered, and Dark Souls III. Their percentage of positive reviews is 92.60%. The lowest percentage of positive reviews (60.72%) is Lords of the Fallen as there is only one game in this combination of factors. With 70.67% positive reviews, only Ashen is in the category. In the remaining category, there is only The Surge with 74.56% positive reviews.

Analyzing the textual reviews
As was mentioned earlier, the textual reviews were analyzed as well. Since the previously mentioned emotion lexicon is multilingual, every textual review was analyzed. This means that the textual parts of all 993,932 reviews were examined. To do this, first, the textual components of all reviews were analyzed. Then, the investigation continued by examining one variable only. Afterward, all variables were analyzed. The results of the analysis regarding all reviews can be seen in Fig. 5.
As can be seen in Fig. 5, trust, anticipation, and joy were the three most largely felt emotions when writing the reviews. Their percentages were 16.68%, 15.32%, and 14.15%, respectively. The percentage of surprise in the reviews was quite small with 7.66%. The feeling of disgust had the lowest percentage with 6.95%. Anger (11.83%), fear (13.61%), and sadness (13.76%) are quite high as well, but can this phenomenon be considered normal? The hypothesis is that "yes, because they are skill-based games". Therefore, to answer this, these emotions are grouped by each game in Table 7. In case of standard deviation, the largest one was joy with 2.24%, while the smallest was surprise with 0.78%.
As shown in Table 7, some games received similar in case of each emotion (defined as ±1% of means): this fact is denoted by *. This means that the previous hypothesis was accepted. Darksiders III was the one that made most people angry (14.88%) and was most feared (15.67%). It is not surprising as its prequels were not "Souls-like": Darksiders I was a simple "hack'n'slash" game, while Darksiders II was an open-world roleplaying game. Based on the scraped reviews, Star Wars Jedi: Fallen Order was the most anticipated game (19.97%), possibly because previous Star Wars games were multiplayer games. Dark Souls II -Scholar of The First Sin was the one that people felt most disgusted by (8.75%), as it included extra penalties upon death and unfair enemy placement. The players of Hollow Knight felt the most joy (20.09%), possibly due to the rewarding experience, while the most sadness is felt during Dark Souls II -Scholar of The First Sin (16.56%). Contrarily to Hollow Knight, Dark Souls II -Scholar of The First Sin was not very rewarding. The Surge Table 6 The three best and three worst positively reviewed combination of factors Hollow Knight was the one that made people the least angry (8.82%), and it also was the least feared game (10.05%). The reasons were probably the same as above. Dark Souls II -Scholar of The First Sin was the least anticipated (13.18%) due to its base game's reception. Although, Dark Souls II was more anticipated (14.28%). Star Wars Jedi: Fallen Order was the game that people felt least disgusted by (4.69%) as well as made the players the least sad (8.91%). Reviewers of Lords of The Fallen felt the least joy (11.34%). Darksiders III felt the least surprising (6.57%). Lastly, Blasphemous felt the least trusted (13.92%).
To analyze whether there are differences between the emotions, first, their data distributions were investigated, and neither of them had a normal one. Then, using the Mann-Whitney-Wilcoxon test, the differences between each of them were investigated in the case of every emotion. Most of them were strongly as well as significantly different from each other (p < 2 × 10 −16 ), therefore it is easier to define those pairs where only insignificant differences arose (meaning p > 0.05). The results of these comparisons can be seen in Fig. 7 in the appendix: 62 pairs of games had insignificant differences among emotions. Out of these pairs, Ashen and Darksiders III had the most similar reviews based on sentiments alone: there were insignificant differences among 7 emotions. The second pair which received similar reviews was Death's Gambit and Hellpoint because there were insignificant differences among 6 emotions.   Table 7 The distribution of sentiments found in the scraped game reviews, grouped by emotions well as Salt and Sanctuary; and lastly, Death's Gambit and The Surge. The number of pairs is the same with 3 emotions. Twelve pairs of games had insignificant differences between 2 emotions, and lastly, thirty-two pairs of games had insignificant differences among 1 emotion. Every other pair was significantly different from each other.

One-by-one analyses
During one-by-one analyses the feelings about each factor were investigated. The results can be seen in Fig. 6. The columns are grouped by factors, while each shade represents a feeling. From the darkest to the brightest, they are the following: anger, anticipation, disgust, fear, joy, sadness, surprise, trust. Starting with the setting, futuristic games have the largest average sentiments per review, while those with Japanese setting have the smallest ones. The increases among them are 108.66%, 183.17%, 109.70%, 102.36%, 186.31%, 103.24%, 128.42%, 163.62%. These increases correspond to anger, anticipation, disgust, fear, joy, sadness, surprise, trust, respectively. The difference among every possible pair is significant (p < 2 × 10 −16 ).
2D games received more positive emotions than 3D games: their anticipation, joy, surprise are better by 6.34%, 30.03%, and 12.47, respectively. They also received more sadness by 0.09%. The 3D games received more negative emotions such as anger, disgust, and fear by Fig. 6 The average sentiments per review and factor 22.24%, 26.73%, and 30.68%, respectively. However, 3D games are more trusted by 3.01%. In this case, not every difference is significant: between 2D and 3D, trust is the only emotion that is insignificantly different (p = 0.85). When talking about sadness, even though the difference is small, it is significant: p ¼ 4:9 Â 10 À9 , while p ¼ 5:8 Â 10 À8 in case of anticipation. In case of all other emotions, p < 2 × 10 −16 .
Regarding graphical styles, games with pixel graphics received the largest average of emotions, while drawn and realistic games received the least average. Each difference is significant, although the value of p is not 2 Â 10 À16 in every case: when talking about anticipation, p = 0.00038. This means that the difference is only moderately significant. The remaining pairs are strongly significant. In case of disgust and fear, the significance among celshaded and realistic graphics are p ¼ 1:4 Â 10 À14 , and p ¼ 1:7 Â 10 À14 , respectively. When talking about surprise and trust, the differences between drawn and realistic graphical styles are also strongly significant (p ¼ 6:9 Â 10 À7 , and p ¼ 1:6 Â 10 À11 , respectively).
While level selection type of games received more anger and fear by 22.99% and 15.89, respectively, it received more positive emotions: these games are more anticipated (58.63%), and people felt more joy (53.32%), more surprise (22.10%), and more trust (47.51%). Games with interconnected levels received more disgust (1.32%) and sadness (11.05%) in their reviews. Each difference is significant in the case of every emotion pair: p < 2 × 10 −16 , except when talking about disgust. In that case, p ¼ 1:5 Â 10 À6 , although it is still strongly significant.
When there are difficulty settings in the games, more average sentiments can be found in the reviews. The increases in average emotions are the following from anger to trust (darkest to brightest): 62.68%, 97.76%, 31.74%, 55.14%, 90.76%, 15.64%, 55.44%, and 79.78%. Every difference among pairs is strongly significant (p < 2 × 10 −16 ).
When weapons or armor can be upgraded, the emotions of disgust, fear, sadness, surprise are increased by 22.88%, 6.13%, 36.20%, and 3.27%, respectively. When they could not be upgraded, the reviews contained more anger (0.69%), anticipation (10.22%), joy (9.12%) and trust (0.65%). Every difference among pairs is significant, except in case of anger where it is not (p = 0.24). The difference between the remaining pairs are strongly significant: p ¼ 5:6 Â 10 À7 when talking about surprise, and p < 2 × 10 −16 in the case of the remaining pairs.
In case of games with equipment durability, players felt more disgust (5.32%) and sadness (4.54%) on average in the reviews. The games without a durability feature received more anger (15.93%), anticipation (30.56%), fear (8.93%), joy (41.16%), surprise (26.67%), and trust (22.18%) on average in the reviews. Every difference among pairs is strongly significant (p < 2 × 10 −16 ), except in the case of fear where it is only slightly significant (p = 0.029).
In those games where an in-game map is provided, the reviews contain more average of every emotion except disgust. Anger, anticipation, fear, joy, sadness, surprise, and trust are increased by 9.21%, 52.81%, 3.70%, 73.49%, 1.98%, 35.39%, and 36.63%, respectively. When an in-game map is not provided, the feeling of disgust is increased by 5.81%. Every difference among pairs is significant, except in the case of sadness where it is not (p = 0.21).
The difference between the remaining pairs are significant: p = 0.0014 when talking about disgust, and p < 2 × 10 −16 in case of the remaining pairs.

Investigation using all factors
After the factors were investigated one by one, all factors were analyzed by emotions. Similarly, to before, 16 groups of factors were made. Each possible pair was analyzed in case of every emotion. Most pairs had strong significant differences. In the appendix, Fig. 8 shows those pairs which do not have significant differences. According to the comparison in Fig. 8, the reviews were most similar among two groups: (1) games, with a futuristic setting, 3D and realistic graphics, interconnected levels, difficulty settings, no multiplayer features, upgradeable weapons/armor, no equipment durability system, no in-game map, no extra penalties upon character death and a classical bonfire system; (2) games, with a medieval setting, 3D and celshaded graphics, interconnected levels, no difficulty settings, multiplayer features, upgradeable weapons/armor, no equipment durability system, no in-game map, no extra penalties upon character death and not a classical bonfire system. These two groups contain only one game each: Darksiders III and Ashen, respectively.
When talking about all factors, no pair contains 6 insignificant differences, however three pairs of factors contain 5 of them. However, these factors still contain one game each. Therefore, the games were the following: Code Vein as well as Salt and Sanctuary; Death's Gambit and The Surge; and lastly, Ashen and The Surge 2. The latter pair is different from before: when comparing the games only, Dark Souls III and The Surge 2 had insignificant differences.
Three pairs of factors contain 4 insignificant differences between emotions: Ashen and Blasphemous, Ashen and Lords of The Fallen; and lastly, Blasphemous and Darksiders III. Three pairs of factors contain 3 insignificant differences, four pairs contain 2, and lastly, eighteen pairs contain only 1.

Discussion
To answer RQ1, all possible pairs of games (210) were created and comparisons were done between the percentage of their positive reviews. As shown in the beginning of subsection 5.2, there are significant differences between 199 pairs of games. The differences are insignificant in the case of 11 pairs. Out of these 11 pairs, 4 pairs contain games from the original creators of the "Souls franchise" (which is From Software).
Regarding RQ2, the three strongest correlations are in the cases of The Surge (approx. 0.3910), Lords of The Fallen (approx. 3860) and The Surge 2 (approx. 3500). This means that the users who have more playtime in these three games, are more likely to leave a positive review (in other words: leave a "thumbs up"). It should also be noted that these three are of the same developer studio, therefore their games may follow a gameplay pattern about which the players feel the same, or they already have a dedicated fanbase. The three weakest correlations are in the case of Code Vein (approx. 0.1290), Hollow Knight (approx. 0.1400), and Dark Souls III (approx. 0.1530). However, according to the percentage of positive reviews in Table 5, these games are well-received. Also, the users' playtime in the case of every investigated game significantly and positively correlates to a positive review.
When talking about RQ3, it can be seen from the one-by-one analyses that the factors significantly influence the percentage of positive reviews. It should be noted that when comparing different graphical styles, games with cel-shaded or pixel graphics received the least percentage of positive reviews. Also, these percentages are not significantly different from each other. When comparing all factors to each other, the percentage of positive reviews was insignificantly different among three pairs.
Lastly, RQ4 needs to be answered: eight emotions were investigated in the textual parts of the reviews: first, the games themselves were analyzed, then the factors were examined oneby-one, and lastly, all factors were investigated. When comparing emotions about these games, the reviews of Ashen and Darksiders III received similar sentiments. This is because 7 emotion pairs had insignificant differences between them. Games were considered similar to each other if at least 4 emotion pairs had insignificant differences among them. This means: Ashen and Darksiders III (7); Death's Gambit and Hellpoint (6); Ashen and The Surge 2 (5); Code Vein as well as Salt and Sanctuary (5); Dark Souls III and The Surge 2 (5); Hellpoint and The Surge (5); Ashen and Blasphemous (4); Ashen and Lords of The Fallen (4); Blasphemous and Darksiders III (4); Blasphemous and The Surge 2 (4); Dark Souls III as well as Salt and Sanctuary (4); and lastly, Death's Gambit and The Surge (4). Therefore, these 12 pairs of games were considered similar based on emotions alone.
From the one-by-one analyses, it can be concluded that almost all emotions can be significantly influenced by these various factors. However, there are exceptions: there is no significant difference between trust when talking about graphical dimensions (2D or 3D), there is no significant difference between anger when talking about upgradeable weapons/armor, there is also no significant difference between sadness when talking about in-game maps, and lastly, there is no significant difference in disgust when talking about extra penalties upon character death. After analyzing all factors, it also became apparent that 928 pairs had significant differences between them, however 31 pairs did not.

The importance of the results
Designing a video game with a good rating can be difficult as many factors which include game design elements and mechanics should be considered. "Souls-like" games have several of these two which are present in all of them. Such design elements and mechanics are the "bonfire checkpoint system", environmental/contextual storytelling as well as the unforgiving difficulty. As the "Souls franchise" inadvertently created this genre, new developers try to make their "Souls-like" games and they try to put new elements into them. These can be new level-up systems, in-game maps, new story settings, et cetera.
By looking at only the average recommendations of the various "Souls-like" games, it can be safely stated that significant differences exist between them. This means that while the games have the same core mechanics, they are distinct enough from each other by having different stories, settings or various gameplay mechanics such as difficulty levels, in-game maps, multiplayer features, et cetera. Therefore, users can give different reviews based on their experiences that they gathered during their playthrough.
Every factor has a significant influence on the percentage of positive reviews. According to the results presented in this article, users liked the following group the most: game, with a medieval setting, 2D graphics, a drawn graphical style, an interconnected world, no difficulty settings, no multiplayer features, weapon upgrades, no equipment durability, a map, additional penalties upon character death and a not classic level-up system.
The implications of these results are of great importance as "Souls-like" games are still made to this day and their developers usually create extra elements to differentiate them from other titles. The results that are shown in this article prove which of the mentioned factors are more liked, therefore carefully implementing them can make the game more positively reviewed by the players. The developers of future "Souls-like" games can use the results presented in this article when creating their next game by implementing the mentioned various design elements and gameplay mechanics.
As mentioned, "Souls-like" games are still developed to this day: Mortal Shell (developed by Cold Symmetry) is one of the newest "Souls-like" games that was made in the last year and possibly more games in this genre are still (and will be) in development. It should be mentioned that this game also contains new elements: the player character can possess enemies, meaning that their bodies can be used in combat. However, the game was not yet available on Steam to include it in the investigation.

Limitations of the study
Due to the nature of this study, the goal was to understand why positive reviews are given to the games and how players feel about them. Therefore, multiple parts of the reviews were analyzed: whether a positive review was given, timestamps at review, and their textual parts. Since this study is already extensive, the following were not investigated: It was not analyzed whether positive reviews and playtimes are in a causal relationship. Naturally, a possibility exists that players might play the game longer if it has more positive reviews or longer playtime causes more positive reviews. Although, according to the results, a positive correlation exists between playtimes and positive reviews. This fact increases the possibility of a causal relationship, but this is to be investigated.
It was also not investigated whether the results are influenced by the games' popularity. Normally, people leave more reviews if a game is more popular. It should be investigated in the future whether this fact influences the results presented in this study, or not.
A possibility also exists that playtime should be considered when assessing reviews as people may leave a negative review after 10-30 min because the game is not working properly for example. While it was not considered, the following should be mentioned: out of 993,932 reviews, 14,290 were under 30 min. Out of them, 7662 were positive and 6628 were negative. Judging by the textual parts, people who tend to leave positive reviews after such a short time are those who either bought it on multiple platforms or played them elsewhere before buying them to support the developers. Since these negative and positive numbers are similar, it can be stated that these reviews with small playtimes almost cancel each other out.

Conclusions
The correlation between positive reviews, game design elements, and mechanics was investigated in "Souls-like" games by scraping the Steam webpage. For this, all (993,932) reviews were scraped during the middle of April 2021. The playtimes at review, the textual components were assessed as well. The percentages of positive reviews were also analyzed, and they were compared to other games in this genre.
It is shown by the results that there are significant differences between the percentages of positive reviews regarding each game. Slight-to-moderate correlation also exists between positive reviews and playtimes. 11 various categories were set up to analyze different design elements and game mechanics in "Souls-like" games. After comparing the percentage of positive reviews regarding each factor, it can be concluded that significant differences exist between all of them, although some games are reviewed similarly by players. For example, Ashen and Darksiders III evoked similar valence of 7 emotions in two cases. The first case was when the games themselves were analyzed, and the second was when the factors were investigated. Also, 12 pairs of games were deemed similar to each other based on sentiment analysis.
According to the results, players are more likely to leave a positive review on those games which have one of the following factors: medieval setting, 2D graphical dimensions, drawn graphical style, interconnected world, no difficulty settings, no multiplayer features, no weapon/armor upgrades, having equipment durability features, an in-game map, extra penalties upon death and not a classic level-up system. This is summarized in Table 8.
However, when all factors are considered, those games receive the best percentages of positive reviews which have a medieval setting, 2D graphics, a drawn graphical style, an interconnected world, no difficulty settings, no multiplayer features, have weapon upgrades, no weapon durability, have a map, have additional penalties upon character death and do not have a classic level-up system. As can be seen, the factors of weapon/armor upgradeability and equipment durability are different from Table 8 when all factors are considered.
Naturally, designing and developing a game is not easy, but the results that are presented in the article can influence the creation of future "Souls-like" games by helping their developers to choose design elements and game mechanics. According to the results, each factor has its pros and cons. They can also evoke various emotions in the players.
In the end, these games are not small in numbers and usually are well-received, therefore more of them may see the light of day in the future.