1 Introduction

Video games have come a long way since their inception and nowadays, several genres of them exist, such as first-person shooters, real-time strategies, and role-playing games. Throughout the years video game genres have evolved as well by incorporating different design elements and game mechanics. This makes gaming experience unique as having good gameplay is important [13, 14].

Developing a unique gaming experience is not an easy task and guides were created to make the procedure simpler [1, 8]. Designing role-playing games is even more complex as according to Horsfall and Oikonomou, players prefer a good combat system, story, and interaction with non-player characters (NPCs) [22]. Daneels et al. also proved that narrative, interaction with characters, and narrative-impacting choices elicited eudaimonic moments [11]. Such design elements as realistic characters and tone-appropriate soundtrack scores can also enhance player experiences. The latter is also strengthened by Skalski and Whitbred, however spatial dimensions do not significantly affect enjoyment [42]. Contrarily to spatial dimensions, graphics and “juiciness” are important as well [26]. According to Bostan and Kaplancali, certain game mechanics such as melee combat, fixing broken equipment, gathering information, acquiring something, etc. can motivate players by providing their needs [5].

1.1 Inception of the “Souls franchise”

As years progressed, some developers began experimenting with various game mechanics, such as an unreliable narrator [39] or telling stories using game mechanics [21]. After experimenting with game mechanics, the first “Souls game” titled Demon Souls, developed by a Japanese game studio called From Software, was released on PlayStation 3 (PS3) in 2009 under the direction of Hidetaka Miyazaki. Initially, it received a mixed reception, but ultimately it became a commercial success. Due to its success, three sequels were made by the same team: Dark Souls I, II, and III. Contrarily to Demon’s Souls, the Dark Souls games were released on PC. These games in the “Souls franchise” were praised for their high difficulty level which can either be intimidating or interesting to the players and even their viewers [18]. These games also have a vast, interconnected hostile world that can be explored (almost) freely, appealing to people who are of the “compete” and/or “explore” types of players [27]. The results in the study of Salmon et al. also show that games that are challenging are in the top three types of games played by seniors [41].

The “Souls-like” video game genre was inadvertently created due to the influence of the “Souls franchise” as several other game developers implemented its “Souls formula” in their games. However, each “Souls-like” game is given new twists which can mean new gameplay mechanics, graphical style, etc. while maintaining the core elements such as the unforgiving difficulty, the so-called “bonfire checkpoint system”, and the environmental/contextual storytelling. As there is a high emphasis on the environments, screenshots of Dark Souls, Dark Souls II – Scholar of The First Sin, and Dark Souls III are shown in Fig. 1.

Fig. 1
figure 1

Screenshots of Dark Souls (a), Dark Souls II – Scholar of The First Sin (b), and Dark Souls III (c)

Changes among “Souls-like” games can either be small or large: for example, gunplay (ranged combat) is one of the main focuses of Remnant: From The Ashes instead of the (mostly) melee combat that can be found in the Dark Souls games; or another example is Salt and Sanctuary which is a 2D game instead of a 3D one. Most games do not provide an in-game map, however some do. Therefore, differences exist between the various “Souls-like” games, but what do players think about them? To find an answer to this question, the reviews on Steam have to be analyzed.

1.2 Analyzing reviews on steam

Analyzing Steam reviews is not a foreign concept as several studies investigate their usefulness. Lin et al. studied the reviews found on early access games and concluded that players leave more positive reviews after the game leaves early access [30]. According to Busurkina et al., reviews usually contain seven topics: achievements, narrative, social interaction, social influence, visual/value, accessories, and general experience. Besides these, users are likely to report bugs in the reviews [7].

As Steam reviews have multiple components, not every part of them is necessary to identify whether they are useful or not. According to Kang et al. the total votes on a review by others is the most important in this regard, while the second most important factor is the user’s recommendation (whether a review is positive or negative) for the game [25]. However, the study of Eberhard et al. shows that those reviews that are more voted on are more complex and express more negative sentiments as they are critical towards the game [12].

Then, what is the case with the “Souls franchise” and “Souls-like” games? Is their reception significantly different due to the various design choices? What are the sentiments of players? To find an answer to these questions, several “Souls-like” games are investigated along with the “Souls franchise”. These examined games are shown in Table 1:

Table 1 The list of investigated games

Therefore, for this investigation, this article is structured as the following: the core game mechanics of “Souls franchise” and “Souls-like” games are discussed in Section 2, the research questions are presented in Section 3, the used materials and methods are shown in Section 4, while the results and discussion can be seen in Sections 5 and 6, respectively. Lastly, conclusions are drawn in Section 7.

2 What makes a game “Souls-like”?

In this section, the game design elements and mechanics – which some call an aesthetic category [29] – are presented in detail. As shown in Fig. 2, the appearance of “Souls-like” games varies among them mainly due to design elements, but sometimes due to their mechanics as well.

Fig. 2
figure 2

Screenshots of Death’s Gambit (a), Sekiro: Shadows Die Twice (b), Hollow Knight (c), and The Surge 2 (d)

Even though their appearance varies, their two core elements remain. These two core elements are the unforgiving difficulty and environmental/contextual storytelling which are detailed in subsections 2.1 and 2.2, respectively.

2.1 The unforgiving difficulty

According to the Flow theory of Csikszentmihalyi, players should be put in a so-called “Flow channel” which exists between the stages of boredom and anxiety [36]. Players should occasionally step into both stages which can create the concept of flow and interesting gameplay. However, the time spent in either boredom or anxiety should not be long, otherwise the player could get demotivated or annoyed. It should be noted that staying in the anxiety stage can also increase the heart rate of the player [4].

However, these games are more about perseverance than are about Flow because the stage of boredom is very small [33], and interaction with the vast environment can be done in multiple ways [37]. Also, the game’s world is hostile and anything could kill the player character. At a first glance, the player does not know which NPCs are friendly or are hostile. Since the games usually do not restrict killing NPCs, friendly ones can also be killed. This can prevent completing certain quests or can set the player character on a new path. Thus, the challenge is usually a lot higher than the player’s skill. According to Rogers, the stage of “pain and loss” can be reached due to the high difficulty, and the player’s skills can also be improved which is another stage [40]. This can create some goals and motivation [31]. This is one of the keys to the successful flow of gameplay found in these types of games [2].

2.1.1 Combat

Even the weakest enemy can kill the player character with three or four attacks. To add to the difficulty, sometimes enemies are in a certain position where the player cannot see them (e.g. behind a column). This means that the player has to know their position, attack pattern, and the area’s layout to survive the encounter.

Since the focus is on melee combat in “Souls-like” games, the weapons and armor of the player character can be upgraded in most of them. As aggressive combat is preferred when playing these games, a “stamina system” is also used: when the player dodges, runs, or attacks, the stamina of the character decreases. When the stamina is empty, the player character cannot do anything besides walking. Therefore, resting is required to refill it, but the player character can be attacked during resting.

As the difficulty of these games is high, the players have multiplayer features in most of them. This means they can summon other players into their games for help, usually before fighting bosses. However, Remnant: From The Ashes is one game that offers a full campaign playthrough with two other players at most.

2.1.2 The bonfire checkpoint system

The so-called “bonfire checkpoint system” is the most important part of the “Souls-like” games. Depending on the games, the bonfires themselves can change graphically due to the changes in the story. For example, the bonfire is illustrated as a bench in Hollow Knight, as a shrine in Nioh: Complete Edition, or as a crystal in Remnant: From The Ashes. However, their functions remain the same in each game. This system is illustrated with a flowchart in Fig. 3.

Fig. 3
figure 3

Flowchart of the “bonfire checkpoint system”

In this simplified version of the “bonfire checkpoint system” the player character can either do two things: interact with a bonfire or be killed. In the case of interaction, a new checkpoint is created near the bonfire. The player character becomes healed and all other enemies (except the bosses) respawn in the world of the game. The health potions (e.g. Estus Flasks in Dark Souls) are also refilled. In case of death, first, the currency is dropped on the ground. Then, the player character is either returned fully healed with his refilled health potions to the last checkpoint or the beginning of an area. The latter usually only happens in the beginning of the game. The enemies also respawn in the case of the player character dying. Now, the player character has a choice that is not explicitly illustrated in Fig. 3: the player character can either go back to pick up the previously dropped currency or can continue the game. It should be noted that if the player character is killed again, then the previously dropped currency will be lost forever. However, new currency (which was gathered after the previous death) can be dropped in a new spot upon dying. Naturally, if the player character picks up the previous currency, it is added to the new currency in the character’s inventory.

2.1.3 Level design

Regarding level design, most of the “Souls-like” games have an interconnected and open-world map that can be perceived as giant [16], whereas in Nioh: Complete Edition, the player has to choose a level from the level selection screen. The 2D games share this interconnected open-world, but they are called “Souls-like Metroidvania” games. Metroidvania is a portmanteau of Metroid and Castlevania as they were the pioneers of this genre: usually, these are 2D side-scroller role-playing games with an open world. The player has to navigate the levels, defeat enemies and bosses to proceed, however there are environmental hazards as well, e.g. traps. An in-game map is present in these games, except in the case of Salt and Sanctuary. Similar maps can also be found in Code Vein or Remnant: From The Ashes which are 3D “Souls-like” games. There is a map in Sekiro: Shadows Die Twice, but it does not tell the exact position of the player, thus it is not counted as an in-game map.

However, knowing the area is not easy, because most “Souls-like” games do not show the layout, do not point in a direction, and do not have an in-game map. For example in the first Dark Souls, it is only mentioned to “ring a bell inside the tallest tower”. The player has to figure out how to get there as the game is not linear and no marks can be seen that could indicate the correct way. It is possible to go into different directions as well and begin a new quest.

2.2 The environmental/contextual storytelling

A recurring theme of death is featured by the “Souls franchise” and it is even built into their stories and gameplay [34, 45]: the player character is some kind of undead on a quest – a hero’s journey [9, 43] – to find a cure and/or rekindle a fire to stop the end of “The Age of Fire” and even the end of the world [19]. However, it is up to the player to stop this because it is also possible to hasten the world’s end in the games. It is the player’s choice, usually depending on what the player understood of the story.

As can be expected, the story of the franchise is deep and evolved throughout the Dark Souls games [17]. Although, the storytelling in these “Souls” games is quite unusual: besides environmental storytelling and cryptic dialogues of NPCs, the story of the games is told through item descriptions [3]. In other words, the storytelling can be contextual as well within the games.

A great example of environmental storytelling is in the original Dark Souls: the town of Oolacile is corrupted by the Abyss. The player starts at the top of the tallest building. First, the player has to go down to the surface level, and lastly, to the Abyss itself. As the player ventures downward, more black substances appear on the buildings, more buildings can be seen destroyed. Even the environment itself becomes darker later on, and the enemies appear more inhuman. An example of contextual storytelling is in the description of the “Witch’s Ring” in Dark Souls III: “The Witch of Izalith and her daughters, scorched by the Flame of Chaos, taught humans the art of pyromancy and offered them this ring (…)”. This sentence tells the player that the art of pyromancy (which is a form of magic in the “Souls franchise”) is taught by The Witch of Izalith. This fact could not be known if the player did not read the item descriptions. The game will not tell the player during the playthrough. There are countless other similar examples.

It should be noted that the item descriptions are as cryptic as the dialogues of NPCs. It is possible that the players could not understand the story even after finishing the games if they did not search for crucial information that is hidden in the world and item descriptions. However, the community of this franchise is quite large, meaning that online forums and videos exist intending to piece the story of the games together [23].

3 Research questions

As could be seen in the introductory section, game design elements and mechanics make the games unique. The effects of narrative, graphics, character models, sounds, melee combat, fixing broken equipment, gathering information, and acquiring items were investigated in the literature. Also, according to previous results, analyzing whether Steam reviews are positive or negative can determine what players liked or did not like. As people write reviews, their sentiments are reflected in the text.

Since there are several “Souls-like” games available, and they have design elements and mechanics that vary, it would be interesting to see whether players have different opinions and feelings about them. Therefore, four research questions (RQs) were formulated to investigate whether these differences positively or negatively affect the reception of the games.

  • RQ1: Are there any significant differences in the percentage of positive reviews among the games?

  • RQ2: Does the playtime of the user correlate with the percentage of positive reviews the games received?

  • RQ3: Do the different game design elements and game mechanics influence the percentage of positive reviews?

  • RQ4: Do the different game design elements and game mechanics influence the emotions in the reviews?

4 Materials and methods

To answer the RQs, a considerable amount of user reviews was needed. To scrape the Steam webpage, the freely available steam_reviews package was used [20]. This package was developed in Python under the MIT license and it uses the Steamworks API. With the use of this package, user reviews were scraped from the Steam website during the middle of April 2021. The reviews were downloaded in a .json format. Then, they were imported using the jsonlite package into R in which they were evaluated [10].

The investigation consisted of three phases. The first phase was the scraping (subsection 4.1), the second was the creation of factors that were investigated (subsection 4.2) and the third was the evaluation itself (subsection 4.3). The latter phase was made up of three various parts: the first part focused on the correlation between playtime and the reviews, the second focused on the percentage of positive reviews grouped by games and certain factors, while the third part focused on the text component inside the reviews.

4.1 The scraping process

Before the scraping process began, the games themselves had to be selected. The selection was made somewhat easier by the tag system on Steam, however the tags are placed by users on the games. This fact is important because it can make the system faulty. For example, at the time of writing this article, “VEGAS Pro 14 Edit Steam Edition” is tagged as “Souls-like” on Steam, despite it being a video editing software. Therefore, the popular tags on these games were examined carefully. The selection criteria were the following: is tagged as “Souls-like” on Steam, has similar combat, and a checkpoint system to the “Souls franchise”, as well as it is not an early access title. To ascertain whether the selected games could be considered “Souls-like”, they were tried out and most of them were even played through. For example, Titan Souls was not included in the investigation: it was tagged as “Souls-like” on Steam, but it did not contain the “bonfire checkpoint system”. It is more like a “Shadow of the Colossus”-type of role-playing game, than a “Souls-like”. This means that the game has an open world and besides the player character, only bosses exist in it. The player has to find them and defeat them. If all are defeated, the game is completed.

After the games were selected, the Steam reviews subpage was examined. Each game on Steam has a subpage that contains the user reviews. The reviews are complex and each contains multiple fields of information [46], but only the following were scraped:

  • The game’s name.

  • Language of the review.

  • Textual part.

  • Whether the game is recommended in the review (thumbs up or thumbs down).

  • The reviewer’s total playtime with the reviewed game.

  • The reviewer’s playtime at the time of review with the reviewed game.

Overall, the number of scraped reviews was 993,932. This also means that – regarding these games – all reviews were scraped from Steam during the middle of April 2021. The number of these records can also be seen in Table 2 regarding each game. Based on the number of reviews alone, Dark Souls III is the most played game in this genre.

Table 2 The number of scraped reviews grouped by game in ascending order

4.2 The investigated factors

After the scraping was completed, the factors that were critical to the comparison were set up. For the investigation, the design elements and game mechanics in the previously mentioned games were carefully examined and the differences between them were noted. These differences and their abbreviations are the following:

  • The setting (SET): e.g. medieval setting, as in the Dark Souls games.

  • Graphical dimensions (GD): e.g. 2D games, such as Death’s Gambit.

  • Graphical style (GS): e.g. realistic style, as in The Surge games.

  • Level design (LD): e.g. level selection system, such as in Nioh: Complete Edition.

  • Whether there are difficulty settings (DIFF): e.g. there are no difficulty settings in the Dark Souls games.

  • Whether a multiplayer feature exists (MUL): e.g. Remnant: From The Ashes has cooperative play.

  • Whether the weapons or the armor of the player character can be upgraded (UPG): e.g. these are not upgradeable in Sekiro: Shadows Die Twice (only the skills are upgradeable).

  • Whether a durability variable exists for the equipment of the player character (DUR): e.g. the weapons/armor can break in the first Dark Souls game.

  • Whether an in-game map is provided to the player (MAP): e.g. there is a map in the upper right corner of the screen in Code Vein.

  • Whether there are extra penalties upon death (PEN): e.g. the player character loses something else besides his currency, as in Blasphemous.

  • Whether the player character can level-up near a bonfire or at a character beside it (LVL): e.g. as in the Dark Souls games.

After these factors were set up, several categories were created from these differences to serve as the basis of comparison. Each of the games was examined by trying/playing them, and the possible categories were noted by hand. The categories were designed to have multiple games in them. These categories can be seen in Table 3, where their possible states are also presented.

Table 3 The investigated states of “Souls-like” game design elements and mechanics

Afterward, the games were carefully examined by trying/playing them and these previously mentioned factors were assigned to them manually. These can be found in Table 4.

Table 4 The investigated factors and their respective values are assigned to each game

4.3 Evaluation of the data

On Steam, every review is either positive or negative. This is symbolized with a thumbs up or a thumbs down on each game’s review page. Thumbs up also means that the game is recommended by the reviewer, while thumbs down naturally means that it is not recommended. For reading purposes, these are used as synonyms in this article; therefore, positive review = thumbs up = recommended game; while negative review = thumbs down = not recommended game.

As was mentioned in the beginning of this section, the evaluation was done in three parts. First, the correlation was investigated between the playtime and the reviews. As each review contained the playtime at reviewing, it was easy to calculate. It should be noted that the steam_review package scrapes the playtimes in minutes. These were converted to hours. The results of this part of the evaluation can be found in subsection 5.1.

The second part focused on the percentage of positive reviews regarding each game. After scraping 993,932 reviews and looking through the data, it was concluded that 904,005 of them are positive ones. This means that 89,927 reviews are negative. In other words, approximately 90.96% of the reviews are positive and 9.04% are negative. Naturally, this is only the overall number. In the next section, this is elaborated on regarding each game, and each factor. First, the games themselves are compared. Afterward, they are grouped by one factor, then by all factors, and lastly, are compared to each other. This part of the evaluation can be seen in subsection 5.2.

However, the only problem with the Steam reviews was that they do not have a rating system between the scale of 1-10 (or 1-5). To get a better picture of the reviews, their textual parts were analyzed as well. Therefore, for the textual analysis, the “syuzhet” Natural Language Processing package was used in R [24, 47]. With its help, it is possible to classify emotions people felt while writing the reviews. Four sentiment lexicons are incorporated by this package. The NRC Emotion Lexicon was chosen as it is free for research purposes, and because it proved to be useful in the past [6, 28]. Eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (positive and negative) exist in it. With the syuzhet package’s customizable get_nrc_sentiment(char_v, cl = NULL, language = “english”, lowercase = TRUE) function, the sentiment of each word or sentence can be easily assessed: they are compared to the words in the previously mentioned emotion lexicon. Then, a data frame is created in which each row corresponds to a sentence in the reviews, while the sentiments of every emotion are represented by columns [35]. Negative numbers are converted from the negative column. Afterward, they are added to the values in the positive column. The resulting matrix shows the number of sentiments per sentence. This was the third part of the evaluation and can be found in subsection 5.3.

5 Results

Before further research commenced, the distributions of the playtime, reviews, and factors had to be investigated as it was imperative to know them. For this, the Kolmogorov-Smirnov test was used [44]: it can compare a sample with a reference probability distribution or two samples to each other. Regarding the playtimes, the hypotheses of normal distribution are rejected in the case of every game: p < 2.2 × 10−16 in all of them. Next, the distribution of the percentage of positive reviews was investigated: if a review was positive, it was counted as 1, and when it was negative, it was counted as 0. When investigating the percentage of positive reviews, the hypothesis of normal distribution is accepted with \(p=0.8104\). Regarding the factors per game, the hypotheses of normal distribution are also accepted in the case of every factor. The distributions of the percentage of positive reviews per game are detailed in Fig. 4.

Fig. 4
figure 4

The distributions of the percentage of positive reviews per game

As could be observed in this section, the distribution of some data is normal, while it is not normal in some cases. Due to this fact, parametric and nonparametric tests should be used, respectively for further analyses. Such parametric test is the t-test [38], while such nonparametric test is the Mann-Whitney-Wilcoxon test [32]. These two are used for further investigation.

5.1 Investigating the correlation between the playtimes and positive reviews

First, the playtimes, afterward the number of recommendations was examined. The results of the examination can be seen in Table 5. The playtimes are depicted in hours in every case.

Table 5 The correlation between the playtimes at review and positive reviews

According to the scraped reviews, if the playtimes at review are investigated, then Dark Souls III is the game that has the largest average playtime (104.07 h), while Death’s Gambit has the smallest average (12.99 h). If the playtimes are still counted even after reviewing, these two games would still have the largest and smallest average playtimes, respectively with 182.66 h and 17.79 h. Although it should be mentioned, that in most cases the standard deviation of playtime is larger than the average value. Naturally, this means that the playtime is more spread out. However, it also means that there were people who really liked the games: somebody played 27728.35 h (1155.348 days!) of Dark Souls III and 561.2 h (23.38 days!) of Death’s Gambit. These were the maximum playtimes in the dataset when these two games are mentioned.

Hollow Knight is the best positively reviewed “Souls-like” game with an average of 0.9692, while Sekiro: Shadows Die Twice is the second most positively reviewed (0.9478) and Dark Souls III is the close third (0.9388). Lords of The Fallen has the smallest average of positive reviews with 0.6072.

The next to examine was the correlation between the recommendations and playtime. As mentioned earlier, the distributions of playtimes are not normal. Therefore, a non-parametric correlation test, called the Spearman’s rank correlation coefficient was used to determine whether they are independent of each other [15]. The results of this investigation can also be seen in Table 5.

According to the data, the correlation varies by each game, while the significance is very strong in each case (p < 2.2 × 10−16). Every correlation value is positive which means that if the playtime increases, the player is more likely to recommend the game. The three strongest correlations are in the cases of The Surge (approx. 0.3910), Lords of The Fallen (approx. 3860), and The Surge 2 (approx. 3500). It should be noted that these three are made by the same developer. The three weakest correlations are in the case of Code Vein (approx. 0.1290), Hollow Knight (approx. 0.1400), and Dark Souls III (approx. 0.1530).

5.2 Analyzing the relation between the percentage of positive reviews, game design, and mechanics

Before investigating the various factors, significant differences among the percentages of positive reviews of the selected games were examined. Every possible pair of them was assessed. There are significant differences between 199 possible pairs. There are 11 cases in which the difference is insignificant:

  • Blasphemous and Code Vein (p = 0.0679).

  • Blasphemous and Dark Souls II (p = 1).

  • Code Vein and Dark Souls II (p = 0.34573).

  • Dark Souls and Salt and Sanctuary (p = 1).

  • Dark Souls – Remastered and Dark Souls II – Scholar of the First Sin (p = 1).

  • Darksiders III and Death’s Gambit (p = 1).

  • Darksiders III and The Surge (p = 0.86881).

  • Death’s Gambit and The Surge (p = 1).

  • Hellpoint and Nioh: Complete Edition (p = 0.34573).

  • Hellpoint and The Surge 2 (p = 1).

  • Nioh: Complete Edition and The Surge 2 (p = 1).

Also, three pairs are close to being insignificant: Ashen and The Surge (p = 0.02532); Blasphemous and Star Wars Jedi: Fallen Order (p = 0.04333); and lastly, Death’s Gambit and Hellpoint (p = 0.04596). However, based on the previous p-values of the average percentage of positive reviews, it can be stated that these are the games that received similar reviews to each other.

Next, the percentage of positive reviews was assessed among the factors. The investigation resulted in the following:

  • SET: there are strong significant differences in positive reviews among all settings (p < 2 × 10−16 in all cases, except between medieval and Japanese settings (p = 0.0013)). The percentages of positive reviews are 91.82%, 91.57%, and 86.74% for medieval, Japanese, and futuristic settings, respectively.

  • GD: “Souls-like” games with 2D graphics are significantly (p < 2 × 10−16) more likely to receive a positive review than with 3D graphics. The percentages of positive reviews are 95.61% and 89.94%, respectively.

  • GS: there are strong significant differences in positive reviews among all graphical styles in all cases (p < 2 × 10−16), except among cel-shaded and pixel graphics where the significance is weak (p = 0.027). The percentages of positive reviews are 96.48%, 90.03%, 87.72%, and 87.11% for drawn, realistic, cel-shaded, and pixel graphics, respectively.

  • LD: “Souls-like” games which feature an interconnected world are significantly (p < 2 × 10−16) more likely to receive a positive review than those with level selection. The percentages of positive reviews are 91.44% and 87.18%, respectively.

  • DIFF: “Souls-like” games which do not have difficulty settings are significantly (p < 2 × 10−16) more likely to receive a positive review than those which have. The percentages of positive reviews are 91.43% and 87.40%, respectively.

  • MUL: “Souls-like” games which do not have multiplayer features are significantly (p < 2 × 10−16) more likely to receive a positive review than those which have. The percentages of positive reviews are 92.20% and 90.06%, respectively.

  • UPG: “Souls-like” games which do not allow weapon and/or armor upgrades are significantly (p < 2 × 10−16) more likely to receive a positive review than those which allow. The percentages of positive reviews are 92.95% and 90.39%, respectively.

  • DUR: “Souls-like” games which have an equipment durability mechanic are significantly (p < 2 × 10−16) more likely to receive a positive review than those which not. The percentages of positive reviews are 91.52% and 90.44%, respectively.

  • MAP: “Souls-like” games which have an in-game map are significantly (p < 2 × 10−16) more likely to receive a positive review than those which not. The percentages of positive reviews are 92.56% and 90.25%, respectively.

  • PEN: “Souls-like” games which have additional penalties upon character death are significantly (p < 2 × 10−16) more likely to receive a positive review than those which not. The percentages of positive reviews are 92.77% and 90.30%, respectively.

  • LVL: When the level-up system is not the classic one (meaning not done at a “bonfire” or specific characters near it), “Souls-like” games are significantly (p < 2 × 10−16) more likely to receive a positive review than those which have the classic level-up system. The percentages of positive reviews are 94.04% and 90.18%, respectively.

Afterward, all factors were investigated. According to these variables, 16 levels could be created of these 21 “Souls-like” games. The following insignificant differences were found among the positive reviews:

  • (F 3D R I ¬DIFF ¬MUL UPG ¬DUR ¬MAP ¬PEN LVL) & (F 3D R I DIFF ¬MUL UPG ¬DUR ¬MAP ¬PEN LVL).

    • There is only one game in each category: The Surge and Darksiders III. This means that the percentage of positive reviews among these two games are insignificantly different (p = 0.32581).

  • (F 3D R I ¬DIFF ¬MUL UPG ¬DUR ¬MAP ¬PEN LVL) & (M 2D P I ¬DIFF ¬MUL UPG ¬DUR MAP ¬PEN LVL).

    • There is only one game in each category: The Surge as well as Salt and Sanctuary. This means that the percentage of positive reviews between these two games are insignificantly different (p = 0.47739).

  • (F 3D R I DIFF ¬MUL UPG ¬DUR ¬MAP ¬PEN LVL) & (M 2D P I ¬DIFF ¬MUL UPG ¬DUR MAP ¬PEN LVL).

    • There is only one game in each category: Darksiders III as well as Salt and Sanctuary. This means that the percentage of positive reviews between these two games are insignificantly different (p = 0.94784).

The differences between the positive reviews of all other “Souls-like” games were strongly significant. The six combinations of factors that are most likely to receive more positive or less positive reviews are seen in Table 6:

Table 6 The three best and three worst positively reviewed combination of factors

According to Table 6, the best percentage of positive reviews is in the case of Hollow Knight as there is only one game in this combination of factors. The percentage of positive reviews is 96.92% at this level. The second is Sekiro: Shadows Die Twice with 94.78%, as there is only one game in this combination of factors. The third combination of factors consists of Dark Souls, Dark Souls – Remastered, and Dark Souls III. Their percentage of positive reviews is 92.60%. The lowest percentage of positive reviews (60.72%) is Lords of the Fallen as there is only one game in this combination of factors. With 70.67% positive reviews, only Ashen is in the category. In the remaining category, there is only The Surge with 74.56% positive reviews.

5.3 Analyzing the textual reviews

As was mentioned earlier, the textual reviews were analyzed as well. Since the previously mentioned emotion lexicon is multilingual, every textual review was analyzed. This means that the textual parts of all 993,932 reviews were examined. To do this, first, the textual components of all reviews were analyzed. Then, the investigation continued by examining one variable only. Afterward, all variables were analyzed. The results of the analysis regarding all reviews can be seen in Fig. 5.

Fig. 5
figure 5

The percentage of the average valence of basic emotions and their standard deviations

As can be seen in Fig. 5, trust, anticipation, and joy were the three most largely felt emotions when writing the reviews. Their percentages were 16.68%, 15.32%, and 14.15%, respectively. The percentage of surprise in the reviews was quite small with 7.66%. The feeling of disgust had the lowest percentage with 6.95%. Anger (11.83%), fear (13.61%), and sadness (13.76%) are quite high as well, but can this phenomenon be considered normal? The hypothesis is that “yes, because they are skill-based games”. Therefore, to answer this, these emotions are grouped by each game in Table 7. In case of standard deviation, the largest one was joy with 2.24%, while the smallest was surprise with 0.78%.

Table 7 The distribution of sentiments found in the scraped game reviews, grouped by emotions

As shown in Table 7, some games received similar in case of each emotion (defined as ±1% of means): this fact is denoted by *. This means that the previous hypothesis was accepted. Darksiders III was the one that made most people angry (14.88%) and was most feared (15.67%). It is not surprising as its prequels were not “Souls-like”: Darksiders I was a simple “hack’n’slash” game, while Darksiders II was an open-world roleplaying game. Based on the scraped reviews, Star Wars Jedi: Fallen Order was the most anticipated game (19.97%), possibly because previous Star Wars games were multiplayer games. Dark Souls II - Scholar of The First Sin was the one that people felt most disgusted by (8.75%), as it included extra penalties upon death and unfair enemy placement. The players of Hollow Knight felt the most joy (20.09%), possibly due to the rewarding experience, while the most sadness is felt during Dark Souls II – Scholar of The First Sin (16.56%). Contrarily to Hollow Knight, Dark Souls II – Scholar of The First Sin was not very rewarding. The Surge 2 players felt the most surprise (9.95%), possibly because of a different narrative, while Star Wars Jedi: Fallen Order was the most trusted (19.01%).

Hollow Knight was the one that made people the least angry (8.82%), and it also was the least feared game (10.05%). The reasons were probably the same as above. Dark Souls II – Scholar of The First Sin was the least anticipated (13.18%) due to its base game’s reception. Although, Dark Souls II was more anticipated (14.28%). Star Wars Jedi: Fallen Order was the game that people felt least disgusted by (4.69%) as well as made the players the least sad (8.91%). Reviewers of Lords of The Fallen felt the least joy (11.34%). Darksiders III felt the least surprising (6.57%). Lastly, Blasphemous felt the least trusted (13.92%).

To analyze whether there are differences between the emotions, first, their data distributions were investigated, and neither of them had a normal one. Then, using the Mann-Whitney-Wilcoxon test, the differences between each of them were investigated in the case of every emotion. Most of them were strongly as well as significantly different from each other (p < 2 × 10−16), therefore it is easier to define those pairs where only insignificant differences arose (meaning p > 0.05). The results of these comparisons can be seen in Fig. 7 in the appendix: 62 pairs of games had insignificant differences among emotions. Out of these pairs, Ashen and Darksiders III had the most similar reviews based on sentiments alone: there were insignificant differences among 7 emotions. The second pair which received similar reviews was Death’s Gambit and Hellpoint because there were insignificant differences among 6 emotions. Four pairs of games had insignificant differences among 5 emotions: Ashen and The Surge 2; Code Vein as well as Salt and Sanctuary; Dark Souls III and The Surge 2; and lastly, Hellpoint and The Surge. Six pairs of games had insignificant differences between 4 emotions: Ashen and Blasphemous; Ashen and Lords of The Fallen; Blasphemous and Darksiders III; Blasphemous and The Surge 2; Dark Souls III as well as Salt and Sanctuary; and lastly, Death’s Gambit and The Surge. The number of pairs is the same with 3 emotions. Twelve pairs of games had insignificant differences between 2 emotions, and lastly, thirty-two pairs of games had insignificant differences among 1 emotion. Every other pair was significantly different from each other.

5.3.1 One-by-one analyses

During one-by-one analyses the feelings about each factor were investigated. The results can be seen in Fig. 6. The columns are grouped by factors, while each shade represents a feeling. From the darkest to the brightest, they are the following: anger, anticipation, disgust, fear, joy, sadness, surprise, trust.

Fig. 6
figure 6

The average sentiments per review and factor

Starting with the setting, futuristic games have the largest average sentiments per review, while those with Japanese setting have the smallest ones. The increases among them are 108.66%, 183.17%, 109.70%, 102.36%, 186.31%, 103.24%, 128.42%, 163.62%. These increases correspond to anger, anticipation, disgust, fear, joy, sadness, surprise, trust, respectively. The difference among every possible pair is significant (p < 2 × 10−16).

2D games received more positive emotions than 3D games: their anticipation, joy, surprise are better by 6.34%, 30.03%, and 12.47, respectively. They also received more sadness by 0.09%. The 3D games received more negative emotions such as anger, disgust, and fear by 22.24%, 26.73%, and 30.68%, respectively. However, 3D games are more trusted by 3.01%. In this case, not every difference is significant: between 2D and 3D, trust is the only emotion that is insignificantly different (p = 0.85). When talking about sadness, even though the difference is small, it is significant: \(p=4.9\times {10}^{-9}\), while \(p=5.8\times {10}^{-8}\) in case of anticipation. In case of all other emotions, p < 2 × 10−16.

Regarding graphical styles, games with pixel graphics received the largest average of emotions, while drawn and realistic games received the least average. Each difference is significant, although the value of p is not \(2\times {10}^{-16}\) in every case: when talking about anticipation, p = 0.00038. This means that the difference is only moderately significant. The remaining pairs are strongly significant. In case of disgust and fear, the significance among cel-shaded and realistic graphics are \(p=1.4\times {10}^{-14}\), and \(p=1.7\times {10}^{-14}\), respectively. When talking about surprise and trust, the differences between drawn and realistic graphical styles are also strongly significant (\(p=6.9\times {10}^{-7}\), and \(p=1.6\times {10}^{-11}\), respectively).

While level selection type of games received more anger and fear by 22.99% and 15.89, respectively, it received more positive emotions: these games are more anticipated (58.63%), and people felt more joy (53.32%), more surprise (22.10%), and more trust (47.51%). Games with interconnected levels received more disgust (1.32%) and sadness (11.05%) in their reviews. Each difference is significant in the case of every emotion pair: p < 2 × 10−16, except when talking about disgust. In that case, \(p=1.5\times {10}^{-6}\), although it is still strongly significant.

When there are difficulty settings in the games, more average sentiments can be found in the reviews. The increases in average emotions are the following from anger to trust (darkest to brightest): 62.68%, 97.76%, 31.74%, 55.14%, 90.76%, 15.64%, 55.44%, and 79.78%. Every difference among pairs is strongly significant (p < 2 × 10−16).

It can be seen that the existence of multiplayer features made people feel more disgust (16.74%), more fear (4.46%), and more sadness (9.48%). However, when multiplayer features are not present, people felt angrier (3.32%), had more anticipation (18.97%), joy (29.67%), surprise (14.01%), and trust (8.30%). Every difference among pairs is strongly significant (p < 2 × 10−16), except in case of anger where it is moderately significant (p = 0.0045).

When weapons or armor can be upgraded, the emotions of disgust, fear, sadness, surprise are increased by 22.88%, 6.13%, 36.20%, and 3.27%, respectively. When they could not be upgraded, the reviews contained more anger (0.69%), anticipation (10.22%), joy (9.12%) and trust (0.65%). Every difference among pairs is significant, except in case of anger where it is not (p = 0.24). The difference between the remaining pairs are strongly significant: \(p=5.6\times {10}^{-7}\) when talking about surprise, and p < 2 × 10−16 in the case of the remaining pairs.

In case of games with equipment durability, players felt more disgust (5.32%) and sadness (4.54%) on average in the reviews. The games without a durability feature received more anger (15.93%), anticipation (30.56%), fear (8.93%), joy (41.16%), surprise (26.67%), and trust (22.18%) on average in the reviews. Every difference among pairs is strongly significant (p < 2 × 10−16), except in the case of fear where it is only slightly significant (p = 0.029).

In those games where an in-game map is provided, the reviews contain more average of every emotion except disgust. Anger, anticipation, fear, joy, sadness, surprise, and trust are increased by 9.21%, 52.81%, 3.70%, 73.49%, 1.98%, 35.39%, and 36.63%, respectively. When an in-game map is not provided, the feeling of disgust is increased by 5.81%. Every difference among pairs is significant, except in the case of sadness where it is not (p = 0.21). The difference between the remaining pairs are significant: p = 0.0014 when talking about disgust, and p < 2 × 10−16 in case of the remaining pairs.

When there are extra penalties upon the player character’s death, more disgust (0.07%), joy (12.85%), sadness (12.68%), and surprise (4.93%) are felt. When there are no extra penalties, more anger (7.21%), anticipation (0.82%), fear (11.67%), and trust (2.36%) are felt. Regarding this factor, the significance levels of differences among pairs of emotions vary: anger \((p=1.6\times {10}^{-15})\), anticipation \((p=1.3\times {10}^{-8})\), disgust (p = 0.095), fear (p < 2 × 10−16), joy (p < 2 × 10−16), sadness (p < 2 × 10−16), surprise (p < 2 × 10−16), and trust (p = 0.018). As can be seen, the difference among the pair of disgust emotions is not significant.

If it is possible to level-up near a bonfire (or at a character near it), more anger (13.63%), disgust (19.97%), fear (16.26%), and sadness (2.71%) are felt. If the level-up system is different, then more anticipation (11.43%), joy (32.83%), surprise (15.81%), and trust (1.16%) can be found in the reviews. Every difference among pairs is significant (p < 2 × 10−16).

5.3.2 Investigation using all factors

After the factors were investigated one by one, all factors were analyzed by emotions. Similarly, to before, 16 groups of factors were made. Each possible pair was analyzed in case of every emotion. Most pairs had strong significant differences. In the appendix, Fig. 8 shows those pairs which do not have significant differences. According to the comparison in Fig. 8, the reviews were most similar among two groups: (1) games, with a futuristic setting, 3D and realistic graphics, interconnected levels, difficulty settings, no multiplayer features, upgradeable weapons/armor, no equipment durability system, no in-game map, no extra penalties upon character death and a classical bonfire system; (2) games, with a medieval setting, 3D and cel-shaded graphics, interconnected levels, no difficulty settings, multiplayer features, upgradeable weapons/armor, no equipment durability system, no in-game map, no extra penalties upon character death and not a classical bonfire system. These two groups contain only one game each: Darksiders III and Ashen, respectively.

When talking about all factors, no pair contains 6 insignificant differences, however three pairs of factors contain 5 of them. However, these factors still contain one game each. Therefore, the games were the following: Code Vein as well as Salt and Sanctuary; Death’s Gambit and The Surge; and lastly, Ashen and The Surge 2. The latter pair is different from before: when comparing the games only, Dark Souls III and The Surge 2 had insignificant differences.

Three pairs of factors contain 4 insignificant differences between emotions: Ashen and Blasphemous, Ashen and Lords of The Fallen; and lastly, Blasphemous and Darksiders III. Three pairs of factors contain 3 insignificant differences, four pairs contain 2, and lastly, eighteen pairs contain only 1.

6 Discussion

To answer RQ1, all possible pairs of games (210) were created and comparisons were done between the percentage of their positive reviews. As shown in the beginning of subsection 5.2, there are significant differences between 199 pairs of games. The differences are insignificant in the case of 11 pairs. Out of these 11 pairs, 4 pairs contain games from the original creators of the “Souls franchise” (which is From Software).

Regarding RQ2, the three strongest correlations are in the cases of The Surge (approx. 0.3910), Lords of The Fallen (approx. 3860) and The Surge 2 (approx. 3500). This means that the users who have more playtime in these three games, are more likely to leave a positive review (in other words: leave a “thumbs up”). It should also be noted that these three are of the same developer studio, therefore their games may follow a gameplay pattern about which the players feel the same, or they already have a dedicated fanbase. The three weakest correlations are in the case of Code Vein (approx. 0.1290), Hollow Knight (approx. 0.1400), and Dark Souls III (approx. 0.1530). However, according to the percentage of positive reviews in Table 5, these games are well-received. Also, the users’ playtime in the case of every investigated game significantly and positively correlates to a positive review.

When talking about RQ3, it can be seen from the one-by-one analyses that the factors significantly influence the percentage of positive reviews. It should be noted that when comparing different graphical styles, games with cel-shaded or pixel graphics received the least percentage of positive reviews. Also, these percentages are not significantly different from each other. When comparing all factors to each other, the percentage of positive reviews was insignificantly different among three pairs.

Lastly, RQ4 needs to be answered: eight emotions were investigated in the textual parts of the reviews: first, the games themselves were analyzed, then the factors were examined one-by-one, and lastly, all factors were investigated. When comparing emotions about these games, the reviews of Ashen and Darksiders III received similar sentiments. This is because 7 emotion pairs had insignificant differences between them. Games were considered similar to each other if at least 4 emotion pairs had insignificant differences among them. This means: Ashen and Darksiders III (7); Death’s Gambit and Hellpoint (6); Ashen and The Surge 2 (5); Code Vein as well as Salt and Sanctuary (5); Dark Souls III and The Surge 2 (5); Hellpoint and The Surge (5); Ashen and Blasphemous (4); Ashen and Lords of The Fallen (4); Blasphemous and Darksiders III (4); Blasphemous and The Surge 2 (4); Dark Souls III as well as Salt and Sanctuary (4); and lastly, Death’s Gambit and The Surge (4). Therefore, these 12 pairs of games were considered similar based on emotions alone.

From the one-by-one analyses, it can be concluded that almost all emotions can be significantly influenced by these various factors. However, there are exceptions: there is no significant difference between trust when talking about graphical dimensions (2D or 3D), there is no significant difference between anger when talking about upgradeable weapons/armor, there is also no significant difference between sadness when talking about in-game maps, and lastly, there is no significant difference in disgust when talking about extra penalties upon character death. After analyzing all factors, it also became apparent that 928 pairs had significant differences between them, however 31 pairs did not.

6.1 The importance of the results

Designing a video game with a good rating can be difficult as many factors which include game design elements and mechanics should be considered. “Souls-like” games have several of these two which are present in all of them. Such design elements and mechanics are the “bonfire checkpoint system”, environmental/contextual storytelling as well as the unforgiving difficulty. As the “Souls franchise” inadvertently created this genre, new developers try to make their “Souls-like” games and they try to put new elements into them. These can be new level-up systems, in-game maps, new story settings, et cetera.

By looking at only the average recommendations of the various “Souls-like” games, it can be safely stated that significant differences exist between them. This means that while the games have the same core mechanics, they are distinct enough from each other by having different stories, settings or various gameplay mechanics such as difficulty levels, in-game maps, multiplayer features, et cetera. Therefore, users can give different reviews based on their experiences that they gathered during their playthrough.

Every factor has a significant influence on the percentage of positive reviews. According to the results presented in this article, users liked the following group the most: game, with a medieval setting, 2D graphics, a drawn graphical style, an interconnected world, no difficulty settings, no multiplayer features, weapon upgrades, no equipment durability, a map, additional penalties upon character death and a not classic level-up system.

The implications of these results are of great importance as “Souls-like” games are still made to this day and their developers usually create extra elements to differentiate them from other titles. The results that are shown in this article prove which of the mentioned factors are more liked, therefore carefully implementing them can make the game more positively reviewed by the players. The developers of future “Souls-like” games can use the results presented in this article when creating their next game by implementing the mentioned various design elements and gameplay mechanics.

As mentioned, “Souls-like” games are still developed to this day: Mortal Shell (developed by Cold Symmetry) is one of the newest “Souls-like” games that was made in the last year and possibly more games in this genre are still (and will be) in development. It should be mentioned that this game also contains new elements: the player character can possess enemies, meaning that their bodies can be used in combat. However, the game was not yet available on Steam to include it in the investigation.

6.2 Limitations of the study

Due to the nature of this study, the goal was to understand why positive reviews are given to the games and how players feel about them. Therefore, multiple parts of the reviews were analyzed: whether a positive review was given, timestamps at review, and their textual parts. Since this study is already extensive, the following were not investigated:

It was not analyzed whether positive reviews and playtimes are in a causal relationship. Naturally, a possibility exists that players might play the game longer if it has more positive reviews or longer playtime causes more positive reviews. Although, according to the results, a positive correlation exists between playtimes and positive reviews. This fact increases the possibility of a causal relationship, but this is to be investigated.

It was also not investigated whether the results are influenced by the games’ popularity. Normally, people leave more reviews if a game is more popular. It should be investigated in the future whether this fact influences the results presented in this study, or not.

A possibility also exists that playtime should be considered when assessing reviews as people may leave a negative review after 10-30 min because the game is not working properly for example. While it was not considered, the following should be mentioned: out of 993,932 reviews, 14,290 were under 30 min. Out of them, 7662 were positive and 6628 were negative. Judging by the textual parts, people who tend to leave positive reviews after such a short time are those who either bought it on multiple platforms or played them elsewhere before buying them to support the developers. Since these negative and positive numbers are similar, it can be stated that these reviews with small playtimes almost cancel each other out.

7 Conclusions

The correlation between positive reviews, game design elements, and mechanics was investigated in “Souls-like” games by scraping the Steam webpage. For this, all (993,932) reviews were scraped during the middle of April 2021. The playtimes at review, the textual components were assessed as well. The percentages of positive reviews were also analyzed, and they were compared to other games in this genre.

It is shown by the results that there are significant differences between the percentages of positive reviews regarding each game. Slight-to-moderate correlation also exists between positive reviews and playtimes. 11 various categories were set up to analyze different design elements and game mechanics in “Souls-like” games. After comparing the percentage of positive reviews regarding each factor, it can be concluded that significant differences exist between all of them, although some games are reviewed similarly by players. For example, Ashen and Darksiders III evoked similar valence of 7 emotions in two cases. The first case was when the games themselves were analyzed, and the second was when the factors were investigated. Also, 12 pairs of games were deemed similar to each other based on sentiment analysis.

According to the results, players are more likely to leave a positive review on those games which have one of the following factors: medieval setting, 2D graphical dimensions, drawn graphical style, interconnected world, no difficulty settings, no multiplayer features, no weapon/armor upgrades, having equipment durability features, an in-game map, extra penalties upon death and not a classic level-up system. This is summarized in Table 8.

Table 8 Factors with the best percentages of positive reviews in each investigated category

However, when all factors are considered, those games receive the best percentages of positive reviews which have a medieval setting, 2D graphics, a drawn graphical style, an interconnected world, no difficulty settings, no multiplayer features, have weapon upgrades, no weapon durability, have a map, have additional penalties upon character death and do not have a classic level-up system. As can be seen, the factors of weapon/armor upgradeability and equipment durability are different from Table 8 when all factors are considered.

Naturally, designing and developing a game is not easy, but the results that are presented in the article can influence the creation of future “Souls-like” games by helping their developers to choose design elements and game mechanics. According to the results, each factor has its pros and cons. They can also evoke various emotions in the players.

In the end, these games are not small in numbers and usually are well-received, therefore more of them may see the light of day in the future.