1 Introduction

Massively Multiplayer Online Games (MMOG) are the most popular genre in online computer games [29]. They can be divided into three categories: MMORPG (Massively Multiplayer Online Role Playing Games), MMORTS (Massively Multiplayer Online Real Time Strategy) and MMOFPS (Massively Multiplayer Online First Person Shooter). Their execution requirements vary with the way of playing them [37]. While MMOFPSs consist of many isolated game services with a handful of players each, who are continuously interacting, MMORTSs and MMORPGs consist of a virtual world that never stops and is always up for any player who wants to satisfy their desire for gaming. Nowadays, of all these genres of on-line games, MMORPGs have become the most popular among network gamers and now attract millions of users. Thus, the QoS and scalability of the computational system to ensure MMORPG playability is a challenge. This paper focuses on these problems in MMORPG environments taking their characteristics into account.

MMORPGs are characterized by a Main Game, which is executed without interruption, and a number of instances, or Auxiliary Games, that happen concurrently with the Main Game, randomly created on demand by the players. The Main Game is the virtual world, where players can interact each other and the other components of the scenario (Non-Player Characters and map objects) in order to evolve their characters. The Auxiliary GamesFootnote 1 are executed outside the Main Game boundaries and involve different ways of playing compared with the Main Game. There are several kinds of Auxiliary Games with different timing, number of players and difficulty requirements. They are created dynamically. So, whenever players want to play in a specific Auxiliary Game, they have to wait until there are enough players to start it; returning to the Main Game when the Auxiliary Game is over or whenever they tire of it.

The most common way to provide service to MMORPGs is based on a centralized management provided by Client-Server structures, usually based on cluster platforms [10]. To overcome the scalability limits of these Client-Server systems, when the number of players increases dramatically, we propose in this paper the use of a hybrid environment that is composed of a centralized cluster system and a distributed P2P area. In addition, we develop a mapping mechanism to achieve load balancing and scalability of the MMORPGs in this hybrid platform, in which the Main Game is maintained in the central cluster of servers and Auxiliary Games are used as the indivisible entities, where the mapping is applied. This is based on the fact that the load in the Main Game has few fluctuations and so, it can be predicted, whereas Auxiliary Games are more dynamic, less predictable and also cause hot spots, which imply peak loads in the overall system. Additionally, players involved in an Auxiliary Game are continuously interacting. Thus, the movement of all players in an Auxiliary Game, as an indivisible entity, to the same server avoids communications between nodes. Then the mapping mechanism acts at two levels. Firstly, it balances Auxiliary Games among the servers in the central cluster in order to manage computation imbalances. Secondly, when the whole central cluster is overloaded, the mapping assigns Auxiliary Games to the P2P Area. Thus, for each Auxiliary Game to be distributed to the P2P Area, a new temporary server is chosen among the players waiting for this game.

In order to choose the temporary server, two different issues must be faced. On one hand, we have to keep the Distributed Area up to avoid disconnections or failures. Thus, a statistical model focused on the players’ sessions history is proposed in order to assign each player a fault likelihood or reliability value, which is used to select those players with lower disconnection probability. On the other hand, the latency response among players in the same Auxiliary Game must be maintained below an acceptable threshold. According to this, the latency among waiting players is calculated, with the player with the lowest latency being chosen as a temporary server for the Auxiliary Game.

The effectiveness of the proposed distribution mechanism over the hybrid architecture is evaluated by simulation. The effects of balancing Auxiliary Games, instead of players, among servers, is evaluated by comparing our mechanism with a representative case of the classic load balancing approach presented by Bezerra et al in [6]. Additionally, we show that our system is able to scale properly when the number of Auxiliary Games in the system increases due to the rise in players. This scalability is achieved by maintaining the QoS in terms of latency and fault tolerance.

The remainder of this paper is organized as follows. Section 2 reports the main contributions of the literature about distribution mechanisms of MMORPGs. Section 3 describes the hybrid system with the balancing mechanism. Section 4 analyses the viability of the P2P area for computing Auxiliary Games. Section 5 evaluates the performance of the global system. Finally, Section 6 outlines the main conclusions and future work.

2 Related work

The techniques that are reported in the literature to give service to MMORPGs vary according to the kind of system, centralized or distributed, used for executing the game. Centralized architectures are traditionally based on cluster systems. The distribution techniques for game computation in such kinds of system are mainly based on splitting the game world map into different subspaces, or cells, and distributing these among the nodes in the cluster [6, 23, 24, 26]. After the initial assignment, these cells will be dynamically reassigned among servers during the game, to respond to changes in the load caused by players’ movements into the game world [4, 11]. The frequency of these rebalancing operations is heavily affected by the size of the cells. Big cells can imply significant differences in the number of players assigned to each one, and so, in order to achieve load balancing, the initial distribution of cells among servers is more complex. However, players change cell less often, and this avoids some dynamic reassignments during the game. The option of small size cells (microcells) facilitates the initial load balancing distribution but causes greater number of cell movements by players during the game, which implies more dynamic redistribution operations and thus more overhead in the run time [12, 17].

An interesting work in the line of balancing microcells of players between servers is the paper of Bezerra et al. in [6]. Bezerra proposes a balancing schema which considers the upload bandwidth of the server as the load to distribute, what is done between servers with the aim of reducing the inter-server communication overhead by using a greedy graph partition growing algorithm.The mechanism starts when a server in the cluster of servers is overloaded. This server selects a number of other servers to become involved with the distribution. First, it chooses the least loaded server among its neighbors and sends a request for it to participate in the load balancing. The chosen server rejects the request if it is already involved in another balancing group, otherwise it responds with the load information of its own neighbors. If the selected neighbor server is unable to absorb all the extra workload of the initiating server, the selection is performed again among the neighbors not only of the overloaded server, but also the neighbors of the already selected servers. The selection continues until the first server’s workload can be absorbed. This work will be used through this paper as a comparison case given that it is a representative approach of player-based load balancing mechanisms in Client-Server systems applied to MMORPGs.

An additional aspect that is considered by some authors in the splitting process is the Area of Interest (AOI) of players [3]. The AOI is the physical area of the world map whose information about game state is relevant for a specific player. Thus, the assignment of players of the same AOI into the same cell will diminish the number of communications across the network.

In the case of MMORPGs, the focus of this paper, we propose splitting the world map based on the Auxiliary Games, instead of cells of a fixed size, for the two following reasons: (a) An Auxiliary Game constitutes a clear AOI where players communicate with each other and, (b) the computation associated with the game for the limited number of players that usually constitutes an Auxiliary Game can be executed in a single current server.

The game distribution techniques, mentioned above, usually reported for centralized systems, present a common problem of scalability when they are serving MMORPGs. This is due to the unpredictable behavior of players, which sometimes creates peak load situations that cannot be solved by system servers. With the goal of providing an unlimited scalability that is able to manage peak load situations dynamically, some authors propose the use of completely decentralized systems, such as Peer-to-Peer networks. In this kind of system, apart from the load distribution techniques, there are additional aspects to be faced [15] that have been studied by several authors: (a) Establishing an effective mechanism for propagating events in a high latency network. Different alternatives have been presented that are applied in the network layer [5, 8] or the application layer [9]. (b) Ensuring data persistence in the game world map [19]. (c) Management of the cheating problem among the peers [27, 31]. (d) Applying incentive mechanisms that promote the participation of peers and avoid freeriders [21, 34].

The load distribution techniques developed for MMORPGs in Peer-to-Peer systems are mainly focused on the decentralized management of the AOI of the players. Some authors have developed techniques for distributing the game load based on AOI [7, 13, 28]. However, in an MMORPG, apart from updating their AOI, players also frequently need to update their view of the virtual game world, causing a high communication overhead for such players. To alleviate this overhead in completely decentralized systems, some authors propose the inclusion of some additional high capacity server to manage the global game world, where most of the players are located [30, 32, 38].

Following this line, we propose a hybrid architecture that combines a centralized and a Peer-to-Peer system, to give service to MMORPGs. This allows the execution of tasks related to the global game world in the centralized server and other tasks corresponding to the Auxiliary Games in the Distributed Area of the system to be combined. In the following section, we present our proposed hybrid system and the load distribution techniques for MMORPGs.

3 The client-server/P2P hybrid system

This section describes the proposed hybrid system, discussing its characteristics and details of its mechanisms. The architecture of the system is explained in Section 3.1, while the distribution policy of Auxiliary Games over the hybrid architecture and mechanisms of establishment and maintenance of the Distributed Area are described in Section 3.2. Finally, Section 3.3 analyzes the communication cost caused by the application of our balancing algorithm.

3.1 The hybrid architecture

Figure 1 shows the architecture of the proposed system, which is composed of two areas: (a) one Central Area performing the Main Game and the Auxiliary Games and, (b) a Distributed Area that grows in a P2P like fashion, where those Auxiliary Games that cannot be served by the Central Area for overload reasons are executed.

Fig. 1
figure 1

Hybrid system architecture

The components of the Central Area are the following:

  • Cluster of Servers (CS). This is composed of a set {S i ∣1 ≤ iN} of N Servers, which are the main servers in the system and act as the bootstrap point. Thus, each player, P j , requesting to enter the system, will attempt to connect to it. A Server S i is a physical computer that manages a part of the Main Game and some Auxiliary Games, providing service to a limited number of players according to its capabilities. The maximum capacity of a server is determined by the number of concurrent players which is able to serve, named max_S_load, which is assumed to be the same for each server S i . Likewise, we consider the CS overloaded when at least one Server S i of the CS is overloaded.

  • Main Game (MG). This is the principal game, where players interact with the elements of the persistent virtual world. The Main Game is executed in the CS, where the map of the virtual world is distributed using a specific load balancing mechanism described and evaluated in the following sections.

  • Auxiliary Games (AG x ). These are the game instances that are executed independently from the Main Game. There are different kinds of AG x in function of the number of players in each, which is denoted by AG x .size. Therefore, each AG x is defined as a set {P j ∣1 < jAG x .size} of players facing a specific mission. Our proposal in overload cases is focused on the efficient distribution of AG x over the non-overloaded servers, considering them as indivisible entities for allocation.

The Distributed Area (or Peer-to-Peer area) is the area, where the extra-number of players belonging to the Auxiliary Games that cannot be served in the Central Area, are mapped. It is composed of players’ machines that are logically grouped into different kinds of AG x , each of a specific size and isolated from the rest of the AG x . Thus, the Distributed Area is made up of a set of Auxiliary Games that conform an independent P2P subarea, each scattered across the network and without any kind of communication between each P2P subarea. To manage the execution of each AG x , the corresponding P2P subarea will have the two following servers:

  • Auxiliary Game Server (AG x .S). This is the temporary server of the AG x . It is worth pointing out that any player of the AG x can be chosen as a temporary server. The experimentation carried out in Section 4 shows that a normal personal computer is able to serve, at least, up to 40 players without problems of computational power, memory or bandwidth usage. Note that 40 players is the maximum Auxiliary Game size accepted by the majority of MMORPGs [33]. In this way, players’ machines can be used to serve a single Auxiliary Game with a satisfactory QoS.

  • Auxiliary Game Replicated Server (AG x .RS). This is the current replicated server of the AG x . This role is used to replace the AG x .S in case of failure. For this reason, players in the AG x will play against the AG x .S and its AG x .RS, and the AG x .S will send the game state to both, players and the AG x .RS. Thus, the AG x .RS has the game state constantly updated, ready to replace the AG x .S if a critical situation requires it.

According to the system described, the next section introduces a new load balancing mechanism for distributing Auxiliary Games, as an indivisible entity, over the Central and Distributed Area to provide scalability to the system according to the demand.

3.2 Load balancing over the hybrid architecture

The balancing approach presented in this paper is established under three premises: (a) servers of the CS are considered homogeneous, (b) the load of each server S i is proportional to the number of players connected to it and, (c) the P2P Area is exploited when the CS is unable to deal with an overload situation.

The load balancing methodology acts whenever a server from the CS reaches its maximum load capacity (max_S_load). In this case, the balancing mechanism is able to decide if the necessary number of players that causes the overload situation (extra_S_players) has to be: (a) balanced to another available server in the CS or, (b) distributed to the P2P Area. The methodology of the load balancing proposal is shown in Algorithm 1. Note that in our load balancing mechanism, this distribution is applied to an extra_S_players players with an extra plus of 10 % of the maximum server capacity in order to minimize the number of distributions. Therefore, the total number of players to be balanced or distributed is (extra_S_players + 10 %⋅max_S_load).

figure d

Algorithm 1 is executed while the CS is overloaded. This means that at least one server in the CS is overloaded. Therefore, each server in the CS is checked for overloading. After this statement, the load balancing is able to: (a) balance the extra-load to another available server in the CS of the Central Area or, (b) distribute it to the P2P Area.

figure e

For the Central Area, the algorithm checks if there is any server S j CS able to accept players belonging to the extra_S_players. If so, a complete AG x of the overloaded server S i is balanced to this new server S j (S i .balances(S j , AG x )). Note that our proposal always distributes complete Auxiliary Games, so the inter-server communication is diminished significantly. Next, if there is no other Server S j CS able to take in more extra players, (extra_S_players > 0), then these remaining players will be distributed to the P2P Area by means of the distributes function. Note that the cost of Algorithm 1 is θ(N 2), where N is the number of servers of the Central Area. This cost can be considered negligible taking two different aspects into account: (a) the value of N of a typical MMORPG CS is usually limited below 103 [25] and (b) this algorithm is only launched punctually, whenever the CS is overloaded.

In Algorithm 2, the distributes function looks for a Server AG x .S and a Replicated Server, AG x .RS, among the players in the Auxiliary Game. This algorithm is able to vary the CRITERIA used to search for the optimal AG x .S and AG x .RS depending on the parameter of QoS to be optimized. The two following options were considered for use as CRITERIA:

  • Latency Lookup (LL): This checks the network latency of each player in relation to the rest of players in the same AG x . Then, it selects as the as the AG x .S, the player who has the lowest latency with respect the others. The AG x .RS will be the player with the second lowest latency value. The additional cost of our policy is the number of comparisons to be carried out to find the servers with the minimum latency, which is the (AG x .size) ⋅ (AG x .size−1)/2; in the worst case being an AG x .size = 40, it is equal to 780 comparisons, which, taking the power of a current CPU into account, can be considered negligible.

  • Probability of Disconnection (PD): According with the estimation of players’ uptime, the fault likelihood of each player is calculated. Then, this is checked player by player, choosing those players with the highest predicted uptime, i.e., the most reliable players, as the AG x .S and AG x .RS respectively. In order to do so, for each player P j AG x , the minimum and maximum predicted uptime (P j .Uptime) is calculated by applying the mean and standard deviation based on his historical behavior, giving the following interval:

    $$ \left[P_{j}.Uptime_{min},P_{j}.Uptime_{max}\right] $$
    (1)

    With these values of players’ uptime, the PD criteria selects, for acting as the AG x .S, the player with the highest minimum uptime, such that:

    $$ AG_{x}.S.Uptime_{min} = max\left\{P_{j}.Uptime_{min} | \{P_{j} \}\in AG_{x} \right\} $$
    (2)

    The replicated server AG x .RS is selected with the same criteria, excluding the player that acts as AG x .S from the selection set. Then:

    $$ AG_{x}.RS.Uptime_{min} = max\left\{P_{j}.Uptime_{min} | \{P_{j}\} \in AG_{x} - \{AG_{x}.S\}\right\} $$
    (3)

    Thus, we are working with a pessimistic assumption. Likewise, it is worth remarking that this method takes the average plus its deviation into account, which is more reliable, in statistical terms, than taking only the average into account. In order to minimize the storage information of the historical of each player and taking previous studios [20] into account, we need to maintain a maximum of 50 records per each player to obtain an estimate accurate enough.

Section 5.2.1 analyzes the QoS of the players according to the chosen latency and reliability CRITERIA.

Once the AG x .S and AG x .RS have been found, the link function of Algorithm 2 creates the P2P subarea of AG x with an interconnection schema of players’ machines, as this of Fig. 2, which corresponds to a bipartite graph G = {SP, E}, where:

  • S is the set of two players machines acting as servers. S = {AG x .S, AG x .RS}.

  • P is the set of remaining players machines in AG x . P = AG x S.

  • E is the set of all possible edges (S i , Pj) such that S i S and P j P, plus the edge (AG x .S, AG x .RS).

The two servers of S will maintain the list of players, making up its Auxiliary Game AG x , together with the main players’ characteristics (latency and reliability). Once the Auxiliary Game is moved to the P2P area, it continues its execution until it finishes. Then, their players can return to the CS to continue with the game.

Fig. 2
figure 2

An enlarged view of an Auxiliary Game (AG x ) structure in the Distributed Area

After the application of this distribution mechanism, the overloaded servers of the Central Area reduce their workload. This allows the system to continue accepting new players above its CS capacity, providing a system that is able to grow according to the demand, exploiting the peers’ capabilities by means of distributing them to a self-organized P2P game area.

3.2.1 Management of the distributed area

The management for the Distributed Area is based on the standard functionalities applicable to any P2P system, focused on the specific case of using the defined P2P subareas for running Auxiliary Games. So, the mechanisms of insertion of peers and resource discovering are not needed in this context, because the players conforming the AG x are known beforehand, and no new players will be added during the execution of the Auxiliary Game. Thus, the management policies to be applied are restricted to the maintenance of each P2P subarea and the exit of peers. These are described next.

The maintenance of each AG x subarea is performed by Algorithm 3 in each period of time T. Control message exchanging is performed between both servers in S to the rest of players of the Auxiliary Game AG x belonging to the set P. In this way, AG x .S and AG x .RS check the state of the total underlying players managed by the AG x .S. Likewise, the Auxiliary Game Server AG x .S notifies the state information of the AG x to the Cluster of Servers (CS) of the Central Area, keeping the global state system updated. In addition, each player P j replies to its Servers by sending information about its state. This happens whenever the player P j had not sent another message to AG x .S and AG x .RS during the same interval T previously. The cost of the algorithm is θ(AG x .size), where AG x .size is the number of peers in the Auxiliary Game. It is worth pointing out that the exact value of T depends of the nature of the MMORPG, although it should be lower than 1s in order to replace the Server of the Auxiliary Game, if it was necessary, in an inadvertently way for players [33].

figure f

By means of applying the previous maintenance algorithm, the Server AG x .S and the Replicated Server AG x .RS can detect that any player P exit has left the Auxiliary Game AG x voluntarily or involuntarily. In such a case, the restructuring operation, described in Algorithm 4, is applied by the Server or the Replicated Server depending if the outgoing player was the Replicated Server or the Server, respectively. Therefore, the main aim of this algorithm is to replace the role of the outgoing player, Server or Replicated Server, by any other player of the Auxiliary Game. It means that if the AG x .S left the system, the AG x .RS would become the new AG x .S and a new AG x .RS will be searched among the rest of players of the AG x . This is done by means of applying the same algorithm described in Algorithm 2. Thus, after applying the restructuring operation, independently of the player who has left the system, the AG x is still alive for the rest of distributed players in a transparent way. Likewise, it is worth pointing out that the Auxiliary Game will be alive while any player P j P can assume the role of Server. Note that the cost of this algorithm is also θ(AG x .size), where AG x .size is the number of peers of the AG x .

figure g

3.3 Analyzing the communication cost

Previous works described in Section 2 show that one of the main problems caused by the application of load balancing mechanisms in MMORPGs severs is the communication cost provoked by the extra movement of players between servers. This section analyzes the communication overhead introduced by the application of the distribution mechanism of Algorithm 1. Due to the fact that the algorithm balances the load in a hybrid architecture, the communication overhead is calculated according on where it is applied, in the Central Area or in the Distributed Area.

3.3.1 Communication cost in the central area

In the case of distributing in the Central Area, we also calculate the overhead introduced by the classic method proposed by Bezerra et al. [6], described in Section 2, to be able to compare our proposal with a reference method of the literature based on distributing individual players among servers in a Client-Server system.

Table 1 shows the analytical evaluation of the communication overhead introduced by our architecture, named Hybrid, in relation to the method proposed by Bezerra et al., named Classic. This is calculated by the number of established communications between players and servers, excluding those caused by the game itself (for instance, the game state updates). For each method, we distinguish the communication overhead before applying the balancing method (Before row) and the overhead after balancing (After row). Finally, the last row in Table 1 indicates the total communication costs of both proposals.

Table 1 Communication overhead of both load balancing proposals when distributing AG x in the Central Area

Regarding to the Classic approach, before applying the balancing method, the number of players to be moved (extra_S_players) is multiplied by 2, as they have to communicate with both servers, the original and the destination one. Once players have been moved to the destination server, the Classic method introduces a communication overhead of 2⋅extra_S_playersInteractionF, where Interaction is the percentage of players in the same AOI and managed by different servers, exchanging messages. The other parameter (F) indicates the frequency of this message exchange among players.

In the case of the Hybrid approach, before applying the balancing method, there will also be two communications for each player to be moved, which, in this case, is extra_S_players plus 10 % of max_S_load, according to our proposal in Algorithm 1. Thus, there is extra communication compared with the Classic method. However, after applying the balancing method, the additional communication cost is negligible, given that the balanced players belong to the same AG x , which corresponds to their AOI and executed in the same server.

Therefore, Classic method introduces an overheard of θ(extra_S_playersFInteraction) in front of the θ(extra_S_players) cost of the Hybrid method, which is much lower according to the typical values achieved by the Interaction and F parameters in a MMORPG game with low inter-player communication [33]. These results will be verified by simulation in the Section 5.1.

3.3.2 Communication cost in the distributed area

In the case of balancing and managing the Auxiliary Games to the Distributed Area, the communication overhead is shown in Table 2. The cost related to the distribution of the Auxiliary Games to the P2P Area is the same as in the previous scenario (2⋅(extra_S_players + max_S_load ⋅ 10 %)), given that the number of players to be moved is the same. When players are already distributed, the communication between CS and players during the Auxiliary Game is only due to the application of the Maintenance Algorithm, which is negligible given that the communication is only produced by the Server of each AG x . When the Auxiliary Game is over, some of these Auxiliary Game players will return to the CS and, as a consequence, they will again communicate with the CS, obtaining (extra_S_players + max_S_load ⋅ 10 %) ⋅ Return RATE , where Return RATE is the percentage of players returning to the CS. Section 5.2 evaluates this communication cost in relation to its main parameters.

Table 2 Communication overhead of distributing AG x to the P2P Area

4 Performance of the auxiliary game server

In this Section, given that our approach is based on using players’ personal computers as the Server and Replicated Server for the Auxiliary Games in the Distributed Area, we analyze the viability in terms of CPU, memory and network consumption of current commercial desktops to fulfill server functionalities.

As our aim is to demonstrate that any current commercial desktop can be used for this purpose, this section is focused on analyzing the performance as a Server of a low cost desktop with the following features: Intel Core 2 Duo Processor at 2.2GHz with 2GB DDR2 SDRAM, 250GB SATA and an ADSL connection at 512Kbps.

In order to parameterize the game and to monitorize the consumption of computational resources by the Server, we played the Urban Terror game [18] increasing the number of players from 5 to 40 during the game. Although this game belongs to the MMOFPS category, its computational requirements and behavior are very similar to an MMORPG and the parameterization of the maximum number of concurrent players is easier. The open statistics [2] of PlaneShift [1] corroborate this assumption.

Figure 3 shows the percentage of use of CPU of the desktop computer while it is serving a game. The game was started with 5 players and every 300 seconds, 5 new players were connected until the game reached 40 concurrent players. It can be seen that there is a linear relationship between the CPU consumption and the number of concurrent players. This corroborates the results in Ye and Cheng in [37]. It is worth pointing out that taking the worst case of 40 concurrent players into account, the CPU consumption is always maintained below 45 %.

Fig. 3
figure 3

CPU consumption in relation to the number of players

Figure 4 shows the memory consumption under the same conditions described above. Amplified in the center of the Figure, we can see how the server reserves a significant amount of memory at the beginning to initialize the game, but once it has begun, the memory utilization remains constant. Thus, our results reveal that the memory consumption is independent of the number of players. Likewise, we can see that the memory consumption is always below 100MB. Thus, we can conclude that memory is not a critical parameter for choosing a peer from the Distributed Area to act as a server.

Fig. 4
figure 4

Servers memory consumption in relation to the number of players

The network requirements were also analyzed. Figures 5 and 6 depict the number of input and output packets to be transferred by the server when the number of concurrent players is increased from 5 up to 40 players over time. We can see the same trend as with the CPU case, where the number of input/output packets is roughly proportional to the number of players.

Fig. 5
figure 5

Input packets in the server during the game

Fig. 6
figure 6

Output packets in the server during the game

Finally, we analyzed the performance of a conventional ADSL connection of 512 kbps to serve a game party of 40 players. According to our empirical results, we can assume that:

  • Each output packet has an average size of 141 Bytes. Note that this value also fits with the empirical results shown in [16].

  • The server sends two packets per second to each player in the game; this means 2,256 bps.

  • Taking the size of the Auxiliary Game of 40 players into account, the average output bandwidth would be: 40 players ⋅ 2,256 bps/player = 90,240 bps.

Therefore, the output rate represents 17.2 % of the total capacity of the ADSL connection. This way, we can conclude that a current ADSL connection is enough to serve the connection of an Auxiliary Game.

The results shown throughout this section reveal that a commercial desktop can be used to serve a typical Auxiliary Game that usually has a size between 5 and 40 players, without any problem from the computational capacity point of view. Our experimentation has shown that the most sensitive resource is the CPU, although it does not arrive in any case above 45 % of its performance.

5 Performance evaluation

In this Section, we evaluate the effectiveness of applying the load balancing mechanism to players in Auxiliary Games, considering them as an indivisible entity, and the scalability and viability of using the P2P area during the peak loads, that cannot be served by the cluster of servers. In addition, the QoS given to the Auxiliary Games players located in the P2P area is also analyzed in terms of latency and reliability.

The simulation was performed using SimPy [22]. SimPy is a discrete-event simulation language based on standard Python. SimPy tools were used to implement nodes of the platform, which can fulfill four distinct roles: player, AG x .S, AG x .RS and CS. The SimPy procedures allow the random behavior of a player during the simulation to be represented. We carried out a set of 1,000 simulations, each of which consisted of a world game map whose dimensions were 1,000x1,000 (1 million player positions). This map was managed by a CS of 10 servers with a homogeneous maximum workload of 5,500 players per server. Taking into account the information about the existing game instances implemented in World of Warcraft [36], we modeled the Auxiliary Games with five different numbers of players: AG x .size = 5, 10, 20, 25 and 40. Table 3 shows the percentage of each kind of AG x in relation to the total number of AG x . We considered that all the AG x were scattered across the map using a random distribution that could eventually cause hot-spots. Figure 7 shows a snapshot of the game load map, where single players and different hot-spots caused by the concentration of many AG x of different sizes are plotted in different colors. Each rectangular region was managed by a single server in the central CS.

Table 3 Percentages of AG x .size modeled into the simulation
Fig. 7
figure 7

Game load map snapshot

Another important issue is the calculation of the players’ latency against the CS. This was determined by a triangulated heuristic, delimiting the 2-Dimensional Euclidean Space to (x = [−1,000, + 1,000], y = [−1,000, + 1,000]). This methodology is based on the relative coordinates, explained in [14].

In order to find the best reliable player to act as a server, the understanding of the behavior of MMORPG players and their subsequent modeling is a key issue of our work. The modeling of players’ uptime is based on their behavior history. Thus, we are able to predict how reliable a player will be in order to select the adequate ones to act as AG x .S and AG x .RS in terms of fault tolerance. From [35], we obtained mean and standard deviation values of players’ uptime of 2.8 and 1.8 hours respectively. These values were taken from the World of Warcraft statistics as a representative player behavior for an MMORPG. These values can be modeled by a gamma distribution (see Fig. 8) with shape (K) equal to 1.157143 and a scale (Θ) equal to 2.419754, given that it has been successfully evaluated with the Chi-square test, indicating the goodness of our model in statistical terms. Then, with this model, we can assign a fault likelihood to any player taking as a reference the player’ uptime that allows the server (AG x .S) and its replicated (AG x .RS) to be selected adequately for each Auxiliary Game (AG x ). It is a fault tolerance based model, which implies that despite the existence of extreme player behavior, in terms of sudden disconnections or large uptime, we prioritize the selection of those players with more stable connection settings. Thus, good reliability in the Distributed Area is ensured.

Fig. 8
figure 8

Players uptime following a Gamma Distribution Histogram

In order to test the proposed system and the load balancing mechanism, we have simulated the two following scenarios:

  • A In order to decrease the load average of servers progressively, the disconnection rate of players in the CS is 10 % higher than the connection rate. At the beginning of the simulation, the CS is loaded with its allowed maximum capacity of players: max_S_loadN, where N = 10 is the number of servers of the CS. Moreover, the internal movements of players between servers are simulated; these movements are caused when players move across the game map. In this way, hot-spots arise dynamically in some servers while other servers are underutilized. Among all the players, 10 % are uniformly distributed across the Main Game map and the rest are located in the Auxiliary Games. This scenario is performed to show the differences between applying a load balancing mechanism to individual players, as has been usual in the literature, or taking the players’ AOI into account that is to say, balancing players from the same Auxiliary Game as we propose in this work.

  • B In order to test the scalability performance of the system, the players’ connection rate is 20 % higher than the disconnection rate. This means that the average load increases over the allowed maximum with time. In this scenario, the AG x s not served by the CS due to overloading are sent to the P2P area. In addition, we simulate the number of players returning to the MG from the AG x (Return RATE ), i.e., the percentage of players returning from the P2P to the Central Area, when the corresponding AG x s are over. In this scenario, we want to prove that, thanks to the P2P area, it is possible to overcome the maximum load capacity of the Central Area, performed by a Client-Server architecture.

5.1 Testing load balancing methods

To analyze the effectiveness of the load balancing mechanism, explained in Section 3.2, we compared it with the load balancing approach proposed by Bezerra et al. in [6] and described in Section 2.

We applied both load balancing methods in scenario A, and the results are labeled Classic for the Bezerra method and Hybrid for our method. Figure 9 shows, for a CS with 10 servers (S 1,..., S 10), the initial and final load of each server after applying both balancing methods. We can see that there were five overloaded servers at the beginning (S 4, S 7, S 8, S 9, S 10); whereas the final distribution of players is balanced in both cases, thus avoiding overloaded servers. However, the Classic method lets more servers near the stress situation (S i .load ≃5,500) due to the fact that movements are applied taking individual players as an entity, instead of taking Auxiliary Games as in our Hybrid method.

Fig. 9
figure 9

Load balancing server comparison between Classic and Hybrid proposals for Scenario A

It is worth pointing out that the Hybrid method moves a significantly greater number of players, specifically 1,337 players in the Classic method versus the 4,100 players in the Hybrid method. This can be explained by two reasons. On one hand, movements affect all the players in each Auxiliary Game and, on the other hand, load balancing is applied to the extra-players (extra_S_players) plus 10 % of the maximum load of a server (max_S_load), with the aim of the load balancing mechanism being applied less frequently. It is also worth remarking that this extra movement of players in the Hybrid method does not yield an increase in communication cost compared with the Classic method, because players in the same Auxiliary Game, who communicate intensively, are maintained in the same server after being balanced.

Figure 10 shows the communication cost described in Table 1 of Section 3.3, varying between 100 and 1,000 the value of extra_S_players. For the Classic proposal, the Interaction parameter was set at 10 % and two different values of F, 20 and 25 messages per second, were evaluated. Note that according to the values shown in [33], the chosen Interaction and F values correspond to an MMORPG game with low inter-player communication. Thus, the Classic method is favored over the Hybrid one. In spite of these unfavorable conditions, the Hybrid method exchanges many fewer messages than the Classic one, which corresponds with the analytical results obtained in the Section 3.3.

Fig. 10
figure 10

Communication costs comparison for Scenario A

5.2 Scalability of the system

Scenario B was used to test the ability of the proposed Hybrid system together with the balancing method to scale on demand. Note that in this scenario, the load of each server of the CS exceeds the maximum allowed load; according to Algorithm 1, this is max_S_load−10 %⋅max_S_load, where max_S_load = 5,500 players. Under this scenario, for each server, Fig. 11 distinguishes the players located in such server and the players distributed from the server to the P2P area. From this Figure, we can see that the Hybrid proposal is able to exploit the P2P area distributing players and avoiding the overload situation. Thus, it is able to overcome the maximum number of players managed by a server, which is 4,950 players.

Fig. 11
figure 11

Hybrid Distributed Area (Scenario B)

It is worth remarking that in our system the players’ load of the Central Server would only increase in the hypothetical case that the players’ connection rate was higher than the percentage of Auxiliary Games creation. According to the the nature and behavior of the players of the MMORPGs [33], more than 50 % of players of the game are playing in an Auxiliary Game and as a consequence, the system would achieve the saturation point in the hypothetical case that the players’ connection rate was 50 % higher than the disconnection rate, which is a complete unusual situation.

As we discussed in Section 3.3, a key aspect is the additional communication cost provoked by the balancing of players to the P2P area. Figure 12 depicts the evolution of the communication overhead showed in Table 2 for different values of the Return RATE . In general, we can observe that whenever the Return RATE parameter increases, the volume of communications increases too, because more players have to communicate with the CS. However, it is remarkable that even in the worst case (Return RATE = 100 % and extra_S_players = 1,000), the total amount of communications, assuming a message size equal to 141 Bytes [16], supposes an extra bandwidth of 719 KB for the CS, which is only slightly higher than a conventional ADSL connection.

Fig. 12
figure 12

Communication costs comparison for Scenario B

Our evaluation demonstrates the viability of the Hybrid proposal for scaling the system on demand with a minimum additional communication cost for the cluster of servers. In order to guarantee the playability of the players in the Auxiliary Games distributed to the P2P Area, Quality of Service (QoS) mechanisms should be proposed. Accordingly, a QoS evaluation in the Distributed Area is analyzed in the next section.

5.2.1 QoS evaluation in the distributed area

The QoS of the proposed system indicates its ability to maintain latency values of the whole system under an acceptable threshold, while a huge number of players are managed on demand in the Distributed Area. The fault tolerance or reliability is also considered to be a parameter for measuring the QoS of the system. Thus, the system has to be able to minimize the sudden disconnections of players served in the Distributed Area.

It is worth pointing out that our system prioritizes latency or reliability depending on the chosen value as CRITERIA of the distributes function presented in Algorithm 2. Remember that the distributes function is used to find the optimal Server (AG x .S) and Replicated Server (AG x .RS) of each Auxiliary Game executed in the Distributed Area.

Table 4 shows the average (AVG) and standard deviation (SD) of the players’ latency, when the latency (LL) CRITERIA is applied to distribute AG x s over the Hybrid architecture in relation to the traditional Client-Server architecture, which means that players play directly against the CS instead of playing against the AG x .S, as we propose. Note that AG x s of size 5 and 10 players have been chosen as representative cases for Auxiliary Games.

Table 4 Distributes function mechanisms performance under LL CRITERIA

Regarding the Hybrid architecture, the latency values are maintained for both AG x s below 1 second, which is considered an acceptable threshold for MMORPGs [33]. In the case of the Client-Server architecture, we obtained an average latency time of 952.6 ms for AG x .size = 5 and 861.6 ms for AG x .size = 10, which represent increases in latency of 15 % and 7 % respectively compared with the P2P area. Note that the better latency time obtained in the P2P area is logical due that our method selects the players with the lowest latency in relation to the remaining players in the same AG x , and this allows a local minimum to be obtained. Thus, the distribution based on AG x in the P2P area is a key issue for providing scalability without latency penalization.

Regarding the appropriateness of the PD criteria for looking for AG x .S and AG x .RS servers of the AG x , we studied its reliability. Figure 13 shows the predicted average uptime and the deviation of each player belonging to the AG x .size = 5 and AG x .size = 10, respectively. As a reference, the uptime of the chosen Server is depicted on the dotted line. From this reference line, we can see as always AG x .S.Uptime min P j .Uptime min |P j AG x − {AG x .S}.

Fig. 13
figure 13

AG x .S AG x .RS selection process for AG x .size = 5,10 players

In order to calculate the probability of disconnection (PD) of a player P j , denoted as PD, we search the interval [ Uptime x , Uptime y ] of the Gamma histogram, such as the given in Fig. 8, where P j .Uptime min belongs to. Once the interval has been located, the PD value is calculated as the probability that P j .Uptime min > Uptime y . For instance, given a P j .Uptime min = 1.5 h, this will be located in the second interval [1 h,2 h] of the histogram of Fig. 8 and it will have a PD(Uptime min > 2h)=60,9 %. According to this, Table 5 shows the fault likelihood of the chosen AG x .S and AG x .RS. In addition, it shows the fault likelihood of the whole AG x by applying Algorithm 4 Given that the AG x is alive while two players of the AG x were playing, the failure probability is calculated as the product of the likelihood of AG x .S, AG x .RS and the rest of players P j AG x , excluding the player with the worst reliability. As can be seen, in both cases, the fault likelihood of the AG x is under 0.9 %. This enhances the fault tolerance of the mapping mechanisms. It is worth remarking that these percentages show the highest fault likelihood. Thus, it shows the worst performance case, which reveals the goodness of the PD mechanism combined with the role of the AG x .RS.

Table 5 PD performance for AG x .size = 5,10

6 Conclusion and future work

This paper confronts the problem of supplying computation service to the increasing demand by users to play in MMORPG games. An MMORPG is characterized by a Main Game, which is executed without interruption, and a number of Auxiliary Games, which are randomly created on the players’ wishes. In line with this, this paper proposes a hybrid architecture for playing MMORPGs. It is composed of a Central Area with a cluster of servers and a distributed P2P area that adds servers dynamically to the system according to the demand.

To distribute computation over this architecture, we defined a mapping mechanism that is based on moving Auxiliary Games, as an entity, among servers in the central cluster and the P2P area. This avoids a significant amount of communications in comparison of moving individual players, because players of the same Auxiliary Game have high interactivity during the game. By means of simulation, it has been demonstrated that the proposed mapping mechanism is able to provide well balanced loads in the cluster system while the distributed platform scales on demand. Moreover, to maximize the performance in terms of latency and reliability, we proposed latency or reliability options to be used as CRITERIA for finding a new Server and Replicated Server for the Auxiliary Games in the Distributed Area. Likewise, we have shown that our proposal is able to achieve lower average latencies compared with the traditional Client-Server architecture. Concerning reliability, it has been demonstrated that the reliability mechanisms in our method is able to achieve a failure probability of less than 0.9 % in the worst cases.

Future work is oriented to face up the cheating problem related to MMORPG games in order to ensure a good player experience in the overall game. In order to exploit the peer’s resources better, another route for improvement will be to manage the inherent heterogeneity of players. Finally, other interesting issue will be to merge our current mapping mechanisms with market criteria to reward the AG x .S and AG x .RS, given that they are sharing their resources altruistically.