1 Introduction

In the recent years, large-scale service systems based on Internet have witnessed rapid evolution such as growth of users, diversification of user requirements, and the openness of system services. Owing to the recent increase in the number of users using internet services, sharp gathering of users in a quick time often lead to the service unavailability issue. This is due to the fact that the new user load suddenly overloading the system, and often this imposed load surge up to paralyzing the system due to the increasing number of users. For instance, ticket booking system is often seasonal and can paralyze the system lead during peak time with the upsurge of user groups.

Large-scale user concurrent processing system are usually affected whilst expanding resources, whereby risking overloading the system due to uncertain user behaviors after expanding the computing resources. To this end, software self-adaptation strategies have been put forward to cope with this system overloading issue whilst expanding the system resources and to combat the complexities faced due to the increasing Internet service systems. It is obvious that the Internet system cannot suddenly scale to match the user behaviors. Therefore, it is important that special attention should be given to restructure the system behaviors in accordance with the changes in user behaviors by balancing the system load. Existing system load balancing methods [1,2,3] are mainly based on resource allocation and task scheduling strategies, but not consider when and how to dynamically reconstruct the system behaviors for the real-time load equilibrium.

Two important components should be considered for adaptive refactoring of system behavior. Firstly, classification of users according to the user behavioral characteristics and secondly, constructing the behavior flow for each user group to dynamically control the system load.

The remainder of the paper is organized as follows: Section 2 reviews the related works and Section 3 presents the proposed system behavior reconstruction model. Section 4 details the proposed Petri net model and algorithm for implementing system behavior reconstruction. Section 5 is covered with the experiments and discussion on the obtained results and Section 6 concludes this paper.

2 Related work

Recently, several research works have focused on dynamic system load balancing based on system behavior adaptive reconstruction. Doukha [4] proposed a load balancing method that distributes the beacon and fairly transmits the system load. Hwang [5] proposed the use of hardware indicators, CPU utilization and the number of online connections as a load evaluation criteria. Duan [1] used the CPU utilization rate, disk utilization ratio, page error number, request number, request response time and other relevant indicators to calculate the real-time load of the server. Gang [2] proposed a method to classify the user requested services to allocate the system resources, so as to achieve dynamic load balancing. Shailesh [3] used a fuzzy dynamic load balancing algorithm to achieve load balancing through task scheduling. Liu [6] proposed a distributed load balancing algorithm using a defined protocol sequence, and developed a model of queuing distributed asynchronous multi-server system. In order to achieve dynamic load balancing based on data stream level, Wang [7] proposed a cloud center dynamic load balancing method based on SDN (Software Defined Networks). However, these works have not given enough importance to the system behaviors and behavior time, both should be considered as essential criteria to achieve system load balancing in dynamic environments based on behavior reconstruction.

Adaptive reconstruction strategies have been the focus of a few research works. Slim [8] put forward adaptation as a key requirement for many software systems, whereby the system should be able to adapt its structure and behavior during runtime in order to respond to the changes witnessed in the operating environment and user needs. Zhang [9] proposed a new coordination method based on a reconfigurable network-event system. Pamela [10] proposed the implementation of a distributed persistence management model for reconfigurable multiprocessor systems on dynamically reconfigurable circuits. Rui [11] further proposed a dynamic adaptive wiping mechanism and Yang [12] proposed a reconfigurable architecture model based on layered hypergraph. Mohamed [13] proposed a reconfigurable and replaceable system for embedded control systems, and modeled it using Petri net. However, such research works have not considered the user requirements into account for system reconstruction. When a large number of users gather in a short time, system reconstruction may not be efficient without considering the user needs.

From the perspectives of the system behavior, Wang [14] pointed out that it is vitally important to understand user behaviors in online services and further proposed an unsupervised system based on the click traffic to check the modes of user behaviors. Luo [15] used fuzzy Petri nets to represent the fuzzy production rules, and performed a state analysis of power systems by an iterative computation of matrices. Kotevski [16] conjointly used queuing networks and Fluid Stochastic Petri Nets, and developed several performance models to analyze the behavior of complex systems. Lu [17] used a new hybrid model to explore the impacts and guidance of user behaviors on mobile banking services. Matthew [18] highlights the importance of finding out the user behaviors and using the same as the source of information by studying a long query log. Jose [19] proposed a genetic algorithm for user behavior modeling and classification from event sequences. In summary, the dynamic relationship between user behaviors and system service is vitally important and should be considered as an essential criteria whilst attempting to improve the overall system performance in balancing the system load.

To sum up, despite a number of works focused on adaptive dynamic balancing of system load, system reconstruction is hardly been considered in the state-of-the-art works to date. Reconstructing the system behavior flow to dynamically balance the system load based on user behavior characteristics can achieve effective load balancing performance in the large-scale network service systems. In this paper, the behavior reconstruction method has been exploited to balance the system load when a large number of users gather in a short time, for the purpose of achieving real-time system load balancing to maximize the processing capacity of the system. Users are classified based on their behavioral characteristics and corresponding behavior processes are constructed. The proposed reconstruction model is triggered when the system load exceeds the warning point during runtime, ultimately to balance the system load by controlling the interaction time of various types of users.

3 Model of system behavior reconstruction based on user behavior classification

Under normal conditions, large-scale network service systems can provide users with a stable and good services. But sometimes, due to the rapid expansion of the user population within a short time the system behaviors and the user behaviors may become incompatible. Thus, the system will become abnormal or even paralyzed. Now, many large scale network service systems usually continue to provide the same services to the users as before. As a result, when the user population increases rapidly, the system load will increase beyond the capacity of the system. In this scenario, the system will be overloaded and the system resources will be limited. To this end, this paper considers reducing the system load by dividing the user behaviors into different groups according to the user interaction sequence features, and by delaying the user group interaction behavior time.

Definition 1

User behaviors membership function \( {\mu}_{U_i}(j) \). It indicates the degree of the user behavior Ui belonging to each class of Sj. \( {\mu}_{U_i}(j) \) is defined as follow:

$$ {\mu}_{U_i}(j)=\sqrt{\sum \limits_{k=1}^m{\left({u}_{ik}-{s}_j\right)}^2},\left(j=1,2,\dots, p\right), $$

where Ui = {u1, u2, …, um} (i = 1, 2, …, m) represents the interaction behavior sequence with the characteristics of user behavior time, assuming that the user behaviors are divided into p user groups based on the length of the interaction behaviors time; Sj = {s1, s2, …, sp}(p ≥ 1) represents the standard for each class of user groups.

Definition 2

User behaviors subordinate standards d(ui, sj). It is the standard of user behaviors belonging to a specific user group, that is, \( d\left({u}_i,{s}_j\right)=\min \left({\mu}_{U_i}(j)\right) \). It indicates that the user behavior membership function value is kept to a minimum if the user behavior belongs to the group. Suppose that Nd (Nd ∈ N+) represents the number of \( {\mu}_{U_i}(j)=d\left({u}_i,{s}_j\right) \). Then, when Nd = 1, the behavior Ui belongs to the class of j user behavior group; when Nd > 1, the behavior Ui is randomly assigned to any kind of behavior group in the Nd classes.

Definition 3

At time t, the total number of user behaviors Bt submitted in the system is equal to the number of users, that is, Bt = Ut, where Ut represents the number of users in the system.

Definition 4

The real-time load Lt at time t. It is the system load corresponding to the total number of behaviors submitted by the users at time t, Lt = Bt × l, where l(l ≥ 1) represents a system load required by a user to submit a request behavior.

Definition 5

System good service status. It is the service state when the system can provide services normally. When 0 ≤ Lt ≤ Lsafe, the system is in a good service state, where Lsafe is the safe load, indicating that the system is in a good service state which can withstand the maximum service capacity corresponding to the load value.

Definition 6

System unstable service status. It is the service state when the system can provide services, but there may be abnormality. That is, when Lsafe < Lt ≤ Lmax, the system is in an unstable service state, where Lmax represents the load value corresponding to the maximum service capacity that the system can withstand in the unstable service state, which is the maximum load that the system can resist.

Definition 7

System non-service status. It is the service state when the system cannot provide services because the load is too large to handle. That is, when Lt > Lmax, the system is in a non-service state or in a state of paralysis.

Definition 8

At time t, the system real-time load Lt is the sum of the load corresponding to the p class user behaviors, namely \( {L}_t=\sum \limits_{i=1}^p{L}_i \), where Li represents the system load corresponding to the user behaviors of the group i.

Definition 9

System processing capability LHC. It is the system load corresponding to the user behaviors which can be processed by the system in unit time. If Lmax = LHC,and Lt > LHC, then the system will enter the non-service status.

Definition 10

System load per unit time Lut. It is the system load corresponding to the number of behaviors But in unit time. When the system real time load is Lt ≥ Lsafe at t moment, the system load exceeds the processing capacity of the system in unit time. Set Lut = Lt/tc, and Lut < Lsafe, where tc is the time required to achieve load balancing in the system.

Definition 11

Reconstruction system delay time \( \Delta {t}_d=\sum \limits_{\mathrm{i}=1}^{\mathrm{j}}{\mathrm{t}}_{\mathrm{i}} \), where ti is defined as follows:

$$ {\displaystyle \begin{array}{l}{t}_1=\left\lceil \sum \limits_{i=1}^{k_1}{L}_i/{L}_{safe}\right\rceil, \kern0.5em 1\le {k}_1<p,\kern0.5em \mathrm{when}\sum \limits_{i=1}^{k_1}{L}_i\le {L}_{safe}\ \mathrm{and}\ \sum \limits_{i=1}^{k_1+1}{L}_i>{L}_{safe};\\ {}{t}_2=\left\lceil \sum \limits_{i={k}_1+1}^{k_2}{L}_i/{L}_{safe}\right\rceil, \kern0.5em {k}_1<{k}_2\le p,\mathrm{when}\sum \limits_{i={k}_1+1}^{k_2}{L}_i\le {L}_{safe}\ \mathrm{and}\ \sum \limits_{i={k}_1+1}^{k_2+1}{L}_i>{L}_{safe};\\ {}\dots \dots; \\ {}{t}_j=\left\lceil \sum \limits_{i={k}_{j-1}+1}^{{\mathrm{k}}_{\mathrm{j}}}{L}_i/{L}_{safe}\right\rceil, \kern0.5em 1<{\mathrm{k}}_j\le p,\mathrm{when}\sum \limits_{i={k}_{j-1}+1}^{{\mathrm{k}}_{\mathrm{j}}}{L}_i\le {L}_{safe}\ \mathrm{and}\ \sum \limits_{i={k}_{\mathrm{j}\hbox{-} 1}+1}^{{\mathrm{k}}_{\mathrm{j}}+1}{L}_i>{L}_{safe}.\end{array}} $$

The user interaction behaviors are divided into p classes according to the time sequence characteristics, and L1,..., Lp are the system load of the p classes. Supposed that the system is in an unstable state, i.e., the system instantaneous load is Lt > Lsafeat t moment. After reconstruction, the instantaneous system load is Lt' ≤ Lsafe at any time t' in the Δt period, and the total system load is equal to Lt in Δtd time.

Assumption 1

The large-scale network service system itself has a maximum system load Lmax.

Assumption 2

When the number of user behaviors at a certain time increases sharply, which leads to an abnormal system i.e., Lt > Lsafe, users can be classified according to the user interaction behavior time sequence.

3.1 Theorem. Load balancing reconstruction

Under Assumption 1 and 2, suppose that the system real-time load is \( {L}_{t_1}>{L}_{safe} \) at time t1. The user behaviors are classified accordingly, and the system behavior flows are reconstructed at the system interactions. So the instantaneous load is \( {L}_{t_2}\le {L}_{safe} \) at any time t2 in the delay time Δtd.

Proof

According to def.4, \( {B}_{t_1}\times l>{L}_{safe} \) is \( {L}_{t_1}>{L}_{safe} \) at time t1. The users are divided into p classes, so\( \sum \limits_{\mathrm{i}=1}^p{B}_{s_i}\times l>{L}_{safe} \).

Set \( \sum \limits_{\mathrm{i}=1}^{\mathrm{k}}{B}_{s_i}\times l\le {L}_{safe} \) and \( \sum \limits_{\mathrm{i}=1}^{\mathrm{k}+1}{B}_{s_i}\times l>{L}_{safe} \),1 ≤ k < p.

If the system dealing with the load \( \sum \limits_{i=1}^k{B}_{s_i}\times l \) requires one unit time, then \( {L}_{t_1}-\sum \limits_{i=1}^k{B}_{s_i}\times l=\sum \limits_{j=k+1}^p{B}_{s_j}\times l \), it takes time\( \sum \limits_{j=k+1}^p\left\lceil {L}_j/{L}_{safe}\right\rceil \).

So, we obtain \( \Delta {t}_d=1+\sum \limits_{j=k+1}^p\left\lceil {L}_j/{L}_{safe}\right\rceil \).

According to def.11, the instantaneous load is \( {L}_{t_2}\le {L}_{safe} \) at any time t2 in the delay time Δtd. Therefore, the system load \( {L}_{t_1} \) can be balanced at the Δtd.

4 Petri net model and algorithm for implementing system behavior reconstruction

In this section, the system behavior reconstruction model provides a theoretical support for the adaptive reconstruction process for load balancing in the large-scale network service system. This section will elaborate the implementation of the system behavior reconstruction model in the actual system behavior reconstruction process based on user classification.

4.1 Random fuzzy Petri nets with time delay

Delay Petri nets [20] define the occurrence of changes that needs to be completed by a units of time. This transition issue can be divided into the problem of time transition and immediate transition, Li [21] used the stochastic Petri nets to construct the social network system model. Milinkovic [22] proposed a fuzzy Petri net (FPN) model for estimating train delay. On this basis, in order to implement the system behavior reconstruction, this paper presents a timed stochastic fuzzy Petri Net.

Definition 12

Random fuzzy Petri net (DSFPN) with delay is a seven-tuple ∑ = (P, T; F, C, DI, τ, M), in which:

  1. (1)

    P is a set of places, P = {p1,p2,..,pn}(n ≥ 0), and the number of tokens in a place represents the number of user actions. The number of users arriving at the system over a period of time is subjected a to Poisson distribution;

  2. (2)

    T is a set of transitions, T = Tt ∪ Ti, Tt ∩ Ti = φ, where time transition set Tt = (T1,T2,...,Tk) includes the transitions of service behaviors; and instantaneous transition set Ti = (Tk + 1,Tk + 2,...,Tk + i) (k ≥ 0,i ≥ 0) includes the service transitions which are triggered by the system load beyond the warning point;

  3. (3)

    C is the control service transition set, C = {c1,c2,..,cm}(m ≥ 0);

  4. (4)

    DI is the time function on the transition set, DI:C → R0. For t ∈ C, DI(t) = a, it indicates that the occurrence of the transition t requires a units of time to complete;

  5. (5)

    F is a directed arc set, where F = FT ∪ FC, FT ⊆ (P × T) ∪ (T × P), FC ⊆ (P × C) × (C × P);

  6. (6)

    τ is a function on the transition set, which represents the triggering threshold of the transition, and its range is [0,∞).

4.2 Four basic structures of DSFPN model based on user classification

The system model based on Petri net is composed of four basic structures including sequence, parallel, selection and circulation [23]. Therefore, the following four basic structures of large-scale network service systems are modeled by the timed stochastic fuzzy Petri net. According to the user membership function, the divided groups of the user behaviors are different. Here, in order to describe the model conveniently, the user behaviors are divided into three groups without the loss of generality, as shown in Fig. 1.

Fig. 1
figure 1

Four basic structures of the timed stochastic fuzzy Petri Net

There are three kinds of transitions represented in Fig. 1. The first is the system behavior transition represented by white rectangles. The second is the instantaneous transition for judgment represented by a black line. When the system satisfies the judgment condition, the instantaneous judgment transition is triggered. If the original behavior transition and the judgment transition are in a conflict, the instantaneous transition is triggered in priority. The third is the control transition represented by the shadow rectangle, which controls the delay of each behavior group according to the system load.

We take the sequential structure as an example, since the other three cases are nearly similar. Petri net is used to model the behavior of the large-scale network service system. If the key interactive behavior node is in a sequential structure and the system load exceeds the warning point before the node is executed, the system is reconstructed as a DSFPN model to classify the system behaviors.

In Fig. 1, p1~p10 is the place set that represents a state. When the user behavior load submitted to the system exceeds the warning point of the system load, the behavior transition t2 and immediate transition t3 face a conflict. Because the priority of the immediate transition t3 is higher than the behavior transition t2, the immediate transition t3 will be triggered. According to the user interaction speed, the user behaviors are divided into three groups as slow, medium and fast speed. Immediate transitions t4, t6 and t8, respectively decide the group to which a behavior belongs. Immediate transitions t5, t7, and t9 respectively judge the relationship between the corresponding load of three behavior groups and the warning point. Control transitions c1, c2, c3 control the delay of three behavior groups.

4.3 DSFPN algorithm

In the above Petri net model, the reconstructed flow will be activated when the tokens in behavior places exceed a certain value, i.e., when the system load exceeds the safe load. According to the definitions 1~11, the DSFPN can obtain the system load, each classified group load and required delay accordingly. The corresponding algorithm of system behavior reconstruction is described as follows.

figure e

5 Examples and experiments

5.1 The DSFPN model of a booking system

Online shopping systems and ticket booking systems are typical representatives of the large-scale network service systems, such as Taobao and 12,306 ticket system. Online booking system usually undergo rapid expansion of user groups during seasonal periods such as holidays (Fig. 2). The system will be overwhelmed by this state, this may even paralyze the system. This situation demands necessary modifications in the service process to accommodate the changes occurring in the system load. The process flow in a ticket booking system is simulated (Fig. 3).

Fig. 2
figure 2

Train ticket booking website flow chart

Fig. 3
figure 3

A booking system flow chart

Now, an appropriate Petri net model is constructed according to the process flow in the ticket booking system, as shown in Fig. 4. The notations of the behavior transitions in Fig. 4 are shown in Table 1. The sequential user interaction behaviors in this system can be presented as follows: login, query, booking and paying. These interactive behaviors are reconstructed according to the defined DSFPN model, as shown in Fig. 5. The notations of the transitions of t15~t42, c1~c12 in Fig. 5 is shown in Table 2.

Fig. 4
figure 4

Petri net model of a booking system

Table 1 Transition description of t1~t14 in Fig. 4
Fig. 5
figure 5

Random fuzzy Petri net model of the booking system with the delay

Table 2 Transition description of t15~t42, c1~c12 in Fig. 5

5.2 Experimental design

The experiment simulates the system process based on the the booking system flow chart, as shown in Fig. 3. It detects and collects the number of user actions and records the time of the each of the user interaction behaviors. The tokens in the places present the user amount. We use a data generator to continuously increase the number of user actions, which is responsible for increasing the system load. The data for simulation is generated according to the flow chart of the 12,306 train booking system, as shown in Fig. 2, so that the experiment replicate the actual traffic characteristics such as gradually changing user behavior, little changes in the user behavior and suddenly changing user behavior.

According to the traffic characteristics, user behavior is simulated under three different types of load service states including [0,120000) as a good service state, [120000150000) as an unstable service state, [150000,+∞) as an unavailable service state, and it has been assumed that the user behavior is divided into three categories. Now in this simulation environment, we train the system data into the reconstruction algorithm. We implement the simulation system in C++ and use the drawing tool TeeChart to interpret the experimental renderings.

5.3 Experimental results analysis

The first set of simulated experimental data is applied to the load balancing algorithm of the system behavior reconstruction process; the experimental results are shown in Fig. 6, where the real-time load changes with time are illustrated. The real-time load exceeds the warning point, but it is not obvious, reflecting that the change is not significant. When the real-time load in the system exceeds the warning point, the reconstruction model is triggered and executed. So the user behaviors are divided into three categories and the system load is balanced. The load at any time does not exceed the warning point after the point of equalization, and the system is in good service state.

Fig. 6
figure 6

The first group of experimental results

The second experimental results are shown in Fig. 7. The real-time load exceeds the warning point significantly, but it does not exceed the maximum load which the system cannot withstand. That is, the magnitude of the change is significant. When the real-time load exceeds the warning point, the reconstruction model is triggered and the system load is balanced.

Fig. 7
figure 7

The second group of experimental results

The third experimental results are shown in Fig. 8. The real-time load exceeds the maximum load, that is, the magnitude of the change is huge. When the system real-time load exceeds the warning point, the reconstruction model is executed and the system is in good service state at any time.

Fig. 8
figure 8

The third group of experimental results

From the above three groups of experimental results, if the system real-time load exceeds the warning point, the system enters into the unstable service state, and the system reconstruction model is triggered. At this time, the user behaviors are classified by the time features of the behavior sequence, and the system load is balanced by controlling the interaction time of each type of user group. Therefore, the system load does not exceed the warning point at any time. The experimental results show that the system behavior reconstruction model based on time features of the user behavior sequence can effectively balance the system in good service condition at any time.

6 Conclusions

This paper proposes a system behavior reconstruction model based on the user interaction time sequence characteristics, with the aim of resolving the system overloading issue resulting from the rapid growth of user behaviors large-scale network service systems, by the way of delaying user behavioral time. Furthermore, a reconstruction algorithm has been developed based on random fuzzy Petri net with imposed delay.

The user behaviors have been classified based on a membership function and a membership criteria, which provided the basis for constructing the system behavior reconstruction model. In the actual service systems, the behavioral flow of different user groups has been constructed by the system behavior reconstruction model and an algorithm based on randomized fuzzy Petri net with delay has been implemented. The proposed model for the balancing the system load guarantee that the system is always in good running state. As a future work, we plan study the potentials of adaptive refactoring system in effectively balancing the system load.