# Adaptive ARMA Based Prediction of CPU Consumption of Servers into Datacenters

## Abstract

The optimization of the energy consumed by data centers is a major concern. Several techniques have tried in vain to overcome this issue for many years. In this panoply, predictive approaches start to emerge. They consist in predicting in advance the resource requirement of the Datacenter’s servers in order to reserve their right quantities at the right time and thus avoid either the waste caused by their over-supplying or the performance problems caused by their under-supplying. In this article, we explored the performance of ARMA models in the realization of this type of prediction. It appears that with good selection of parameters, the ARMA models produce reliable predictions but also about 30% higher than those performed with naive methods. These results could be used to feed virtual machine management algorithms into Cloud Datacenters, particularly in the decision-making of their placement or migration for the rationalization of provisioned resources.

## Keywords

Datacenter ARMA Prediction Energy consumption## 1 Introduction

The march of the world to the all-digital does not stop. All the vital sectors of the economy to sport and health are transformed day after day on the common base of computer technologies. With the emergence of social Internet and mobile terminals, big data and the Internet of things, there is almost no limit in men’s quest to improve their everyday lives with ever more sophisticated tools. To curb this non-stop evolution, virtualization technologies and cloud computing have been improved to meet the need for speed and handling of large amount of data. Starting in the past with enterprise computing in which many servers and storage arrays are deployed for each specific needs of the business, cloud computing introduces novels ways of hosting information systems based on virtualization. Based now in single datacenters, the computing of many companies from the least sensitive to the most demanding, Cloud operators therefore inherit obligations to ensure the satisfaction of users’ quality of service requirements while minimizing the amount of resources deployed. The virtualization techniques in perpetual evolution bring a part of the solution to this equation. In the face of increasing demand for cloud services, operators generally undertake to virtualize all or part of the deployed infrastructure, starting from switches to servers via storage devices and firewalls. Hypervisors, Network Function Virtualization (NV/NFV) and Software Defined Network (SDN) have all significantly improved the design of hosting systems and enabled the success of cloud computing services. However, the legitimate requirements of users to operate their online services without the risk of disconnection require operators to oversize the infrastructure and thus minimize the congestion factors of servers or storage spaces. Unfortunately, this engineering constraint seems to be very expensive in terms of the amount of energy deployed to power some non-use or underutilized infrastructure. Experts are increasingly concerned about an explosion in the energy demand of the cloud computing industry, which could surpass that of aeronautics by 2020 if suitable solutions are not provided [6]. From then on, new techniques were invented to optimize the quantities of resources deployed to satisfy their users’ needs. Some methods involve monitoring underutilized resources and turning them off manually. Others automate this task by setting high and low resource occupancy thresholds from which an action is triggered to readjust their affectations. It is also possible to implement more dynamic methods consisting on observing trends in traffic arriving on servers and defining the corresponding mathematical formulas in order to determine in a timely manner the necessary reassignment plans. This last category of strategy called the predictive approach has already produced quite satisfactory results. It is a question of finding, on the basis of an analysis of the trends in solicitation of the hardware resources of the servers, the distribution plans of virtual machines which best prevent the under-consumption and thus generating energy losses, or over-consumption which may lead to unavailability of services and violation of SLAs. Several methods compete for this purpose. Their choice depends strongly on the type of problem, the objectives pursued and the types of data available. The simplest method is the naive method of transferring the previous value as a prediction to the immediate horizon. Although it produces acceptables results, we choose here to implement a more robust method based on ARMA models. It is then necessary to demonstrate the adaptability of this method for the case of our study and formalize the approach of it’s implementation. This will enable us to develop a reliable technique for predicting future CPU consumption on virtuals servers within Datacenters and to anticipate the decision to adjust their placement and then rationalize the consumption of the supplied electrical power. We will also demonstrate the reliability of the results of our model in comparison with those obtained with the naive method. The rest of this article is as follows: We will present in the second section the similars works to ours, then we will discuss the theoretical framework of ARMA models in Sect. 3 before presenting in Sect. 4 the approach of implementation of our method, followed by simulations and results in Sect. 5 and finally, we’ll conclude this work.

## 2 Related Works

The predictive models are used in many cases. In the field of cloud computing these methods are increasingly sought in particular for the prediction of resource requests to the servers in order to improve their planning. In the goal of maximization of the use of server resources, [10] showed that it is possible to use the prediction of traffic using the HoltWinters and Box Jenkins autoregressive models and develop a system that assess the number of servers to maintain in operation during the night. A resource planning solution based on the prediction was also developed by [5]. Based on the prediction of resources demand they achieve a dynamic algorithm consolidation of servers and resource allocation to reduce the number of physical servers in operation. [2] also use ARIMA and Holt models to predict the traffic and energy consumption for the prevention of peaks and supplies improving.

To overcome the shortcomings of autoscalling, [1, 8] tested the impact of ARMA type models on resource planning and established that by this means, allowances and resources deallocation plans are respecting the QoS constraints. All these papers while pursued the same goal with ours, did not handle real Datacenters dataset but generally generate synthetics workloads to validate their assumptions. Our approach is to test the reliability of a so popular statistics tools ARMA on Google cluster data. The interest is to perform these experiments on data coming from a real data center to qualify the result to generalization. We process approximately 29 days of data on five of the most requested servers to catch maximum resource consumption trends as part of our future work.

## 3 Theoretical Framework

### 3.1 AR Process

*t*is written as a linear combination of past observations plus some white noise [9]. The time series \(\lbrace X_{t}: t\ge 0 \rbrace \) is an autoregressive process

*AR*(

*X*) of order

*p*with \((p> 0)\) if it can be written in the following form:

*AR*(

*x*) process. Using the forward shift operator

*Z*, where \(x_{t+1}=Z x_{t}\), it can be written as,

### 3.2 MA Process

*q*with \((q> 0)\) if it can be written in the following form:

### 3.3 ARMA Models

- Definition: The combination of both the autoregressive and the moving average processes is called
*ARMA*(*p*,*q*) where*p*and*q*are the orders of the*AR*and*MA*components respectively. In terms of prediction,*ARMA*models are frequently used in particular because of their adaptability to a wide range of data types, [3] and [4]. A process is called ARMA (p, q) if there exist real sequences \(\lbrace \varphi _k \rbrace \) and \(\lbrace \epsilon _k \rbrace \) such thatwhere \(\lbrace \epsilon _t\rbrace \sim N(0, \sigma _\epsilon ^2)\) is a white sequence. We can also use the polynomials \(\varphi (B)\) and \(\theta (B)\) to rewrite this model under the form:$$\begin{aligned} x_{t} = \sum _{k=1}^{p}{a_k}x_{t-k}+ \epsilon _{t} + \sum _{j=1}^{q}{b_j}\epsilon _{t-j}, \end{aligned}$$(5)$$A(Z)x_t = B(Z)\epsilon _t.$$ - Model order: Determining the best value of
*p*and*q*is called model order identification. There are several error criteria used for model identification, they aim at determining the best model that is the model minimizing an error criterion. We can cite the most frequently used that are the Akaike’s Information Criterion (AIC) and the Bayesian Information Criteria (BIC). These criteria apply a log-likelihood function and penalize more complex models having a great number of parameters. More precisely, let \(\log {L}\) denote the value of the maximized log-likelihood objective function for a model with*k*parameters fit to*n*data points, we have:$$ AIC= -2\log {L}+2(p+q)$$AIC is used when the observation size is small relative to the model dimension, usually \(n/(p+q) < 40\). For the BIC criterion, the penalty is also function of the sample size. The models providing the smallest values of the selected error criterion are chosen. These indicators will be used to analyze the optimum ratio (\(p+q,w\)), with$$BIC = -2\log {L}+\log {n}(p+q)$$*w*the size of the sliding horizon of past samples, for this type of application by using typical records. - Parameters identification of an ARMA model: Let us assume the following quadratic cost over an horizon of t past samples:where \(\epsilon _t\) is the prediction error given by$$\begin{aligned} V_t=\frac{1}{2}\sum _{i=1}^t \epsilon ^2_i \end{aligned}$$(6)where \(p_t\) is the prediction using the$$\begin{aligned} \epsilon _t=x_t-p_t \end{aligned}$$(7)
*ARMA*(*p*,*q*) model with parameters vector \(\theta _t=[ a_{1,t}, \dots , a_{p,t},b_{1,t}, \dots , b_{p,t}]^T\) that minimizes \(V_t\), where*T*means transposed. The parameters vector \(\theta _t\) is unknown and it is estimated by using the well-known Recursive Prediction Error Method (RPEM). To this end, the Gauss-Newton recursive algorithm over the cost function is used. The algorithm and its properties are given by the following theorem:**Theorem**: Consider the cost function \(V_t\) to be minimized, with respect to the parameter vector \(\theta _t\), by the following Gauss Newton recursion:$$\begin{aligned} \epsilon _t= & {} x_t-p_t \end{aligned}$$(8)$$\begin{aligned} M_t= & {} M_{t-1}-\frac{M_{t-1}\epsilon ^{'}_t \epsilon ^{'T}_{t} M_{t-1}}{1+\epsilon ^{'T}_{t}M_{t-1}\epsilon ^{'}_t }\end{aligned}$$(9)where$$\begin{aligned} \theta _t= & {} \theta _{t-1}+M_{t}\epsilon ^{'}_t \epsilon ^{'T}_{t} \end{aligned}$$(10)*t*is the iteration step, \(M_{t}\) is a square matrix of dimension (\(p+ q\)); and \(\epsilon ^{'}_t\) is a column vectors derivative of \(\epsilon _t\) with respect to the parameters in \(\theta _t\). Then, the following holds: \(\theta _t\) converges as \(k\rightarrow \infty \) with probability 1 to one element of the set of minimizers.where \(\sigma ^{'2}\) is the derivative of the prediction error variance with respect to \(\theta \). Proof: See L. Ljung, T. Soderstrom, (1983). Theory and Practice of Recursive Identification. MIT Press. 1987. The initial values are as follows: \(t = 1\), \(M_0\) is the identity matrix and \(\epsilon _t\) is a vector of zeros. \(\theta _0\) is obtained by doing the least squared estimation from data. This recursive algorithm can be repeated several times over the observation window where the parameters obtained in the previous stage are used as initial values of the new stage. In the convergence vector optimal parameters are obtained at each step$$\begin{aligned} \left\{ \theta |\sigma ^{'2}=0\right\} \end{aligned}$$(11)*t*. -
Sliding window strategy: To improve the accuracy of the predictions, we propose to use an adaptive

*ARMA*(*p*,*q*) model: at each time t, the parameters of the*ARMA*(*p*,*q*) model are computed on a sliding window of size*w*. More precisely, at time \(t-1\), the*ARMA*(*p*,*q*) model predicts the value of the time series at time*t*according using the*w*last observations in the window \([t-w, t-1]\) to compute the vector parameter \(\theta \) of the*ARMA*(*p*,*q*) model. This principle is illustrated in Fig. 1. Then, the sliding window moves one step ahead, starting at time \([t-w+1]\) and ending at time*t*, the parameters are computed using the observations in this window, and the prediction for time \(t + 1\) is given. Notice that the models order (*p*,*q*) is fixed, whereas the computation of parameter vector \(\theta \) is done at each time*t*using an iterative parameter estimation algorithm. That is why the*ARMA*(*p*,*q*) model is said adaptive.

## 4 Problem Statement

On this data we sampled per hour of use, we determine the corresponding ARMA model and then we calculate a prediction of consumption over the next hour. This result is then compared with that obtained on the same data by applying the “naive” method.

## 5 Simulations and Results

### 5.1 Dataset Description

The data used in the context of this work comes from traces recently published by Google [11]. These data refer to traffic collected for 29 days in a Datacenter containing approximately 12000 servers with various features. These traces are a rare one which provide traffic data from real data centers of significant size that can afford to make relevant analyzes and draw credible conclusions. All data can be divided into 3 categories namely: (i) information and events on servers, including the creation, modification and deletion of virtual servers in the data center; (ii) data on requests and tasks performed by servers and (iii) information on resource consumption of all servers throughout the data collection period. It is on this latter category of data that we focus our analysis. The folder task_usage indeed contain a collection of amounts of CPU (processor), RAM, disk space etc. consumed by the various tasks performed on each server in the data center. Because of the number of servers involved, the duration of collection (every 5 min for 29 days) and the volume of traffic processed in the Datacenter, the folder task_usage proves very large size (160 gigabyte) and therefore requires enormous computing power to processing and analysis.

We are particularly interested in processor quantities consumed on the most sought Datacenter servers to provide a relevant and representative picture of the solicitation servers. We therefore proceed to some extraction and filtering on data collection to obtain a subset corresponding to our need. Some data have been deliberately elided or processed by Google to ensure privacy. While these changes do not hinder our work, they do not provide true CPU values resources consumed on servers to perform computations close to reality. Therefore, this is sufficient to obtain resource consumption proportions likely to be confronted with real processor values and draw conclusions.

### 5.2 Parameters Identification of Our Energy Consumption ARMA Model

Our data file includes a high number of processor consumption values (more than 8000 per server), we choose to test the first 15 symmetric parameters pairs. We then obtain the optimal values for both criteria (AIC and BIC) for the couple \(p = q = 14\). With these values, we apply in a sliding window our model to the dataset to obtain a prediction based on the \(w=100\) previous observations through a recursive algorithm. Using the proposed adaptive *ARMA* prediction method to the five servers data both data and predictions are showen in Fig. 2.

### 5.3 Accuracy of Resources Consumption Prediction

The prediction errors are contained in an interval \([-0.2, 0.2]\) with a centering on 0 as shown in the histograms of the residuals, \(\epsilon _t\) in Fig. 3. Thus, the estimations are not biased and almost gaussian distributed. Also they are uncorrelated as shown in Fig. 4 which means the predictions are optimal.

Prediction errors

Server | Naive | ARMA |
---|---|---|

Server1 | 0.64 | 0.40 |

Server2 | 0.59 | 0.37 |

Server3 | 0.58 | 0.36 |

Server4 | 0.69 | 0.43 |

Server5 | 0.50 | 0.31 |

## 6 Conclusion

The search for solutions to optimize the use of resources within Datacenters deserves to be deepened on the understanding that the energy needs in the future will not decrease and above all, that energy is still not an extensible resource at infinity. Techniques compete to achieve this goal. However, it remains a challenge to establish models that can be applied to different scenarios and adapt to the real-time constraints and huge amount of data that cloud computing imposes. Through this article, we have tried to explore the possibility of adopting a predictive approach to mastering the resource requirements of a Datacenter over time in order to develop on this basis the best supply plans for these resources. Our work consisted in developing an ARMA model adapted to the type of data that is processed within Datacenters relative to the consumption of processor resources and to realize real-time predictions of their future needs. To do this, we relied on a data collection from a Google Datacenter in which we extracted, on 5 servers, the averages of processor consumption over 5 min over a total of 29 days. We then realized sliding windows of 100 data which allow us each time to predict the needs of processor resources in the horizon of one hour. Our results clearly show that the ARMA models, when the parameters are well determined, constitute a reliable means of predicting this kind of data since the residual is containing into [−0.2, 0.2] interval. This proves that most predictions are very close to the real values of consumption. The other contribution of this work was to compare our predictions based on the ARMA model with those made using the naive method. The results confirm once again the performance of our ARMA model, with predictive accuracy of about 30% higher than the naive method. Overall, predictive methods could significantly improve efficient use of physicals resources within Datacenters and thus promote the optimization of energy resources. The achievement of these results with the ARMA models and a real Datacenters experiment situation, is a good opening towards achieving this goal, since it could be integrated in the algorithms of virtuals machines placement or migrations allowing the optimization of electric power supplied.

## References

- 1.Davis, I., Hemmati, H., Holt, R.C., Godfrey, M.W., Neuse, D., Mankovskii, S.: Storm prediction in a cloud. In: 2013 ICSE Workshop on Principles of Engineering Service-Oriented Systems (PESOS), pp. 37–40. IEEE (2013)Google Scholar
- 2.Hao, M.C. et al.: A visual analytics approach for peak-preserving prediction of large seasonal time series. In: Computer Graphics Forum, vol. 30, pp. 691–700. Wiley Online Library (2011)Google Scholar
- 3.Hibon, M., Makridakis, S.: ARMA models and the Box-Jenkins methodology (1997)Google Scholar
- 4.Hoff, J.C.: A Practical Guide to Box-Jenkins Forecasting. Lifetime Learning Publications, Belmont (1983)Google Scholar
- 5.Huang, Q., Shuang, K., Xu, P., Li, J., Liu, X., Su, S.: Prediction-based dynamic resource scheduling for virtualized cloud systems. J. Netw.
**9**(2), 375–383 (2014)Google Scholar - 6.Kellner, I.L.: Turn down the heat. J. Bus. Strategy
**16**(6), 22–23 (1995)CrossRefGoogle Scholar - 7.Robinson, P.M.: The estimation of a nonlinear moving average model. Stoch. Process. Their Appl.
**5**(1), 81–90 (1977)MathSciNetCrossRefGoogle Scholar - 8.Roy, N., Dubey, A., Gokhale, A.: Efficient autoscaling in the cloud using predictive models for workload forecasting. In: 2011 IEEE International Conference on Cloud Computing (CLOUD), pp. 500–507. IEEE (2011)Google Scholar
- 9.Yule, G.U.: On a method of investigating periodicities in disturbed series, with special reference to Wolfer’s sunspot numbers. Philos. Trans. R. Soc. Lond. Ser. A
**226**, 267–298 (1927)CrossRefGoogle Scholar - 10.Vondra, T., Sedivy, J.: Maximizing utilization in private iaas clouds with heterogenous load through time series forecasting. Int. J. Adv. Syst. Meas.
**6**(1–2), 149–165 (2013)Google Scholar - 11.Wilkes, J.: More Google cluster data. Google research blog, November 2011Google Scholar