Abstract
Cloud computing offers the flexibility to dynamically size the infrastructure in response to changes in workload demand. While both horizontal scaling and vertical scaling of infrastructure are supported by major cloud providers, these scaling options differ significantly in terms of their cost, provisioning time, and their impact on workload performance. Importantly, the efficacy of horizontal and vertical scaling critically depends on the workload characteristics, such as the workload’s parallelizability and its core scalability. In today’s cloud systems, the scaling decision is left to the users, requiring them to fully understand the trade-offs associated with the different scaling options. In this paper, we present our solution for optimizing the resource scaling of cloud deployments via implementation in OpenStack. The key component of our solution is the modeling engine that characterizes the workload and then quantitatively evaluates different scaling options for that workload. Our modeling engine leverages Amdahl’s Law to model service timescaling in scale-up environments and queueing-theoretic concepts to model performance scaling in scale-out environments. We further employ Kalman filtering to account for inaccuracies in the model-based methodology and to dynamically track changes in the workload and cloud environment.
Similar content being viewed by others
Notes
We acknowledge that the per-VM prices of various cloud providers depend on external factors such as profit margins and market fluctuations. As such, we will often use the amount of resources employed by the workload or application as a proxy for the “cost” and only use cloud providers’ listed prices for specific use cases.
Service time for a tier is defined as the total time taken to serve the workload request at that tier, assuming no resource contention. In other words, it is the minimum execution time at a tier.
References
Amazon Inc.: Amazon Auto Scaling. http://aws.amazon.com/autoscaling
SoftLayer Technologies, Inc., http://www.softlayer.com
Tseng, J.H., Yu, H., Nagar, S., Dubey, N., Franke, H., Pattnaik, P., Inoue, H., Nakatani, T.: Performance studies of commercial workloads on a multi-core system. In: Proceedings of the 2007 IEEE International Symposium on Workload Characterization, Boston, MA, USA, pp. 57–65 (2007)
Inoue, H., Nakatani, T.: Performance of multi-process and multi-thread processing on multi-core SMT processors. In: Proceedings of the 2010 IEEE International Symposium on Workload Characterization, Atlanta, GA, USA, pp. 209–218 (2010)
Guerin, X., Tan, W., Liu, Y., Seelam, S., Dube, P.: Evaluation of multi-core scalability bottlenecks in enterprise Java workloads. In: Proceedings of the 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Arlington, VA, USA, pp. 308–317 (2012)
Hill, M.D., Marty, M.R.: Amdahl’s law in the multicore era. Computer 41(7), 33–38 (2008)
Moreira, J.E., Michael, M.M., Da Silva, D., Shiloach, D., Dube, P., Zhang, L.: Scalability of the nutch search engine. In: Proceedings of the 21st Annual International Conference on Supercomputing, Seattle, WA, USA, pp. 3–12 (2007)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Openstack.org: OpenStack Open Source Cloud Computing Software. http://www.openstack.org
Opscode Inc.: Chef. http://www.opscode.com/chef
RUBiS: Rice University Bidding System. http://rubis.ow2.org
Intel Corp.: Intel Math Kernel Library - LINPACK 11.1 Update 2. https://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download
Bouchenak, S., Cox, A., Dropsho, S., Mittal, S., Zwaenepoel, W.: Caching dynamic web content: designing and analysing an aspect-oriented solution. In: Middleware 2006, (2006)
Urgaonkar, B., Pacifici, G., Shenoy, P., Spreitzer, M., Tantawi, A.: An analytical model for multi-tier internet services and its applications. In: Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Banff, Alberta, Canada, pp. 291–302 (2005)
Mosberger, D., Jin, T.: httperf—a tool for measuring web server performance. SIGMETRICS Perform. Eval. Rev. 26(3), 31–37 (1998)
Standard Performance Evaluation Corporation: SPECjbb2005. http://www.spec.org/jbb2005
Wikimedia Foundation: MediaWiki. http://www.mediawiki.org
libvirt virtualization API. http://libvirt.org
Amazon Web Services, Inc.: Amazon EC2 Pricing. http://aws.amazon.com/ec2/pricing
Rackspace, US Inc.: Cloud Servers Pricing—Rackspace Hosting. http://www.rackspace.com/cloud/servers/pricing
SoftLayer Technologies, Inc.: Build Your Own Cloud Server. http://www.softlayer.com/cloudlayer/build-your-own-cloud
Le Sueur, E., Heiser, G.: Dynamic voltage and frequency scaling: The laws of diminishing returns. In: Proceedings of the 2010 International Conference on Power Aware Computing and Systems, ser. HotPower’10, pp. 1–8 (2010)
VMware: VMware vCenter Server. http://www.vmware.com/products/vcenter-server
Why has CPU frequency ceased to grow? . https://software.intel.com/en-us/blogs/2014/02/19/why-has-cpu-frequency-ceased-to-grow (2014)
Microsoft, Inc.: Pricing Calculator | Windows Azure. http://www.windowsazure.com/en-us/pricing/calculator/?scenario=virtual-machines
Walrand, J.: An Introduction to Queueing Networks. Prentice Hall, Upper Saddle River (1988)
Simon, D.: Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches. Wiley, New York (2006)
Gandhi, A., Dube, P., Karve, A., Kochut, A., Zhang, L.: Adaptive, model-driven autoscaling for cloud applications. In: Proceedings of the 11th International Conference on Autonomic Computing, Philadelphia, PA, USA (2014)
Singhal, R.: Inside Intel Next Generation Nehalem Microarchitecture. Intel Developer Forum, San Francisco (2008)
Kongetira, P., Aingaran, K., Olukotun, K.: Niagara: a 32-way multithreaded Sparc processor. IEEE Micro 25(2), 21–29 (2005)
WAND Network Research Group: WITS: Waikato Internet Traffic Storage. http://www.wand.net.nz/wits/index.php
Google Cloud Platform: Auto Scaling on the Google Cloud Platform. http://cloud.google.com/resources/articles/auto-scaling-on-the-google-cloud-platform
WindowsAzure: How to Scale an Application. http://www.windowsazure.com/en-us/manage/services/cloud-services/how-to-scale-a-cloud-service
VMware, Inc.: VMware vFabric AppInsight. http://pubs.vmware.com/appinsight-5/index.jsp
Verma, A., Cherkasova, L., Campbell, R.H.: ARIA: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, ser. ICAC ’11, Karlsruhe, Germany, pp. 235–244 (2011)
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: a self-tuning system for big data analytics. In: Proceedings of the 5th Biennial Conference on Innovative Data Systems Research, ser. CIDR ’11, Asilomar, CA, USA, pp. 261–272 (2011)
Chen, K., Powers, J., Guo, S., Tian, F.: Cresp: towards optimal resource provisioning for mapreduce computing in public clouds. Parallel Distrib. Syst. IEEE Trans. 25(6), 1403–1412 (2014)
Ghit, B., Yigitbasi, N., Epema, D.: Resource management for dynamic mapreduce clusters in multicluster systems. In: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, ser. SCC ’12. Washington, DC, USA. IEEE Computer Society, pp. 1252–1259 (2012)
Ghit, B., Yigitbasi, N., Iosup, A., Epema, D.: Balanced resource allocations across multiple dynamic mapreduce clusters. SIGMETRICS Perform. Eval. Rev. 42(1), 329–341 (2014)
Tan, J., Chin, A., Hu, Z.Z., Hu, Y., Meng, S., Meng, X., Zhang, L.: Dynmr: dynamic mapreduce with reducetask interleaving and maptask backfilling. In: Proceedings of the Ninth European Conference on Computer Systems, ser. EuroSys ’14. New York, NY, USA: ACM, pp. 2:1–2:14 (2014)
Gandhi, A., Harchol-Balter, M., Raghunathan, R., Kozuch, M.: AutoScale: dynamic, robust capacity management for multi-tier data centers. Trans. Comput. Syst. 30, 14 (2012)
Krioukov, A., Mohan, P., Alspaugh, S., Keys, L., Culler, D., Katz, R.: NapSAC: design and implementation of a power-proportional web cluster. In: Proceedings of the 1st ACM SIGCOMM Workshop on Green Networking, New Delhi, India, pp. 15–22 (2010)
GmbH, ProfitBricks: Live Vertical Scaling. Technical Report, PROFITBRICKS IAAS (2012)
Kalyvianaki, E., Charalambous, T., Hand, S.: Self-adaptive and self-configured CPU resource provisioning for virtualized servers using Kalman filters. In: Proceedings of the 6th International Conference on Autonomic Computing, Barcelona, Spain, pp. 117–126 (2009)
Rowstron, A., Narayanan, D., Donnelly, A., O’Shea, G., Douglas, A.: Nobody ever got fired for using Hadoop on a cluster. In: Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing, Bern, Switzerland, pp. 2:1–2:5 (2012)
Gigaspaces Resource Center: Scale Up vs. Scale Out. http://www.gigaspaces.com/WhitePapers (2011)
Sevilla, M., Nassi, I., Ioannidou, K., Brandt, S., Maltzahn, C.: A framework for an in-depth comparison of scale-up and scale-out. In: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems, Denver, CO, USA, pp. 13–18 (2013)
Schwarzkopf, M., Murray, D.G., Hand, S.: The seven deadly sins of cloud computing research. In: Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing, Boston, MA, USA (2012)
Iqbal, W., Dailey, M.N., Carrera, D.: SLA-driven dynamic resource management for multi-tier web applications in a cloud. In: Proceedings of the 10th International Symposium on Cluster, Cloud and Grid Computing, Melbourne, Victoria, Australia, pp. 832–837 (2010)
Sedaghat, M., Hernandez-Rodriguez, F., Elmroth, E.: A virtual machine re-packing approach to the horizontal vs. vertical elasticity trade-off for cloud autoscaling. In: Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference, Miami, FL, USA, pp. 6:1–6:10 (2013)
Bonvin, N., Papaioannou, T., Aberer, K.: Autonomic SLA-driven provisioning for cloud applications. In: Proceedings of the 11th International Symposium on Cluster, Cloud and Grid Computing, Newport Beach, CA, USA, pp. 434–443 (2011)
Vaquero, L.M., Rodero-Merino, L., Buyya, R.: Dynamically scaling applications in the cloud. SIGCOMM Comput. Commun. Rev. 41(1), 45–52 (2011)
Michael, M., Moreira, J., Shiloach, D., Wisniewski, R.: Scale-up x scale-out: a case study using Nutch/Lucene. In: Proceedings of the 2007 International Parallel and Distributed Processing Symposium, Long Beach, CA, USA, pp. 1–8 (2007)
Yu, H., Moreira, J., Dube, P., Chung, I.-H., Zhang, L.: Performance studies of a WebSphere application, trade, in scale-out and scale-up environments. In: Proceedings of the 2007 International Parallel and Distributed Processing Symposium, Long Beach, CA, USA, pp. 1–8 (2007)
Brebner, P., Gosper, J.: How scalable is J2EE technology? SIGSOFT Softw. Eng. Notes 28(3), 4–4 (2003)
Appuswamy, R., Gkantsidis, C., Narayanan, D., Hodson, O., Rowstron, A.: Scale-up vs scale-out for Hadoop: time to rethink? In: Proceedings of the 4th annual symposium on cloud computing, Santa Clara, CA, USA, pp. 20:1–20:13 (2013)
Cao, Z., Huang, W., Chang, J.M.: A study of Java virtual machine scalability issues on SMP systems. In: Proceedings of the 2005 IEEE International Symposium on Workload Characterization, Austin, TX, USA, pp. 119–128 (2005)
Ishizaki, K., Nakatani, T., Daijavad, S.: Analyzing and improving performance scalability of commercial server workloads on a chip multiprocessor. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization, Austin, TX, USA, pp. 217–226 (2009)
Ohara, M., Nagpurkar, P., Ueda, Y., Ishizaki, K.: The data-centricity of Web 2.0 workloads and its impact on server performance. In: Proceedings of the 2009 International Symposium on Performance Analysis of Systems and Software, Boston, MA, USA, pp. 133–142 (2009)
Iyer, R., Bhat, M., Zhao, L., Illikkal, R., Makineni, S., Jones, M., Shiv, K., Newell, D.: Exploring small-scale and large-scale CMP architectures for commercial Java servers. In: Proceedings of the 2006 IEEE International Symposium on Workload Characterization, San Jose, CA, USA, pp. 191–200 (2006)
Dube, P., Yu, H., Zhang, L., Moreira, J.: Performance evaluation of a commercial application, trade, in scale-out environments. In: Proceedings of the 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, Istanbul, Turkey, pp. 252–259 (2007)
Kumar, D., Tantawi, A., Zhang, L.: Estimating model parameters of adaptive software systems in real-time. In: Ardagna, D., Zhang, L. (eds.) Run-Time Models for Self-managing Systems and Applications, ser. Autonomic Systems. Springer Basel, pp. 45–71 (2010). doi:10.1007/978-3-0346-0433-8_3
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Dr. Kai Sachs and Catalina Llado.
Appendix
Appendix
Figure 18 shows a queueing-network model of a general three-tier system with each tier representing a collection of homogeneous servers. We assume that the load at each tier is distributed uniformly across all the servers in that tier. The system is driven by a workload consisting of i distinct request classes, each class being characterized by its arrival rate, \(\lambda _i\), and end-to-end response time, \(R_i\). Let \(n_j\) be the number of servers at tier j. With homogeneous servers and perfect load balancing, the arrival rate of requests at any server in tier j is \({\lambda }_{ij} := \lambda _i/n_j\). Since servers at a tier are identical, for ease of analysis, we model each tier as a single representative server. With some abuse of terminology, we refer to the representative server at tier j as tier j. Let \(u_j \in [0,1)\) be the utilization of tier j. The background utilization of tier j is denoted by \(u_{0j}\) and models the resource utilization due to other jobs (not related to our workload) running on that tier. The end-to-end network latency for a class i request is denoted by \(d_i\). Let \(S_{ij} ({\ge }0)\) denote the average service time of a class i request at tier j. Assuming we have Poisson’s arrivals and a processor-sharing policy at each server, the stationary distribution of the queueing network is known to have a product form [26], for any general distribution of service time at servers. Under the product-form assumption, we have following analytical results from queueing theory:
While \(u_j\), \(R_i\), and \(\lambda _i\) \(\forall i,j\) can be monitored easily and are thus observable, the parameters \(S_{ij}\), \(u_{0j}\), and \(d_i\) are nontrivial to measure and are thus unobservable.
We employ a parameter estimation technique, Kalman filtering, to derive estimates for the unobservable parameters. Further, since the system parameters can dynamically change during runtime, we employ the Kalman filter as an online parameter estimator to continually adapt our parameter estimates. It is important to note that while the product form is shown to be a reasonable assumption for tiered web services [61], we only use it as an approximation for our complex system. By employing the Kalman filter to leverage the actual monitored values, we minimize our dependence on the approximation.
For a three-class, three-tier system (i.e., \(i=j=3\)), let \(\mathbf{z} := (u_1,u_2,u_3,R_1,R_2,R_3)^T = \mathbf{h (x)}\) and \(\mathbf{x} =\) \((u_{01},u_{02},u_{03},d_1,d_2,d_3,S_{11},S_{21},S_{31},S_{12},S_{22},S_{32},S_{13},S_{23},S_{33})^T\). Note that \(\mathbf{z}\) is a 6-dimensional vector, whereas \(\mathbf{x}\) is a 15-dimensional vector. The problem is to determine the unobservable parameters \(\mathbf{x}\) from measured values of \(\mathbf{z}\) and \(\mathbf{\lambda }=(\lambda _1,\lambda _2,\lambda _3)\).
The dynamic evolution of system parameters can be described through the following Kalman filtering equations [27]:
where \(\mathbf{F}(t)\) is the state transition model and \(\mathbf{H}(t)\) is the observation model mapping the true state space into observed state space. In our case, \(\mathbf{F}(t), \forall t\) is the identity matrix. The variables \(\mathbf{w}(t)\sim \mathcal{N}(0,\mathcal{Q}(t))\) and \(\mathbf{v}(t)\sim \mathcal{N}(0,\mathcal{R}(t))\) are process noise and measurement noise which are assumed to be zero-mean, multi-variate normal distributions with covariance matrices \(\mathcal{Q}(t)\) and \(\mathcal{R}(t)\), respectively. The matrices \(\mathcal{Q}(t)\) and \(\mathcal{R}(t)\) are not directly measurable but can be tuned via best practices [62].
Since the measurement model \(\mathbf{z}\) is a nonlinear function of the system state \(\mathbf{x}\) [see Eqs. (3) and (4)], we use the extended Kalman filer [27] with \(\mathbf{H}(t) = \left[ \frac{\partial \mathbf{h}}{\partial \mathbf{x}}\right] _{\mathbf{x}(t)}\) which for our model is a \(6 \times 15\) matrix with \(\mathbf{H}(t)_{ij} = \left[ \frac{\partial \mathbf{h_{i}}}{\partial \mathbf{x_{j}}}\right] _{\mathbf{x}(t)}\). Since \(\mathbf{x}(t)\) is not known at time t, we estimate it by \(\hat{\mathbf{x}}(t|t-1)\), which is the a priori estimate of x(t) given all the history up to time \(t-1\). The state of the filter is described by two variables \(\hat{\mathbf{x}}(t|t)\) and \(\mathbf{P}(t|t)\), where \(\hat{\mathbf{x}}(t|t)\) is the a posteriori estimate of state at time t and \(\mathbf{P}(t|t)\) is the a posteriori error covariance matrix which is a measure of estimated accuracy of the state estimate.
The Kalman filter has two phases: predict and update. In the predict phase a priori estimates of state and error matrix are calculated, and in the update phase, these estimates are refined using the current observation to get a posteriori estimates of state and error matrix. The filter model for the predict and update phase for our 3-class, 3-tier model is given by:
Predict:
Update:
We employ the above filter model by seeding our initial estimate of \(\hat{\mathbf{x}}(t|t-1)\) and \(\mathbf{P}(t|t-1)\) with random values, then applying the update equations by monitoring \(\mathbf{z}(t)\) to get \(\hat{\mathbf{x}}(t|t)\) and \(\mathbf{P}(t|t)\), and finally using the predict values to arrive at the estimated \(\hat{\mathbf{x}}(t|t-1)\) and \(\mathbf{P}(t|t-1)\). We continue this process iteratively at each 10-s monitoring interval to derive new estimates of the system state.
Rights and permissions
About this article
Cite this article
Gandhi, A., Dube, P., Karve, A. et al. Model-driven optimal resource scaling in cloud. Softw Syst Model 17, 509–526 (2018). https://doi.org/10.1007/s10270-017-0584-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10270-017-0584-y