Skip to main content
Log in

Model-driven optimal resource scaling in cloud

  • Theme Section Paper
  • Published:
Software & Systems Modeling Aims and scope Submit manuscript

Abstract

Cloud computing offers the flexibility to dynamically size the infrastructure in response to changes in workload demand. While both horizontal scaling and vertical scaling of infrastructure are supported by major cloud providers, these scaling options differ significantly in terms of their cost, provisioning time, and their impact on workload performance. Importantly, the efficacy of horizontal and vertical scaling critically depends on the workload characteristics, such as the workload’s parallelizability and its core scalability. In today’s cloud systems, the scaling decision is left to the users, requiring them to fully understand the trade-offs associated with the different scaling options. In this paper, we present our solution for optimizing the resource scaling of cloud deployments via implementation in OpenStack. The key component of our solution is the modeling engine that characterizes the workload and then quantitatively evaluates different scaling options for that workload. Our modeling engine leverages Amdahl’s Law to model service timescaling in scale-up environments and queueing-theoretic concepts to model performance scaling in scale-out environments. We further employ Kalman filtering to account for inaccuracies in the model-based methodology and to dynamically track changes in the workload and cloud environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. We acknowledge that the per-VM prices of various cloud providers depend on external factors such as profit margins and market fluctuations. As such, we will often use the amount of resources employed by the workload or application as a proxy for the “cost” and only use cloud providers’ listed prices for specific use cases.

  2. Service time for a tier is defined as the total time taken to serve the workload request at that tier, assuming no resource contention. In other words, it is the minimum execution time at a tier.

References

  1. Amazon Inc.: Amazon Auto Scaling. http://aws.amazon.com/autoscaling

  2. SoftLayer Technologies, Inc., http://www.softlayer.com

  3. Tseng, J.H., Yu, H., Nagar, S., Dubey, N., Franke, H., Pattnaik, P., Inoue, H., Nakatani, T.: Performance studies of commercial workloads on a multi-core system. In: Proceedings of the 2007 IEEE International Symposium on Workload Characterization, Boston, MA, USA, pp. 57–65 (2007)

  4. Inoue, H., Nakatani, T.: Performance of multi-process and multi-thread processing on multi-core SMT processors. In: Proceedings of the 2010 IEEE International Symposium on Workload Characterization, Atlanta, GA, USA, pp. 209–218 (2010)

  5. Guerin, X., Tan, W., Liu, Y., Seelam, S., Dube, P.: Evaluation of multi-core scalability bottlenecks in enterprise Java workloads. In: Proceedings of the 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Arlington, VA, USA, pp. 308–317 (2012)

  6. Hill, M.D., Marty, M.R.: Amdahl’s law in the multicore era. Computer 41(7), 33–38 (2008)

    Article  Google Scholar 

  7. Moreira, J.E., Michael, M.M., Da Silva, D., Shiloach, D., Dube, P., Zhang, L.: Scalability of the nutch search engine. In: Proceedings of the 21st Annual International Conference on Supercomputing, Seattle, WA, USA, pp. 3–12 (2007)

  8. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)

    Article  Google Scholar 

  9. Openstack.org: OpenStack Open Source Cloud Computing Software. http://www.openstack.org

  10. Opscode Inc.: Chef. http://www.opscode.com/chef

  11. RUBiS: Rice University Bidding System. http://rubis.ow2.org

  12. Intel Corp.: Intel Math Kernel Library - LINPACK 11.1 Update 2. https://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download

  13. Bouchenak, S., Cox, A., Dropsho, S., Mittal, S., Zwaenepoel, W.: Caching dynamic web content: designing and analysing an aspect-oriented solution. In: Middleware 2006, (2006)

  14. Urgaonkar, B., Pacifici, G., Shenoy, P., Spreitzer, M., Tantawi, A.: An analytical model for multi-tier internet services and its applications. In: Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Banff, Alberta, Canada, pp. 291–302 (2005)

  15. Mosberger, D., Jin, T.: httperf—a tool for measuring web server performance. SIGMETRICS Perform. Eval. Rev. 26(3), 31–37 (1998)

    Article  Google Scholar 

  16. Standard Performance Evaluation Corporation: SPECjbb2005. http://www.spec.org/jbb2005

  17. Wikimedia Foundation: MediaWiki. http://www.mediawiki.org

  18. libvirt virtualization API. http://libvirt.org

  19. Amazon Web Services, Inc.: Amazon EC2 Pricing. http://aws.amazon.com/ec2/pricing

  20. Rackspace, US Inc.: Cloud Servers Pricing—Rackspace Hosting. http://www.rackspace.com/cloud/servers/pricing

  21. SoftLayer Technologies, Inc.: Build Your Own Cloud Server. http://www.softlayer.com/cloudlayer/build-your-own-cloud

  22. Le Sueur, E., Heiser, G.: Dynamic voltage and frequency scaling: The laws of diminishing returns. In: Proceedings of the 2010 International Conference on Power Aware Computing and Systems, ser. HotPower’10, pp. 1–8 (2010)

  23. VMware: VMware vCenter Server. http://www.vmware.com/products/vcenter-server

  24. Why has CPU frequency ceased to grow? . https://software.intel.com/en-us/blogs/2014/02/19/why-has-cpu-frequency-ceased-to-grow (2014)

  25. Microsoft, Inc.: Pricing Calculator | Windows Azure. http://www.windowsazure.com/en-us/pricing/calculator/?scenario=virtual-machines

  26. Walrand, J.: An Introduction to Queueing Networks. Prentice Hall, Upper Saddle River (1988)

    MATH  Google Scholar 

  27. Simon, D.: Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches. Wiley, New York (2006)

    Book  Google Scholar 

  28. Gandhi, A., Dube, P., Karve, A., Kochut, A., Zhang, L.: Adaptive, model-driven autoscaling for cloud applications. In: Proceedings of the 11th International Conference on Autonomic Computing, Philadelphia, PA, USA (2014)

  29. Singhal, R.: Inside Intel Next Generation Nehalem Microarchitecture. Intel Developer Forum, San Francisco (2008)

    Google Scholar 

  30. Kongetira, P., Aingaran, K., Olukotun, K.: Niagara: a 32-way multithreaded Sparc processor. IEEE Micro 25(2), 21–29 (2005)

    Article  Google Scholar 

  31. WAND Network Research Group: WITS: Waikato Internet Traffic Storage. http://www.wand.net.nz/wits/index.php

  32. Google Cloud Platform: Auto Scaling on the Google Cloud Platform. http://cloud.google.com/resources/articles/auto-scaling-on-the-google-cloud-platform

  33. WindowsAzure: How to Scale an Application. http://www.windowsazure.com/en-us/manage/services/cloud-services/how-to-scale-a-cloud-service

  34. VMware, Inc.: VMware vFabric AppInsight. http://pubs.vmware.com/appinsight-5/index.jsp

  35. Verma, A., Cherkasova, L., Campbell, R.H.: ARIA: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, ser. ICAC ’11, Karlsruhe, Germany, pp. 235–244 (2011)

  36. Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: a self-tuning system for big data analytics. In: Proceedings of the 5th Biennial Conference on Innovative Data Systems Research, ser. CIDR ’11, Asilomar, CA, USA, pp. 261–272 (2011)

  37. Chen, K., Powers, J., Guo, S., Tian, F.: Cresp: towards optimal resource provisioning for mapreduce computing in public clouds. Parallel Distrib. Syst. IEEE Trans. 25(6), 1403–1412 (2014)

    Article  Google Scholar 

  38. Ghit, B., Yigitbasi, N., Epema, D.: Resource management for dynamic mapreduce clusters in multicluster systems. In: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, ser. SCC ’12. Washington, DC, USA. IEEE Computer Society, pp. 1252–1259 (2012)

  39. Ghit, B., Yigitbasi, N., Iosup, A., Epema, D.: Balanced resource allocations across multiple dynamic mapreduce clusters. SIGMETRICS Perform. Eval. Rev. 42(1), 329–341 (2014)

    Article  Google Scholar 

  40. Tan, J., Chin, A., Hu, Z.Z., Hu, Y., Meng, S., Meng, X., Zhang, L.: Dynmr: dynamic mapreduce with reducetask interleaving and maptask backfilling. In: Proceedings of the Ninth European Conference on Computer Systems, ser. EuroSys ’14. New York, NY, USA: ACM, pp. 2:1–2:14 (2014)

  41. Gandhi, A., Harchol-Balter, M., Raghunathan, R., Kozuch, M.: AutoScale: dynamic, robust capacity management for multi-tier data centers. Trans. Comput. Syst. 30, 14 (2012)

    Google Scholar 

  42. Krioukov, A., Mohan, P., Alspaugh, S., Keys, L., Culler, D., Katz, R.: NapSAC: design and implementation of a power-proportional web cluster. In: Proceedings of the 1st ACM SIGCOMM Workshop on Green Networking, New Delhi, India, pp. 15–22 (2010)

  43. GmbH, ProfitBricks: Live Vertical Scaling. Technical Report, PROFITBRICKS IAAS (2012)

  44. Kalyvianaki, E., Charalambous, T., Hand, S.: Self-adaptive and self-configured CPU resource provisioning for virtualized servers using Kalman filters. In: Proceedings of the 6th International Conference on Autonomic Computing, Barcelona, Spain, pp. 117–126 (2009)

  45. Rowstron, A., Narayanan, D., Donnelly, A., O’Shea, G., Douglas, A.: Nobody ever got fired for using Hadoop on a cluster. In: Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing, Bern, Switzerland, pp. 2:1–2:5 (2012)

  46. Gigaspaces Resource Center: Scale Up vs. Scale Out. http://www.gigaspaces.com/WhitePapers (2011)

  47. Sevilla, M., Nassi, I., Ioannidou, K., Brandt, S., Maltzahn, C.: A framework for an in-depth comparison of scale-up and scale-out. In: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems, Denver, CO, USA, pp. 13–18 (2013)

  48. Schwarzkopf, M., Murray, D.G., Hand, S.: The seven deadly sins of cloud computing research. In: Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing, Boston, MA, USA (2012)

  49. Iqbal, W., Dailey, M.N., Carrera, D.: SLA-driven dynamic resource management for multi-tier web applications in a cloud. In: Proceedings of the 10th International Symposium on Cluster, Cloud and Grid Computing, Melbourne, Victoria, Australia, pp. 832–837 (2010)

  50. Sedaghat, M., Hernandez-Rodriguez, F., Elmroth, E.: A virtual machine re-packing approach to the horizontal vs. vertical elasticity trade-off for cloud autoscaling. In: Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference, Miami, FL, USA, pp. 6:1–6:10 (2013)

  51. Bonvin, N., Papaioannou, T., Aberer, K.: Autonomic SLA-driven provisioning for cloud applications. In: Proceedings of the 11th International Symposium on Cluster, Cloud and Grid Computing, Newport Beach, CA, USA, pp. 434–443 (2011)

  52. Vaquero, L.M., Rodero-Merino, L., Buyya, R.: Dynamically scaling applications in the cloud. SIGCOMM Comput. Commun. Rev. 41(1), 45–52 (2011)

    Article  Google Scholar 

  53. Michael, M., Moreira, J., Shiloach, D., Wisniewski, R.: Scale-up x scale-out: a case study using Nutch/Lucene. In: Proceedings of the 2007 International Parallel and Distributed Processing Symposium, Long Beach, CA, USA, pp. 1–8 (2007)

  54. Yu, H., Moreira, J., Dube, P., Chung, I.-H., Zhang, L.: Performance studies of a WebSphere application, trade, in scale-out and scale-up environments. In: Proceedings of the 2007 International Parallel and Distributed Processing Symposium, Long Beach, CA, USA, pp. 1–8 (2007)

  55. Brebner, P., Gosper, J.: How scalable is J2EE technology? SIGSOFT Softw. Eng. Notes 28(3), 4–4 (2003)

    Article  Google Scholar 

  56. Appuswamy, R., Gkantsidis, C., Narayanan, D., Hodson, O., Rowstron, A.: Scale-up vs scale-out for Hadoop: time to rethink? In: Proceedings of the 4th annual symposium on cloud computing, Santa Clara, CA, USA, pp. 20:1–20:13 (2013)

  57. Cao, Z., Huang, W., Chang, J.M.: A study of Java virtual machine scalability issues on SMP systems. In: Proceedings of the 2005 IEEE International Symposium on Workload Characterization, Austin, TX, USA, pp. 119–128 (2005)

  58. Ishizaki, K., Nakatani, T., Daijavad, S.: Analyzing and improving performance scalability of commercial server workloads on a chip multiprocessor. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization, Austin, TX, USA, pp. 217–226 (2009)

  59. Ohara, M., Nagpurkar, P., Ueda, Y., Ishizaki, K.: The data-centricity of Web 2.0 workloads and its impact on server performance. In: Proceedings of the 2009 International Symposium on Performance Analysis of Systems and Software, Boston, MA, USA, pp. 133–142 (2009)

  60. Iyer, R., Bhat, M., Zhao, L., Illikkal, R., Makineni, S., Jones, M., Shiv, K., Newell, D.: Exploring small-scale and large-scale CMP architectures for commercial Java servers. In: Proceedings of the 2006 IEEE International Symposium on Workload Characterization, San Jose, CA, USA, pp. 191–200 (2006)

  61. Dube, P., Yu, H., Zhang, L., Moreira, J.: Performance evaluation of a commercial application, trade, in scale-out environments. In: Proceedings of the 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, Istanbul, Turkey, pp. 252–259 (2007)

  62. Kumar, D., Tantawi, A., Zhang, L.: Estimating model parameters of adaptive software systems in real-time. In: Ardagna, D., Zhang, L. (eds.) Run-Time Models for Self-managing Systems and Applications, ser. Autonomic Systems. Springer Basel, pp. 45–71 (2010). doi:10.1007/978-3-0346-0433-8_3

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Parijat Dube.

Additional information

Communicated by Dr. Kai Sachs and Catalina Llado.

Appendix

Appendix

Figure 18 shows a queueing-network model of a general three-tier system with each tier representing a collection of homogeneous servers. We assume that the load at each tier is distributed uniformly across all the servers in that tier. The system is driven by a workload consisting of i distinct request classes, each class being characterized by its arrival rate, \(\lambda _i\), and end-to-end response time, \(R_i\). Let \(n_j\) be the number of servers at tier j. With homogeneous servers and perfect load balancing, the arrival rate of requests at any server in tier j is \({\lambda }_{ij} := \lambda _i/n_j\). Since servers at a tier are identical, for ease of analysis, we model each tier as a single representative server. With some abuse of terminology, we refer to the representative server at tier j as tier j. Let \(u_j \in [0,1)\) be the utilization of tier j. The background utilization of tier j is denoted by \(u_{0j}\) and models the resource utilization due to other jobs (not related to our workload) running on that tier. The end-to-end network latency for a class i request is denoted by \(d_i\). Let \(S_{ij} ({\ge }0)\) denote the average service time of a class i request at tier j. Assuming we have Poisson’s arrivals and a processor-sharing policy at each server, the stationary distribution of the queueing network is known to have a product form [26], for any general distribution of service time at servers. Under the product-form assumption, we have following analytical results from queueing theory:

$$\begin{aligned} u_j= & {} u_{0j}+\sum _{i}\lambda _{ij} S_{ij}, \quad \forall j \end{aligned}$$
(3)
$$\begin{aligned} R_i= & {} d_i + \sum _{j} \frac{S_{ij}}{1-u_j}, \quad \forall i \end{aligned}$$
(4)

While \(u_j\), \(R_i\), and \(\lambda _i\) \(\forall i,j\) can be monitored easily and are thus observable, the parameters \(S_{ij}\), \(u_{0j}\), and \(d_i\) are nontrivial to measure and are thus unobservable.

Fig. 18
figure 18

Queueing model for our system. The system parameters are: \(\lambda _{i}\), arrival rate of class i; \(R_i\), response time for class i; \(d_i\), network latency for class i; \(u_{0j}\), background utilization for tier j; \(S_{ij}\): service time of class i at tier j

We employ a parameter estimation technique, Kalman filtering, to derive estimates for the unobservable parameters. Further, since the system parameters can dynamically change during runtime, we employ the Kalman filter as an online parameter estimator to continually adapt our parameter estimates. It is important to note that while the product form is shown to be a reasonable assumption for tiered web services [61], we only use it as an approximation for our complex system. By employing the Kalman filter to leverage the actual monitored values, we minimize our dependence on the approximation.

For a three-class, three-tier system (i.e., \(i=j=3\)), let \(\mathbf{z} := (u_1,u_2,u_3,R_1,R_2,R_3)^T = \mathbf{h (x)}\) and \(\mathbf{x} =\) \((u_{01},u_{02},u_{03},d_1,d_2,d_3,S_{11},S_{21},S_{31},S_{12},S_{22},S_{32},S_{13},S_{23},S_{33})^T\). Note that \(\mathbf{z}\) is a 6-dimensional vector, whereas \(\mathbf{x}\) is a 15-dimensional vector. The problem is to determine the unobservable parameters \(\mathbf{x}\) from measured values of \(\mathbf{z}\) and \(\mathbf{\lambda }=(\lambda _1,\lambda _2,\lambda _3)\).

The dynamic evolution of system parameters can be described through the following Kalman filtering equations [27]:

$$\begin{aligned} \text {System State}&\mathbf{x}(t) =&\mathbf{F}(t)\mathbf{x}(t-1)+\mathbf{w}(t),\\ \text {Measurement Model}&\mathbf{z}(t) =&\mathbf{H}(t)\mathbf{x}(t)+\mathbf{v}(t), \end{aligned}$$

where \(\mathbf{F}(t)\) is the state transition model and \(\mathbf{H}(t)\) is the observation model mapping the true state space into observed state space. In our case, \(\mathbf{F}(t), \forall t\) is the identity matrix. The variables \(\mathbf{w}(t)\sim \mathcal{N}(0,\mathcal{Q}(t))\) and \(\mathbf{v}(t)\sim \mathcal{N}(0,\mathcal{R}(t))\) are process noise and measurement noise which are assumed to be zero-mean, multi-variate normal distributions with covariance matrices \(\mathcal{Q}(t)\) and \(\mathcal{R}(t)\), respectively. The matrices \(\mathcal{Q}(t)\) and \(\mathcal{R}(t)\) are not directly measurable but can be tuned via best practices [62].

Since the measurement model \(\mathbf{z}\) is a nonlinear function of the system state \(\mathbf{x}\) [see Eqs. (3) and (4)], we use the extended Kalman filer [27] with \(\mathbf{H}(t) = \left[ \frac{\partial \mathbf{h}}{\partial \mathbf{x}}\right] _{\mathbf{x}(t)}\) which for our model is a \(6 \times 15\) matrix with \(\mathbf{H}(t)_{ij} = \left[ \frac{\partial \mathbf{h_{i}}}{\partial \mathbf{x_{j}}}\right] _{\mathbf{x}(t)}\). Since \(\mathbf{x}(t)\) is not known at time t, we estimate it by \(\hat{\mathbf{x}}(t|t-1)\), which is the a priori estimate of x(t) given all the history up to time \(t-1\). The state of the filter is described by two variables \(\hat{\mathbf{x}}(t|t)\) and \(\mathbf{P}(t|t)\), where \(\hat{\mathbf{x}}(t|t)\) is the a posteriori estimate of state at time t and \(\mathbf{P}(t|t)\) is the a posteriori error covariance matrix which is a measure of estimated accuracy of the state estimate.

The Kalman filter has two phases: predict and update. In the predict phase a priori estimates of state and error matrix are calculated, and in the update phase, these estimates are refined using the current observation to get a posteriori estimates of state and error matrix. The filter model for the predict and update phase for our 3-class, 3-tier model is given by:

Predict:

$$\begin{aligned} \hat{\mathbf{x}}(t|t-1)= & {} \mathbf{F}(t) \hat{\mathbf{x}}(t-1|t-1) \\ \mathbf{P}(t|t-1)= & {} \mathbf{F}(t)\mathbf{P}(t-1|t-1)\mathbf{F}^T(t) + \mathcal{Q}(t) \\ \end{aligned}$$

Update:

$$\begin{aligned} \mathbf{y}(t)= & {} \mathbf{z}(t) - \mathbf{h}(\hat{\mathbf{x}}(t|t-1))\\ \mathbf{H}(t)= & {} \left[ \frac{\partial \mathbf{h}}{\partial \mathbf{x}}\right] _{\hat{\mathbf{x}}(t|t-1)}\\ \mathbf{S}(t)= & {} \mathbf{H}(t)\mathbf{P}(t|t-1)\mathbf{H}^T(t) + \mathcal{R}(t)\\ \mathbf{K}(t)= & {} \mathbf{P}(t|t-1)\mathbf{H}^T(t)\mathbf{S}^{-1}(t)\\ \hat{\mathbf{x}}(t|t)= & {} \hat{\mathbf{x}}(t|t-1)+\mathbf{K}(t)\mathbf{y}(t)\\ \mathbf{P}(t|t)= & {} (\mathbf{I}-\mathbf{K}(t)\mathbf{H}(t))\mathbf{P}(t|t-1) \end{aligned}$$

We employ the above filter model by seeding our initial estimate of \(\hat{\mathbf{x}}(t|t-1)\) and \(\mathbf{P}(t|t-1)\) with random values, then applying the update equations by monitoring \(\mathbf{z}(t)\) to get \(\hat{\mathbf{x}}(t|t)\) and \(\mathbf{P}(t|t)\), and finally using the predict values to arrive at the estimated \(\hat{\mathbf{x}}(t|t-1)\) and \(\mathbf{P}(t|t-1)\). We continue this process iteratively at each 10-s monitoring interval to derive new estimates of the system state.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gandhi, A., Dube, P., Karve, A. et al. Model-driven optimal resource scaling in cloud. Softw Syst Model 17, 509–526 (2018). https://doi.org/10.1007/s10270-017-0584-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10270-017-0584-y

Keywords

Navigation