Journal of Grid Computing

, Volume 10, Issue 1, pp 85–108 | Cite as

Cloud Resource Usage—Heavy Tailed Distributions Invalidating Traditional Capacity Planning Models

Article

Abstract

For years Capacity Planning professionals knew or suspected that various characteristics of computer usage have non-normal distribution. At the same time much of the traditional workload modeling and forecasting is based on mathematical techniques assuming some sort of normality of underlying distributions. If the dissonance between the existing and assumed distribution exists, then resulting capacity models are of lower quality, with possibly erroneous forecasts—and confidence intervals much wider than expected. This paper analyzes distribution of daily resource usage on three storage clusters for 478 days. For each day we consider the distribution of resource usage by customer accounts for five different resources: storage used, storage transactions executed, internal network transfer, egress transfer and inter-data-center transfer—7170 sample distributions in total. All distributions were highly imbalanced and most distribution samples have tails heavier than log-normal, exponential, or normal distributions. These findings spell significant problems for most models assuming normality. Mathematically: Central Limit Theorem does not apply to power-law distributions—so the ‘averaging’ effect cannot be counted on to help with modeling using traditional approach. Operationally: very high volatility found means that the ‘capacity buffers’ need to be large, leading to wasted capacity. Other, administrative, means need to be applied to reduce that. Overall the distributions of resource usage in cloud storage are so far from normal, even after usual transformations, that traditional approach to forecasting and capacity planning needs to be reconsidered. The distributions of log-returns of time series describing resource usage are much more heavy-tailed than similar distributions for stock indexes. Since no financial professional would use linear regression for stock market analysis and forecasting—it stands to reason that capacity planning should move toward employing tools accounting for heavy-tailed distributions, too.

Keywords

Capacity planning Resource usage Power law Probability distributions Volatility 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data. SIAM Rev. 51, 661–703 (2009)MathSciNetMATHCrossRefGoogle Scholar
  2. 2.
    Clauset, A., Young, M., Gleditsch, K.S.: J. Conflict Resolut. 51, 58 (2007)CrossRefGoogle Scholar
  3. 3.
    Goldstein, M.L., Morri, S.A., Yen, G.G.: Problems with fitting to the power-law distribution. Eur. Phys. J. B. 41(2), 255–258 (2004)CrossRefGoogle Scholar
  4. 4.
    Gunther, N.: Guerilla capacity planning. iUniverse (October 31, 2000), ISBN-10: 3642065570Google Scholar
  5. 5.
    James, A., Plank, M.J.: On fitting power laws to ecological data arxiv:0712.06131
  6. 6.
    Leland, W., Taqqu, M., Willinger, W., Wilson, D.: On the self-similar nature of ethernet traffic, IEEE/ACM TON (1994)Google Scholar
  7. 7.
    Lilifoers, H.W.: J. Amer. Statist. Assoc. 64, 387–389 (1969)Google Scholar
  8. 8.
    Mantegna, R.N., Stanley, H.E.: An Introduction to Econophysics: Correlations and Complexity in Finance. Cambridge University Press, Cambridge (1999)CrossRefGoogle Scholar
  9. 9.
    Marvasti, M.A.: How ‘Normal’ is your IT data. Proceedings of the Computer Measurement Group’s 2009 International Conference, www.cmg.org
  10. 10.
    Newman, M.E.J.: Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 46, 323–351 (2006)CrossRefGoogle Scholar
  11. 11.
    Shalizi, C.: Power law distributions, 1/f Noise, Long-Memory Time Series http://cscs.umich.edu/~crshalizi/notabene/power-laws.html
  12. 12.
    Van der Loo, M.P.J.: Distribution based outlier detection in univariate data, discussion paper 10003, Statistic NetherlandsGoogle Scholar
  13. 13.
    Agrawal, N., Bolosky, W.J., Douceur, J.R., Lorch, J.R.: A five-year study of file-system metadata. Trans. Storage 3,3,Article 9 (October 2007). doi:10.1145/1288783.1288788
  14. 14.
    Li, H.: Workload dynamics on clusters and grids. J. Supercomput. 47(1), (2009)Google Scholar
  15. 15.
    Li, H., Muskulus, M., Wolters, L.: Modeling job arrivals in a data-intensive grid. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) Int’l. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), Revised Selected Papers, In: Lecture Notes in Computer Science, vol. 4376, pp. 210–231. Springer (2007)Google Scholar
  16. 16.
    Litzkow, M.J., Livny, M., Mutka, M.W.: Condor-a hunter of idle workstations, 8th International Conference on Distributed Computing Systems, pp. 104–111 (1988)Google Scholar
  17. 17.
    Iosup, A., Li, H., Jan, M., Anoep, S, Dumitrescu, C., Wolters, L., Dick, H., Epema, J.: The grid workloads archive. Future Gener. Comp. Sy. 24(7), 672–686 (2008)CrossRefGoogle Scholar
  18. 18.
    Li, H., Heusdens, R., Muskulus, M.V., Wolters, L.: Analysis and synthesis of pseudo-periodic job arrivals in grids: a matching pursuit approach IEEE/ACM Intl. Symp. on Cluster Computing and the Grid (CCGrid) IEEE Computer Society, pp. 183–196 (2007)Google Scholar
  19. 19.
    Li, H., Muskulus, M., Wolters, L.: Modeling job arrivals in a data-intensive grid. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) Int’l. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), Revised Selected Papers, In: Lecture Notes in Computer Science, vol. 4376, pp. 210–231. Springer (2007)Google Scholar
  20. 20.
    Li, H., Wolters, L.: Towards a better understanding of workload dynamics on data-intensive clusters and grids. In: Int’l. Parallel &Distributed Processing Symposium (IPDPS), IEEE Computer Society, pp. 1–10 (2007)Google Scholar
  21. 21.
    Li, H.: Workload characterization, modeling, and prediction in grid computing. PhD thesis, https://openaccess.leidenuniv.nl/bitstream/1887/12574/1/Thesis.pdf
  22. 22.
    Park, C., Hernandez-Campos, F., Marron, J.S., Donelson Smith, F.: Long-range dependence in a changing internet traffic mix. Comput. Netw. 48(3), 401–422 (2005)CrossRefGoogle Scholar
  23. 23.
    Allspaw, J.: The art of capacity planning: scaling web resources, O’Reilly Media; 1 edn. (September 15, 2008), ISBN-10: 0596518579Google Scholar
  24. 24.
    Albert, R., Barabási, A.-L.: Statistical mechanics of complex networks. Rev. Modern Phys. 74, 47–97 (2002)MathSciNetMATHCrossRefGoogle Scholar
  25. 25.
    Rasch, D., Guiard, V.: The robustness of parametric statistical methods. Psychol. Sci. 46(2), 175–208 (2004)Google Scholar
  26. 26.
    Peterson, D., Grossman, R.: Power laws in large shop DASD I/O Activity, CMG Proceedings, pp. 822–833 (Dec. 1995)Google Scholar
  27. 27.
    Peterson, D., Adams, D.: Fractal patterns in DASD I/O Traffic, CMG Proceedings, Dec, (1996)Google Scholar
  28. 28.
    Milligan, C., Peterson, D.: A practical approach for estimating true I/O skew, CMG Proceedings, pp. 970–981 (Dec. 1994)Google Scholar
  29. 29.
    Peterson, D.: Data center I/O patterns and power laws, CMG Proceedings (1996)Google Scholar
  30. 30.
    Adamic, L.A.: Zipf, Power-laws, and Pareto—a ranking tutorial. Xerox Palo Alto Research Center, Palo Alto, CA. Retrieved on 2011-07-26. http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html
  31. 31.
    Nicholls, P.T.: J. Am. Soc. Inform. Sci. 40, 379–385 (1989)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.Microsoft Corporation, Windows AzureRedmondUSA

Personalised recommendations