Resource Usage Prediction in Distributed Key-Value Datastores

  • Francisco CruzEmail author
  • Francisco Maia
  • Miguel Matos
  • Rui Oliveira
  • João Paulo
  • José Pereira
  • Ricardo Vilaça
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9687)


In order to attain the promises of the Cloud Computing paradigm, systems need to be able to transparently adapt to environment changes. Such behavior benefits from the ability to predict those changes in order to handle them seamlessly. In this paper, we present a mechanism to accurately predict the resource usage of distributed key-value datastores. Our mechanism requires offline training but, in contrast with other approaches, it is sufficient to run it only once per hardware configuration and subsequently use it for online prediction of database performance under any circumstance. The mechanism accurately estimates the database resource usage for any request distribution with an average accuracy of 94 %, only by knowing two parameters: (i) cache hit ratio; and (ii) incoming throughput. Both input values can be observed in real time or synthesized for request allocation decisions. This novel approach is sufficiently simple and generic, while simultaneously being suitable for other practical applications.



This work is part-funded by: ERDF - European Regional Development Fund through the Operational Programme for Competitiveness and Internationalization - COMPETE 2020 Programme, and by National Funds through the FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project POCI-01-0145-FEDER-006961; and project LeanBigData (FP7-619606).


  1. 1.
    Ahmad, M., Bowman, I.T.: Predicting system performance for multi-tenant database workloads. In: Proceedings of the Fourth International Workshop on Testing Database Systems, DBTest 2011, pp. 6:1–6:6 (2011)Google Scholar
  2. 2.
    Ahmad, M.Y., Kemme, B.: Compaction management in distributed key-value datastores. Proc. VLDB Endow. 8(8), 850–861 (2015)CrossRefGoogle Scholar
  3. 3.
    Apache. Hadoop (2015).
  4. 4.
    Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: OSDI (2006)Google Scholar
  5. 5.
    Che, H., Tung, Y., Wang, Z.: Hierarchical web caching systems: modeling, design and experimental results. IEEE J. Sel. Areas Commun. 20, 1305–1314 (2002)CrossRefGoogle Scholar
  6. 6.
    Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: SoCC (2010)Google Scholar
  7. 7.
    Cruz, F., Maia, F., Matos, M., Oliveira, R., Paulo, J.A., Pereira, J., Vilaça, R.: Met: workload aware elasticity for NOSQL. In: Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys, pp. 183–196 (2013)Google Scholar
  8. 8.
    Curino, C., Jones, E.P., Madden, S., Balakrishnan, H.: Workload-aware database monitoring and consolidation. In: Proceedings of ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 313–324 (2011)Google Scholar
  9. 9.
    Desnoyers, P., Wood, T., Shenoy, P., Singh, R., Patil, S., Vin, H.: Modellus: automated modeling of complex internet data center applications. ACM Trans. Web 6, 1–29 (2012)CrossRefGoogle Scholar
  10. 10.
    Didona, D., Quaglia, F., Romano, P., Torre, E.: Enhancing performance prediction robustness by combining analytical modeling and machine learning. In: Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering, pp. 145–156 (2015)Google Scholar
  11. 11.
    Fisher, R.A.: On the probable error of a coefficient of correlation deduced from a small sample. Metron 1, 3–32 (1921)Google Scholar
  12. 12.
    George, L.: HBase: The Definitive Guide. O’Reilly, Sebastopol (2011)Google Scholar
  13. 13.
    Gong, Z., Gu, X., Wilkes, J.: Press: predictive elastic resource scaling for cloud systems. In: International Conference on Network and Service Management, pp. 9–16 (2010)Google Scholar
  14. 14.
    Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: wait-free coordination for internet-scale systems. In: Proceedings of USENIX Conference on USENIX Annual Technical Conference, USENIXATC 2010, p. 11 (2010)Google Scholar
  15. 15.
    Jennings, B., Stadler, R.: Resource management in clouds: survey and research challenges. J. Netw. Syst. Manage. 23(3), 567–619 (2014)CrossRefGoogle Scholar
  16. 16.
    Khan, A., Yan, X., Tao, S., Anerousis, N.: Workload characterization and prediction in the cloud: a multiple time series approach. In: Network Operations and Management Symposium (NOMS), pp. 1287–1294 (2012)Google Scholar
  17. 17.
    Konstantinou, I., Angelou, E., Tsoumakos, D., Boumpouka, C., Koziris, N., Sioutas, S.: Tiramola: elastic nosql provisioning through a cloud management platform. In: International Conference on Management of Data (SIGMOD Demo Track) (2012)Google Scholar
  18. 18.
    Lakshman, A., Malik, P.: Cassandra - a decentralized structured storage system. In: LADIS (2009)Google Scholar
  19. 19.
    Li, J., König, A.C., Narasayya, V., Chaudhuri, S.: Robust estimation of resource consumption for SQL queries using statistical techniques. Proc. VLDB 5, 1555–1566 (2012)CrossRefGoogle Scholar
  20. 20.
    Matsunaga, A., Fortes, J.A.B.: On the use of machine learning to predict the time and resources consumed by applications. In: Proceedings of IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, CCGRID, pp. 495–504 (2010)Google Scholar
  21. 21.
    Mozafari, B., Curino, C., Jindal, A., Madden, S.: Performance and resource modeling in highly-concurrent OLTP workloads. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 301–312 (2013)Google Scholar
  22. 22.
    Mozafari, B., Curino, C., Madden, S.: Dbseer: resource and performance prediction for building a next generation database cloud. In: Conference on Innovative Data Systems Research (CIDR) (2013)Google Scholar
  23. 23.
    Puzak, T.R.: Analysis of Cache Replacement-algorithms. Ph.D. thesis (1985). AAI8509594Google Scholar
  24. 24.
    Singh, R., Sharma, U., Cecchet, E., Shenoy, P.: Autonomic mix-aware provisioning for non-stationary data center workloads. In: Proceedings of the 7th International Conference on Autonomic Computing, pp. 21–30 (2010)Google Scholar
  25. 25.
    Sleator, D.D., Tarjan, R.E.: Amortized efficiency of list update and paging rules. Commun. ACM 28, 202–208 (1985)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Sudevalayam, S., Kulkarni, P.: Affinity-aware modeling of cpu usage for provisioning virtualized applications. In: 2011 IEEE International Conference on Cloud Computing (CLOUD), pp. 139–146 (2011)Google Scholar
  27. 27.
    Wood, T., Cherkasova, L., Ozonat, K., Shenoy, P.D.: Profiling and modeling resource usage of virtualized applications. In: Issarny, V., Schantz, R. (eds.) Middleware 2008. LNCS, vol. 5346, pp. 366–387. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  28. 28.
    Zhang, Q., Cherkasova, L., Smirni, E.: A regression-based analytic model for dynamic resource provisioning of multi-tier applications. In: Proceedings of the 4th International Conference on Autonomic Computing, p. 27 (2007)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2016

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (, which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  • Francisco Cruz
    • 1
    Email author
  • Francisco Maia
    • 1
  • Miguel Matos
    • 1
  • Rui Oliveira
    • 1
  • João Paulo
    • 1
  • José Pereira
    • 1
  • Ricardo Vilaça
    • 1
  1. 1.INESCTEC and Minho UniversityBragaPortugal

Personalised recommendations