Resource Distribution Estimation for Data-Intensive Workloads: Give Me My Share & No One Gets Hurt!

Khoshkbarforoushha, Alireza; Ranjan, Rajiv; Strazdins, Peter

doi:10.1007/978-3-319-33313-7_17

Alireza Khoshkbarforoushha^11,12,
Rajiv Ranjan^11,12 &
Peter Strazdins¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 567))

Included in the following conference series:

European Conference on Service-Oriented and Cloud Computing

1975 Accesses
1 Citations

Abstract

Robust resource share estimation of data-intensive workloads is integral to efficient workload management in a (virtualized) cluster where multiple systems co-exist and share the same infrastructure. However, developing a reliable resource estimator is quite challenging due to (i) heterogeneity of workloads (e.g. stream processing, batch processing, transactional, etc.) in a multi-system shared cluster, (ii) limited (in batch processing) or complete uncertainties (in stream processing) on input data size or arrival rates, and (iii) changing configurations from run to run. To address above challenges, we propose an inclusive framework and related techniques for workload profiling, similar job identification, and resource distribution prediction in a cluster. Our analysis shows that the framework can successfully estimate the whole spectrum of resource usage as probability distribution functions for wide ranges of data-intensive workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Notes

1.
http://www.microsoft.com/en-us/download/details.aspx?id=43376.
2.
Due to the large number of configuration parameters, only a subset of settings which have substantial impacts on resource and performance measures need to be logged.
3.
https://github.com/SWIMProjectUCB/SWIM/wiki.
4.
http://www.cloudera.com/content/cloudera/en/products-and-services/cloudera-enterprise/cloudera-manager.html.

References

Akdere, M., Çetintemel, U., Riondato, M., Upfal, E., Zdonik, S.B.: Learning-based query performance modeling and prediction. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 390–401. IEEE (2012)
Google Scholar
Arasu, A., Cherniack, M., Galvez, E., Maier, D., Maskey, A.S., Ryvkina, E., Stonebraker, M., Tibbetts, R.: Linear road: a stream data management benchmark. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30, pp. 480–491. VLDB Endowment (2004)
Google Scholar
Bishop, C.M.: Mixture density networks (1994)
Google Scholar
Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: a cross-industry study of mapreduce workloads. VLDB 5(12), 1802–1813 (2012)
Google Scholar
Curino, C., Difallah, D.E., Douglas, C., Krishnan, S., Ramakrishnan, R., Rao, S.: Reservation-based scheduling: if you’re late don’t blame us! In: Proceedings of the ACM Symposium on Cloud Computing, pp. 1–14. ACM (2014)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Ser. B (Methodological) 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Ganapathi, A., Chen, Y., Fox, A., Katz, R., Patterson, D.: Statistics-driven workload modeling for the cloud. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW), pp. 87–92. IEEE (2010)
Google Scholar
Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., Stoica, I.: Dominant resource fairness: fair allocation of multiple resource types. In: NSDI, vol. 11, p. 24 (2011)
Google Scholar
Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of mapreduce programs. VLDB 4(11), 1111–1122 (2011)
Google Scholar
Jamshidi, P., Ahmad, A., Pahl, C.: Cloud migration research: a systematic review. IEEE Trans. Cloud Comput. 1(2), 142–157 (2013)
Article Google Scholar
Khoshkbarforoushha, A., Ranjan, R.: Resource and performance distributionprediction for large scale analytics queries. TR-2015-01, ANU Technical report (2015)
Google Scholar
Khoshkbarforoushha, A., Ranjan, R., Gaire, R., Jayaraman, P.P., Hosking, J., Abbasnejad, E.: Resource usage estimation of data stream processing workloads in datacenter clouds. arXiv preprint arXiv:1501.07020 (2015)
Li, J., König, A.C., Narasayya, V., Chaudhuri, S.: Robust estimation of resource consumption for sql queries using statistical techniques. Proc. VLDB Endowment 5(11), 1555–1566 (2012)
Article Google Scholar
Mace, J., Bodik, P., Fonseca, R., Musuvathi, M.: Retro: targeted resource management in multi-tenant distributed systems. In: NSDI. USENIX (2015)
Google Scholar
Popescu, A.D., Balmin, A., Ercegovac, V., Ailamaki, A.: Predict: towards predicting the runtime of large scale iterative analytics. Proc. VLDB Endowment 6(14), 1678–1689 (2013)
Article Google Scholar
Popescu, A.D., Ercegovac, V., Balmin, A., Branco, M., Ailamaki, A.: Same queries, different data: Can we predict runtime performance? In: 2012 IEEE 28th International Conference on Data Engineering Workshops (ICDEW), pp. 275–280. IEEE (2012)
Google Scholar
Sarkar, M., Mondal, T., Roy, S., Mukherjee, N.: Resource requirement prediction using clone detection technique. Future Gener. Comput. Syst. 29(4), 936–952 (2013)
Article Google Scholar
Smith, W., Foster, I., Taylor, V.: Predicting application run times using historical information. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 122–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al.: Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th annual Symposium on Cloud Computing, p. 5. ACM (2013)
Google Scholar
Verma, A., Cherkasova, L., Campbell, R.H.: Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp. 235–244. ACM (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Australian National University, Canberra, Australia
Alireza Khoshkbarforoushha, Rajiv Ranjan & Peter Strazdins
CSIRO, Canberra, Australia
Alireza Khoshkbarforoushha & Rajiv Ranjan

Authors

Alireza Khoshkbarforoushha
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Ranjan
View author publications
You can also search for this author in PubMed Google Scholar
Peter Strazdins
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alireza Khoshkbarforoushha .

Editor information

Editors and Affiliations

DICIEAMA, University of Messina, Messina, Italy
Antonio Celesti
Software Evolution and Architecture Lab, University of Zürich Software Evolution and Architecture Lab, Zürich, Switzerland
Philipp Leitner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khoshkbarforoushha, A., Ranjan, R., Strazdins, P. (2016). Resource Distribution Estimation for Data-Intensive Workloads: Give Me My Share & No One Gets Hurt!. In: Celesti, A., Leitner, P. (eds) Advances in Service-Oriented and Cloud Computing. ESOCC 2015. Communications in Computer and Information Science, vol 567. Springer, Cham. https://doi.org/10.1007/978-3-319-33313-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-33313-7_17
Published: 27 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33312-0
Online ISBN: 978-3-319-33313-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics