Orchestration Tools for Big Data

Garg, Saurabh; Wang, Siqi; Ranjan, Rajiv

doi:10.1007/978-3-319-77525-8_43

Orchestration Tools for Big Data

Saurabh Garg³,
Siqi Wang³ &
Rajiv Ranjan⁴

Reference work entry
First Online: 01 January 2019

60 Accesses

Introduction

With increasing digitization of data and advancement of storage and communication technologies, we are generating quintillion bytes of data every day (Goloboff 1999). According to recent studies, this data collected will become as much as 40 yottabytes by 2020 (Chen et al. 2016). Every second more than 500 millions of tweet messages are posted on Twitter-like social media site. Square Kilometer Array (SKA) radio telescopes can transmit a massive 155.7 terabytes per second. This data explosion is called big data that refers to such large quantity of data that it cannot be efficiently handled by traditional data architectures. According to NIST (Grady et al. 2014), big data is characterized by four properties, namely, volume (size of the dataset), variety (data from multiple repositories, domains, or types), velocity (rate of data ingestion), and variability (change in data characteristics). This big data paradigm has brought both opportunities and challenges. Leaders from...

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 849.99; Price excludes VAT (USA)

Hardcover Book: USD 999.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Ardagna D, Casale G, Ciavotta M, Pérez JF, Wang W (2014) Quality-of-service in cloud computing: modeling techniques and their applications. J Internet Serv Appl 5(1):11
Article Google Scholar
Beygelzimer A, Riabov A, Sow DM, Turaga DS, Udrea O (2013) Big data exploration via automated orchestration of analytic workflows. In: ICAC, pp 153–158
Google Scholar
Chen S, Zhao J (2014) The requirements, challenges, and technologies for 5G of terrestrial mobile telecommunication. IEEE Commun Mag 52(5):36–43
Article Google Scholar
Chen X, Li J, Weng J, Ma J, Lou W (2016) Verifiable computation over large database with incremental updates. IEEE Trans Comput 65(10):3184–3195
Article MathSciNet MATH Google Scholar
Chodorow K (2013) MongoDB: the definitive guide: powerful and scalable data storage. O’Reilly Media, Inc., Beijing
Google Scholar
Goloboff PA (1999) Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics 15(4):415–428
Article Google Scholar
Grady NW, Underwood M, Roy A, Chang WL (2014) Big data: challenges, practices and technologies: NIST big data public working group workshop at IEEE big data 2014. In: 2014 IEEE international conference on Big Data (Big Data). IEEE, pp 11–15
Google Scholar
Huai Y, Chauhan A, Gates A, Hagleitner G, Hanson EN, O’Malley O, Pandey J, Yuan Y, Lee R, Zhang X (2014) Major technical advancements in Apache Hive. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data. ACM, pp 1235–1246
Google Scholar
Josephsen D (2007) Building a monitoring infrastructure with Nagios. Prentice Hall PTR, Upper Saddle River
Google Scholar
Kakadia D (2015) Apache Mesos essentials. Packt Publishing Ltd, Birmingham
Google Scholar
Karun AK, Chitharanjan, K (2013) A review on Hadoop—HDFS infrastructure extensions. In: 2013 IEEE conference on information & communication technologies (ICT). IEEE, pp 132–137
Google Scholar
Kim J, Yu SY, Park J (2016) Performance evaluation of multithreaded computations for cpu bounded task. In: 2016 international conference on platform technology and service (PlatCon). IEEE, pp 1–5
Google Scholar
Klein S (2017) Azure data factory. In: IoT solutions in Microsoft’s Azure IoT Suite. Springer, Apress, pp 105–122
Chapter Google Scholar
Lama P, Zhou X (2012) Aroma: automated resource allocation and configuration of MapReduce environment in the cloud. In: Proceedings of the 9th international conference on autonomic computing. ACM, pp 63–72
Google Scholar
Lee G, Katz RH (2011) Heterogeneity-aware resource allocation and scheduling in the cloud. In: HotCloud
Google Scholar
Massie ML, Chun BN, Culler DE (2004) The ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput 30(7):817–840
Article Google Scholar
Ranjan R, Garg S, Khoskbar AR, Solaiman E, James P, Georgakopoulos D (2017) Orchestrating bigdata analysis workflows. IEEE Cloud Comput 4(3): 20–28
Article Google Scholar
Sharma B, Prabhakar R, Lim S-H, Kandemir MT, Das CR (2012) Mrorchestrator: a fine-grained resource orchestration framework for MapReduce clusters. In: 2012 IEEE 5th international conference on cloud computing (CLOUD). IEEE, pp 1–8
Google Scholar
van der Veen JS, van der Waaij B, Lazovik E, Wijbrandi W, Meijer RJ (2015) Dynamically scaling Apache storm for the analysis of streaming data. In: 2015 IEEE first international conference on big data computing service and applications (BigDataService). IEEE, pp 154–161
Google Scholar
Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S et al (2013) Apache Hadoop YARN: Yet another resource negotiator. In Proceedings of the 4th annual symposium on cloud computing. ACM, p 5
Google Scholar
Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ et al (2016) Apache Spark: a unified engine for Big Data processing. Commun ACM 59(11):56–65
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering and ICT, University of Tasmania, Hobart, TAS, Australia
Saurabh Garg & Siqi Wang
School of Computing Science, Newcastle University, Newcastleupon Tyne, UK
Rajiv Ranjan

Authors

Saurabh Garg
View author publications
You can also search for this author in PubMed Google Scholar
Siqi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Ranjan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Saurabh Garg , Siqi Wang or Rajiv Ranjan .

Editor information

Editors and Affiliations

Institute of Computer Science, University of Tartu, Tartu, Estonia
Sherif Sakr
School of Information Technologies, Sydney University, Sydney, Australia
Albert Y. Zomaya

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Garg, S., Wang, S., Ranjan, R. (2019). Orchestration Tools for Big Data. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_43

Download citation

DOI: https://doi.org/10.1007/978-3-319-77525-8_43
Published: 20 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77524-1
Online ISBN: 978-3-319-77525-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics