Introduction
With increasing digitization of data and advancement of storage and communication technologies, we are generating quintillion bytes of data every day (Goloboff 1999). According to recent studies, this data collected will become as much as 40 yottabytes by 2020 (Chen et al. 2016). Every second more than 500 millions of tweet messages are posted on Twitter-like social media site. Square Kilometer Array (SKA) radio telescopes can transmit a massive 155.7 terabytes per second. This data explosion is called big data that refers to such large quantity of data that it cannot be efficiently handled by traditional data architectures. According to NIST (Grady et al. 2014), big data is characterized by four properties, namely, volume (size of the dataset), variety (data from multiple repositories, domains, or types), velocity (rate of data ingestion), and variability (change in data characteristics). This big data paradigm has brought both opportunities and challenges. Leaders from...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ardagna D, Casale G, Ciavotta M, Pérez JF, Wang W (2014) Quality-of-service in cloud computing: modeling techniques and their applications. J Internet Serv Appl 5(1):11
Beygelzimer A, Riabov A, Sow DM, Turaga DS, Udrea O (2013) Big data exploration via automated orchestration of analytic workflows. In: ICAC, pp 153–158
Chen S, Zhao J (2014) The requirements, challenges, and technologies for 5G of terrestrial mobile telecommunication. IEEE Commun Mag 52(5):36–43
Chen X, Li J, Weng J, Ma J, Lou W (2016) Verifiable computation over large database with incremental updates. IEEE Trans Comput 65(10):3184–3195
Chodorow K (2013) MongoDB: the definitive guide: powerful and scalable data storage. O’Reilly Media, Inc., Beijing
Goloboff PA (1999) Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics 15(4):415–428
Grady NW, Underwood M, Roy A, Chang WL (2014) Big data: challenges, practices and technologies: NIST big data public working group workshop at IEEE big data 2014. In: 2014 IEEE international conference on Big Data (Big Data). IEEE, pp 11–15
Huai Y, Chauhan A, Gates A, Hagleitner G, Hanson EN, O’Malley O, Pandey J, Yuan Y, Lee R, Zhang X (2014) Major technical advancements in Apache Hive. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data. ACM, pp 1235–1246
Josephsen D (2007) Building a monitoring infrastructure with Nagios. Prentice Hall PTR, Upper Saddle River
Kakadia D (2015) Apache Mesos essentials. Packt Publishing Ltd, Birmingham
Karun AK, Chitharanjan, K (2013) A review on Hadoop—HDFS infrastructure extensions. In: 2013 IEEE conference on information & communication technologies (ICT). IEEE, pp 132–137
Kim J, Yu SY, Park J (2016) Performance evaluation of multithreaded computations for cpu bounded task. In: 2016 international conference on platform technology and service (PlatCon). IEEE, pp 1–5
Klein S (2017) Azure data factory. In: IoT solutions in Microsoft’s Azure IoT Suite. Springer, Apress, pp 105–122
Lama P, Zhou X (2012) Aroma: automated resource allocation and configuration of MapReduce environment in the cloud. In: Proceedings of the 9th international conference on autonomic computing. ACM, pp 63–72
Lee G, Katz RH (2011) Heterogeneity-aware resource allocation and scheduling in the cloud. In: HotCloud
Massie ML, Chun BN, Culler DE (2004) The ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput 30(7):817–840
Ranjan R, Garg S, Khoskbar AR, Solaiman E, James P, Georgakopoulos D (2017) Orchestrating bigdata analysis workflows. IEEE Cloud Comput 4(3): 20–28
Sharma B, Prabhakar R, Lim S-H, Kandemir MT, Das CR (2012) Mrorchestrator: a fine-grained resource orchestration framework for MapReduce clusters. In: 2012 IEEE 5th international conference on cloud computing (CLOUD). IEEE, pp 1–8
van der Veen JS, van der Waaij B, Lazovik E, Wijbrandi W, Meijer RJ (2015) Dynamically scaling Apache storm for the analysis of streaming data. In: 2015 IEEE first international conference on big data computing service and applications (BigDataService). IEEE, pp 154–161
Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S et al (2013) Apache Hadoop YARN: Yet another resource negotiator. In Proceedings of the 4th annual symposium on cloud computing. ACM, p 5
Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ et al (2016) Apache Spark: a unified engine for Big Data processing. Commun ACM 59(11):56–65
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this entry
Cite this entry
Garg, S., Wang, S., Ranjan, R. (2019). Orchestration Tools for Big Data. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-77525-8_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77524-1
Online ISBN: 978-3-319-77525-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering