Skip to main content

Orchestration Tools for Big Data

  • Reference work entry
  • First Online:
  • 60 Accesses

Introduction

With increasing digitization of data and advancement of storage and communication technologies, we are generating quintillion bytes of data every day (Goloboff 1999). According to recent studies, this data collected will become as much as 40 yottabytes by 2020 (Chen et al. 2016). Every second more than 500 millions of tweet messages are posted on Twitter-like social media site. Square Kilometer Array (SKA) radio telescopes can transmit a massive 155.7 terabytes per second. This data explosion is called big data that refers to such large quantity of data that it cannot be efficiently handled by traditional data architectures. According to NIST (Grady et al. 2014), big data is characterized by four properties, namely, volume (size of the dataset), variety (data from multiple repositories, domains, or types), velocity (rate of data ingestion), and variability (change in data characteristics). This big data paradigm has brought both opportunities and challenges. Leaders from...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   849.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   999.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://aws.amazon.com/lambda/details/

  2. 2.

    https://aws.amazon.com/cloudwatch/

  3. 3.

    http://calculator.s3.amazonaws.com/, http://www.windowsazure.com/en-us/pricing/calculator

References

  • Ardagna D, Casale G, Ciavotta M, Pérez JF, Wang W (2014) Quality-of-service in cloud computing: modeling techniques and their applications. J Internet Serv Appl 5(1):11

    Article  Google Scholar 

  • Beygelzimer A, Riabov A, Sow DM, Turaga DS, Udrea O (2013) Big data exploration via automated orchestration of analytic workflows. In: ICAC, pp 153–158

    Google Scholar 

  • Chen S, Zhao J (2014) The requirements, challenges, and technologies for 5G of terrestrial mobile telecommunication. IEEE Commun Mag 52(5):36–43

    Article  Google Scholar 

  • Chen X, Li J, Weng J, Ma J, Lou W (2016) Verifiable computation over large database with incremental updates. IEEE Trans Comput 65(10):3184–3195

    Article  MathSciNet  MATH  Google Scholar 

  • Chodorow K (2013) MongoDB: the definitive guide: powerful and scalable data storage. O’Reilly Media, Inc., Beijing

    Google Scholar 

  • Goloboff PA (1999) Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics 15(4):415–428

    Article  Google Scholar 

  • Grady NW, Underwood M, Roy A, Chang WL (2014) Big data: challenges, practices and technologies: NIST big data public working group workshop at IEEE big data 2014. In: 2014 IEEE international conference on Big Data (Big Data). IEEE, pp 11–15

    Google Scholar 

  • Huai Y, Chauhan A, Gates A, Hagleitner G, Hanson EN, O’Malley O, Pandey J, Yuan Y, Lee R, Zhang X (2014) Major technical advancements in Apache Hive. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data. ACM, pp 1235–1246

    Google Scholar 

  • Josephsen D (2007) Building a monitoring infrastructure with Nagios. Prentice Hall PTR, Upper Saddle River

    Google Scholar 

  • Kakadia D (2015) Apache Mesos essentials. Packt Publishing Ltd, Birmingham

    Google Scholar 

  • Karun AK, Chitharanjan, K (2013) A review on Hadoop—HDFS infrastructure extensions. In: 2013 IEEE conference on information & communication technologies (ICT). IEEE, pp 132–137

    Google Scholar 

  • Kim J, Yu SY, Park J (2016) Performance evaluation of multithreaded computations for cpu bounded task. In: 2016 international conference on platform technology and service (PlatCon). IEEE, pp 1–5

    Google Scholar 

  • Klein S (2017) Azure data factory. In: IoT solutions in Microsoft’s Azure IoT Suite. Springer, Apress, pp 105–122

    Chapter  Google Scholar 

  • Lama P, Zhou X (2012) Aroma: automated resource allocation and configuration of MapReduce environment in the cloud. In: Proceedings of the 9th international conference on autonomic computing. ACM, pp 63–72

    Google Scholar 

  • Lee G, Katz RH (2011) Heterogeneity-aware resource allocation and scheduling in the cloud. In: HotCloud

    Google Scholar 

  • Massie ML, Chun BN, Culler DE (2004) The ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput 30(7):817–840

    Article  Google Scholar 

  • Ranjan R, Garg S, Khoskbar AR, Solaiman E, James P, Georgakopoulos D (2017) Orchestrating bigdata analysis workflows. IEEE Cloud Comput 4(3): 20–28

    Article  Google Scholar 

  • Sharma B, Prabhakar R, Lim S-H, Kandemir MT, Das CR (2012) Mrorchestrator: a fine-grained resource orchestration framework for MapReduce clusters. In: 2012 IEEE 5th international conference on cloud computing (CLOUD). IEEE, pp 1–8

    Google Scholar 

  • van der Veen JS, van der Waaij B, Lazovik E, Wijbrandi W, Meijer RJ (2015) Dynamically scaling Apache storm for the analysis of streaming data. In: 2015 IEEE first international conference on big data computing service and applications (BigDataService). IEEE, pp 154–161

    Google Scholar 

  • Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S et al (2013) Apache Hadoop YARN: Yet another resource negotiator. In Proceedings of the 4th annual symposium on cloud computing. ACM, p 5

    Google Scholar 

  • Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ et al (2016) Apache Spark: a unified engine for Big Data processing. Commun ACM 59(11):56–65

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Saurabh Garg , Siqi Wang or Rajiv Ranjan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Garg, S., Wang, S., Ranjan, R. (2019). Orchestration Tools for Big Data. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_43

Download citation

Publish with us

Policies and ethics