Skip to main content

Study of Big Data Analytics Tool: Apache Spark

  • Chapter
  • First Online:
Big Data Analytics in Cognitive Social Media and Literary Texts

Abstract

In this chapter, we remark on machine learning and Big Data with their sample applications, process, and commonly used machine learning techniques like classification and clustering. These techniques are used to explore, evaluate, and leverage data. Also, tools and techniques that can be used to develop machine learning schemes to learn from data (or, Big Data) will be discussed. In addition to this, the role of distributed computing platforms like Apache Spark in applying machine learning to Big Data will be presented in detail. Apache Spark is a general-purpose cluster computing framework which works on the principle of distributed processing. It is open-source software used for fast computing. On receiving data, it can immediately process it. Apache Spark deals with historical data using batch processing and real-time processing. Machine learning is a subfield of Artificial Intelligence. Its main focus is on learning models that can be learned by experience (which is data in the case of machines). For example, a machine learning model can learn to recognize an image of a Dog by being shown lots and lots of images of Dogs. In this chapter, we assume that a reader has a basic understanding of Machine Learning. Ongoing through this book chapter, readers will be able to:

  1. i.

    Machine learning with Big Data, characteristics, sources, and applications are discussed.

  2. ii.

    Understand the comparative working of Apache Spark.

  3. iii.

    Analyze the various types of problems to identify suitable techniques.

  4. iv.

    Develop models using open-source tools like Skill Network Lab and IBM cloud.

  5. v.

    Explore problems of Big Data using machine learning techniques with Apache Spark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Amirghodsi, S., Hall, B., Rajendran, M., & Mei, S. (2017). Apache Spark 2.x machine learning cookbook. Packt Publishing Ltd.

    Google Scholar 

  • Ankam, V. (2016). Big Data analytics. Packt Publishing Ltd.

    Google Scholar 

  • Ardagna, C., Bellandi, V., & Damiani, E. (2017) A model-driven methodology for Big Data analytics-as-a-service. In: Proceedings of the IEEE International Congress on Big Data, Honolulu, HI.

    Google Scholar 

  • Bahga, A., & Madisetti, V. (2018). Big Data analytics: A hands-on approach. ISBN-10: 099602557X, ISBN-13: 978-0996025577.

    Google Scholar 

  • Bironneau, M., Coleman, T. (2019). Machine learning with go quick start guide: Hands-on techniques for building supervised and unsupervised machine learning Workflows. Packt Publishing Ltd.

    Google Scholar 

  • Börzsönyi, S., Kossmann, D., & Stocker, K. (2001). The skyline operator. In: Proceedings of the ICDE (pp. 421–430). Heidelberg, New York: IEEE.

    Google Scholar 

  • Bouali, F., Guettala, A. E., & Venturini, G. (2016). VizAssist an interactive user assistant for visual data mining. Visual Computer, 1447–1463.

    Google Scholar 

  • Chambers, B. (2018). Spark: The definitive guide: Big Data processing made simple. O'Reilly Media, Inc.

    Google Scholar 

  • Chandarana, P., & Vijayalakshmi, M. (2014). Big Data analytics frameworks. In: International conference on circuits, systems, communication and information technology applications (CSCITA). INSPEC Accession Number: 14395170, Electronic ISBN: 978-1-4799-2494-3.

    Google Scholar 

  • D'Arcy, A., Kelleher, J. D., & Namee, B. M. (2015) Fundamentals of machine learning for predictive data analytics: Algorithms, worked examples, and case studies. MIT Press. ISBN 0262029448, 9780262029445.

    Google Scholar 

  • Dinesh Kumar, U., & Pradhan, M. (2019). Machine learning using python, Wiley. ISBN-10: 8126579900, ISBN-13: 978-8126579907.

    Google Scholar 

  • DT Editorial Services. (2015). Big Data, black book. Dreamtech Press. ASIN: B01LZEWQH6.

    Google Scholar 

  • Flach, P. (2012). Machine learning: The art and science of algorithms that make sense of data. Cambridge University Press. ISBN 1107096391, 9781107096394.

    Google Scholar 

  • Geron, A. (2017) Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems. O'Reilly Media, Inc.

    Google Scholar 

  • Golfarelli, M., Pirini, T., & Rizzi, S. (2017) Goal-based selection of visual representations for Big Data analytics. In: Proceedings of the MoBid, Valencia (pp. 47–57). Berlin: Springer.

    Google Scholar 

  • Grange, J. (2017). Machine learning for absolute beginners: A simple. Create Space Independent Publishing Platform.

    Google Scholar 

  • Grus, J. (2015). Data science from scratch. O'Reilly Media, Inc. ISBN 1491904399, 9781491904398.

    Google Scholar 

  • Luu, H. (2018). Beginning Apache Spark 2: With resilient distributed datasets, spark SQL, structured streaming and spark machine learning library, Apress

    Google Scholar 

  • Ibrahim, I. A., Albarrak, A. M., & Li, X. (2017). Constrained recommendations for query visualizations. Knowledge and Information Systems, 499–529.

    Google Scholar 

  • Karau, H., Konwinski, A., Wendell, P., & Zaharia, M. (2015). Learning spark: Lightning-fast Big Data analysis. O'Reilly Media, Inc.

    Google Scholar 

  • Keim, D., Ma, K.-L., & Qu, H. (2013). Big-data visualization. IEEE Computer Graphics and Applications. https://doi.org/10.1109/MCG.2013.54.

  • Marr, B. (2016). Big Data in practice: How 45 successful companies used Big Data analytics to deliver extraordinary results (1st edn.). Wiley. ASIN: B01DCOYDUS.

    Google Scholar 

  • Marsland, S. (2014). Machine learning: An algorithmic perspective. CRC Press.

    Book  Google Scholar 

  • Mayer-Schönberger, V., & Cukier, K. (2013). Big Data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt

    Google Scholar 

  • Oprea, A., Li, Z., & Yen, T. (2015) Detection of early-stage enterprise infection by mining large-scale log data. In: Proceedings of the DSN (pp. 45–56), Rio De Janeiro, Brazil.

    Google Scholar 

  • Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge University Press.

    Book  Google Scholar 

  • Thottuvaikkatumana, R. (2016). Apache Spark 2 for beginners. Packt Publishing Ltd.

    Google Scholar 

  • Walkowiak, S. (2016). Big Data analytics with R. Packt Publishing Ltd.

    Google Scholar 

  • Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data (1st edn.). Shroff/O'Reilly.

    Google Scholar 

  • Wills. J., Ryza, S., Laserson, U., & Owen, S. (2009) Advanced analytics with spark: patterns for learning from data at scale. O’Reilly.

    Google Scholar 

  • Wongsuphasawat, K., Moritz, D., Anand, A. (2016). Towards a general-purpose query language for visualization recommendation. In: Proceedings of the HILDA (p. 4), San Francisco, CA.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Prajapati, G.L., Raghuwanshi, R. (2021). Study of Big Data Analytics Tool: Apache Spark. In: Sharma, S., Rahaman, V., Sinha, G.R. (eds) Big Data Analytics in Cognitive Social Media and Literary Texts. Springer, Singapore. https://doi.org/10.1007/978-981-16-4729-1_4

Download citation

Publish with us

Policies and ethics