Study of Big Data Analytics Tool: Apache Spark

Prajapati, Gend Lal; Raghuwanshi, Rachana

doi:10.1007/978-981-16-4729-1_4

Gend Lal Prajapati⁴ &
Rachana Raghuwanshi⁵

397 Accesses
1 Citations

Abstract

In this chapter, we remark on machine learning and Big Data with their sample applications, process, and commonly used machine learning techniques like classification and clustering. These techniques are used to explore, evaluate, and leverage data. Also, tools and techniques that can be used to develop machine learning schemes to learn from data (or, Big Data) will be discussed. In addition to this, the role of distributed computing platforms like Apache Spark in applying machine learning to Big Data will be presented in detail. Apache Spark is a general-purpose cluster computing framework which works on the principle of distributed processing. It is open-source software used for fast computing. On receiving data, it can immediately process it. Apache Spark deals with historical data using batch processing and real-time processing. Machine learning is a subfield of Artificial Intelligence. Its main focus is on learning models that can be learned by experience (which is data in the case of machines). For example, a machine learning model can learn to recognize an image of a Dog by being shown lots and lots of images of Dogs. In this chapter, we assume that a reader has a basic understanding of Machine Learning. Ongoing through this book chapter, readers will be able to:

i.
Machine learning with Big Data, characteristics, sources, and applications are discussed.
ii.
Understand the comparative working of Apache Spark.
iii.
Analyze the various types of problems to identify suitable techniques.
iv.
Develop models using open-source tools like Skill Network Lab and IBM cloud.
v.
Explore problems of Big Data using machine learning techniques with Apache Spark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amirghodsi, S., Hall, B., Rajendran, M., & Mei, S. (2017). Apache Spark 2.x machine learning cookbook. Packt Publishing Ltd.
Google Scholar
Ankam, V. (2016). Big Data analytics. Packt Publishing Ltd.
Google Scholar
Ardagna, C., Bellandi, V., & Damiani, E. (2017) A model-driven methodology for Big Data analytics-as-a-service. In: Proceedings of the IEEE International Congress on Big Data, Honolulu, HI.
Google Scholar
Bahga, A., & Madisetti, V. (2018). Big Data analytics: A hands-on approach. ISBN-10: 099602557X, ISBN-13: 978-0996025577.
Google Scholar
Bironneau, M., Coleman, T. (2019). Machine learning with go quick start guide: Hands-on techniques for building supervised and unsupervised machine learning Workflows. Packt Publishing Ltd.
Google Scholar
Börzsönyi, S., Kossmann, D., & Stocker, K. (2001). The skyline operator. In: Proceedings of the ICDE (pp. 421–430). Heidelberg, New York: IEEE.
Google Scholar
Bouali, F., Guettala, A. E., & Venturini, G. (2016). VizAssist an interactive user assistant for visual data mining. Visual Computer, 1447–1463.
Google Scholar
Chambers, B. (2018). Spark: The definitive guide: Big Data processing made simple. O'Reilly Media, Inc.
Google Scholar
Chandarana, P., & Vijayalakshmi, M. (2014). Big Data analytics frameworks. In: International conference on circuits, systems, communication and information technology applications (CSCITA). INSPEC Accession Number: 14395170, Electronic ISBN: 978-1-4799-2494-3.
Google Scholar
D'Arcy, A., Kelleher, J. D., & Namee, B. M. (2015) Fundamentals of machine learning for predictive data analytics: Algorithms, worked examples, and case studies. MIT Press. ISBN 0262029448, 9780262029445.
Google Scholar
Dinesh Kumar, U., & Pradhan, M. (2019). Machine learning using python, Wiley. ISBN-10: 8126579900, ISBN-13: 978-8126579907.
Google Scholar
DT Editorial Services. (2015). Big Data, black book. Dreamtech Press. ASIN: B01LZEWQH6.
Google Scholar
Flach, P. (2012). Machine learning: The art and science of algorithms that make sense of data. Cambridge University Press. ISBN 1107096391, 9781107096394.
Google Scholar
Geron, A. (2017) Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems. O'Reilly Media, Inc.
Google Scholar
Golfarelli, M., Pirini, T., & Rizzi, S. (2017) Goal-based selection of visual representations for Big Data analytics. In: Proceedings of the MoBid, Valencia (pp. 47–57). Berlin: Springer.
Google Scholar
Grange, J. (2017). Machine learning for absolute beginners: A simple. Create Space Independent Publishing Platform.
Google Scholar
Grus, J. (2015). Data science from scratch. O'Reilly Media, Inc. ISBN 1491904399, 9781491904398.
Google Scholar
Luu, H. (2018). Beginning Apache Spark 2: With resilient distributed datasets, spark SQL, structured streaming and spark machine learning library, Apress
Google Scholar
Ibrahim, I. A., Albarrak, A. M., & Li, X. (2017). Constrained recommendations for query visualizations. Knowledge and Information Systems, 499–529.
Google Scholar
Karau, H., Konwinski, A., Wendell, P., & Zaharia, M. (2015). Learning spark: Lightning-fast Big Data analysis. O'Reilly Media, Inc.
Google Scholar
Keim, D., Ma, K.-L., & Qu, H. (2013). Big-data visualization. IEEE Computer Graphics and Applications. https://doi.org/10.1109/MCG.2013.54.
Marr, B. (2016). Big Data in practice: How 45 successful companies used Big Data analytics to deliver extraordinary results (1st edn.). Wiley. ASIN: B01DCOYDUS.
Google Scholar
Marsland, S. (2014). Machine learning: An algorithmic perspective. CRC Press.
Book Google Scholar
Mayer-Schönberger, V., & Cukier, K. (2013). Big Data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt
Google Scholar
Oprea, A., Li, Z., & Yen, T. (2015) Detection of early-stage enterprise infection by mining large-scale log data. In: Proceedings of the DSN (pp. 45–56), Rio De Janeiro, Brazil.
Google Scholar
Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge University Press.
Book Google Scholar
Thottuvaikkatumana, R. (2016). Apache Spark 2 for beginners. Packt Publishing Ltd.
Google Scholar
Walkowiak, S. (2016). Big Data analytics with R. Packt Publishing Ltd.
Google Scholar
Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data (1st edn.). Shroff/O'Reilly.
Google Scholar
Wills. J., Ryza, S., Laserson, U., & Owen, S. (2009) Advanced analytics with spark: patterns for learning from data at scale. O’Reilly.
Google Scholar
Wongsuphasawat, K., Moritz, D., Anand, A. (2016). Towards a general-purpose query language for visualization recommendation. In: Proceedings of the HILDA (p. 4), San Francisco, CA.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering, Devi Ahilya University, Khandwa RoadMadhya Pradesh, Indore, 452017, India
Gend Lal Prajapati
Computer Engineering, SAGE University, Bypass RoadMadhya Pradesh, Kailod, Indore, 452020, India
Rachana Raghuwanshi

Authors

Gend Lal Prajapati
View author publications
You can also search for this author in PubMed Google Scholar
Rachana Raghuwanshi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CSE&IT, Madhav Institute of Technology & Science, Gwalior, India
Sanjiv Sharma
Madhav Institute of Technology & Science, Gwalior, India
Valiur Rahaman
Electronics & Communications Engineering, Myanmar Institute of Information Technology, Mandalay, Myanmar
G. R. Sinha

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Prajapati, G.L., Raghuwanshi, R. (2021). Study of Big Data Analytics Tool: Apache Spark. In: Sharma, S., Rahaman, V., Sinha, G.R. (eds) Big Data Analytics in Cognitive Social Media and Literary Texts. Springer, Singapore. https://doi.org/10.1007/978-981-16-4729-1_4

Download citation

DOI: https://doi.org/10.1007/978-981-16-4729-1_4
Published: 11 October 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-4728-4
Online ISBN: 978-981-16-4729-1
eBook Packages: Literature, Cultural and Media StudiesLiterature, Cultural and Media Studies (R0)

Publish with us

Policies and ethics