Abstract
In this chapter, we remark on machine learning and Big Data with their sample applications, process, and commonly used machine learning techniques like classification and clustering. These techniques are used to explore, evaluate, and leverage data. Also, tools and techniques that can be used to develop machine learning schemes to learn from data (or, Big Data) will be discussed. In addition to this, the role of distributed computing platforms like Apache Spark in applying machine learning to Big Data will be presented in detail. Apache Spark is a general-purpose cluster computing framework which works on the principle of distributed processing. It is open-source software used for fast computing. On receiving data, it can immediately process it. Apache Spark deals with historical data using batch processing and real-time processing. Machine learning is a subfield of Artificial Intelligence. Its main focus is on learning models that can be learned by experience (which is data in the case of machines). For example, a machine learning model can learn to recognize an image of a Dog by being shown lots and lots of images of Dogs. In this chapter, we assume that a reader has a basic understanding of Machine Learning. Ongoing through this book chapter, readers will be able to:
-
i.
Machine learning with Big Data, characteristics, sources, and applications are discussed.
-
ii.
Understand the comparative working of Apache Spark.
-
iii.
Analyze the various types of problems to identify suitable techniques.
-
iv.
Develop models using open-source tools like Skill Network Lab and IBM cloud.
-
v.
Explore problems of Big Data using machine learning techniques with Apache Spark.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amirghodsi, S., Hall, B., Rajendran, M., & Mei, S. (2017). Apache Spark 2.x machine learning cookbook. Packt Publishing Ltd.
Ankam, V. (2016). Big Data analytics. Packt Publishing Ltd.
Ardagna, C., Bellandi, V., & Damiani, E. (2017) A model-driven methodology for Big Data analytics-as-a-service. In: Proceedings of the IEEE International Congress on Big Data, Honolulu, HI.
Bahga, A., & Madisetti, V. (2018). Big Data analytics: A hands-on approach. ISBN-10: 099602557X, ISBN-13: 978-0996025577.
Bironneau, M., Coleman, T. (2019). Machine learning with go quick start guide: Hands-on techniques for building supervised and unsupervised machine learning Workflows. Packt Publishing Ltd.
Börzsönyi, S., Kossmann, D., & Stocker, K. (2001). The skyline operator. In: Proceedings of the ICDE (pp. 421–430). Heidelberg, New York: IEEE.
Bouali, F., Guettala, A. E., & Venturini, G. (2016). VizAssist an interactive user assistant for visual data mining. Visual Computer, 1447–1463.
Chambers, B. (2018). Spark: The definitive guide: Big Data processing made simple. O'Reilly Media, Inc.
Chandarana, P., & Vijayalakshmi, M. (2014). Big Data analytics frameworks. In: International conference on circuits, systems, communication and information technology applications (CSCITA). INSPEC Accession Number: 14395170, Electronic ISBN: 978-1-4799-2494-3.
D'Arcy, A., Kelleher, J. D., & Namee, B. M. (2015) Fundamentals of machine learning for predictive data analytics: Algorithms, worked examples, and case studies. MIT Press. ISBN 0262029448, 9780262029445.
Dinesh Kumar, U., & Pradhan, M. (2019). Machine learning using python, Wiley. ISBN-10: 8126579900, ISBN-13: 978-8126579907.
DT Editorial Services. (2015). Big Data, black book. Dreamtech Press. ASIN: B01LZEWQH6.
Flach, P. (2012). Machine learning: The art and science of algorithms that make sense of data. Cambridge University Press. ISBN 1107096391, 9781107096394.
Geron, A. (2017) Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems. O'Reilly Media, Inc.
Golfarelli, M., Pirini, T., & Rizzi, S. (2017) Goal-based selection of visual representations for Big Data analytics. In: Proceedings of the MoBid, Valencia (pp. 47–57). Berlin: Springer.
Grange, J. (2017). Machine learning for absolute beginners: A simple. Create Space Independent Publishing Platform.
Grus, J. (2015). Data science from scratch. O'Reilly Media, Inc. ISBN 1491904399, 9781491904398.
Luu, H. (2018). Beginning Apache Spark 2: With resilient distributed datasets, spark SQL, structured streaming and spark machine learning library, Apress
Ibrahim, I. A., Albarrak, A. M., & Li, X. (2017). Constrained recommendations for query visualizations. Knowledge and Information Systems, 499–529.
Karau, H., Konwinski, A., Wendell, P., & Zaharia, M. (2015). Learning spark: Lightning-fast Big Data analysis. O'Reilly Media, Inc.
Keim, D., Ma, K.-L., & Qu, H. (2013). Big-data visualization. IEEE Computer Graphics and Applications. https://doi.org/10.1109/MCG.2013.54.
Marr, B. (2016). Big Data in practice: How 45 successful companies used Big Data analytics to deliver extraordinary results (1st edn.). Wiley. ASIN: B01DCOYDUS.
Marsland, S. (2014). Machine learning: An algorithmic perspective. CRC Press.
Mayer-Schönberger, V., & Cukier, K. (2013). Big Data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt
Oprea, A., Li, Z., & Yen, T. (2015) Detection of early-stage enterprise infection by mining large-scale log data. In: Proceedings of the DSN (pp. 45–56), Rio De Janeiro, Brazil.
Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge University Press.
Thottuvaikkatumana, R. (2016). Apache Spark 2 for beginners. Packt Publishing Ltd.
Walkowiak, S. (2016). Big Data analytics with R. Packt Publishing Ltd.
Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data (1st edn.). Shroff/O'Reilly.
Wills. J., Ryza, S., Laserson, U., & Owen, S. (2009) Advanced analytics with spark: patterns for learning from data at scale. O’Reilly.
Wongsuphasawat, K., Moritz, D., Anand, A. (2016). Towards a general-purpose query language for visualization recommendation. In: Proceedings of the HILDA (p. 4), San Francisco, CA.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Prajapati, G.L., Raghuwanshi, R. (2021). Study of Big Data Analytics Tool: Apache Spark. In: Sharma, S., Rahaman, V., Sinha, G.R. (eds) Big Data Analytics in Cognitive Social Media and Literary Texts. Springer, Singapore. https://doi.org/10.1007/978-981-16-4729-1_4
Download citation
DOI: https://doi.org/10.1007/978-981-16-4729-1_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-4728-4
Online ISBN: 978-981-16-4729-1
eBook Packages: Literature, Cultural and Media StudiesLiterature, Cultural and Media Studies (R0)