Overview
- Covers entire range of PySpark’s offerings from streaming to graph analytics
- Build standardized work flows for pre-processing and builds machine learning and deep learning models on big data sets
- Discusses how to schedule different Spark jobs using Airflow
Access this book
Tax calculation will be finalised at checkout
Other ways to access
About this book
You'll start by reviewing PySpark fundamentals, such as Spark’s core architecture, and see how to use PySpark for big data processing like data ingestion, cleaning, and transformations techniques. This is followed by building workflows for analyzing streaming data using PySpark and a comparison of various streaming platforms.
You'll then see how to schedule different spark jobs using Airflow with PySpark and book examine tuning machine and deep learning models for real-time predictions. This book concludes with a discussion on graph frames and performing network analysis using graph algorithms in PySpark. All the code presented in the book will be available in Python scripts on Github.
What You'll Learn
- Develop pipelines for streaming data processing using PySpark
- Build Machine Learning & Deep Learning models using PySpark latest offerings
- Use graph analytics using PySpark
- Create Sequence Embeddings from Text data
Who This Book is For
Data Scientists, machine learning and deep learning engineers who want to learn and use PySpark for real time analysis on streaming data.
Similar content being viewed by others
Keywords
Table of contents (8 chapters)
Authors and Affiliations
About the author
Bibliographic Information
Book Title: Learn PySpark
Book Subtitle: Build Python-based Machine Learning and Deep Learning Models
Authors: Pramod Singh
DOI: https://doi.org/10.1007/978-1-4842-4961-1
Publisher: Apress Berkeley, CA
eBook Packages: Professional and Applied Computing, Apress Access Books, Professional and Applied Computing (R0)
Copyright Information: Pramod Singh 2019
Softcover ISBN: 978-1-4842-4960-4Published: 07 September 2019
eBook ISBN: 978-1-4842-4961-1Published: 06 September 2019
Edition Number: 1
Number of Pages: XVIII, 210
Number of Illustrations: 155 b/w illustrations, 32 illustrations in colour
Topics: Python, Big Data, Machine Learning, Open Source