Abstract
Apache Spark is the data engineer's Swiss Army knife. As a unified framework, it provides essential libraries to effectively connect and establish a common data narrative for engineers to work together cross-discipline. From ingestion and validation of raw data to data cleansing, transformation, and aggregation, as well as analytical exploration of trends and generation of insights, Spark connects the dots between the various constituents in any successful data operation. It also supports consistent (serializable) pipelines for feature engineering and robust machine learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature
About this chapter
Cite this chapter
Haines, S. (2022). Getting Started with Apache Spark. In: Modern Data Engineering with Apache Spark. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-7452-1_2
Download citation
DOI: https://doi.org/10.1007/978-1-4842-7452-1_2
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-7451-4
Online ISBN: 978-1-4842-7452-1
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)
