Abstract
This chapter covers the oldest foundational concept in Spark called resilient distributed datasets (RDDs). To truly understand how Spark works, you must understand the essence of RDDs. They provide an extremely solid foundation that other abstractions are built upon. The ideas behind RDDs are pretty unique in the distributed data processing framework landscape, and they were introduced in a timely manner to solve the pressing needs of dealing with the complexity and efficiency of iterative and interactive data processing use cases. Starting with Spark 2.0, Spark users will have fewer needs for directly interacting with RDD, but having a strong mental model of how RDD works is essential. In a nutshell, Spark revolves around the concept of RDDs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
“Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing”
- 2.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Hien Luu
About this chapter
Cite this chapter
Luu, H. (2018). Resilient Distributed Datasets. In: Beginning Apache Spark 2. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-3579-9_3
Download citation
DOI: https://doi.org/10.1007/978-1-4842-3579-9_3
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-3578-2
Online ISBN: 978-1-4842-3579-9
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)