Abstract
Spark is a distributed framework for facilitating parallel processing. The parallel algorithms require computation and communication between machines. While communicating, machines send or exchange data. This is also known as shuffling.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Raju Kumar Mishra
About this chapter
Cite this chapter
Mishra, R.K. (2018). Optimizing PySpark and PySpark Streaming. In: PySpark Recipes. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-3141-8_7
Download citation
DOI: https://doi.org/10.1007/978-1-4842-3141-8_7
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-3140-1
Online ISBN: 978-1-4842-3141-8
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)