Skip to main content

Optimizing PySpark and PySpark Streaming

  • Chapter
  • First Online:
PySpark Recipes
  • 1942 Accesses

Abstract

Spark is a distributed framework for facilitating parallel processing. The parallel algorithms require computation and communication between machines. While communicating, machines send or exchange data. This is also known as shuffling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Raju Kumar Mishra

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mishra, R.K. (2018). Optimizing PySpark and PySpark Streaming. In: PySpark Recipes. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-3141-8_7

Download citation

Publish with us

Policies and ethics