Essential Knowledge: Parallel Programming



At the heart of any big data system is a plethora of processes and algorithms that run in parallel to crunch data and produce results that would have taken ages if they were run in a sequential manner. Parallel computing is what enables companies like Google to index the Internet and provide big data systems like email, video streaming, etc. Once workloads can be distributed effectively over multiple processes, scaling the processing horizontally becomes an easier task. In this chapter, we will explore how to parallelize work among concurrent processing units; such concepts apply for the most part whether said processing units are concurrent threads in the same process, or multiple processes running on the same machine or on multiple machines. If you haven’t already read about process management and scheduling in Sect. 3.4 of Chap. 3, now would be a good time to visit that.


Thread Pool String Args Reentrant Locks Chun KS Throws InterruptedException 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Maged M. Michael and Michael L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing, PODC ’96, pages 267–275, New York, NY, USA, 1996. ACM. ISBN 0-89791-800-2. doi: 10.1145/248052.248106. URL
  2. B. Goetz and T. Peierls. Java Concurrency in Practice. Addison-Wesley, 2006. ISBN 9780321349606.Google Scholar
  3. J. Bloch. Effective Java: 2nd Edition. 2008.Google Scholar
  4. F. Junqueira and B. Reed. ZooKeeper: Distributed Process Coordination. O’Reilly Media, Incorporated, 2013. ISBN 9781449361303.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.AmazonMenlo ParkUSA
  2. 2.VoiceraSanta ClaraUSA

Personalised recommendations