Essential Knowledge: Parallel Programming
At the heart of any big data system is a plethora of processes and algorithms that run in parallel to crunch data and produce results that would have taken ages if they were run in a sequential manner. Parallel computing is what enables companies like Google to index the Internet and provide big data systems like email, video streaming, etc. Once workloads can be distributed effectively over multiple processes, scaling the processing horizontally becomes an easier task. In this chapter, we will explore how to parallelize work among concurrent processing units; such concepts apply for the most part whether said processing units are concurrent threads in the same process, or multiple processes running on the same machine or on multiple machines. If you haven’t already read about process management and scheduling in Sect. 3.4 of Chap. 3, now would be a good time to visit that.
- Maged M. Michael and Michael L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing, PODC ’96, pages 267–275, New York, NY, USA, 1996. ACM. ISBN 0-89791-800-2. doi: 10.1145/248052.248106. URL http://doi.acm.org/10.1145/248052.248106.
- B. Goetz and T. Peierls. Java Concurrency in Practice. Addison-Wesley, 2006. ISBN 9780321349606.Google Scholar
- J. Bloch. Effective Java: 2nd Edition. 2008.Google Scholar
- F. Junqueira and B. Reed. ZooKeeper: Distributed Process Coordination. O’Reilly Media, Incorporated, 2013. ISBN 9781449361303.Google Scholar