Towards Systematic Parallel Programming over MapReduce
MapReduce is a useful and popular programming model for data-intensive distributed parallel computing. But it is still a challenge to develop parallel programs with MapReduce systematically, since it is usually not easy to derive a proper divide-and-conquer algorithm that matches MapReduce. In this paper, we propose a homomorphism-based framework named Screwdriver for systematic parallel programming with MapReduce, making use of the program calculation theory of list homomorphisms. Screwdriver is implemented as a Java library on top of Hadoop. For any problem which can be resolved by two sequential functions that satisfy the requirements of the third homomorphism theorem, Screwdriver can automatically derive a parallel algorithm as a list homomorphism and transform the initial sequential programs to an efficient MapReduce program. Users need neither to care about parallelism nor to have deep knowledge of MapReduce. In addition to the simplicity of the programming model of our framework, such a calculational approach enables us to resolve many problems that it would be nontrivial to resolve directly with MapReduce.
KeywordsVirtual Machine Parallel Programming Sequential Program Reduce Phase MapReduce Framework
Unable to display preview. Download preview PDF.
- 1.Bird, R.S.: Introduction to Functional Programming using Haskell. Prentice-Hall, Englewood Cliffs (1998)Google Scholar
- 5.Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: 6th Symposium on Operating System Design and Implementation (OSDI 2004), pp. 137–150 (2004)Google Scholar
- 8.Hu, Z.: Calculational parallel programming. In: HLPP 2010: Proceedings of the Fourth International Workshop on High-level Parallel Programming and Applications, p. 1. ACM Press, New York (2010)Google Scholar
- 10.Hu, Z., Takeichi, M., Chin, W.N.: Parallelization in calculational forms. In: POPL 1998, Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 316–328. ACM Press, New York (1998)Google Scholar
- 12.Matsuzaki, K., Iwasaki, H., Emoto, K., Hu, Z.: A library of constructive skeletons for sequential style of parallel programming. In: InfoScale 2006: Proceedings of the 1st International Conference on Scalable Information Systems. ACM International Conference Proceeding Series, vol. 152. ACM Press, New York (2006)Google Scholar
- 13.Morihata, A., Matsuzaki, K., Hu, Z., Takeichi, M.: The third homomorphism theorem on trees: downward & upward lead to divide-and-conquer. In: Shao, Z., Pierce, B.C. (eds.) POPL 2009: Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 177–185. ACM Press, New York (2009)Google Scholar
- 15.Rabhi, F.A., Gorlatch, S. (eds.): Patterns and Skeletons for Parallel and Distributed Computing. Springer, Heidelberg (2002)Google Scholar