Abstract
MapReduce is a useful and popular programming model for data-intensive distributed parallel computing. But it is still a challenge to develop parallel programs with MapReduce systematically, since it is usually not easy to derive a proper divide-and-conquer algorithm that matches MapReduce. In this paper, we propose a homomorphism-based framework named Screwdriver for systematic parallel programming with MapReduce, making use of the program calculation theory of list homomorphisms. Screwdriver is implemented as a Java library on top of Hadoop. For any problem which can be resolved by two sequential functions that satisfy the requirements of the third homomorphism theorem, Screwdriver can automatically derive a parallel algorithm as a list homomorphism and transform the initial sequential programs to an efficient MapReduce program. Users need neither to care about parallelism nor to have deep knowledge of MapReduce. In addition to the simplicity of the programming model of our framework, such a calculational approach enables us to resolve many problems that it would be nontrivial to resolve directly with MapReduce.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bird, R.S.: Introduction to Functional Programming using Haskell. Prentice-Hall, Englewood Cliffs (1998)
Bird, R.S.: An introduction to the theory of lists. In: Broy, M. (ed.) Logic of Programming and Calculi of Discrete Design. NATO ASI Series F, vol. 36, pp. 5–42. Springer, Heidelberg (1987)
Claessen, K., Hughes, J.: QuickCheck: a lightweight tool for random testing of Haskell programs. In: Odersky, M., Wadler, P. (eds.) ICFP 2000: Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming, pp. 268–279. ACM Press, New York (2000)
Cole, M.: Parallel programming with list homomorphisms. Parallel Processing Letters 5(2), 191–203 (1995)
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: 6th Symposium on Operating System Design and Implementation (OSDI 2004), pp. 137–150 (2004)
Gibbons, J.: The third homomorphism theorem. Journal of Functional Programming 6(4), 657–665 (1996)
Gorlatch, S.: Systematic extraction and implementation of divide-and-conquer parallelism. In: Kuchen, H., Swierstra, S.D. (eds.) PLILP 1996. LNCS, vol. 1140, pp. 274–288. Springer, Heidelberg (1996)
Hu, Z.: Calculational parallel programming. In: HLPP 2010: Proceedings of the Fourth International Workshop on High-level Parallel Programming and Applications, p. 1. ACM Press, New York (2010)
Hu, Z., Iwasaki, H., Takeichi, M.: Formal derivation of efficient parallel programs by construction of list homomorphisms. ACM Transactions on Programming Languages and Systems 19(3), 444–461 (1997)
Hu, Z., Takeichi, M., Chin, W.N.: Parallelization in calculational forms. In: POPL 1998, Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 316–328. ACM Press, New York (1998)
Lämmel, R.: Google’s MapReduce programming model — Revisited. Science of Computer Programming 70(1), 1–30 (2008)
Matsuzaki, K., Iwasaki, H., Emoto, K., Hu, Z.: A library of constructive skeletons for sequential style of parallel programming. In: InfoScale 2006: Proceedings of the 1st International Conference on Scalable Information Systems. ACM International Conference Proceeding Series, vol. 152. ACM Press, New York (2006)
Morihata, A., Matsuzaki, K., Hu, Z., Takeichi, M.: The third homomorphism theorem on trees: downward & upward lead to divide-and-conquer. In: Shao, Z., Pierce, B.C. (eds.) POPL 2009: Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 177–185. ACM Press, New York (2009)
Morita, K., Morihata, A., Matsuzaki, K., Hu, Z., Takeichi, M.: Automatic inversion generates divide-and-conquer parallel programs. In: ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI 2007), pp. 146–155. ACM Press, New York (2007)
Rabhi, F.A., Gorlatch, S. (eds.): Patterns and Skeletons for Parallel and Distributed Computing. Springer, Heidelberg (2002)
Steele Jr. G.L.: Parallel programming and parallel abstractions in fortress. In: Hagiya, M. (ed.) FLOPS 2006. LNCS, vol. 3945, pp. 1–1. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, Y., Hu, Z., Matsuzaki, K. (2011). Towards Systematic Parallel Programming over MapReduce. In: Jeannot, E., Namyst, R., Roman, J. (eds) Euro-Par 2011 Parallel Processing. Euro-Par 2011. Lecture Notes in Computer Science, vol 6853. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23397-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-23397-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23396-8
Online ISBN: 978-3-642-23397-5
eBook Packages: Computer ScienceComputer Science (R0)