Towards Systematic Parallel Programming over MapReduce

  • Yu Liu
  • Zhenjiang Hu
  • Kiminori Matsuzaki
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6853)

Abstract

MapReduce is a useful and popular programming model for data-intensive distributed parallel computing. But it is still a challenge to develop parallel programs with MapReduce systematically, since it is usually not easy to derive a proper divide-and-conquer algorithm that matches MapReduce. In this paper, we propose a homomorphism-based framework named Screwdriver for systematic parallel programming with MapReduce, making use of the program calculation theory of list homomorphisms. Screwdriver is implemented as a Java library on top of Hadoop. For any problem which can be resolved by two sequential functions that satisfy the requirements of the third homomorphism theorem, Screwdriver can automatically derive a parallel algorithm as a list homomorphism and transform the initial sequential programs to an efficient MapReduce program. Users need neither to care about parallelism nor to have deep knowledge of MapReduce. In addition to the simplicity of the programming model of our framework, such a calculational approach enables us to resolve many problems that it would be nontrivial to resolve directly with MapReduce.

Keywords

Sorting Paral 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bird, R.S.: Introduction to Functional Programming using Haskell. Prentice-Hall, Englewood Cliffs (1998)Google Scholar
  2. 2.
    Bird, R.S.: An introduction to the theory of lists. In: Broy, M. (ed.) Logic of Programming and Calculi of Discrete Design. NATO ASI Series F, vol. 36, pp. 5–42. Springer, Heidelberg (1987)CrossRefGoogle Scholar
  3. 3.
    Claessen, K., Hughes, J.: QuickCheck: a lightweight tool for random testing of Haskell programs. In: Odersky, M., Wadler, P. (eds.) ICFP 2000: Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming, pp. 268–279. ACM Press, New York (2000)CrossRefGoogle Scholar
  4. 4.
    Cole, M.: Parallel programming with list homomorphisms. Parallel Processing Letters 5(2), 191–203 (1995)CrossRefGoogle Scholar
  5. 5.
    Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: 6th Symposium on Operating System Design and Implementation (OSDI 2004), pp. 137–150 (2004)Google Scholar
  6. 6.
    Gibbons, J.: The third homomorphism theorem. Journal of Functional Programming 6(4), 657–665 (1996)CrossRefMathSciNetMATHGoogle Scholar
  7. 7.
    Gorlatch, S.: Systematic extraction and implementation of divide-and-conquer parallelism. In: Kuchen, H., Swierstra, S.D. (eds.) PLILP 1996. LNCS, vol. 1140, pp. 274–288. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  8. 8.
    Hu, Z.: Calculational parallel programming. In: HLPP 2010: Proceedings of the Fourth International Workshop on High-level Parallel Programming and Applications, p. 1. ACM Press, New York (2010)Google Scholar
  9. 9.
    Hu, Z., Iwasaki, H., Takeichi, M.: Formal derivation of efficient parallel programs by construction of list homomorphisms. ACM Transactions on Programming Languages and Systems 19(3), 444–461 (1997)CrossRefGoogle Scholar
  10. 10.
    Hu, Z., Takeichi, M., Chin, W.N.: Parallelization in calculational forms. In: POPL 1998, Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 316–328. ACM Press, New York (1998)Google Scholar
  11. 11.
    Lämmel, R.: Google’s MapReduce programming model — Revisited. Science of Computer Programming 70(1), 1–30 (2008)CrossRefMathSciNetMATHGoogle Scholar
  12. 12.
    Matsuzaki, K., Iwasaki, H., Emoto, K., Hu, Z.: A library of constructive skeletons for sequential style of parallel programming. In: InfoScale 2006: Proceedings of the 1st International Conference on Scalable Information Systems. ACM International Conference Proceeding Series, vol. 152. ACM Press, New York (2006)Google Scholar
  13. 13.
    Morihata, A., Matsuzaki, K., Hu, Z., Takeichi, M.: The third homomorphism theorem on trees: downward & upward lead to divide-and-conquer. In: Shao, Z., Pierce, B.C. (eds.) POPL 2009: Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 177–185. ACM Press, New York (2009)Google Scholar
  14. 14.
    Morita, K., Morihata, A., Matsuzaki, K., Hu, Z., Takeichi, M.: Automatic inversion generates divide-and-conquer parallel programs. In: ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI 2007), pp. 146–155. ACM Press, New York (2007)CrossRefGoogle Scholar
  15. 15.
    Rabhi, F.A., Gorlatch, S. (eds.): Patterns and Skeletons for Parallel and Distributed Computing. Springer, Heidelberg (2002)Google Scholar
  16. 16.
    Steele Jr. G.L.: Parallel programming and parallel abstractions in fortress. In: Hagiya, M. (ed.) FLOPS 2006. LNCS, vol. 3945, pp. 1–1. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Yu Liu
    • 1
  • Zhenjiang Hu
    • 2
  • Kiminori Matsuzaki
    • 3
  1. 1.The Graduate University for Advanced StudiesJapan
  2. 2.National Institute of InformaticsJapan
  3. 3.School of InformationKochi University of TechnologyJapan

Personalised recommendations