Generate, Test, and Aggregate

A Calculation-based Framework for Systematic Parallel Programming with MapReduce
  • Kento Emoto
  • Sebastian Fischer
  • Zhenjiang Hu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7211)

Abstract

MapReduce, being inspired by the map and reduce primitives available in many functional languages, is the de facto standard for large scale data-intensive parallel programming. Although it has succeeded in popularizing the use of the two primitives for hiding the details of parallel computation, little effort has been made to emphasize the programming methodology behind, which has been intensively studied in the functional programming and program calculation fields. We show that MapReduce can be equipped with a programming theory in calculational form. By integrating the generate-and-test programing paradigm and semirings for aggregation of results, we propose a novel parallel programming framework for MapReduce. The framework consists of two important calculation theorems: the shortcut fusion theorem of semiring homomorphisms bridges the gap between specifications and efficient implementations, and the filter-embedding theorem helps to develop parallel programs in a systematic and incremental way. We give nontrivial examples that demonstrate how to apply our framework.

Keywords

Parallel Algorithm Knapsack Problem Parallel Program Hide State Programming Theory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Bird, R.: An introduction to the theory of lists. In: Broy, M. (ed.) Logic of Programming and Calculi of Discrete Design, pp. 5–42. Springer, Heidelberg (1987)Google Scholar
  2. 2.
    Bird, R.: Introduction to Functional Programming using Haskell. Prentice Hall (1998)Google Scholar
  3. 3.
    Bird, R., de Moor, O.: Algebras of Programming. Prentice Hall (1996)Google Scholar
  4. 4.
    Chin, W.N., Khoo, S.C., Hu, Z., Takeichi, M.: Deriving Parallel Codes via Invariants. In: SAS 2000. LNCS, vol. 1824, pp. 75–94. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  5. 5.
    Cole, M.: Parallel programming with list homomorphisms. Parallel Processing Letters 5(2), 191–203 (1995)CrossRefGoogle Scholar
  6. 6.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Communications of the ACM 51, 107–113 (2008)CrossRefGoogle Scholar
  7. 7.
    Fisher, A.L., Ghuloum, A.M.: Parallelizing complex scans and reductions. In: Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI 1994), pp. 135–146. ACM (1994)Google Scholar
  8. 8.
    Gill, A., Launchbury, J., Peyton Jones, S.L.: A short cut to deforestation. In: Conference on Functional Programming Languages and Computer Architecture, pp. 223–232 (1993)Google Scholar
  9. 9.
    Goodman, J.: Semiring parsing. Computational Linguistics 25, 573–605 (1999)MathSciNetGoogle Scholar
  10. 10.
    Gorlatch, S.: Systematic Efficient Parallelization of Scan and Other List Homomorphisms. In: Fraigniaud, P., Mignotte, A., Robert, Y., Bougé, L. (eds.) Euro-Par 1996. LNCS, vol. 1124, pp. 401–408. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  11. 11.
    Grant-Duff, Z., Harrison, P.: Parallelism via homomorphism. Parallel Processing Letters 6(2), 279–295 (1996)MathSciNetCrossRefGoogle Scholar
  12. 12.
    He, Y.: Extended viterbi algorithm for second order hidden markov process. In: 9th International Conference on Pattern Recognition, vol. 2, pp. 718–720. IEEE Press (1988)Google Scholar
  13. 13.
    Ho, T.J., Chen, B.S.: Novel extended viterbi-based multiple-model algorithms for state estimation of discrete-time systems with markov jump parameters. IEEE Transactions on Signal Processing 54(2), 393–404 (2006)CrossRefGoogle Scholar
  14. 14.
    Hu, Z., Takeichi, M., Chin, W.N.: Parallelization in calculational forms. In: 25th ACM Symposium on Principles of Programming Languages (POPL 1998), pp. 316–328. ACM Press, San Diego (1998)CrossRefGoogle Scholar
  15. 15.
    Hu, Z., Yokoyama, T., Takeichi, M.: Program Optimizations and Transformations in Calculation Form. In: Lämmel, R., Saraiva, J., Visser, J. (eds.) GTTSE 2005. LNCS, vol. 4143, pp. 144–168. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  16. 16.
    Karloff, H., Suri, S., Vassilvitskii, S.: A model of computation for mapreduce. In: Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, pp. 938–948. SIAM (2010)Google Scholar
  17. 17.
    Lin, J., Dyer, C.: Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers (2010)Google Scholar
  18. 18.
  19. 19.
    Liu, Y., Hu, Z., Matsuzaki, K.: Towards Systematic Parallel Programming over MapReduce. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 6853, pp. 39–50. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  20. 20.
    Morita, K., Morihata, A., Matsuzaki, K., Hu, Z., Takeichi, M.: Automatic inversion generates divide-and-conquer parallel programs. In: ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI 2007), pp. 146–155. ACM Press (2007)Google Scholar
  21. 21.
    Sato, S., Iwasaki, H.: Automatic parallelization via matrix multiplication. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2011), pp. 470–479. ACM (2011)Google Scholar
  22. 22.
    Skillicorn, D.B.: The Bird-Meertens Formalism as a Parallel Model. In: NATO ARW “Software for Parallel Computation” (1992)Google Scholar
  23. 23.
    Takano, A., Meijer, E.: Shortcut deforestation in calculational form. In: Proc. Conference on Functional Programming Languages and Computer Architecture, pp. 306–313. La Jolla, California (1995)Google Scholar
  24. 24.
    Takano, A., Hu, Z., Takeichi, M.: Program transformation in calculational form. ACM Computing Surveys 30(3) (1998)Google Scholar
  25. 25.
    Thomas, W.: Automata on infinite objects. In: Handbook of Theoretical Computer Science, Formal Models and Sematics, vol. B, pp. 133–192. Elsevier and MIT Press (1990)Google Scholar
  26. 26.
    Wadler, P.: Theorems for free! In: Proceedings of the Fourth International Conference on Functional Programming Languages and Computer Architecture, FPCA 1989, pp. 347–359. ACM (1989)Google Scholar
  27. 27.
    White, T.: Hadoop: The Definitive Guide. O’Reilly Media (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Kento Emoto
    • 1
  • Sebastian Fischer
    • 2
  • Zhenjiang Hu
    • 2
  1. 1.The University of TokyoJapan
  2. 2.National Institute of InformaticsTokyoJapan

Personalised recommendations