Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

European Symposium on Programming

ESOP 2012: Programming Languages and Systems pp 254–273Cite as

  1. Home
  2. Programming Languages and Systems
  3. Conference paper
Generate, Test, and Aggregate

Generate, Test, and Aggregate

A Calculation-based Framework for Systematic Parallel Programming with MapReduce

  • Kento Emoto17,
  • Sebastian Fischer18 &
  • Zhenjiang Hu18 
  • Conference paper
  • 1393 Accesses

  • 9 Citations

Part of the Lecture Notes in Computer Science book series (LNPSE,volume 7211)

Abstract

MapReduce, being inspired by the map and reduce primitives available in many functional languages, is the de facto standard for large scale data-intensive parallel programming. Although it has succeeded in popularizing the use of the two primitives for hiding the details of parallel computation, little effort has been made to emphasize the programming methodology behind, which has been intensively studied in the functional programming and program calculation fields. We show that MapReduce can be equipped with a programming theory in calculational form. By integrating the generate-and-test programing paradigm and semirings for aggregation of results, we propose a novel parallel programming framework for MapReduce. The framework consists of two important calculation theorems: the shortcut fusion theorem of semiring homomorphisms bridges the gap between specifications and efficient implementations, and the filter-embedding theorem helps to develop parallel programs in a systematic and incremental way. We give nontrivial examples that demonstrate how to apply our framework.

Keywords

  • Parallel Algorithm
  • Knapsack Problem
  • Parallel Program
  • Hide State
  • Programming Theory

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Download conference paper PDF

References

  1. Bird, R.: An introduction to the theory of lists. In: Broy, M. (ed.) Logic of Programming and Calculi of Discrete Design, pp. 5–42. Springer, Heidelberg (1987)

    Google Scholar 

  2. Bird, R.: Introduction to Functional Programming using Haskell. Prentice Hall (1998)

    Google Scholar 

  3. Bird, R., de Moor, O.: Algebras of Programming. Prentice Hall (1996)

    Google Scholar 

  4. Chin, W.N., Khoo, S.C., Hu, Z., Takeichi, M.: Deriving Parallel Codes via Invariants. In: SAS 2000. LNCS, vol. 1824, pp. 75–94. Springer, Heidelberg (2000)

    CrossRef  Google Scholar 

  5. Cole, M.: Parallel programming with list homomorphisms. Parallel Processing Letters 5(2), 191–203 (1995)

    CrossRef  Google Scholar 

  6. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Communications of the ACM 51, 107–113 (2008)

    CrossRef  Google Scholar 

  7. Fisher, A.L., Ghuloum, A.M.: Parallelizing complex scans and reductions. In: Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI 1994), pp. 135–146. ACM (1994)

    Google Scholar 

  8. Gill, A., Launchbury, J., Peyton Jones, S.L.: A short cut to deforestation. In: Conference on Functional Programming Languages and Computer Architecture, pp. 223–232 (1993)

    Google Scholar 

  9. Goodman, J.: Semiring parsing. Computational Linguistics 25, 573–605 (1999)

    MathSciNet  Google Scholar 

  10. Gorlatch, S.: Systematic Efficient Parallelization of Scan and Other List Homomorphisms. In: Fraigniaud, P., Mignotte, A., Robert, Y., Bougé, L. (eds.) Euro-Par 1996. LNCS, vol. 1124, pp. 401–408. Springer, Heidelberg (1996)

    CrossRef  Google Scholar 

  11. Grant-Duff, Z., Harrison, P.: Parallelism via homomorphism. Parallel Processing Letters 6(2), 279–295 (1996)

    CrossRef  MathSciNet  Google Scholar 

  12. He, Y.: Extended viterbi algorithm for second order hidden markov process. In: 9th International Conference on Pattern Recognition, vol. 2, pp. 718–720. IEEE Press (1988)

    Google Scholar 

  13. Ho, T.J., Chen, B.S.: Novel extended viterbi-based multiple-model algorithms for state estimation of discrete-time systems with markov jump parameters. IEEE Transactions on Signal Processing 54(2), 393–404 (2006)

    CrossRef  Google Scholar 

  14. Hu, Z., Takeichi, M., Chin, W.N.: Parallelization in calculational forms. In: 25th ACM Symposium on Principles of Programming Languages (POPL 1998), pp. 316–328. ACM Press, San Diego (1998)

    CrossRef  Google Scholar 

  15. Hu, Z., Yokoyama, T., Takeichi, M.: Program Optimizations and Transformations in Calculation Form. In: Lämmel, R., Saraiva, J., Visser, J. (eds.) GTTSE 2005. LNCS, vol. 4143, pp. 144–168. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  16. Karloff, H., Suri, S., Vassilvitskii, S.: A model of computation for mapreduce. In: Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, pp. 938–948. SIAM (2010)

    Google Scholar 

  17. Lin, J., Dyer, C.: Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers (2010)

    Google Scholar 

  18. List, M.A.P. (2011), http://www.mendeley.com/groups/1058401/mapreduce-applications/papers/

  19. Liu, Y., Hu, Z., Matsuzaki, K.: Towards Systematic Parallel Programming over MapReduce. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 6853, pp. 39–50. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  20. Morita, K., Morihata, A., Matsuzaki, K., Hu, Z., Takeichi, M.: Automatic inversion generates divide-and-conquer parallel programs. In: ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI 2007), pp. 146–155. ACM Press (2007)

    Google Scholar 

  21. Sato, S., Iwasaki, H.: Automatic parallelization via matrix multiplication. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2011), pp. 470–479. ACM (2011)

    Google Scholar 

  22. Skillicorn, D.B.: The Bird-Meertens Formalism as a Parallel Model. In: NATO ARW “Software for Parallel Computation” (1992)

    Google Scholar 

  23. Takano, A., Meijer, E.: Shortcut deforestation in calculational form. In: Proc. Conference on Functional Programming Languages and Computer Architecture, pp. 306–313. La Jolla, California (1995)

    Google Scholar 

  24. Takano, A., Hu, Z., Takeichi, M.: Program transformation in calculational form. ACM Computing Surveys 30(3) (1998)

    Google Scholar 

  25. Thomas, W.: Automata on infinite objects. In: Handbook of Theoretical Computer Science, Formal Models and Sematics, vol. B, pp. 133–192. Elsevier and MIT Press (1990)

    Google Scholar 

  26. Wadler, P.: Theorems for free! In: Proceedings of the Fourth International Conference on Functional Programming Languages and Computer Architecture, FPCA 1989, pp. 347–359. ACM (1989)

    Google Scholar 

  27. White, T.: Hadoop: The Definitive Guide. O’Reilly Media (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. The University of Tokyo, Japan

    Kento Emoto

  2. National Institute of Informatics, Tokyo, Japan

    Sebastian Fischer & Zhenjiang Hu

Authors
  1. Kento Emoto
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Sebastian Fischer
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Zhenjiang Hu
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Technische Universität München, Boltzmannstrasse 3, 85748, Garching, Germany

    Helmut Seidl

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Emoto, K., Fischer, S., Hu, Z. (2012). Generate, Test, and Aggregate. In: Seidl, H. (eds) Programming Languages and Systems. ESOP 2012. Lecture Notes in Computer Science, vol 7211. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28869-2_13

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-28869-2_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28868-5

  • Online ISBN: 978-3-642-28869-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature