Skip to main content
Log in

Parallel Tree Accumulations on MapReduce

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

MapReduce is a remarkable parallel programming model as well as a parallel processing infrastructure for large-scale data processing. Since it is now widely available on cloud environments, developing methodology or patterns for MapReduce programming is important. In particular, XML is the de facto standard for representing data, and processing semi-structured data is involved in many applications. The target computational patterns in this paper are tree accumulations. Tree accumulations are shape-preserving computations over a tree in which values are updated through flows over the tree. We develop BSP algorithms for two tree accumulations as extensions of the BSP algorithm for tree reduction by Kakehi et al. (Tech. Rep. METR 2006-64, Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, 2006). We also implemented the two-superstep algorithms with a single MapReduce execution. Experimental results on a 16-node PC cluster show good speedups of a factor of 10.9–12.7.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. In Haskell, we need sectioning to handle operators as functions, e.g., \((\oplus )\). In this paper, we simply write operators without sectioning when used as function parameters.

  2. Since it is allowed for \(f_\mathrm{reduce}\) to output no result, the result type of \(f_\mathrm{reduce}\) is specified by a maybe type.

  3. In the functional programming community, the tree of the type \(\textit{Tree}~\alpha \) is called a rose tree [20].

  4. If some other properties hold on the parameter operators \(\oplus \) and \(\otimes \), we may be able to extract more parallelism. We can find discussions of such a property in [15, 19]. Note that we only require the associativity of \(\otimes \) in this paper.

  5. This definition is slightly different from that in the original paper [15]. Our \(\textit{as}\) corresponds to their \(\textit{cs}\), and our \(\textit{ds}\) corresponds to two lists \(\textit{as}\) and \(\textit{bs}\) in their definition.

    Fig. 5
    figure 5

    Local computation of tree reduction that returns hNF. Tree reduction is applied to the nodes in dashed lines

References

  1. Abrahamson, K.R., Dadoun, N., Kirkpatrick, D.G., Przytycka, T.M.: A simple parallel tree contraction algorithm. J. Algorithms 10(2), 287–302 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bird, R.: Introduction to Functional Programming Using Haskell. Prentice-Hall, New York (1998)

    Google Scholar 

  3. Blelloch, G.E.: Scans as primitive parallel operations. IEEE Trans. Comput. 38(11), 1526–1538 (1989)

    Article  Google Scholar 

  4. Choi, H., Lee, K.H., Kim, S.H., Lee, Y.J., Moon, B.: HadoopXML: a suite for parallel processing of massive XML data with multiple twig pattern queries. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12), pp. 2737–2739. ACM (2012)

  5. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI2004), pp. 137–150 (2004)

  6. Dehne, F.K.H.A., Ferreira, A., Cáceres, E., Song, S.W., Roncato, A.: Efficient parallel graph algorithms for coarse-grained multicomputers and BSP. Algorithmica 33(2), 183–200 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  7. Emoto, K., Imachi, H.: Parallel tree reduction on MapReduce. In: Proceedings of the International Conference on Computational Science (ICCS 2012), Procedia Computer Science, vol. 9, pp. 1827–1836. Elsevier, Amsterdam (2012)

  8. Gazit, H., Miller, G.L., Teng, S.H.: Optimal tree contraction in EREW model. In: Proceedings of the Princeton Workshop on Algorithms, Architectures, and Technical Issues for Models of Concurrent Computation, pp. 139–156 (1987)

  9. Gibbons, J.: Algebras for tree algorithms. Ph.D. thesis, Programming Research Group, University of Oxford (1991)

  10. Gibbons, J.: Computing downwards accumulations on trees quickly. Theor. Comput. Sci. 169(1), 67–80 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  11. Gibbons, J.: Generic downwards accumulations. Sci. Comput. Progr. 37(1–3), 37–65 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  12. Gibbons, J., Cai, W., Skillicorn, D.B.: Efficient parallel algorithms for tree accumulations. Sci. Comput. Progr. 23(1), 1–18 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  13. Hu, Z., Iwasaki, H., Takeichi, M.: Calculating accumulations. New Gener. Comput. 17, 153–173 (1999)

    Article  Google Scholar 

  14. Kakehi, K., Matsuzaki, K., Emoto, K.: Efficient parallel tree reductions on distributed memory environments. In: Proceedings of the 7th International Conference on Computational Science (ICCS 2007), Part II, Lecture Notes in Computer Science, vol. 4488, pp. 601–608. Springer, Berlin (2007)

  15. Kakehi, K., Matsuzaki, K., Emoto, K., Hu, Z.: A practicable framework for tree reductions under distributed memory environments. Tech. Rep. METR 2006-64, Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo (2006)

  16. Lämmel, R.: Google’s MapReduce programming model—revisited. Sci. Comput. Progr. 70(1), 1–30 (2008)

    Article  MATH  Google Scholar 

  17. Liu, Y., Emoto, K., Matsuzaki, K., Hu, Z.: Accumulative computation on MapReduce. IPSJ Trans. Progr. 7(1), 18–27 (2014)

    Google Scholar 

  18. Matsuzaki, K.: Efficient implementation of tree accumulations on distributed-memory parallel computers. In: Proceedings of the 7th International Conference on Computational Science (ICCS 2007), Part II, Lecture Notes in Computer Science, vol. 4488, pp. 609–616. Springer, Berlin (2007)

  19. Matsuzaki, K., Hu, Z., Takeichi, M.: Parallel skeletons for manipulating general trees. Parallel Comput. 32(7–8), 590–603 (2006)

    Article  Google Scholar 

  20. Meertens, L.: First Steps Towards the Theory of Rose Trees. CWI, Amsterdam; IFIP Working Group 2.1 Working Paper 592 ROM-25 (1988)

  21. Mignet, L., Barbosa, D., Veltri, P.: The XML web: A first study. In: Proceedings of the 12th International Conference on World Wide Web (WWW’03), pp. 500–510. ACM, New York (2003)

  22. Miller, G.L., Reif, J.H.: Parallel tree contraction and its application. In: 26th Annual Symposium on Foundations of Computer Science, pp. 478–489. IEEE Computer Society (1985)

  23. Nomura, Y., Emoto, K., Matsuzaki, K., Hu, Z., Takeichi, M.: Parallelization of XPath queries with tree skeletons. Comput. Softw. 24(3), 51–62 (2007). (In Japanese)

    Google Scholar 

  24. Pardo, A.: Generic accumulations. In: Proceedings of the IFIP TC2/WG2.1 Working Conference on Generic Programming, pp. 49–78 (2003)

  25. Reif, J.H. (ed.): Synthesis of Parallel Algorithms. Morgan Kaufmann Publishers, Burlington, MA (1993)

  26. Sevilgen, F.E., Aluru, S., Futamura, N.: Parallel algorithms for tree accumulations. J. Parallel Distrib. Comput. 65(1), 85–93 (2005)

    Article  MATH  Google Scholar 

  27. Skillicorn, D.B.: Foundations of Parallel Programming. Cambridge University Press, Cambridge (1994)

    Book  MATH  Google Scholar 

  28. Skillicorn, D.B.: Parallel implementation of tree skeletons. J. Parallel Distrib. Comput. 39(2), 115–125 (1996)

    Article  MATH  Google Scholar 

  29. Skillicorn, D.B.: Structured parallel computation in structured documents. J. Univers. Comput. Sci. 3(1), 42–68 (1997)

    MATH  Google Scholar 

  30. Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)

    Article  Google Scholar 

  31. White, T.: Hadoop: The Definitive Guide. O’Reilly Media / Yahoo Press, Sebastopol, CA (2012)

    Google Scholar 

Download references

Acknowledgments

This work was conducted in the PaPDAS project supported by ANR (ANR-2010-INTB-0205-02) and JST (10102704).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kiminori Matsuzaki.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Matsuzaki, K., Miyazaki, R. Parallel Tree Accumulations on MapReduce. Int J Parallel Prog 44, 466–485 (2016). https://doi.org/10.1007/s10766-015-0355-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-015-0355-8

Keywords

Navigation