Parallel Tree Accumulations on MapReduce

Matsuzaki, Kiminori; Miyazaki, Reina

doi:10.1007/s10766-015-0355-8

Parallel Tree Accumulations on MapReduce

Published: 20 March 2015

Volume 44, pages 466–485, (2016)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Kiminori Matsuzaki¹ &
Reina Miyazaki¹

307 Accesses
6 Citations
Explore all metrics

Abstract

MapReduce is a remarkable parallel programming model as well as a parallel processing infrastructure for large-scale data processing. Since it is now widely available on cloud environments, developing methodology or patterns for MapReduce programming is important. In particular, XML is the de facto standard for representing data, and processing semi-structured data is involved in many applications. The target computational patterns in this paper are tree accumulations. Tree accumulations are shape-preserving computations over a tree in which values are updated through flows over the tree. We develop BSP algorithms for two tree accumulations as extensions of the BSP algorithm for tree reduction by Kakehi et al. (Tech. Rep. METR 2006-64, Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, 2006). We also implemented the two-superstep algorithms with a single MapReduce execution. Experimental results on a 16-node PC cluster show good speedups of a factor of 10.9–12.7.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Versatile XQuery Processing in MapReduce

High-performance XML modeling of parallel queries based on MapReduce framework

Article 14 September 2016

Kunfang Song & Hongwei Lu

Parallelization of Algorithms for Mining Data from Distributed Sources

Notes

In Haskell, we need sectioning to handle operators as functions, e.g., \((\oplus )\). In this paper, we simply write operators without sectioning when used as function parameters.
Since it is allowed for \(f_\mathrm{reduce}\) to output no result, the result type of \(f_\mathrm{reduce}\) is specified by a maybe type.
In the functional programming community, the tree of the type \(\textit{Tree}~\alpha \) is called a rose tree [20].
If some other properties hold on the parameter operators \(\oplus \) and \(\otimes \), we may be able to extract more parallelism. We can find discussions of such a property in [15, 19]. Note that we only require the associativity of \(\otimes \) in this paper.
This definition is slightly different from that in the original paper [15]. Our \(\textit{as}\) corresponds to their \(\textit{cs}\), and our \(\textit{ds}\) corresponds to two lists \(\textit{as}\) and \(\textit{bs}\) in their definition.
Fig. 5
Local computation of tree reduction that returns hNF. Tree reduction is applied to the nodes in dashed lines
Full size image

References

Abrahamson, K.R., Dadoun, N., Kirkpatrick, D.G., Przytycka, T.M.: A simple parallel tree contraction algorithm. J. Algorithms 10(2), 287–302 (1989)
Article MathSciNet MATH Google Scholar
Bird, R.: Introduction to Functional Programming Using Haskell. Prentice-Hall, New York (1998)
Google Scholar
Blelloch, G.E.: Scans as primitive parallel operations. IEEE Trans. Comput. 38(11), 1526–1538 (1989)
Article Google Scholar
Choi, H., Lee, K.H., Kim, S.H., Lee, Y.J., Moon, B.: HadoopXML: a suite for parallel processing of massive XML data with multiple twig pattern queries. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12), pp. 2737–2739. ACM (2012)
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI2004), pp. 137–150 (2004)
Dehne, F.K.H.A., Ferreira, A., Cáceres, E., Song, S.W., Roncato, A.: Efficient parallel graph algorithms for coarse-grained multicomputers and BSP. Algorithmica 33(2), 183–200 (2002)
Article MathSciNet MATH Google Scholar
Emoto, K., Imachi, H.: Parallel tree reduction on MapReduce. In: Proceedings of the International Conference on Computational Science (ICCS 2012), Procedia Computer Science, vol. 9, pp. 1827–1836. Elsevier, Amsterdam (2012)
Gazit, H., Miller, G.L., Teng, S.H.: Optimal tree contraction in EREW model. In: Proceedings of the Princeton Workshop on Algorithms, Architectures, and Technical Issues for Models of Concurrent Computation, pp. 139–156 (1987)
Gibbons, J.: Algebras for tree algorithms. Ph.D. thesis, Programming Research Group, University of Oxford (1991)
Gibbons, J.: Computing downwards accumulations on trees quickly. Theor. Comput. Sci. 169(1), 67–80 (1996)
Article MathSciNet MATH Google Scholar
Gibbons, J.: Generic downwards accumulations. Sci. Comput. Progr. 37(1–3), 37–65 (2000)
Article MathSciNet MATH Google Scholar
Gibbons, J., Cai, W., Skillicorn, D.B.: Efficient parallel algorithms for tree accumulations. Sci. Comput. Progr. 23(1), 1–18 (1994)
Article MathSciNet MATH Google Scholar
Hu, Z., Iwasaki, H., Takeichi, M.: Calculating accumulations. New Gener. Comput. 17, 153–173 (1999)
Article Google Scholar
Kakehi, K., Matsuzaki, K., Emoto, K.: Efficient parallel tree reductions on distributed memory environments. In: Proceedings of the 7th International Conference on Computational Science (ICCS 2007), Part II, Lecture Notes in Computer Science, vol. 4488, pp. 601–608. Springer, Berlin (2007)
Kakehi, K., Matsuzaki, K., Emoto, K., Hu, Z.: A practicable framework for tree reductions under distributed memory environments. Tech. Rep. METR 2006-64, Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo (2006)
Lämmel, R.: Google’s MapReduce programming model—revisited. Sci. Comput. Progr. 70(1), 1–30 (2008)
Article MATH Google Scholar
Liu, Y., Emoto, K., Matsuzaki, K., Hu, Z.: Accumulative computation on MapReduce. IPSJ Trans. Progr. 7(1), 18–27 (2014)
Google Scholar
Matsuzaki, K.: Efficient implementation of tree accumulations on distributed-memory parallel computers. In: Proceedings of the 7th International Conference on Computational Science (ICCS 2007), Part II, Lecture Notes in Computer Science, vol. 4488, pp. 609–616. Springer, Berlin (2007)
Matsuzaki, K., Hu, Z., Takeichi, M.: Parallel skeletons for manipulating general trees. Parallel Comput. 32(7–8), 590–603 (2006)
Article Google Scholar
Meertens, L.: First Steps Towards the Theory of Rose Trees. CWI, Amsterdam; IFIP Working Group 2.1 Working Paper 592 ROM-25 (1988)
Mignet, L., Barbosa, D., Veltri, P.: The XML web: A first study. In: Proceedings of the 12th International Conference on World Wide Web (WWW’03), pp. 500–510. ACM, New York (2003)
Miller, G.L., Reif, J.H.: Parallel tree contraction and its application. In: 26th Annual Symposium on Foundations of Computer Science, pp. 478–489. IEEE Computer Society (1985)
Nomura, Y., Emoto, K., Matsuzaki, K., Hu, Z., Takeichi, M.: Parallelization of XPath queries with tree skeletons. Comput. Softw. 24(3), 51–62 (2007). (In Japanese)
Google Scholar
Pardo, A.: Generic accumulations. In: Proceedings of the IFIP TC2/WG2.1 Working Conference on Generic Programming, pp. 49–78 (2003)
Reif, J.H. (ed.): Synthesis of Parallel Algorithms. Morgan Kaufmann Publishers, Burlington, MA (1993)
Sevilgen, F.E., Aluru, S., Futamura, N.: Parallel algorithms for tree accumulations. J. Parallel Distrib. Comput. 65(1), 85–93 (2005)
Article MATH Google Scholar
Skillicorn, D.B.: Foundations of Parallel Programming. Cambridge University Press, Cambridge (1994)
Book MATH Google Scholar
Skillicorn, D.B.: Parallel implementation of tree skeletons. J. Parallel Distrib. Comput. 39(2), 115–125 (1996)
Article MATH Google Scholar
Skillicorn, D.B.: Structured parallel computation in structured documents. J. Univers. Comput. Sci. 3(1), 42–68 (1997)
MATH Google Scholar
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Article Google Scholar
White, T.: Hadoop: The Definitive Guide. O’Reilly Media / Yahoo Press, Sebastopol, CA (2012)
Google Scholar

Download references

Acknowledgments

This work was conducted in the PaPDAS project supported by ANR (ANR-2010-INTB-0205-02) and JST (10102704).

Author information

Authors and Affiliations

School of Information, Kochi University of Technology, 185 Tosayamadacho-Miyanokuchi, Kami, Kochi, Japan
Kiminori Matsuzaki & Reina Miyazaki

Authors

Kiminori Matsuzaki
View author publications
You can also search for this author in PubMed Google Scholar
Reina Miyazaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kiminori Matsuzaki.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Matsuzaki, K., Miyazaki, R. Parallel Tree Accumulations on MapReduce. Int J Parallel Prog 44, 466–485 (2016). https://doi.org/10.1007/s10766-015-0355-8

Download citation

Received: 31 July 2014
Accepted: 07 March 2015
Published: 20 March 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10766-015-0355-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Parallel Tree Accumulations on MapReduce

Abstract

Access this article