Abstract
MapReduce is a remarkable parallel programming model as well as a parallel processing infrastructure for large-scale data processing. Since it is now widely available on cloud environments, developing methodology or patterns for MapReduce programming is important. In particular, XML is the de facto standard for representing data, and processing semi-structured data is involved in many applications. The target computational patterns in this paper are tree accumulations. Tree accumulations are shape-preserving computations over a tree in which values are updated through flows over the tree. We develop BSP algorithms for two tree accumulations as extensions of the BSP algorithm for tree reduction by Kakehi et al. (Tech. Rep. METR 2006-64, Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, 2006). We also implemented the two-superstep algorithms with a single MapReduce execution. Experimental results on a 16-node PC cluster show good speedups of a factor of 10.9–12.7.
Similar content being viewed by others
Notes
In Haskell, we need sectioning to handle operators as functions, e.g., \((\oplus )\). In this paper, we simply write operators without sectioning when used as function parameters.
Since it is allowed for \(f_\mathrm{reduce}\) to output no result, the result type of \(f_\mathrm{reduce}\) is specified by a maybe type.
In the functional programming community, the tree of the type \(\textit{Tree}~\alpha \) is called a rose tree [20].
This definition is slightly different from that in the original paper [15]. Our \(\textit{as}\) corresponds to their \(\textit{cs}\), and our \(\textit{ds}\) corresponds to two lists \(\textit{as}\) and \(\textit{bs}\) in their definition.
References
Abrahamson, K.R., Dadoun, N., Kirkpatrick, D.G., Przytycka, T.M.: A simple parallel tree contraction algorithm. J. Algorithms 10(2), 287–302 (1989)
Bird, R.: Introduction to Functional Programming Using Haskell. Prentice-Hall, New York (1998)
Blelloch, G.E.: Scans as primitive parallel operations. IEEE Trans. Comput. 38(11), 1526–1538 (1989)
Choi, H., Lee, K.H., Kim, S.H., Lee, Y.J., Moon, B.: HadoopXML: a suite for parallel processing of massive XML data with multiple twig pattern queries. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12), pp. 2737–2739. ACM (2012)
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI2004), pp. 137–150 (2004)
Dehne, F.K.H.A., Ferreira, A., Cáceres, E., Song, S.W., Roncato, A.: Efficient parallel graph algorithms for coarse-grained multicomputers and BSP. Algorithmica 33(2), 183–200 (2002)
Emoto, K., Imachi, H.: Parallel tree reduction on MapReduce. In: Proceedings of the International Conference on Computational Science (ICCS 2012), Procedia Computer Science, vol. 9, pp. 1827–1836. Elsevier, Amsterdam (2012)
Gazit, H., Miller, G.L., Teng, S.H.: Optimal tree contraction in EREW model. In: Proceedings of the Princeton Workshop on Algorithms, Architectures, and Technical Issues for Models of Concurrent Computation, pp. 139–156 (1987)
Gibbons, J.: Algebras for tree algorithms. Ph.D. thesis, Programming Research Group, University of Oxford (1991)
Gibbons, J.: Computing downwards accumulations on trees quickly. Theor. Comput. Sci. 169(1), 67–80 (1996)
Gibbons, J.: Generic downwards accumulations. Sci. Comput. Progr. 37(1–3), 37–65 (2000)
Gibbons, J., Cai, W., Skillicorn, D.B.: Efficient parallel algorithms for tree accumulations. Sci. Comput. Progr. 23(1), 1–18 (1994)
Hu, Z., Iwasaki, H., Takeichi, M.: Calculating accumulations. New Gener. Comput. 17, 153–173 (1999)
Kakehi, K., Matsuzaki, K., Emoto, K.: Efficient parallel tree reductions on distributed memory environments. In: Proceedings of the 7th International Conference on Computational Science (ICCS 2007), Part II, Lecture Notes in Computer Science, vol. 4488, pp. 601–608. Springer, Berlin (2007)
Kakehi, K., Matsuzaki, K., Emoto, K., Hu, Z.: A practicable framework for tree reductions under distributed memory environments. Tech. Rep. METR 2006-64, Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo (2006)
Lämmel, R.: Google’s MapReduce programming model—revisited. Sci. Comput. Progr. 70(1), 1–30 (2008)
Liu, Y., Emoto, K., Matsuzaki, K., Hu, Z.: Accumulative computation on MapReduce. IPSJ Trans. Progr. 7(1), 18–27 (2014)
Matsuzaki, K.: Efficient implementation of tree accumulations on distributed-memory parallel computers. In: Proceedings of the 7th International Conference on Computational Science (ICCS 2007), Part II, Lecture Notes in Computer Science, vol. 4488, pp. 609–616. Springer, Berlin (2007)
Matsuzaki, K., Hu, Z., Takeichi, M.: Parallel skeletons for manipulating general trees. Parallel Comput. 32(7–8), 590–603 (2006)
Meertens, L.: First Steps Towards the Theory of Rose Trees. CWI, Amsterdam; IFIP Working Group 2.1 Working Paper 592 ROM-25 (1988)
Mignet, L., Barbosa, D., Veltri, P.: The XML web: A first study. In: Proceedings of the 12th International Conference on World Wide Web (WWW’03), pp. 500–510. ACM, New York (2003)
Miller, G.L., Reif, J.H.: Parallel tree contraction and its application. In: 26th Annual Symposium on Foundations of Computer Science, pp. 478–489. IEEE Computer Society (1985)
Nomura, Y., Emoto, K., Matsuzaki, K., Hu, Z., Takeichi, M.: Parallelization of XPath queries with tree skeletons. Comput. Softw. 24(3), 51–62 (2007). (In Japanese)
Pardo, A.: Generic accumulations. In: Proceedings of the IFIP TC2/WG2.1 Working Conference on Generic Programming, pp. 49–78 (2003)
Reif, J.H. (ed.): Synthesis of Parallel Algorithms. Morgan Kaufmann Publishers, Burlington, MA (1993)
Sevilgen, F.E., Aluru, S., Futamura, N.: Parallel algorithms for tree accumulations. J. Parallel Distrib. Comput. 65(1), 85–93 (2005)
Skillicorn, D.B.: Foundations of Parallel Programming. Cambridge University Press, Cambridge (1994)
Skillicorn, D.B.: Parallel implementation of tree skeletons. J. Parallel Distrib. Comput. 39(2), 115–125 (1996)
Skillicorn, D.B.: Structured parallel computation in structured documents. J. Univers. Comput. Sci. 3(1), 42–68 (1997)
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
White, T.: Hadoop: The Definitive Guide. O’Reilly Media / Yahoo Press, Sebastopol, CA (2012)
Acknowledgments
This work was conducted in the PaPDAS project supported by ANR (ANR-2010-INTB-0205-02) and JST (10102704).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Matsuzaki, K., Miyazaki, R. Parallel Tree Accumulations on MapReduce. Int J Parallel Prog 44, 466–485 (2016). https://doi.org/10.1007/s10766-015-0355-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-015-0355-8