Parallel Tree Accumulations on MapReduce

Article

Abstract

MapReduce is a remarkable parallel programming model as well as a parallel processing infrastructure for large-scale data processing. Since it is now widely available on cloud environments, developing methodology or patterns for MapReduce programming is important. In particular, XML is the de facto standard for representing data, and processing semi-structured data is involved in many applications. The target computational patterns in this paper are tree accumulations. Tree accumulations are shape-preserving computations over a tree in which values are updated through flows over the tree. We develop BSP algorithms for two tree accumulations as extensions of the BSP algorithm for tree reduction by Kakehi et al. (Tech. Rep. METR 2006-64, Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, 2006). We also implemented the two-superstep algorithms with a single MapReduce execution. Experimental results on a 16-node PC cluster show good speedups of a factor of 10.9–12.7.

Keywords

Tree accumulations MapReduce Hadoop Bulk synchronous parallel (BSP) model 

References

  1. 1.
    Abrahamson, K.R., Dadoun, N., Kirkpatrick, D.G., Przytycka, T.M.: A simple parallel tree contraction algorithm. J. Algorithms 10(2), 287–302 (1989)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Bird, R.: Introduction to Functional Programming Using Haskell. Prentice-Hall, New York (1998)Google Scholar
  3. 3.
    Blelloch, G.E.: Scans as primitive parallel operations. IEEE Trans. Comput. 38(11), 1526–1538 (1989)CrossRefGoogle Scholar
  4. 4.
    Choi, H., Lee, K.H., Kim, S.H., Lee, Y.J., Moon, B.: HadoopXML: a suite for parallel processing of massive XML data with multiple twig pattern queries. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12), pp. 2737–2739. ACM (2012)Google Scholar
  5. 5.
    Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI2004), pp. 137–150 (2004)Google Scholar
  6. 6.
    Dehne, F.K.H.A., Ferreira, A., Cáceres, E., Song, S.W., Roncato, A.: Efficient parallel graph algorithms for coarse-grained multicomputers and BSP. Algorithmica 33(2), 183–200 (2002)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Emoto, K., Imachi, H.: Parallel tree reduction on MapReduce. In: Proceedings of the International Conference on Computational Science (ICCS 2012), Procedia Computer Science, vol. 9, pp. 1827–1836. Elsevier, Amsterdam (2012)Google Scholar
  8. 8.
    Gazit, H., Miller, G.L., Teng, S.H.: Optimal tree contraction in EREW model. In: Proceedings of the Princeton Workshop on Algorithms, Architectures, and Technical Issues for Models of Concurrent Computation, pp. 139–156 (1987)Google Scholar
  9. 9.
    Gibbons, J.: Algebras for tree algorithms. Ph.D. thesis, Programming Research Group, University of Oxford (1991)Google Scholar
  10. 10.
    Gibbons, J.: Computing downwards accumulations on trees quickly. Theor. Comput. Sci. 169(1), 67–80 (1996)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Gibbons, J.: Generic downwards accumulations. Sci. Comput. Progr. 37(1–3), 37–65 (2000)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Gibbons, J., Cai, W., Skillicorn, D.B.: Efficient parallel algorithms for tree accumulations. Sci. Comput. Progr. 23(1), 1–18 (1994)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Hu, Z., Iwasaki, H., Takeichi, M.: Calculating accumulations. New Gener. Comput. 17, 153–173 (1999)CrossRefGoogle Scholar
  14. 14.
    Kakehi, K., Matsuzaki, K., Emoto, K.: Efficient parallel tree reductions on distributed memory environments. In: Proceedings of the 7th International Conference on Computational Science (ICCS 2007), Part II, Lecture Notes in Computer Science, vol. 4488, pp. 601–608. Springer, Berlin (2007)Google Scholar
  15. 15.
    Kakehi, K., Matsuzaki, K., Emoto, K., Hu, Z.: A practicable framework for tree reductions under distributed memory environments. Tech. Rep. METR 2006-64, Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo (2006)Google Scholar
  16. 16.
    Lämmel, R.: Google’s MapReduce programming model—revisited. Sci. Comput. Progr. 70(1), 1–30 (2008)CrossRefMATHGoogle Scholar
  17. 17.
    Liu, Y., Emoto, K., Matsuzaki, K., Hu, Z.: Accumulative computation on MapReduce. IPSJ Trans. Progr. 7(1), 18–27 (2014)Google Scholar
  18. 18.
    Matsuzaki, K.: Efficient implementation of tree accumulations on distributed-memory parallel computers. In: Proceedings of the 7th International Conference on Computational Science (ICCS 2007), Part II, Lecture Notes in Computer Science, vol. 4488, pp. 609–616. Springer, Berlin (2007)Google Scholar
  19. 19.
    Matsuzaki, K., Hu, Z., Takeichi, M.: Parallel skeletons for manipulating general trees. Parallel Comput. 32(7–8), 590–603 (2006)CrossRefGoogle Scholar
  20. 20.
    Meertens, L.: First Steps Towards the Theory of Rose Trees. CWI, Amsterdam; IFIP Working Group 2.1 Working Paper 592 ROM-25 (1988)Google Scholar
  21. 21.
    Mignet, L., Barbosa, D., Veltri, P.: The XML web: A first study. In: Proceedings of the 12th International Conference on World Wide Web (WWW’03), pp. 500–510. ACM, New York (2003)Google Scholar
  22. 22.
    Miller, G.L., Reif, J.H.: Parallel tree contraction and its application. In: 26th Annual Symposium on Foundations of Computer Science, pp. 478–489. IEEE Computer Society (1985)Google Scholar
  23. 23.
    Nomura, Y., Emoto, K., Matsuzaki, K., Hu, Z., Takeichi, M.: Parallelization of XPath queries with tree skeletons. Comput. Softw. 24(3), 51–62 (2007). (In Japanese)Google Scholar
  24. 24.
    Pardo, A.: Generic accumulations. In: Proceedings of the IFIP TC2/WG2.1 Working Conference on Generic Programming, pp. 49–78 (2003)Google Scholar
  25. 25.
    Reif, J.H. (ed.): Synthesis of Parallel Algorithms. Morgan Kaufmann Publishers, Burlington, MA (1993)Google Scholar
  26. 26.
    Sevilgen, F.E., Aluru, S., Futamura, N.: Parallel algorithms for tree accumulations. J. Parallel Distrib. Comput. 65(1), 85–93 (2005)CrossRefMATHGoogle Scholar
  27. 27.
    Skillicorn, D.B.: Foundations of Parallel Programming. Cambridge University Press, Cambridge (1994)CrossRefMATHGoogle Scholar
  28. 28.
    Skillicorn, D.B.: Parallel implementation of tree skeletons. J. Parallel Distrib. Comput. 39(2), 115–125 (1996)CrossRefMATHGoogle Scholar
  29. 29.
    Skillicorn, D.B.: Structured parallel computation in structured documents. J. Univers. Comput. Sci. 3(1), 42–68 (1997)MATHGoogle Scholar
  30. 30.
    Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)CrossRefGoogle Scholar
  31. 31.
    White, T.: Hadoop: The Definitive Guide. O’Reilly Media / Yahoo Press, Sebastopol, CA (2012)Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.School of InformationKochi University of TechnologyKamiJapan

Personalised recommendations