Advertisement

PGAS Approach to Implement Mapreduce Framework Based on UPC Language

  • Shomanov AdayEmail author
  • Akhmed-Zaki Darkhan
  • Mansurova Madina
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10421)

Abstract

Over the years from its introduction Mapreduce technology proved to be very effective parallel programming technique to process large volumes of data. One of the most prevalent implementations of Mapreduce is Hadoop framework and Google proprietary Mapreduce system.

Out of other notable implementations one should mention recent PGAS (partitioned global address space) – based X10, UPC (Unified Parallel C) versions. These implementations present a new viewpoint when Mapreduce application developers can benefit from using global address space model while writing data parallel tasks. In this paper we introduce a novel UPC implementation of Mapreduce technology based on idea of using purely UPC based implementation of shared hashmap data structure as an intermediate key/value store. Shared hashmap is used in to perform exchange of key/values between parallel UPC threads during shuffle phase of Mapreduce framework. The framework also allows to express data parallel applications using simple sequential code.

Additionally, we present a heuristic approach based on genetic algorithm that could efficiently perform load balancing optimization to distribute key/values among threads such that we minimize data movement operations and evenly distribute computational workload.

Results of evaluation of Mapreduce on UPC framework based on WordCount benchmark application are presented and compared to Apache Hadoop implementation.

Keywords

UPC PGAS Mapreduce 

References

  1. 1.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Sixth Symposium on Operating System Design and Implementation (OSDI2004), p. 10. USENIX Association, San Francisco (2004)Google Scholar
  2. 2.
    Carlson, W.W., Draper, J.M., Culler, D.E., Yelick, K., Brooks, E., Warren, K.: Introduction to UPC and language specification. Technical report, IDA Center for Computing Sciences (1999)Google Scholar
  3. 3.
    Teijeiro, C., Taboada, G.L., Tourino, J., Doallo, R.: Design and implementation of Mapreduce using the PGAS programming model with UPC. In: 17th International Conference on Parallel and Distributed Systems (ICPADS 2011), pp. 196–203. IEEE Computer Society, Washington (2011). doi: 10.1109/ICPADS.2011.162
  4. 4.
    Dong, H., Zhou, S., Grove, D.: X10-enabled MapReduce. In: 4th Conference on Partitioned Global Address Space Programming Model (PGAS 2010), pp. 1–6. ACM, New York (2010). doi: 10.1145/2020373.2020382
  5. 5.
    Man, K.F., Tang, K.S., Kwong, S.: Genetic algorithms: Concepts and applications. IEEE Trans. Industr. Electron. 43(5), 519–534 (1996). doi: 10.1109/41.538609 CrossRefGoogle Scholar
  6. 6.
    Chu, P.C., Beasley, J.E.: A genetic algorithm for the generalised assignment problem. Comput. Oper. Res. 24(1), 17–23 (1997). doi: 10.1016/S0305-0548(96)00032-9 MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Liu, Y.Y., Wang, S.: A scalable parallel genetic algorithm for the generalized Assignment Problem. Parallel Comput. 46, 98–119 (2015). doi: 10.1016/j.parco.2014.04.008 MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Shomanov Aday
    • 1
    Email author
  • Akhmed-Zaki Darkhan
    • 1
  • Mansurova Madina
    • 1
  1. 1.Al-Farabi Kazakh National UniversityAlmatyKazakhstan

Personalised recommendations