HIP: Information Passing for Optimizing Join-Intensive Data Processing Workloads on Hadoop

  • Seokyong Hong
  • Kemafor Anyanwu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7447)

Abstract

Hadoop-based data processing platforms translate join intensive queries into multiple “jobs” (MapReduce cycles). Such multi-job workflows lead to a significant amount of data movement through the disk, network and memory fabric of a Hadoop cluster which could negatively impact performance and scalability. Consequently, techniques that minimize sizes of intermediate results will be useful in this context. In this paper, we present an information passing technique (HIP) that can minimize the size of intermediate data on Hadoop-based data processing platforms.

Keywords

Query Plan Summary Information MapReduce Framework Hadoop Cluster Information Passing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  2. 2.
    Apache Hadoop, http://hadoop.apache.org
  3. 3.
    Gates, A., Natkovich, O., Chopra, S., Kamath, P., Narayanam, S., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience. PVLDB 2(2), 1414–1425 (2009)Google Scholar
  4. 4.
    Dittrich, J., Quiané-Ruiz, J., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: Making a Yellow Elephant Run Like a Cheetah. PVLDB 3(1), 518–529 (2010)Google Scholar
  5. 5.
    Lin, Y., Agrawal, D., Chen, C., Ooi, B.C., Wu, S.: Llama: Leveraging Columnar Storage for Scalable Join Processing in the MapReduce Framework. In: ACM SIGMOD, pp. 961–972. ACM, Athens (2011)Google Scholar
  6. 6.
    Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A Comparison of Join Algorithms for Log Processing in MapReduce. In: ACM SIGMOD, pp. 975–986. ACM, Indianapolis (2010)Google Scholar
  7. 7.
    Ives, Z.G., Taylor, N.E.: Sideways Information Passing for Push-Style Query Processing. In: 24th International Conference on ICDE, pp. 774–783. IEEE, Cancún (2008)Google Scholar
  8. 8.
    Neumann, T., Weikum, G.: Scalable join processing on very large RDF graphs. In: ACM SIGMOD, pp. 627–640. ACM, Providence (2009)CrossRefGoogle Scholar
  9. 9.
    Bernstein, P.A., Chiu, D.W.: Using Semi-Joins to Solve Relational Queries. J. ACM 28(1), 25–40 (1981)MathSciNetMATHCrossRefGoogle Scholar
  10. 10.
    Avnur, R., Hellerstein, J.M.: Eddies: Continuously Adaptive Query Processing. In: ACM SIGMOD, pp. 261–272. ACM, Dallas (2000)CrossRefGoogle Scholar
  11. 11.
    Mumick, I.S., Pirahesh, H.: Implementation of Magic-sets in a Relational Database System. In: ACM SIGMOD, pp. 103–114. ACM, Minneapolis (1994)Google Scholar
  12. 12.
  13. 13.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Seokyong Hong
    • 1
  • Kemafor Anyanwu
    • 1
  1. 1.Department of Computer ScienceNorth Carolina State UniversityRaleighUSA

Personalised recommendations