Comparing High Level MapReduce Query Languages

  • Robert J. Stewart
  • Phil W. Trinder
  • Hans-Wolfgang Loidl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6965)


The MapReduce parallel computational model is of increasing importance. A number of High Level Query Languages (HLQLs) have been constructed on top of the Hadoop MapReduce realization, primarily Pig, Hive, and JAQL. This paper makes a systematic performance comparison of these three HLQLs, focusing on scale up, scale out and runtime metrics. We further make a language comparison of the HLQLs focusing on conciseness and computational power. The HLQL development communities are engaged in the study, which revealed technical bottlenecks and limitations described in this document, and it is impacting their development.


Message Passing Interface Query Language Word Count Input Size Runtime Performance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Atkinson, M.P., Buneman, P.: Types and persistence in database programming languages. ACM Comput. Surv. 19(2), 105–190 (1987)CrossRefGoogle Scholar
  2. 2.
    Beyer, K.S., Ercegovac, V., Krishnamurthy, R., Raghavan, S., Rao, J., Reiss, F., Shekita, E.J., Simmen, D.E., Tata, S., Vaithyanathan, S., Zhu, H.: Towards a scalable enterprise content analytics platform. IEEE Data Eng. Bull. 32(1), 28–35 (2009)Google Scholar
  3. 3.
    Borthakur, D.: The Hadoop Distributed File System: Architecture and Design (2007),
  4. 4.
    Borthakur, D.: The Hadoop Distributed File System: Architecture and Design. The Apache Software Foundation (2007)Google Scholar
  5. 5. Jaql developers message board,
  6. 6.
    Crockford, D.: The application/json media type for javascript object notation (json). RFC 4627 (Informational) (July 2006)Google Scholar
  7. 7.
    Date, C.J.: An Introduction to Database Systems. Addison-Wesley Longman Publishing Co., Inc., Boston (1991)zbMATHGoogle Scholar
  8. 8.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  9. 9.
    The Apache Software Foundation. Hadoop — published java implementation of the join benchmark,
  10. 10.
    The Apache Software Foundation. Hadoop — wordcount example,
  11. 11.
    The Apache Software Foundation. Hive — language manual for the join function,
  12. 12.
    The Apache Software Foundation. Pig 0.8 — release notes (December 2010),
  13. 13.
    The Apache Software Foundation. Hive 0.7 — release notes (March 2011),
  14. 14.
    Gates, A.: Pig and hive at yahoo (August 2010),
  15. 15.
    Gates, A.F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S.M., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a high-level dataflow system on top of map-reduce: the pig experience. In: Proc. VLDB Endow., vol. 2, pp. 1414–1425 (August 2009)Google Scholar
  16. 16.
    IBM. Jaql — language manual for the join function,
  17. 17.
    Murthy, A.C.: Programming Hadoop Map-Reduce: Programming, Tuning and Debugging. In: ApacheCon US (2008)Google Scholar
  18. 18.
    Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110. ACM, New York (2008)CrossRefGoogle Scholar
  19. 19.
    Stewart, R.J.: Performance and programmability comparison of mapreduce query languages: Pig, hive, jaql & java. Master’s thesis, Heriot Watt University, Edinburgh, United Kingdom (May 2010),
  20. 20.
    Stewart, R.J.: Slideshow presentation: Performance results of high level query languages: Pig, hive, and jaql (April 2010)
  21. 21.
    JAQL Development Team. Email discussion on jaql join runtime performance issues. private communication (September 2010)Google Scholar
  22. 22.
    Pig Development Team. Pig DataGenerator,
  23. 23.
    Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H., Murthy, R.: Hive - a petabyte scale data warehouse using hadoop. In: ICDE, pp. 996–1005 (2010)Google Scholar
  24. 24.
    Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. In: Proc. VLDB Endow., vol. 2(2), pp. 1626–1629 (2009)Google Scholar
  25. 25.
    White, T.: Hadoop — The Definitive Guide: MapReduce for the Cloud. O’Reilly, Sebastopol (2009)Google Scholar
  26. 26.
    Yahoo. Pigmix — unit test benchmarks for pig,
  27. 27.
    Yang, H.-c., Dasdan, A., Hsiao, R.-L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: SIGMOD 2007: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 1029–1040. ACM, New York (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Robert J. Stewart
    • 1
  • Phil W. Trinder
    • 1
  • Hans-Wolfgang Loidl
    • 1
  1. 1.Mathematical and Computer SciencesHeriot Watt UniversityUK

Personalised recommendations