Skip to main content

Representing MapReduce Optimisations in the Nested Relational Calculus

  • Conference paper
Big Data (BNCOD 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7968))

Included in the following conference series:

Abstract

The MapReduce programming model is recently getting a lot of attention from both academic and business researchers. Systems based on this model hide communication and synchronization issues from the user and allow processing of high volumes of data on thousands of commodity computers. In this paper we are interested in applying MR to processing hierarchical data with nested collections such as stored in JSON or XML formats but with restricted nesting depth as is usual in the nested relational model. The current data analytics systems now often propose ad-hoc formalisms to represent query evaluation plans and to optimize their execution. In this paper we will argue that the Nested Relation Calculus provides a general, elegant and effective way to describe and investigate these optimizations. It allows to describe and combine both classical optimizations and MapReduce-specific optimizations. We demonstrate this by showing that MapReduce programs can be expressed and represented straightforwardly in NRC by adding syntactic short-hands. In addition we show that optimizations in existing systems can be readily represented in this extended formalism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Afrati, F.N., Sarma, A.D., Salihoglu, S., Ullman, J.D.: Vision paper: Towards an understanding of the limits of map-reduce computation. CoRR, abs/1204.1754 (2012)

    Google Scholar 

  2. Borthakur, D., Gray, J., Sarma, J.S., Muthukkaruppan, K., Spiegelberg, N., Kuang, H., Ranganathan, K., Molkov, D., Menon, A., Rash, S., Schmidt, R., Aiyer, A.: Apache hadoop goes realtime at facebook. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 1071–1080. ACM, New York (2011)

    Chapter  Google Scholar 

  3. Buneman, P., Naqvi, S.A., Tannen, V., Wong, L.: Principles of programming with complex objects and collection types. Theor. Comput. Sci. 149(1), 3–48 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  4. Cafarella, M.J., Ré, C.: Manimal: relational optimization for data-intensive programs. In: Procceedings of the 13th International Workshop on the Web and Databases, WebDB 2010, pp. 10:1–10:6. ACM, New York (2010)

    Google Scholar 

  5. Chambers, C., Raniwala, A., Perry, F., Adams, S., Henry, R.R., Bradshaw, R., Weizenbaum, N.: Flumejava: easy, efficient data-parallel pipelines. In: Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2010, pp. 363–375. ACM, New York (2010)

    Chapter  Google Scholar 

  6. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)

    Google Scholar 

  7. Van den Bussche, J., Vansummeren, S.: Polymorphic type inference for the named nested relational calculus. ACM Trans. Comput. Log. 9(1) (2007)

    Google Scholar 

  8. Fegaras, L., Li, C., Gupta, U.: An optimization framework for map-reduce queries. In: EDBT, pp. 26–37 (2012)

    Google Scholar 

  9. Fegaras, L., Maier, D.: Optimizing object queries using an effective calculus. ACM Trans. Database Syst. 25(4), 457–516 (2000)

    Article  MATH  Google Scholar 

  10. Gates, A., Natkovich, O., Chopra, S., Kamath, P., Narayanam, S., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a highlevel dataflow system on top of mapreduce: The pig experience. PVLDB 2(2), 1414–1425 (2009)

    Google Scholar 

  11. Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, SOSP 2003, pp. 29–43. ACM, New York (2003)

    Chapter  Google Scholar 

  12. Karloff, H.J., Suri, S., Vassilvitskii, S.: A model of computation for mapreduce. In: SODA, pp. 938–948 (2010)

    Google Scholar 

  13. Lämmel, R.: Google’s MapReduce Programming Model – Revisited. Science of Computer Programming 70(1), 1–30 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  14. Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. SPAA 48 (2009)

    Google Scholar 

  15. Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the data: Parallel analysis with sawzall. Scientific Programming 13(4), 277–298 (2005)

    Google Scholar 

  16. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive - a warehousing solution over a map-reduce framework. PVLDB 2(2), 1626–1629 (2009)

    Google Scholar 

  17. Van den Bussche, J., Van Gucht, D., Vansummeren, S.: A crash course on database queries. In: Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2007, pp. 143–154. ACM, New York (2007)

    Chapter  Google Scholar 

  18. Wadler, P.: Comprehending monads. Mathematical Structures in Computer Science 2(4), 461–493 (1992)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grabowski, M., Hidders, J., Sroka, J. (2013). Representing MapReduce Optimisations in the Nested Relational Calculus. In: Gottlob, G., Grasso, G., Olteanu, D., Schallhart, C. (eds) Big Data. BNCOD 2013. Lecture Notes in Computer Science, vol 7968. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39467-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39467-6_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39466-9

  • Online ISBN: 978-3-642-39467-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics