Skip to main content

A Practice of TPC-DS Multidimensional Implementation on NoSQL Database Systems

  • Conference paper
Book cover Performance Characterization and Benchmarking (TPCTC 2013)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8391))

Included in the following conference series:

Abstract

While NoSQL database systems are well established, it is not clear how to process multidimensional OLAP queries on current key-value stores. In this paper, we detail how to match the high-level cube model with the low-level key-value stores built on NoSQL databases, and illustrate how to support efficiently OLAP queries by scale out while retaining a MapReduce-like execution engine. For big data the functional problem of storage and processing power is compounded, we balanced them with partial aggregation between batch processing and query runtime. Base cuboids are initially constructed for TPC-DS fact tables by using multidimensional array, and cuboids for various granularity aggregation data are derived at runtime with base ones. The cube storage module converts dimension members into binary keys and leverages a novel distributed database to provide efficient storage for huge cuboids. The OLAP engine built on lightweight concurrent actors can scale out seamlessly; provide highly concurrent distributed cuboid processing. Finally, we illustrate some experiments on the implementation prototype based on TPC-DS queries. The results show that multidimensional models for OLAP applications on NoSQL systems are possible for future big data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Evelson, B.: It’s the dawning of the age of BI DBMS. Technical report (2011), http://www.forrester.com

  2. Cuzzocrea, A., Il-Yeol, S., Karen, C.D.: Analytics over large-scale multidimensional data: the big data revolution. In: Proceedings of the DOLAP, pp. 101–103. ACM (2011)

    Google Scholar 

  3. Melnik, S., et al.: Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment 3(1), 330–339 (2010)

    Google Scholar 

  4. Corbett, J.C., et al.: Spanner: Google’s globally-distributed database. In: Proceedings of the10th USENIX Symposium on OSDI, pp. 251–264 (2012)

    Google Scholar 

  5. Xin, R., et al.: Shark: SQL and rich analytics at scale.arXiv preprint arXiv:1211.6176 (2012)

    Google Scholar 

  6. Chen, Z., Carlos, O.: Efficient OLAP with UDFs. In: Proceedings of the DOLAP, pp. 41–48. ACM (2008)

    Google Scholar 

  7. Turcu, A., Binoy, R.: Hyflow2: A high performance distributed transactional memory framework in scala (2012), http://hyflow.org/hyflow/chrome/site/pub/hyflow2-tech.pdf

  8. Ghazal, A., Hu, M., Rabl, T., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.A.: BigBench: towards an industry standard benchmark forbig data analytics. In: Proceedings of the SIGMOD (2013)

    Google Scholar 

  9. Poess, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: Proceeding of VLDB, pp. 1138–1149. ACM (2007)

    Google Scholar 

  10. Cheung, D., Zhou, B., Kao, B., Lu, H., Lam, T., Ting, H.: Requirement-based data cube schema design. In: Proceedings of the CIKM, pp. 162–169. ACM (1999)

    Google Scholar 

  11. Niemi, T., Nummenmaa, J., Thanisch, P.: Constructing OLAP cubes based on queries. In: The Proceeding of DOLAP, pp. 9–15. ACM (2001)

    Google Scholar 

  12. Dehne, F., et al.: A Distributed Tree Data Structure For Real-Time OLAP On Cloud Architectures

    Google Scholar 

  13. Ciferri, C., Ciferri, R., Gómez, L.I., Schneider, M., Vaisman, A.A., Zimanyi, E.: Cube Algebra: A Generic User-Centric Model and Query Language for OLAP Cubes. International Journal of Data Warehousing and Mining (2012)

    Google Scholar 

  14. Goil, S., Alok, C.: High Performance OLAP and Data Mining on Parallel Computers. Data Mining and Knowledge Discovery 1(4), 391–417 (1997)

    Article  Google Scholar 

  15. Romero, O., Alberto, A.: Multidimensional Design by Examples. Data Warehousing and Knowledge Discovery, pp. 85–94. Springer, Heidelberg (2006)

    Google Scholar 

  16. Zaharia, M., et al.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on NSDI (2012)

    Google Scholar 

  17. Li, J., Rotem, D., Srivastava, J.: Aggregation Algorithms for Very Large Compressed Data Warehouses. In: Proceedings of the VLDB, pp. 651–662. ACM (1999)

    Google Scholar 

  18. Taylor, R.C.: An Overview of the Hadoop/MapReduce/HBaseFramework and Its Current Applications in Bioinformatics. BMC Bioinformatics 11(suppl. 12), S1 (2010)

    Google Scholar 

  19. Dean, J., Sanjay, G.: MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  20. Van Renesse, R., Dumitriu, D., Gough, V., et al.: Efficient Reconciliation and Flow Control for Anti-entropy Protocols. In: Proceedings of the LADIS. ACM (2008)

    Google Scholar 

  21. Moussa, R.: TPC-H Benchmark Analytics Scenarios and Performances on Hadoop Data Clouds. In: Benlamri, R. (ed.) NDT 2012, Part I. CCIS, vol. 293, pp. 220–234. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  22. d’Orazio, L., Bimonte, S.: Multidimensional arrays for warehousing data on clouds. In: Hameurlain, A., Morvan, F., Tjoa, A.M. (eds.) Globe 2010. LNCS, vol. 6265, pp. 26–37. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  23. Dutta, H., Kamil, A., Pooleery, M., et al.: Distributed Storage of Large-Scale Multidimensional Electroencephalogram Data Using Hadoop and HBase. In: Grid and Cloud Database Management, pp. 331–347. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  24. Wu, L., Sumbaly, R., Riccomini, C., et al.: Avatara: Olap for web-scale analytics products. Proceedings of the VLDB Endowment 5(12), 1874–1877 (2012)

    Google Scholar 

  25. Wang, H., Qin, X., Zhang, Y., Wang, S., Wang, Z.: LinearDB: A relational approach to make data warehouse scale like MapReduce. In: Yu, J.X., Kim, M.H., Unland, R., et al. (eds.) DASFAA 2011, Part II. LNCS, vol. 6588, pp. 306–320. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  26. Nishimura, S., Das, S., Agrawal, D., et al.: MD-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services. In: Distributed and Parallel Databases, pp. 1–31 (2012)

    Google Scholar 

  27. Zhizhin, M., Medvedev, D., Mishin, D., et al.: Transparent Data Cube for Spatiotemporal Data Mining and Visualization. In: Grid and Cloud Database Management, pp. 307–330. Springer, Heidelberg (2011)

    Google Scholar 

  28. Lehene, C.: Low Latency “OLAP” with Hbase, HBaseCon (2012), http://www.slideshare.net/Hadoop_Summit/low-latancy-olap-with-hadoop-13386744

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhao, H., Ye, X. (2014). A Practice of TPC-DS Multidimensional Implementation on NoSQL Database Systems. In: Nambiar, R., Poess, M. (eds) Performance Characterization and Benchmarking. TPCTC 2013. Lecture Notes in Computer Science, vol 8391. Springer, Cham. https://doi.org/10.1007/978-3-319-04936-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04936-6_7

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04935-9

  • Online ISBN: 978-3-319-04936-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics