A Practice of TPC-DS Multidimensional Implementation on NoSQL Database Systems

  • Hongwei Zhao
  • Xiaojun Ye
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8391)

Abstract

While NoSQL database systems are well established, it is not clear how to process multidimensional OLAP queries on current key-value stores. In this paper, we detail how to match the high-level cube model with the low-level key-value stores built on NoSQL databases, and illustrate how to support efficiently OLAP queries by scale out while retaining a MapReduce-like execution engine. For big data the functional problem of storage and processing power is compounded, we balanced them with partial aggregation between batch processing and query runtime. Base cuboids are initially constructed for TPC-DS fact tables by using multidimensional array, and cuboids for various granularity aggregation data are derived at runtime with base ones. The cube storage module converts dimension members into binary keys and leverages a novel distributed database to provide efficient storage for huge cuboids. The OLAP engine built on lightweight concurrent actors can scale out seamlessly; provide highly concurrent distributed cuboid processing. Finally, we illustrate some experiments on the implementation prototype based on TPC-DS queries. The results show that multidimensional models for OLAP applications on NoSQL systems are possible for future big data analytics.

Keywords

Big Data On Line Analysis Processing Multidimensional Data Model TPC-DS Benchmark 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Evelson, B.: It’s the dawning of the age of BI DBMS. Technical report (2011), http://www.forrester.com
  2. 2.
    Cuzzocrea, A., Il-Yeol, S., Karen, C.D.: Analytics over large-scale multidimensional data: the big data revolution. In: Proceedings of the DOLAP, pp. 101–103. ACM (2011)Google Scholar
  3. 3.
    Melnik, S., et al.: Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment 3(1), 330–339 (2010)Google Scholar
  4. 4.
    Corbett, J.C., et al.: Spanner: Google’s globally-distributed database. In: Proceedings of the10th USENIX Symposium on OSDI, pp. 251–264 (2012)Google Scholar
  5. 5.
    Xin, R., et al.: Shark: SQL and rich analytics at scale.arXiv preprint arXiv:1211.6176 (2012)Google Scholar
  6. 6.
    Chen, Z., Carlos, O.: Efficient OLAP with UDFs. In: Proceedings of the DOLAP, pp. 41–48. ACM (2008)Google Scholar
  7. 7.
    Turcu, A., Binoy, R.: Hyflow2: A high performance distributed transactional memory framework in scala (2012), http://hyflow.org/hyflow/chrome/site/pub/hyflow2-tech.pdf
  8. 8.
    Ghazal, A., Hu, M., Rabl, T., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.A.: BigBench: towards an industry standard benchmark forbig data analytics. In: Proceedings of the SIGMOD (2013)Google Scholar
  9. 9.
    Poess, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: Proceeding of VLDB, pp. 1138–1149. ACM (2007)Google Scholar
  10. 10.
    Cheung, D., Zhou, B., Kao, B., Lu, H., Lam, T., Ting, H.: Requirement-based data cube schema design. In: Proceedings of the CIKM, pp. 162–169. ACM (1999)Google Scholar
  11. 11.
    Niemi, T., Nummenmaa, J., Thanisch, P.: Constructing OLAP cubes based on queries. In: The Proceeding of DOLAP, pp. 9–15. ACM (2001)Google Scholar
  12. 12.
    Dehne, F., et al.: A Distributed Tree Data Structure For Real-Time OLAP On Cloud ArchitecturesGoogle Scholar
  13. 13.
    Ciferri, C., Ciferri, R., Gómez, L.I., Schneider, M., Vaisman, A.A., Zimanyi, E.: Cube Algebra: A Generic User-Centric Model and Query Language for OLAP Cubes. International Journal of Data Warehousing and Mining (2012)Google Scholar
  14. 14.
    Goil, S., Alok, C.: High Performance OLAP and Data Mining on Parallel Computers. Data Mining and Knowledge Discovery 1(4), 391–417 (1997)CrossRefGoogle Scholar
  15. 15.
    Romero, O., Alberto, A.: Multidimensional Design by Examples. Data Warehousing and Knowledge Discovery, pp. 85–94. Springer, Heidelberg (2006)Google Scholar
  16. 16.
    Zaharia, M., et al.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on NSDI (2012)Google Scholar
  17. 17.
    Li, J., Rotem, D., Srivastava, J.: Aggregation Algorithms for Very Large Compressed Data Warehouses. In: Proceedings of the VLDB, pp. 651–662. ACM (1999)Google Scholar
  18. 18.
    Taylor, R.C.: An Overview of the Hadoop/MapReduce/HBaseFramework and Its Current Applications in Bioinformatics. BMC Bioinformatics 11(suppl. 12), S1 (2010)Google Scholar
  19. 19.
    Dean, J., Sanjay, G.: MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  20. 20.
    Van Renesse, R., Dumitriu, D., Gough, V., et al.: Efficient Reconciliation and Flow Control for Anti-entropy Protocols. In: Proceedings of the LADIS. ACM (2008)Google Scholar
  21. 21.
    Moussa, R.: TPC-H Benchmark Analytics Scenarios and Performances on Hadoop Data Clouds. In: Benlamri, R. (ed.) NDT 2012, Part I. CCIS, vol. 293, pp. 220–234. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  22. 22.
    d’Orazio, L., Bimonte, S.: Multidimensional arrays for warehousing data on clouds. In: Hameurlain, A., Morvan, F., Tjoa, A.M. (eds.) Globe 2010. LNCS, vol. 6265, pp. 26–37. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  23. 23.
    Dutta, H., Kamil, A., Pooleery, M., et al.: Distributed Storage of Large-Scale Multidimensional Electroencephalogram Data Using Hadoop and HBase. In: Grid and Cloud Database Management, pp. 331–347. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  24. 24.
    Wu, L., Sumbaly, R., Riccomini, C., et al.: Avatara: Olap for web-scale analytics products. Proceedings of the VLDB Endowment 5(12), 1874–1877 (2012)Google Scholar
  25. 25.
    Wang, H., Qin, X., Zhang, Y., Wang, S., Wang, Z.: LinearDB: A relational approach to make data warehouse scale like MapReduce. In: Yu, J.X., Kim, M.H., Unland, R., et al. (eds.) DASFAA 2011, Part II. LNCS, vol. 6588, pp. 306–320. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  26. 26.
    Nishimura, S., Das, S., Agrawal, D., et al.: MD-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services. In: Distributed and Parallel Databases, pp. 1–31 (2012)Google Scholar
  27. 27.
    Zhizhin, M., Medvedev, D., Mishin, D., et al.: Transparent Data Cube for Spatiotemporal Data Mining and Visualization. In: Grid and Cloud Database Management, pp. 307–330. Springer, Heidelberg (2011)Google Scholar
  28. 28.
    Lehene, C.: Low Latency “OLAP” with Hbase, HBaseCon (2012), http://www.slideshare.net/Hadoop_Summit/low-latancy-olap-with-hadoop-13386744

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Hongwei Zhao
    • 1
  • Xiaojun Ye
    • 1
  1. 1.School of SoftwareTsinghua UniversityBeijingChina

Personalised recommendations