A Practice of TPC-DS Multidimensional Implementation on NoSQL Database Systems
- Cite this paper as:
- Zhao H., Ye X. (2014) A Practice of TPC-DS Multidimensional Implementation on NoSQL Database Systems. In: Nambiar R., Poess M. (eds) Performance Characterization and Benchmarking. TPCTC 2013. Lecture Notes in Computer Science, vol 8391. Springer, Cham
While NoSQL database systems are well established, it is not clear how to process multidimensional OLAP queries on current key-value stores. In this paper, we detail how to match the high-level cube model with the low-level key-value stores built on NoSQL databases, and illustrate how to support efficiently OLAP queries by scale out while retaining a MapReduce-like execution engine. For big data the functional problem of storage and processing power is compounded, we balanced them with partial aggregation between batch processing and query runtime. Base cuboids are initially constructed for TPC-DS fact tables by using multidimensional array, and cuboids for various granularity aggregation data are derived at runtime with base ones. The cube storage module converts dimension members into binary keys and leverages a novel distributed database to provide efficient storage for huge cuboids. The OLAP engine built on lightweight concurrent actors can scale out seamlessly; provide highly concurrent distributed cuboid processing. Finally, we illustrate some experiments on the implementation prototype based on TPC-DS queries. The results show that multidimensional models for OLAP applications on NoSQL systems are possible for future big data analytics.
KeywordsBig Data On Line Analysis Processing Multidimensional Data Model TPC-DS Benchmark
Unable to display preview. Download preview PDF.