Chapter

Performance Characterization and Benchmarking

Volume 8391 of the series Lecture Notes in Computer Science pp 93-108

A Practice of TPC-DS Multidimensional Implementation on NoSQL Database Systems

  • Hongwei ZhaoAffiliated withSchool of Software, Tsinghua University
  • , Xiaojun YeAffiliated withSchool of Software, Tsinghua University

* Final gross prices may vary according to local VAT.

Get Access

Abstract

While NoSQL database systems are well established, it is not clear how to process multidimensional OLAP queries on current key-value stores. In this paper, we detail how to match the high-level cube model with the low-level key-value stores built on NoSQL databases, and illustrate how to support efficiently OLAP queries by scale out while retaining a MapReduce-like execution engine. For big data the functional problem of storage and processing power is compounded, we balanced them with partial aggregation between batch processing and query runtime. Base cuboids are initially constructed for TPC-DS fact tables by using multidimensional array, and cuboids for various granularity aggregation data are derived at runtime with base ones. The cube storage module converts dimension members into binary keys and leverages a novel distributed database to provide efficient storage for huge cuboids. The OLAP engine built on lightweight concurrent actors can scale out seamlessly; provide highly concurrent distributed cuboid processing. Finally, we illustrate some experiments on the implementation prototype based on TPC-DS queries. The results show that multidimensional models for OLAP applications on NoSQL systems are possible for future big data analytics.

Keywords

Big Data On Line Analysis Processing Multidimensional Data Model TPC-DS Benchmark