A Parallel Compressed Data Cube Based on Hadoop

  • Jingang ShiEmail author
  • Yan Zheng
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1117)


Aiming at the on-line analytical processing technology, this paper proposes a parallel compressed data cube algorithm based on Hadoop architecture. The algorithm divides a single data cube into several independent sub-compressed data cubes, and then uses Hadoop architecture to realize the parallel construction and query of the entire data cube. Experiments show that the parallel compressed data cube algorithm combines the parallelism and high scalability of the Hadoop architecture on the one hand, and on the other hand, it can realize faster query operation on data cube by means of a self-indexing of the compressed data cube. So it has good research value and practical application significance.


Data cube Hadoop Parallel 



This work was supported by the National Natural Science Foundation of China (No. 61702345).


  1. 1.
    Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Rec. 26(1), 65–74 (1997)CrossRefGoogle Scholar
  2. 2.
    Golfarelli, M., Rizzi, S.: Designing the data warehouse: key steps and crucial issues. J. Comput. Sci. Inf. Manag. 2(3), 13–22 (1999)Google Scholar
  3. 3.
    Wang, W., Lu, H.J., Feng, J.L., et al.: Condensed cube: an effective approach to reducing data cube size. In: Proceedings of the 18th International Conference on Data Engineering, pp. 155–165 (2002)Google Scholar
  4. 4.
    Lakshmanan, L.V.S., Pei, J., Han, J.W.: Quotient cube: how to summarize the semantics of a data cube. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 778–789 (2002)CrossRefGoogle Scholar
  5. 5.
    Sismanis, Y., Deligiannakis, A., Roussopoulos, N., et al.: Dwarf: shrinking the PetaCube. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 464–475 (2002)Google Scholar
  6. 6.
    Xiang, L.G.: Construction and compression of dwarf. J. Zhejiang Univ. SCI. 6(1), 519–527 (2005)MathSciNetGoogle Scholar
  7. 7.
    Lammel, R.: Google’s MapReduce programming model-revisited. Sci. Comput. Program. 70(1), 1–30 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  9. 9.
    Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)CrossRefGoogle Scholar
  10. 10.
    Landset, S., Khoshgoftaar, T.M., Richter, A.N., et al.: A survey of open source tools for machine learning with big data in the hadoop ecosystem. J. Big Data 2(1), 1–36 (2015)CrossRefGoogle Scholar
  11. 11.
    Othayoth, R., Poess, M.: The making of TPC-DS. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 1049–1058 (2006)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.School of Information and Control EngineeringShenyang Jianzhu UniversityShenyangChina
  2. 2.Shenyang DONFON Titanium Industry Co., Ltd.ShenyangChina

Personalised recommendations