CUBE File: A File Structure for Hierarchically Clustered OLAP Cubes

  • Nikos Karayannidis
  • Timos Sellis
  • Yannis Kouvaras
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2992)

Abstract

Hierarchical clustering has been proved an effective means for physically organizing large fact tables since it reduces significantly the I/O cost during ad hoc OLAP query evaluation. In this paper, we propose a novel multidimensional file structure for organizing the most detailed data of a cube, the CUBE File. The CUBE File achieves hierarchical clustering of the data, enabling fast access via hierarchical restrictions. Moreover, it imposes a low storage cost and adapts perfectly to the extensive sparseness of the data space achieving a high compression rate. Our results show that the CUBE File outperforms the most effective method proposed up to now for hierarchically clustering the cube, resulting in 7-9 times less I/Os on average for all workloads tested. Thus, it achieves a higher degree of hierarchical clustering. Moreover, the CUBE File imposes a 2-3 times lower storage cost.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bayer, R.: The universal B-Tree for multi-dimensional Indexing: General Concepts. In: Masuda, T., Tsukamoto, M., Masunaga, Y. (eds.) WWCA 1997. LNCS, vol. 1274, Springer, Heidelberg (1997)Google Scholar
  2. 2.
    Chan, C.Y., Ioannidis, Y.E.: Bitmap Index Design and Evaluation. In: SIGMOD 1998 (1998)Google Scholar
  3. 3.
    Deshpande, P., Ramasamy, K., Shukla, A., Naughton, J.F.: Caching Multidimensional Queries Using Chunks. In: SIGMOD 1998 (1998)Google Scholar
  4. 4.
    Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and SubTotal. In: ICDE 1996 (1996)Google Scholar
  5. 5.
    Karayannidis, N.: Storage Structures, Query Processing and Implementation of On-Line Analytical Processing Systems, Ph.D. Thesis, National Technical University of Athens (2003), Available at: http://www.dblab.ece.ntua.gr/~nikos/thesis/PhD_thesis_en.pdf
  6. 6.
    Karayannidis, N., Sellis, T.: SISYPHUS: The Implementation of a Chunk-Based Storage Manager for OLAP Data Cubes. Data and Knowledge Engineering 45(2), 155–188 (2003)CrossRefGoogle Scholar
  7. 7.
    Karayannidis, N., et al.: Processing Star-Queries on Hierarchically-Clustered Fact-Tables. In: VLDB 2002 (2002)Google Scholar
  8. 8.
    Lakshmanan, L.V.S., Pei, J., Han, J.: Quotient Cube: How to Summarize the Semantics of a Data Cube. In: VLDB 2002 (2002)Google Scholar
  9. 9.
    Markl, V., Ramsak, F., Bayern, R.: Improving OLAP Performance by Multidimensional Hierarchical Clustering. In: IDEAS 1999 (1999)Google Scholar
  10. 10.
    O’Neil, P.E., Graefe, G.: Multi-Table Joins Through Bitmapped Join Indices. SIGMOD Record 24(3), 8–11 (1995)CrossRefGoogle Scholar
  11. 11.
    Nievergelt, J., Hinterberger, H., Sevcik, K.C.: The Grid File: An Adaptable, Symmetric Multikey File Structure. TODS 9(1), 38–71 (1984)CrossRefGoogle Scholar
  12. 12.
    O’Neil, P.E., Quass, D.: Improved Query Performance with Variant Indexes. In: SIGMOD 1997 (1997)Google Scholar
  13. 13.
    Pieringer, R., et al.: Combining Hierarchy Encoding and Pre-Grouping: Intelligent Grouping in Star Join Processing. In: ICDE 2003 (2003)Google Scholar
  14. 14.
    Ramsak, F., et al.: Integrating the UB-Tree into a Database System Kernel. In: VLDB 2000 (2000)Google Scholar
  15. 15.
    Sarawagi, S.: Indexing OLAP Data. Data Engineering Bulletin 20(1), 36–43 (1997)Google Scholar
  16. 16.
    Sismanis, Y., Deligiannakis, A., Roussopoulos, N., Kotidis, Y.: Dwarf: shrinking the PetaCube. In: SIGMOD 2002 (2002)Google Scholar
  17. 17.
    Sarawagi, S., Stonebraker, M.: Efficient Organization of Large Multidimensional Arrays. In: ICDE 1994 (1994)Google Scholar
  18. 18.
    The Transbase Hypercube® relational database system, http://www.transaction.de
  19. 19.
    Tsois, A., Sellis, T.: The Generalized Pre-Grouping Transformation: Aggregate- Query Optimization in the Presence of Dependencies. In: VLDB 2003 (2003)Google Scholar
  20. 20.
    Weber, R., Schek, H.-J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: VLDB 1998, pp. 194–205 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Nikos Karayannidis
    • 1
  • Timos Sellis
    • 1
  • Yannis Kouvaras
    • 1
  1. 1.Institute of Communication and Computer Systems and, School of Electrical and Computer EngineeringNational Technical University of AthensZographou, AthensHellas

Personalised recommendations