Skip to main content
Log in

Cache-Conscious Data Cube Computation on a Modern Processor

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Data cube computation is an important problem in the field of data warehousing and OLAP (online analytical processing). Although it has been studied extensively in the past, most of its algorithms are designed without considering CPU and cache behavior. In this paper, we first propose a cache-conscious cubing approach called CC-Cubing to efficiently compute data cubes on a modern processor. This method can enhance CPU and cache performances. It adopts an integrated depth-first and breadth-first partitioning order and partitions multiple dimensions simultaneously. The partitioning scheme improves the data spatial locality and increases the utilization of cache lines. Software prefetching techniques are then applied in the sorting phase to hide the expensive cache misses associated with data scans. In addition, a cache-aware method is used in CC-Cubing to switch the sort algorithm dynamically. Our performance study shows that CC-Cubing outperforms BUC, Star-Cubing and MM-Cubing in most cases. Then, in order to fully utilize an SMT (simultaneous multithreading) processor, we present a thread-based CC-Cubing-SMT method. This parallel method provides an improvement up to 27% for the single-threaded CC-Cubing algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Gray J, Chaudhuri S, Bosworth A et al. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery, 1997, 1(1): 29–53.

    Article  Google Scholar 

  2. Agarwal S, Agrawal R, Deshpande P et al. On the computation of multidimensional aggregates. In Proc. VLDB Conference, Mumbai, India, September 3–6, 1996, pp.506–521.

  3. Ross K A, Srivastava D. Fast computation of sparse data cubes. In Proc. VLDB Conference, Athens, Greece, August 25–29, 1997, pp.116–125.

  4. Zhao Y, Deshpande P, Naughton J F. An array-based algorithm for simultaneous multidimensional aggregates. In Proc. ACM SIGMOD, Tucson, USA, May 13–15, 1997, pp.159–170.

  5. Beyer K, Ramakrishnan R. Bottom-up computation of sparse and iceberg CUBEs. In Proc. ACM SIGMOD, Philadelphia, USA, June 1–3, 1999, pp.359–370.

  6. Xin D, Han J, Li X, Wah B W. Star-Cubing: Computing iceberg cubes by top-down and bottom-up integration. In Proc. VLDB Conference, Berlin, Germany, Sept. 9–12, 2003, pp.476–487.

  7. Shao Z, Han J, Xin D. MM-Cubing: Computing iceberg cubes by factorizing the lattice space. In Proc. SSDBM, Santorini Island, Greece, June 21–23, 2004, pp.213–222.

  8. Sismanis Y, Deligiannakis A, Roussopoulos N, Kotidis Y. Dwarf: Shrinking the petacube. In Proc. ACM SIGMOD, Madison, Wisconsin, USA, June 3–6, 2002, pp.464–475.

  9. Lakshmanan L V S, Pei J, Zhao Y. QC-trees: An efficient summary structure for semantic OLAP. In Proc. ACM SIGMOD, San Diego, USA, June 9–12, 2003, pp.64–75.

  10. Li C, Wang S. Efficient incremental maintenance for distributive and non-distributive aggregate functions. Journal of Computer Science and Technology, 2006, 21(1): 52–65.

    Article  MathSciNet  Google Scholar 

  11. Shanmugasundaram J, Fayyad U M, Bradley P S. Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. In Proc. ACM SIGKDD, San Diego, California, USA, August 15–18, 1999, pp.223–232.

  12. Vitter J S, Wang M, Iyer B R. Data cube approximation and histograms via wavelets. In Proc. CIKM, Bethesda, Maryland, USA, November 3–7, 1998, pp.96–104.

  13. TimesTen Team. High-performance and scalability through application tier, in-memory data management. In Proc. VLDB Conference, Cairo, Egypt, Sept. 10–14, 2000, pp.677–680.

  14. Altibase. http://www.altibase.com/.

  15. Hennessy J L, Patterson D A. Computer architecture: A quantitative approach. Morgan Kaufmann Publishers Inc., 2002.

  16. Rao J, Ross K A. Cache conscious indexing for decision-support in main memory. In Proc. VLDB Conference, Edinburgh, UK, September 7–10, 1999, pp.78–89.

  17. Boncz P A, Manegold S, Kersten M L. Database architecture optimized for the new bottleneck: Memory access. In Proc. VLDB Conference, Edinburgh, UK, Sept. 7–10, 1999, pp.54–65.

  18. Rao J, Ross K A. Making B+-trees cache conscious in main memory. In Proc. ACM SIGMOD, Dallas, Texas, USA, May 16–18, 2000, pp.475–486.

  19. Shatdal A, Kant C, Naughton J F. Cache conscious algorithms for relational query processing. In Proc. VLDB Conference, Santiago de Chile, Chile, September 12–15, 1994, pp.510–521.

  20. Chen S, Gibbons P B, Mowry T C. Improving index performance through prefetching. In Proc. ACM SIGMOD, Santa Barbara, California, USA, May 21–24, 2001, pp.235–246.

  21. Nyberg C, Barclay T, Cvetanovic Z, Gray J, Lomet D. AlphaSort: A RISC machine sort. In Proc. ACM SIGMOD, Minneapolis, USA, May 24–27, 1994, pp.233–242.

  22. Chen S, Ailamaki A, Gibbons P B, Mowry T C. Improving hash join performance through prefetching. In Proc. ICDE, Boston, USA, March 30–April 2, 2004, pp.116–127.

  23. Luan H, Du X Y, Wang S. Prefetching J+ tree: A cache-optimized main memory database index structure. J. Comput. Sci. & Technol., 2009, 24(4): 687–707.

    Article  Google Scholar 

  24. Ailamaki A, DeWitt D J, Hill M D, Wood D A. DBMSs on a modern processor: Where does time go? In Proc. VLDB Conference, Edinburgh, UK, Sept. 7–10, 1999, pp.266–277.

  25. Lo J L, Barroso L A, Eggers S J, Gharachorloo K, Levy H M, Parekh S S. An analysis of database workload performance on simultaneous multithreaded processors. In Proc. ISCA, Barcelona, Spain, June 27–July 1, 1998, pp.39–50.

  26. Zhou J, Cieslewicz J, Ross K A, Shah M. Improving database performance on simultaneous multithreading processors. In Proc. VLDB Conference, Trondheim, Norway, August 30–September 2, 2005, pp.49–60.

  27. Ghoting A, Buehrer G, Parthasarathy S, Kim D, Nguyen A, Chen Y, Dubey P. Cache-conscious frequent pattern mining on modern and emerging processor. The VLDB Journal, 2007, 16(1): 77–96.

    Article  Google Scholar 

  28. Liu L, Li E, Zhang Y, Tang Z. Optimization of frequent itemset mining on multiple-core processor. In Proc. VLDB Conference, Vienna, Austria, Sept. 23–27, 2007, pp.1275–1285.

  29. PerfSuite. http://perfsuite.ncsa.uiuc.edu/.

  30. Calibrator. http://monetdb.cwi.nl/Calibrator/.

  31. Intel. IA-32 Intel architecture optimization reference manual. http://developer.intel.com.

  32. Tullsen D M, Eggers S J, Levy H M. Simultaneous multithreading: Maximizing on-chip parallelism. In Proc. ISCA, Santa Margherita Ligure, Italy, June 22–24, 1995, pp.392–403.

  33. Marr D T, Binns F, Hill D L, Hinton G, Koufaty D A, Miller J A, Upton M. Hyper-threading technology architecture and microarchitecture. Intel Technology Journal, 2002, 6(1): 4–15.

    Google Scholar 

  34. Hahn C, Warren S. Extended edited synoptic cloud reports from ships and land stations over the globe, 1952–1996. http://cdiac.ornl.gov/ftp/ndp026c.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hua Luan.

Additional information

This work is supported in part by a grant from HP Labs China, the National Natural Science Foundation of China under Grant No. 60496325, and the Main Memory OLAP Servers Project.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 90 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luan, H., Du, XY. & Wang, S. Cache-Conscious Data Cube Computation on a Modern Processor. J. Comput. Sci. Technol. 24, 708–722 (2009). https://doi.org/10.1007/s11390-009-9253-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-009-9253-0

Keywords

Navigation