Advertisement

Efficient Online Aggregates in Dense-Region-Based Data Cube Representations

  • Kais Haddadin
  • Tobias Lauer
Chapter
  • 281 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6380)

Abstract

In-memory OLAP systems require a space-efficient representation of sparse data cubes in order to accommodate large data sets. On the other hand, many efficient online aggregation techniques, such as prefix sums, are built on dense array-based representations. These are often not applicable to real-world data due to the size of the arrays which usually cannot be compressed well, as most sparsity is removed during pre-processing. A possible solution is to identify dense regions in a sparse cube and only represent those using arrays, while storing sparse data separately, e.g. in a spatial index structure. Previous dense-region-based approaches have concentrated mainly on the effectiveness of the dense-region detection (i.e. on the space-efficiency of the result). However, especially in higher-dimensional cubes, data is usually more cluttered, resulting in a potentially large number of small dense regions, which negatively affects query performance on such a structure. In this article, our focus is not only on space-efficiency but also on time-efficiency, both for the initial dense-region extraction and for queries carried out in the resulting hybrid data structure. After describing a pre-aggregation method for representing dense sub-cubes which supports efficient online aggregate queries as well as cell updates, our sub-cube extraction approach is outlined in detail. In addition, optimizations in our approach significantly reduce the time to build the initial data structure compared to former systems. Two methods to trade available memory for increased aggregate query performance are provided. Also, we present a straightforward adaptation of our approach to support multi-core or multi-processor architectures, which can further enhance query performance. Experiments with different real-world data sets show how various parameter settings can be used to adjust the efficiency and effectiveness of our algorithms.

Keywords

Range Query Query Time Data Cube Query Performance Global Density 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: Proceedings of ACM SIGMOD, pp. 322–331 (1990)Google Scholar
  2. 2.
    Cheung, D.W., Zhou, B., Kao, B., Kan, H., Lee, S.D.: Towards the building of dense-region-based OLAP system. Data and Knowledge Engineering 36(1), 1–27 (2001)zbMATHCrossRefGoogle Scholar
  3. 3.
    Chun, S., Chung, C.-W., Lee, S.-L.: Space-efficient cubes for OLAP range-sum queries. Decision Support Systems 37(1), 83–102 (2004)CrossRefGoogle Scholar
  4. 4.
    Cuzzocrea, A., Wang, W.: Approximate range-sum query answering on data cubes with probabilistic guarantees. Journal of Intelligent Information Systems 28, 161–197 (2007)CrossRefGoogle Scholar
  5. 5.
    Geffner, S., Agrawal, D., El Abbadi, A., Smith, T.: Relative prefix sums: an efficient approach for querying dynamic OLAP data cubes. In: Proceedings of International Conference on Data Engineering, Sydney, Australia, pp. 328–335 (1999)Google Scholar
  6. 6.
    Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery, 29–53 (1997)Google Scholar
  7. 7.
    Gupta, H., Harinarayan, V., Rajaraman, A., Ullman, J.: Index selection for OLAP. In: Proceedings of the 13th International Conference on Data Engineering, pp. 208–219 (1997)Google Scholar
  8. 8.
    Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings of ACM SIGMOD, pp. 47–57 (1984)Google Scholar
  9. 9.
    Haddadin, K., Lauer, T.: Efficient Online Aggregates in Dense-Region-Based Data Cube Representations. In: Proceedings of DaWaK, Linz, Austria, pp. 177–188 (2009)Google Scholar
  10. 10.
    Ho, C.-T., Agrawal, R., Megido, N., Srikant, R.: Range queries in OLAP data cubes. In: Proceedings of ACM SIGMOD, pp. 73–88 (1997)Google Scholar
  11. 11.
    Lauer, T., Mai, D., Hagedorn, P.: Efficient range-sum queries along dimensional hierarchies in data cubes. In: Proceedings of the First International Conference on Advances in Database, Knowledge, and Data Applications, Cancún, Mexico, pp. 7–12 (2009)Google Scholar
  12. 12.
    Mamoulis, N., Bakiras, S., Kalnis, P.: Evaluation of top-k OLAP queries using aggregate R-trees. In: Bauzer Medeiros, C., Egenhofer, M.J., Bertino, E. (eds.) SSTD 2005. LNCS, vol. 3633, pp. 236–253. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    Lee, S.-L.: An effective algorithm to extract dense sub-cubes from a large sparse cube. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 155–164. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Riedewald, M., Agrawal, D., El Abbadi, A.: Flexible data cubes for online aggregation. In: Proceedings of the 8th International Conference on Database Theory, London, UK, pp. 159–173 (2001)Google Scholar
  15. 15.
    Riedewald, M., Agrawal, D., El Abbadi, A.: pCube: Update-efficient online aggregation with progressive feedback and error bounds. In: Proceedings of the 12th International Conference on Scientific and Statistical Database Management, Berlin, Germany, pp. 95–108 (2000)Google Scholar
  16. 16.
    Witten, I.H.: Data Mining: Practical Machine Learning Tools and Techniques. Addison-Wesley, Reading (2000)Google Scholar
  17. 17.
    Zhao, Y., Deshpande, P., Naughton, J.: An array-based algorithm for simultaneous multidimensional aggregates. In: Proceedings of ACM SIGMOD, Tucson, AZ, pp. 159–170 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Kais Haddadin
    • 1
  • Tobias Lauer
    • 2
  1. 1.Jedox AGFreiburgGermany
  2. 2.Institute of Computer ScienceUniversity of FreiburgGermany

Personalised recommendations