Advertisement

A Probabilistic Approach for Computing Approximate Iceberg Cubes

  • Alfredo Cuzzocrea
  • Filippo Furfaro
  • Giuseppe M. Mazzeo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5181)

Abstract

An iceberg cube is a refinement of a data cube containing the subset of cells whose measure is larger than a given threshold (iceberg condition). Iceberg cubes are well-established tools supporting fast data analysis, as they filter the information contained in classical data cubes to provide the most relevant pieces of information. Although the problem of efficiently computing iceberg cubes has been widely investigated, this task is intrinsically expensive, due to the large amount of data which must be usually dealt with. Indeed, in several application scenarios, efficiency is so crucial that users would benefit from a fast computation of even incomplete iceberg cubes. In fact, an incomplete iceberg cube could support preliminary data analysis by allowing users to focus their explorations quickly and effectively, thus saving large amounts of computational resources. In this paper, we propose a technique for efficiently computing iceberg cubes, possibly trading off the computational efficiency with the completeness of the result. Specifically, we devise an algorithm which employs a probabilistic framework to prevent cells which are probably irrelevant (i.e., which are unlikely to satisfy the iceberg condition) from being computed. The output of our algorithm is an incomplete iceberg cube, which is efficiently computed and prone to be refined, in the sense that the user can decide to go through the computation of the cells which were estimated irrelevant during the previous invocations of the algorithm.

Keywords

Probability Threshold Probabilistic Framework Data Cube Zipf Distribution Count Threshold 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Acharya, S., Gibbons, P.B., Poosala, V., Ramaswamy, S.: Join Synopses for Approximate Query Answering. In: ACM SIGMOD (1999)Google Scholar
  2. 2.
    Agarwal, S., Agrawal, R., Deshpande, P.M., Gupta, A., Naughton, J.F., Ramakrishnan, R., Sarawagi, S.: On the Computation of Multidimensional Aggregates. In: VLDB (1996)Google Scholar
  3. 3.
    Beyer, K., Ramakrishnan, R.: Bottom-Up Computation of Sparse and Iceberg Cubes. In: ACM SIGMOD (1999)Google Scholar
  4. 4.
    Chen, Y., Dehne, F., Eavis, T., Rau-Chaplin, A.: PnP: Parallel And External Memory Iceberg Cube Computation. In: IEEE ICDE (2005)Google Scholar
  5. 5.
    Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Crosstab, and Sub-Total. In: IEEE ICDE (1996)Google Scholar
  6. 6.
    Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing Data Cubes Efficiently. In: ACM SIGMOD (1996)Google Scholar
  7. 7.
    Han, J., Pei, J., Dong, G., Wang, K.: Efficient Computation of Iceberg Cubes with Complex Measures. In: ACM SIGMOD (2001)Google Scholar
  8. 8.
    Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: ACM SIGMOD (2000)Google Scholar
  9. 9.
    Ming-fei Jian, F., Pei, J., Wai-chee Fu, A.: IX-Cubes: Iceberg Cubes for Data Warehousing and OLAP on XML Data. In: ACM CIKM (2007)Google Scholar
  10. 10.
    Poosala, V., Ioannidis, Y.E.: Selectivity Estimation without the Attribute Value Independence Assumption. In: VLDB (1997)Google Scholar
  11. 11.
    Vitter, J.S., Wang, M., Iyer, B.: Data Cube Approximation and Histograms via Wavelets. In: ACM CIKM (1998)Google Scholar
  12. 12.
    Xin, D., Han, J., Xiaolei, L., Shao, Z., Wah, B.W.: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach. IEEE Trans. on Knowledge and Data Engineering 19(1) (2007)Google Scholar
  13. 13.
    Zhang, X., Lienhua Chou, P.: Multiway Pruning for Efficient Iceberg Cubing. In: Bressan, S., Küng, J., Wagner, R. (eds.) DEXA 2006. LNCS, vol. 4080, pp. 203–212. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Zhang, X., Lienhua Chou, P., Dong, G.: Efficient Computation of Iceberg Cubes by Bounding Aggregate Functions. IEEE Trans. on Knowledge and Data Engineering 19(7) (2007)Google Scholar
  15. 15.
    Zhang, X., Lienhua Chou, P., Ramamohanarao, K.: Computing Iceberg Quotient Cubes with Bounding. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2006. LNCS, vol. 4081, pp. 145–154. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  16. 16.
    Zhao, Y., Deshpande, P.M., Naughton, J.F.: An Array-based Algorithm for Simultaneous Multidimensional Aggregates. In: ACM SIGMOD (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Alfredo Cuzzocrea
    • 1
  • Filippo Furfaro
    • 1
  • Giuseppe M. Mazzeo
    • 1
  1. 1.ICAR-CNR, I-87036, Cosenza, Italy, and University of CalabriaCosenzaItaly

Personalised recommendations