Advertisement

OLAP over Continuous Domains via Density-Based Hierarchical Clustering

  • Michelangelo Ceci
  • Alfredo Cuzzocrea
  • Donato Malerba
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6882)

Abstract

In traditional OLAP systems, roll-up and drill-down operations over data cubes exploit fixed hierarchies defined on discrete attributes that play the roles of dimensions, and operate along them. However, in recent years, a new tendency of considering even continuous attributes as dimensions, hence hierarchical members become continuous accordingly, has emerged mostly due to novel and emerging application scenarios like sensor and data stream management tools. A clear advantage of this emerging approach is that of avoiding the beforehand definition of an ad-hoc discretization hierarchy along each OLAP dimension. Following this latest trend, in this paper we propose a novel method for effectively and efficiently supporting roll-up and drill-down operations over OLAP data cubes with continuous dimensions via a density-based hierarchical clustering algorithm. This algorithm allows us to hierarchically cluster together dimension instances by also taking fact-table measures into account in order to enhance the clustering effect with respect to the possible analysis. Experiments on two well-known multidimensional datasets clearly show the advantages of the proposed solution.

Keywords

Cluster Algorithm Association Rule Range Query Data Cube Continuous Domain 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data. Data Min. Knowl. Discov. 11(1), 5–33 (2005)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) VLDB 1994, Proceedings of 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, September 12-15, pp. 487–499. Morgan Kaufmann, San Francisco (1994)Google Scholar
  3. 3.
    Chen, Q., Dayal, U., Hsu, M.: An OLAP-based scalable web access analysis engine. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, pp. 210–223. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  4. 4.
    Cuzzocrea, A.: Improving range-sum query evaluation on data cubes via polynomial approximation. Data Knowl. Eng. 56(2), 85–121 (2006)CrossRefGoogle Scholar
  5. 5.
    Cuzzocrea, A., Serafino, P.: Clustcube: An olap-based framework for clustering and mining complex database objects. In: SAC (2011)Google Scholar
  6. 6.
    Dong, G., Han, J., Lam, J.M.W., Pei, J., Wang, K.: Mining multi-dimensional constrained gradients in data cubes. In: VLDB, pp. 321–330 (2001)Google Scholar
  7. 7.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)Google Scholar
  8. 8.
    Goil, S., Choudhary, A.N.: Parsimony: An infrastructure for parallel multidimensional analysis and data mining. J. Parallel Distrib. Comput. 61(3), 285–321 (2001)CrossRefzbMATHGoogle Scholar
  9. 9.
    Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub totals. Data Min. Knowl. Discov. 1(1), 29–53 (1997)CrossRefGoogle Scholar
  10. 10.
    Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large databases. Inf. Syst. 26(1), 35–58 (2001)CrossRefzbMATHGoogle Scholar
  11. 11.
    Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C.: Selectivity estimators for multidimensional range queries over real attributes. VLDB J. 14(2), 137–154 (2005)CrossRefGoogle Scholar
  12. 12.
    Han, J.: Towards on-line analytical mining in large databases. SIGMOD Record 27(1), 97–107 (1998)CrossRefGoogle Scholar
  13. 13.
    Hinneburg, A., Keim, D.A.: Clustering methods for large databases: From the past to the future. In: SIGMOD Conference, p. 509 (1999)Google Scholar
  14. 14.
    Imielinski, T., Khachiyan, L., Abdulghani, A.: Cubegrades: Generalizing association rules. Data Min. Knowl. Discov. 6(3), 219–257 (2002)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Karayannidis, N., Sellis, T.K.: Hierarchical clustering for olap: the cube file approach. VLDB J. 17(4), 621–655 (2008)CrossRefGoogle Scholar
  16. 16.
    Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3(1) (2009)Google Scholar
  17. 17.
    Messaoud, R.B., Rabaséda, S.L., Boussaid, O., Missaoui, R.: Enhanced mining of association rules from data cubes. In: DOLAP, pp. 11–18 (2006)Google Scholar
  18. 18.
    Ng, R.T., Han, J.: Clarans: A method for clustering objects for spatial data mining. IEEE Trans. Knowl. Data Eng. 14(5), 1003–1016 (2002)CrossRefGoogle Scholar
  19. 19.
    Parsaye, K.: Olap and data mining: Bridging the gap. Database Programming and Design 10, 30–37 (1997)Google Scholar
  20. 20.
    Sarawagi, S.: idiff: Informative summarization of differences in multidimensional aggregates. Data Min. Knowl. Discov. 5(4), 255–276 (2001)CrossRefzbMATHGoogle Scholar
  21. 21.
    Sarawagi, S., Agrawal, R., Megiddo, N.: Discovery-driven exploration of olap data cubes. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 168–182. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  22. 22.
    Shanmugasundaram, J., Fayyad, U.M., Bradley, P.S.: Compressed data cubes for olap aggregate query approximation on continuous dimensions. In: KDD, pp. 223–232 (1999)Google Scholar
  23. 23.
    Sheikholeslami, G., Chatterjee, S., Zhang, A.: Wavecluster: A wavelet based clustering approach for spatial data in very large databases. VLDB J. 8(3-4), 289–304 (2000)CrossRefGoogle Scholar
  24. 24.
    Yin, X., Han, J., Yu, P.S.: Crossclus: user-guided multi-relational clustering. Data Min. Knowl. Discov. 15(3), 321–348 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Zhang, T., Ramakrishnan, R., Livny, M.: Birch: An efficient data clustering method for very large databases. In: Jagadish, H.V., Mumick, I.S. (eds.) SIGMOD Conference, pp. 103–114. ACM Press, New York (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Michelangelo Ceci
    • 1
  • Alfredo Cuzzocrea
    • 2
  • Donato Malerba
    • 1
  1. 1.Dipartimento di InformaticaUniversitá degli Studi di Bari “Aldo Modo”BariItaly
  2. 2.ICAR-CNR and University of CalabriaRende, CosenzaItaly

Personalised recommendations