Advertisement

Journal of Systems Integration

, Volume 10, Issue 1, pp 5–22 | Cite as

Smoothing over Summary Information in Data Cubes

  • Sam Sung
  • Stephen Huang
  • Arthur Ramer
Article
  • 4 Downloads

Abstract

Decision support usuallyrequires drawing from a huge data warehouse some statisticalinformation that is interesting and useful to its users. A typicaldata model that supports the data warehouse is the multidimensionaldatabase, also known as a data cube. A data cube contains cells,each of which is associated with some summary information, or aggregate, that the decisions are to be based on. However, inreal-life databases, due to the nature of their contents, datadistribution tends to be clustered and sparse. The sparsity situationgets worse, in general, as the number of cells increases. Forthose cells that have support levels below a certain threshold,combining with adjacent cells is necessary to acquire sufficientsupport. Otherwise, incomplete or biased results could be deriveddue to lack of sufficient support.

Our mainfocus in this paper is to find approximations for the missingor biased aggregates of those cells that have missing or lowsupport. We call this approximation process smoothing in thispaper. We propose a smooth function that can smooth nicely ona quantitative attribute while still being preserved locally.Our method is also adaptive to sudden changes of data distribution,called discontinuities, that inevitably occur in real-life data.

data warehouse data cube OLAP smoothing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi, “On the computation of multidimensional aggregates,” in Proc. of the 22nd Int'l Conference on Very Large Databases, Bombay, India, Sept. 1996, pp. 506–521.Google Scholar
  2. 2.
    Adult dataset, http://www.cs.toronto.edu/~delve/data/adult/desc.html.Google Scholar
  3. 3.
    R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami, “An interval classifier for database mining applications,” in Proc. of the 18th Int'l Conference on Very Large Databases, Vancouver, British Columbia, 1992, pp. 560–573.Google Scholar
  4. 4.
    C. Bouman and K. Sauer, “A generalized Gaussian image model for edge-preserving MAP estimation.” IEEE Trans. on Image Processing 2(3), pp. 296–310, July 1993.Google Scholar
  5. 5.
    J. Chambers, W. Cleaveland, B. Kleiner, and P. Tukey, Graphical Methods of Data Analysis. Wadsworth International Group (Duxbury Press), 1983.Google Scholar
  6. 6.
    B. Char el., Maple V: Language & Library Reference Manual. Springer-Verlag, 1991.Google Scholar
  7. 7.
    S. Chaudhuri and U. Dayal, “An overview of data warehousing and OLAP technology.” SIGMOD Record, March 1997.Google Scholar
  8. 8.
    E. Codd, S. Codd, and C. Salley, Providing OLAP (On-Line Analytical Processing) to User-Analysts: An IT Mandate. E. F. Codd and Associates, 1993. http://www.arborsoft.com/papers/coddTOC.html.Google Scholar
  9. 9.
    G. Colliad, “OLAP, relational, and multidimensional database systems.” SIGMOD Record 25(3), Sept. 1996.Google Scholar
  10. 10.
    J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and unsupervised discretization of continuous features,” in Proc. of 12th Int'l Conference on Machine Learning, 1995, pp. 194–202.Google Scholar
  11. 11.
    C. Faloutsos, H. Jagadish, and N. Sidiropoulos, “Recovering information from summary data,” in Proc. of 1997 VLDB, Athens, Grace.Google Scholar
  12. 12.
    A. Freitas and S. Lavington, Mining Very Large Databases with Parallel Processing. Kluwer Academic Publishers, 1998.Google Scholar
  13. 13.
    T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama, “Mining optimized association rules for numeric attributes,” in Proc. of the 1996 ACM PODS, Montreal, Quebec, Canada, June 1996, pp. 182–191.Google Scholar
  14. 14.
    T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama, “Data mining using two-dimensional optimized association rules: Sceme, algorithms, and visualization,” in Proc. of 1996 SIGMOD Conference, Montreal, Quebec, Canada, June 1996, pp. 13–23.Google Scholar
  15. 15.
    J. Giarratano and G. Riley, Expert Systems, 3rd ed. PWS, 1998.Google Scholar
  16. 16.
    P. Gibbons, and Y. Matias, “New sampling-based summary statistics for improving approximate query answers,” in Proc. of SIGMOD'98, Seattle, Washington, USA, June 1998, pp. 331–342.Google Scholar
  17. 17.
    J. Gray, A. Bosworth, A. Layman, and H. Pirahesh, “Data cube: A relational aggregation operator generalizing group-by, cross-tabs and subtotals,” in Proc. of the 12th Int'l Conference on Data Engineering, 1996, pp. 152–159.Google Scholar
  18. 18.
    A. Gupta, V. Harinarayan, and D. Quass, “Generalized projections: A powerful approach to aggregation,” in Proc. 21st International Conference on VLDB, Zurich, Switzerland, Sept. 1995.Google Scholar
  19. 19.
    A. Gupta, I. Mumick, and V. Subrahmanian, “Maintaining views incrementally,” in Proc. of ACMSIGMOD, Washington, D.C., May 1993.Google Scholar
  20. 20.
    J. Han and Y. Fu, “Dynamic generation and refinement of concept hierarchies for knowledge discovery in databases.” KDD, pp. 157–168, 1994.Google Scholar
  21. 21.
    V. Harinarayan, A. Rajaraman, and J. Ullman, “Implementing data cube efficiently,” in Proc. of 1996 SIGMOD Conference, Montreal, Canada, June 1996, pp. 205–216.Google Scholar
  22. 22.
    C. Ho, R. Agrawal, N. Megiddo and R. Srikant, “Range queries in OLAP data cubes,” in Proc. of 1997 SIGMOD Conference, Tucson, Arizona, May 1997, pp. 73–88.Google Scholar
  23. 23.
    H. Jagadish, I. Mumick, and A. Silberschatz, “View maintenance issues in the chronicle data model,” in Proc. ACM PODS, 1995, pp. 113–124.Google Scholar
  24. 24.
    R. Kerber, “ChiMerge: Discretization of numeric attributes,” in Proc. of 1992 Conf. of American Assoc. for Artificial Intelligence (AAAI-92), AAAI Press/The MIT Press, 1992, pp. 123–128.Google Scholar
  25. 25.
    R. Kohavi and M. Sahami, “Error-based and entropy-based discretization of continuous features,” in Proc. of 2nd Int'l Conference Knowledge Discovery and Data Mining (KDD-96), 1996, pp. 114–119.Google Scholar
  26. 26.
    S. Makridakis and S. C. Wheelwright, Forecasting Methods for Management. John Wiley & Sons, 1989.Google Scholar
  27. 27.
    F. Malvestuto, “A universal-scheme approach to statistical database containing homogeneous summary tables.” ACM TODS 18(4), pp. 678–708, 1993.Google Scholar
  28. 28.
    K. Marshall and R. Oliver, Decision Making and Forecasting. McGraw-Hill, Inc, 1995.Google Scholar
  29. 29.
    R. J. Miller and Y. Yang, “Association rules over interval data.” SIGMOD97 Proceedings, Tucson, Arizona, 1997, pp. 452–461.Google Scholar
  30. 30.
    Oracle Corporation, Oracle OLAP Products. http://www.oracle.com/products/olap/html.Google Scholar
  31. 31.
    M. Richeldi and M. Rossotto, “Class-driven statistical discretization of continuous attributes,” in Proc. of 8th European Conference on Machine Learning (ECML-95), 1995, pp. 335–338.Google Scholar
  32. 32.
    D. Scott, Multivariate Density Estimation–Theory, Practice, and Visualization. John Wiley & Sons, 1992.Google Scholar
  33. 33.
    R. Srikant and R. Agrawal, “Mining quantitative association rules in large relational tables,” in Proc. of 1996 SIGMOD Conference, Montreal, Quebec, Canada, June 1996, pp. 1–12.Google Scholar
  34. 34.
    D. Terzopoulos, “Regularization of inverse visual problems involving discontinuities.” IEEE Trans. on Pattern Analysis and Machine Intelligence PAMI-8(4), pp. 413–424, 1986.Google Scholar
  35. 35.
    J. Widom, “Research problems in data warehousing.” CIKM, November 1995, invited paper.Google Scholar
  36. 36.
    Y. Zhao, P. Deshpande, and J. Naughton, “An array-based algorithm for simultaneous multidimensional aggregates,” in Proc. of 1997 SIGMOD Conference, Tucson, Arizona, May 1997, pp. 159–170.Google Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • Sam Sung
    • 1
  • Stephen Huang
    • 2
  • Arthur Ramer
    • 3
  1. 1.School of ComputingNational University of SingaporeSingapore
  2. 2.Dept. of Computer ScienceUniversity of HoustonHouston
  3. 3.Dept. of Computer Science & EngineeringUniversity of New South WalesAustralia

Personalised recommendations