Data Mining and Knowledge Discovery

, Volume 1, Issue 1, pp 29–53

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

  • Jim Gray
  • Surajit Chaudhuri
  • Adam Bosworth
  • Andrew Layman
  • Don Reichart
  • Murali Venkatrao
  • Frank Pellow
  • Hamid Pirahesh
Article

Abstract

Data analysis applications typically aggregate data across manydimensions looking for anomalies or unusual patterns. The SQL aggregatefunctions and the GROUP BY operator produce zero-dimensional orone-dimensional aggregates. Applications need the N-dimensionalgeneralization of these operators. This paper defines that operator, calledthe data cube or simply cube. The cube operator generalizes the histogram,cross-tabulation, roll-up,drill-down, and sub-total constructs found in most report writers.The novelty is that cubes are relations. Consequently, the cubeoperator can be imbedded in more complex non-procedural dataanalysis programs. The cube operator treats each of the Naggregation attributes as a dimension of N-space. The aggregate ofa particular set of attribute values is a point in this space. Theset of points forms an N-dimensional cube. Super-aggregates arecomputed by aggregating the N-cube to lower dimensional spaces.This paper (1) explains the cube and roll-up operators, (2) showshow they fit in SQL, (3) explains how users can define new aggregatefunctions for cubes, and (4) discusses efficient techniques tocompute the cube. Many of these features are being added to the SQLStandard.

data cube data mining aggregation summarization database analysis query 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Deshpande, P., Gupta, A., Naughton, J.F., Ramakrishnan, R., and Sarawagi, S. 1996. On the Computation of Multidimensional Aggregates. Proc. 21st VLDB, Bombay.Google Scholar
  2. Chamberlin, D. 1996. Using the New DB2-IBM's Object-Relational Database System. San Francisco, CA: Morgan Kaufmann.Google Scholar
  3. DataBlade Developer's Kit: Users Guide 2.0. Informix Software, Menlo Park, CA, 1996.Google Scholar
  4. Date, C.J. 1995. Introduction to Database Systems. 6th edition, N.Y.: Addison Wesley.Google Scholar
  5. Date, C.J. 1996. Aggregate functions. Database Programming and Design, 9(4): 17–19.Google Scholar
  6. Graefe, C.J. 1993. Query evaluation techniques for large databases. ACM Computing Surveys, 25.2, pp. 73–170.Google Scholar
  7. Gray, J. (Ed.) 1991. The Benchmark Handbook. San Francisco, CA: Morgan Kaufmann.Google Scholar
  8. Gray, J., Bosworth, A., Layman, A., and Pirahesh, H. 1996. Data cube: A relational operator generalizing group-by, cross-tab, and roll-up. Proc. International Conf. on Data Engineering. New Orleans: IEEE Press.Google Scholar
  9. Harinarayn, V., Rajaraman, A., and Ullman, J.D. 1996. Implementing data cubes efficiently. Proc. ACMSIGMOD. Montreal, pp. 205–216.Google Scholar
  10. 1992. IS 9075 International Standard for Database Language SQL, document ISO/IEC 9075:1992, J. Melton (Ed.).Google Scholar
  11. 1996. ISO/IEC DBL:MCI-006 (ISO Working Draft) Database Language SQL-Part 4: Persistent Stored Modules (SQL/PSM), J. Melton (Ed.).Google Scholar
  12. Melton, J. and Simon, A.R. 1993. Understanding the New SQL: A Complete Guide. San Francisco, CA: Morgan Kaufmann.Google Scholar
  13. 1994. Method and Apparatus for Storing and Retrieving Multi-Dimensional Data in Computer Memory. Inventor: Earle; Robert J.,Assignee: Arbor Software Corporation, US Patent 05359724.Google Scholar
  14. 1994. Microsoft Access Relational Database Management System for Windows, Language Reference-Functions, Statements, Methods, Properties, and Actions, DB26142, Microsoft, Redmond, WA.Google Scholar
  15. 1995. Microsoft Excel-User's Guide. Microsoft. Redmond, WA.Google Scholar
  16. 1996. Microsoft SQL Server: Transact-SQL Reference, Document 63900. Microsoft Corp. Redmond, WA.Google Scholar
  17. 1994. RISQL Reference Guide, Red Brick Warehouse VPT Version 3, Part no.: 401530, Red Brick Systems, Los Gatos. CA.Google Scholar
  18. Shukla, A., Deshpande, P., Naughton, J.F., and Ramaswamy, K. 1996. Storage estimation for multidimensional aggregates in the presence of hierarchies. Proc. 21st VLDB, Bombay.Google Scholar
  19. 1993. The Benchmark Handbook for Database and Transaction Processing Systems-2nd edition, J. Gray (Ed.), San Francisco, CA: Morgan Kaufmann. Or http://www.tpc.org/Google Scholar

Copyright information

© Kluwer Academic Publishers 1997

Authors and Affiliations

  • Jim Gray
    • 1
    • 1
  • Surajit Chaudhuri
    • 1
    • 1
  • Adam Bosworth
    • 1
    • 1
  • Andrew Layman
    • 1
    • 1
  • Don Reichart
    • 1
    • 1
  • Murali Venkatrao
    • 1
    • 1
  • Frank Pellow
    • 2
    • 2
  • Hamid Pirahesh
    • 2
    • 2
  1. 1.Microsoft Research, Advanced Technology DivisionMicrosoft CorporationRedmond
  2. 2.IBM ResearchSan Jose

Personalised recommendations