The Star Schema Benchmark and Augmented Fact Table Indexing

  • Patrick O’Neil
  • Elizabeth O’Neil
  • Xuedong Chen
  • Stephen Revilak
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5895)


We provide a benchmark measuring star schema queries retrieving data from a fact table with Where clause column restrictions on dimension tables. Clustering is crucial to performance with modern disk technology, since retrievals with filter factors down to 0.0005 are now performed most efficiently by sequential table search rather than by indexed access. DB2’s Multi-Dimensional Clustering (MDC) provides methods to "dice" the fact table along a number of orthogonal "dimensions", but only when these dimensions are columns in the fact table. The diced cells cluster fact rows on several of these "dimensions" at once so queries restricting several such columns can access crucially localized data, with much faster query response. Unfortunately, columns of dimension tables of a star schema are not usually represented in the fact table. In this paper, we show a simple way to adjoin physical copies of dimension columns to the fact table, dicing data to effectively cluster query retrieval, and explain how such dicing can be achieved on database products other than DB2. We provide benchmark measurements to show successful use of this methodology on three commercial database products.


Benchmark Star Schema Data Warehousing Clustering  Multi-Dimensional Clustering DB2 Oracle Vertica 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bhattacharjee, B., et al.: Efficient Query Processing for Multi-Dimensional Clustered Tables in DB2Google Scholar
  2. 2.
    Chen, X., O’Neil, P., O’Neil, E.: Adjoined Dimension Column Clustering to Improve Data Warehouse Query Performance. Poster presentation at ICDE 2008 (2008),
  3. 3.
    Cranston, L.: MDC Performance: Customer Examples and Experiences,
  4. 4.
  5. 5.
    IBM Research: DB2’s Multi-Dimensional Clustering,
  6. 6.
    Kennedy, J.: Introduction to Multidimensional Clustering with DB2 UDB LUW. In: IBM DB2 Information Management Technical Conference, Orlando, FL (September 2005)Google Scholar
  7. 7.
    Kimball, R., Ross, M.: The Data Warehouse Toolkit, 2nd edn. Wiley, Chichester (2002)Google Scholar
  8. 8.
    Lightstone, S., Teorey, T., Nadeau, T.: Physical Database Design. Morgan Kaufman, San Francisco (2007)Google Scholar
  9. 9.
    O’Neil, P.: The Set Query Benchmark. In: Gray, J. (ed.) The Benchmark Handbook for Database and Transaction Processing Systems, pp. 209–245. Morgan Kauffmann, San Francisco (1993), Google Scholar
  10. 10.
    O’Neil, P., O’Neil, E.: Database Principles, Programming, and Performance, 2nd edn. Morgan Kaufmann, San Francisco (2001)Google Scholar
  11. 11.
    O’Neil, P., O’Neil, E., Chen, X.: The Star Schema Benchmark,
  12. 12.
    Oracle: Partitioning in Oracle Database 10g Release 2 (May 2005),
  13. 13.
    Padmanabhan, S., et al.: Multi-Dimensional Clustering: A New Data Layout Scheme in DB2. In: Proceedings of the ACM SIGMOD Conference (2003)Google Scholar
  14. 14.
    Selinger, P., et al.: Access Path Selection in a Relational Database Management System. In: Proceedings of the ACM SIGMOD Conference, pp. 23–34 (1979)Google Scholar
  15. 15.
    Stonebraker, M., et al.: One Size Fits All? Part 2: Benchmarking Results, Keynote address. In: Proceedings of CIDR (2007),
  16. 16.
    TPC: TPC-DS, TPC Decision Support, under development,
  17. 17.
    Nambiar, R., Poess, M.: The Making of TPC-DS. In: VLDB Proceedings (2006)Google Scholar
  18. 18.
    TPC: TPC-H Version 2.4.0,

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Patrick O’Neil
    • 1
  • Elizabeth O’Neil
    • 1
  • Xuedong Chen
    • 1
  • Stephen Revilak
    • 1
  1. 1.University of Massachusetts at BostonBostonUSA

Personalised recommendations