Skip to main content
Log in

Data mining-based materialized view and index selection in data warehouses

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Materialized views and indexes are physical structures for accelerating data access that are casually used in data warehouses. However, these data structures generate some maintenance overhead. They also share the same storage space. Most existing studies about materialized view and index selection consider these structures separately. In this paper, we adopt the opposite stance and couple materialized view and index selection to take view–index interactions into account and achieve efficient storage space sharing. Candidate materialized views and indexes are selected through a data mining process. We also exploit cost models that evaluate the respective benefit of indexing and view materialization, and help select a relevant configuration of indexes and materialized views among the candidates. Experimental results show that our strategy performs better than an independent selection of materialized views and indexes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. An itemset I is closed with respect to the Galois connection (f, g) iff g ∘ f (I) = I (Pasquier et al. 1999).

  2. http://eric.univ-lyon2.fr/~kaouiche/adbis.pdf.

References

  • Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In 20th international conference on very large data bases (VLDB 1994) (pp. 487–499).

  • Agrawal, S., Chaudhuri, S., Kollár, L., Marathe, A., Narasayya, V., & Syamala, M. (2004). Database tuning advisor for Microsoft SQL server 2005. In 30th international conference on very large data bases (VLDB 2004) (pp. 1110–1121). Toronto, Canada.

  • Agrawal, S., Chaudhuri, S., & Narasayya, V. R. (2000). Automated selection of materialized views and indexes in SQL databases. In 26th international conference on very large data bases (VLDB 2000) (pp. 496–505). Cairo, Egypt.

  • Agrawal, S., Chaudhuri, S., & Narasayya, V. (2001). Materialized view and index selection tool for Microsoft SQL server 2000. In ACM SIGMOD international conference on management of data (SIGMOD 2001) (p. 608). Santa Barbara, USA.

  • Aouiche, K., Darmont, J., Boussaïd, O., & Bentayeb, F. (2005). Automatic selection of bitmap join indexes in data warehouses. In 7th international conference on data warehousing and knowledge discovery (DaWaK 2005), LNCS (Vol. 3589, pp. 64–73). Copenhagen, Denmark.

  • Aouiche, K., Darmont, J., & Gruenwald, L. (2003). Frequent itemsets mining for database auto-administration. In 7th international database engineering and application symposium (IDEAS 2003) (pp. 98–103). Hong Kong, China.

  • Aouiche, K., Jouve, P., & Darmont, J. (2006). Clustering-based materialized view selection in data warehouses. In 10th East-European conference on advances in databases and information systems (ADBIS 2006), LNCS (Vol. 4152, pp. 81–95). Thessaloniki, Greece.

  • Baralis, E., Paraboschi, S., & Teniente, E. (1997). Materialized views selection in a multidimensional database. In 23rd international conference on very large data bases (VLDB 1997) (pp. 156–165). Athens, Greece.

  • Baril, X., & Bellahsene, Z. (2003). Selection of materialized views: A cost-based approach. In 15th international conference on advanced information systems engineering (CAiSE 2003) (pp. 665–680). Klagenfurt, Austria.

  • Bellatreche, L., Boukhalfa, K., & Mohania, M. (2005). An evolutionary approach to schema partitioning selection in a data warehouse environment. In 7th international conference on data warehousing and knowledge discovery (DaWaK 2005), LNCS (Vol. 3589, pp. 115–125). Copenhagen, Denmark.

  • Bellatreche, L., Karlapalem, K., & Schneider, M. (2000). On efficient storage space distribution among materialized views and indices in data warehousing environments. In 9th international conference on information and knowledge management (CIKM 2000) (pp. 397–404). McLean, USA.

  • Bruno, N., & Chaudhuri, S. (2006). Physical design refinement: The “merge-reduce” approach. In 10th international conference on extending database technology (EDBT 2006), LNCS (Vol. 3896, pp. 386–404). Munich, Germany.

  • Cardenas, A. F. (1975). Analysis and performance of inverted data base structures. Communication of the ACM, 18(5), 253–263.

    Article  MATH  MathSciNet  Google Scholar 

  • Chan, G. K. Y., Li, Q., & Feng, L. (1999). Design and selection of materialized views in a data warehousing environment: A case study. In 2nd ACM international workshop on data warehousing and OLAP (DOLAP 1999) (pp. 42–47). Kansas City, USA.

  • Chaudhuri, S., Datar, M., & Narasayya, V. (2004). Index selection for databases: A hardness study and a principled heuristic solution. IEEE Transactions on Knowledge and Data Engineering, 16(11), 1313–1323.

    Article  Google Scholar 

  • Chaudhuri, S., Gupta, A., & Narasayya, V. (2002). Compressing SQL workloads. In 2002 ACM SIGMOD international conference on management of data (SIGMOD 2002) (pp. 488–499). Madison, Wisconsin.

  • Chaudhuri, S., & Narasayya, V. (1997). An efficient cost-driven index selection tool for Microsoft SQL server. In 23rd international conference on very large data bases (VLDB 1994) (pp. 146–155). Santiago de Chile, Chile.

  • Chaudhuri, S., & Motwani, R. (1999). On sampling and relational operators. IEEE Data Engineering Bulletin, 22(4), 41–46.

    Google Scholar 

  • Choenni, S., Blanken, H. M., & Chang, T. (1993a). Index selection in relational databases. In 5th international conference on computing and information (ICCI 1993) (pp. 491–496). Ontario, Canada.

  • Choenni, S., Blanken, H. M., & Chang, T. (1993b). On the selection of secondary indices in relational databases. Data Knowledge Engineering, 11(3), 207–238.

    Article  MATH  Google Scholar 

  • Comer, D. (1978). The difficulty of optimum index selection. ACM Transactions on Database Systems, 3(4), 440–445.

    Article  Google Scholar 

  • Feldman, Y. A., & Reouven, J. (2003). A knowledge-based approach for index selection in relational databases. Expert System with Applications, 25(1), 15–37.

    Article  Google Scholar 

  • Finkelstein, S. J., Schkolnick, M., & Tiberio, P. (1988). Physical database design for relational databases. ACM Transactions on Database Systems, 13(1), 91–128.

    Article  Google Scholar 

  • Frank, M. R., Omiecinski, E., & Navathe, S. B. (1992). Adaptive and automated index selection in RDBMS. In 3rd international conference on extending database technology (EDBT 1992), LNCS (Vol. 580, pp. 277–292). Vienna, Austria.

  • Goldstein, J., & Åke Larson, P. (2001). Optimizing queries using materialized views: A practical, scalable solution. In ACM SIGMOD international conference on management of data (SIGMOD 2001) (pp. 331–342). Santa Barbara, USA.

  • Golfarelli, M., Rizzi, S., & Saltarelli, E. (2002). Index selection for data warehousing. In 4th international workshop on design and management of data warehouses (DMDW 2002) (pp. 33–42). Toronto, Canada.

  • Golfarelli, M., & Rizzi, S. (1998). A methodological framework for data warehouse design. In 1st ACM international workshop on data warehousing and OLAP (DOLAP 1998) (pp. 3–9). New York, USA.

  • Gundem, T. I. (1999). Near optimal multiple choice index selection for relational databases. Computers & Mathematics with Applications, 37(2), 111–120.

    Article  MathSciNet  Google Scholar 

  • Gupta, H. (1999). Selection and maintenance of views in a data warehouse. PhD thesis, Stanford University.

  • Gupta, H., Harinarayan, V., Rajaraman, A., & Ullman, J. D. (1997). Index selection for OLAP. In 13th international conference on data engineering (ICDE 1997) (pp. 208–219). Birmingham, UK.

  • Gupta, H., & Mumick, I. S. (2005). Selection of views to materialize in a data warehouse. IEEE Transactions on Knowledge and Data Engineering, 17(1), 24–43.

    Article  Google Scholar 

  • Harinarayan, V., Rajaraman, A., & Ullman, J. D. (1996). Implementing data cubes efficiently. In ACM SIGMOD international conference on management of data (SIGMOD 1996) (pp. 205–216). Montreal, Canada.

  • Ip, M. Y. L., Saxton, L. V., & Raghavan, V. V. (1983). On the selection of an optimal set of indexes. IEEE Transactions on Software Engineering, 9(2), 135–143.

    Article  Google Scholar 

  • Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323.

    Article  Google Scholar 

  • Jouve, P., & Nicoloyannis, N. (2003a). KEROUAC: An algorithm for clustering categorical data sets with practical advantages. In international workshop on data mining for actionable knowledge (DMAK/PAKDD 2003). Seoul, Korea.

  • Jouve, P., & Nicoloyannis, N. (2003b). A new method for combining partitions, applications for distributed clustering. In international workshop on paralell and distributed machine learning and data mining (ECML/PKDD 2003) (pp. 35–46). Cavtat-Dubrovnik, Croatia.

  • Kotidis, Y., & Roussopoulos, N. (1999). Dynamat: A dynamic view management system for data warehouses. In ACM SIGMOD international conference on management of data (SIGMOD 1999) (pp. 371–382). Philadelphia, USA.

  • Kratica, J., Ljubić, I., & Tošić, D. (2003). A genetic algorithm for the index selection problem. In Applications of evolutionary computing, EvoWorkshops 2003: EvoBIO, EvoCOP , EvoIASP, EvoMUSART , EvoROB, EvoSTIM. LNCS (Vol. 2611, pp. 281–291).

  • Kyu-Young, W. (1987). Index selection in relational databases. In Foundation of data organization (pp. 497–500). New York: Plenum.

    Google Scholar 

  • Labio, W., Quass, D., & Adelberg, B. (1997). Physical database design for data warehouses. In 13th international conference on data engineering (ICDE 1997) (pp. 277–288). Birmingham, UK.

  • Leung, C., Khan, Q., & Hoque, T. (2005). CanTree: A tree structure for efficient incremental mining of frequent patterns. In 5th IEEE international conference on data mining (ICDM 2005) (pp. 274–281). Houston, USA.

  • Mahboubi, H., Aouiche, K., & Darmont, J. (2006). Materialized view selection by query clustering in XML data warehouses. In 4th international multiconference on computer science and information technology (CSIT 2006) (Vol. 2, pp. 68–77). Amman, Jordan.

  • Nadeau, T. P., & Teorey, T. J. (2001). A pareto model for OLAP view size estimation. In 4th conference of the centre for advanced studies on collaborative research (CASCON 2001) (p. 13). Toronto, Canada.

  • Nadeau, T. P., & Teorey, T. J. (2002). Achieving scalability in OLAP materialized view selection. In 5th ACM international workshop on data warehousing and OLAP (DOLAP 2002) (pp. 28–34). McLean, USA.

  • O’Neil, P., & Graefe, G. (1995). Multi-table joins through bitmapped join indices. SIGMOD Record, 24(3), 8–11.

    Article  Google Scholar 

  • O’Neil, P., & Quass, D. (1997). Improved query performance with variant indexes. In ACM SIGMOD international conference on management of data (SIGMOD 1997) (pp. 38–49). Tucson, USA.

  • Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In 7th international conference on database theory (ICDT 1999), LNCS (Vol. 1540, pp. 398–416). Jerusalem, Israel.

  • Rizzi, S., & Saltarelli, E. (2003). View materialization vs. indexing: Balancing space constraints in data warehouse design. In 15th international conference on advanced information systems engineering (CAiSE 2003) (pp. 502–519). Klagenfurt, Austria.

  • Sarawagi, S. (1997). Indexing OLAP data. Data Engineering Bulletin, 20(1), 36–43.

    Google Scholar 

  • Shah, B., Ramachandran, K., & Raghavan, V. (2006). A hybrid approach for data warehouse view selection. International Journal of Data Warehousing and Mining, 2(2), 1–37.

    MATH  Google Scholar 

  • Shukla, A., Deshpande, P., & Naughton, J. F. (2000). Materialized view selection for multi-cube data models. In 7th international conference on extending database technology (EDBT 2000) (pp. 269–284). Konstanz, Germany.

  • Shukla, A., Deshpande, P. M., Naughton, J. F., & Ramasamy, K. (1996). Storage estimation for multidimensional aggregates in the presence of hierarchies. In 22nd international conference on very large data bases (VLDB 1996) (pp. 522–531). Bombay, India.

  • Sismanis, Y., Deligiannakis, A., Roussopoulos, N., & Kotidis, Y. (2002). Dwarf: Shrinking the petacube. In ACM SIGMOD international conference on management of data (SIGMOD 2002) (pp. 464–475). Madison, USA.

  • Smith, J. R., Li, C.-S., & Jhingran, A. (2004). A wavelet framework for adapting data cube views for OLAP. IEEE Transactions on Knowledge and Data Engineering, 16(5), 552–565.

    Article  Google Scholar 

  • TPC (2005). TPC benchark H standard specification revision 2.3.0. Transaction Processing Performance Council.

  • TPC (2007). TPC benchmark DS standard specification, draft Version 52. Transaction Processing Performance Council.

  • Uchiyama, H., Runapongsa, K., & Teorey, T. J. (1999). A progressive view materialization algorithm. In 2nd ACM international workshop on data warehousing and OLAP (DOLAP 1999) (pp. 36–41). Kansas City, USA.

  • Valentin, G., Zuliani, M., Zilio, D., Lohman, G., & Skelley, A. (2000). DB2 advisor: An optimizer smart enough to recommend its own indexes. In 16th international conference on data engineering, (ICDE 2000) (pp. 101–110). California, USA.

  • Valluri, S. R., Vadapalli, S., & Karlapalem, K. (2002). View relevance driven materialized view selection in data warehousing environment. In 13th Australasian database technologies (ADC 2002) (pp. 187–196). Melbourne, Australia.

  • Whang, K. (1985). Index selection in relational databases. In international conference on foundations of data organization (FODO 1985) (pp. 487–500). Kyoto, Japan.

  • Wu, M. (1999). Query optimization for selections using bitmaps. In ACM SIGMOD international conference on management of data (SIGMOD 1999) (pp. 227–238). Philadelphia, USA.

  • Wu, M., & Buchmann, A. (1998). Encoded bitmap indexing for data warehouses. In 14th international conference on data engineering (ICDE 1998) (pp. 220–230). Orlando, USA.

  • Yao, Q., Huang, J., & An, A. (2005). Machine learning approach to identify database sessions using unlabeled data. In 7th international conference on data warehousing and knowledge discovery (DaWaK 2005), LNCS (Vol. 3589, pp. 254–255). Copenhagen, Denmark.

  • Yao, S. B. (1977). Approximating block accesses in database organizations. Communications of the ACM, 20(4), 260–261.

    Article  MATH  Google Scholar 

  • Zaman, M., Surabattula, J., & Gruenwald, L. (2004). An auto-indexing technique for databases based on clustering. In 15th international workshop on database and expert systems applications (DEXA Workshops 2004) (pp. 776–780). Zaragoza, Spain.

  • Zilio, D., Rao, J., Lightstone, S., Lohman, G., Storm, A., Garcia-Arellano, C., et al. (2004). DB2 design advisor: Integrated automatic physical database design. In 30th international conference on very large data bases (VLDB 2004) (pp. 1087–1097). Toronto, Canada.

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their detailed and constructive feedback, as well as the JIIS editors, who greatly helped improve this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jérôme Darmont.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aouiche, K., Darmont, J. Data mining-based materialized view and index selection in data warehouses. J Intell Inf Syst 33, 65–93 (2009). https://doi.org/10.1007/s10844-009-0080-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-009-0080-0

Keywords

Navigation