Advertisement

On Index Structures for Star Query Processing in Data Warehouses

Chapter
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 172)

Abstract

One of the important research and technological issues in data warehouse performance is the optimization of analytical queries. Most of the research have been focusing on optimizing such queries by means of materialized views, data and index partitioning, as well as various index structures including: join indexes, bitmap join indexes, multidimensional indexes or index-based multidimensional clusters. These structures neither well support navigation along dimension hierarchies nor optimize joins with the Time dimension, which in practice is used in the majority of analytical queries. In this chapter we overview the basic index structures, namely: a bitmap index, a join index, and a bitmap join index. Based on these indexes, we show how to build another index, called Time-HOBI, for optimizing queries that address the Time dimension and compute aggregates along dimension hierarchies. We further discuss the extension of the index with additional data structure for storing aggregate values along the hierarchical structure of the index. The aggregates are used for speeding up aggregate queries along dimension hierarchies. Furthermore, we show how the index is used for answering queries in an example data warehouse. Finally, we discuss its performance-related characteristics, based on experiments.

Keywords

Data warehouse Query optimization Star query Hierarchical index Bitmap index Join index Bitmap join index Time-HOBI 

Notes

Acknowledgment

This work was supported from the Polish National Science Center (NCN), grant No. 2011/01/B/ST6/05169. The authors express their gratitude to the anonymous Reviewers whose very thorough comments greatly improved the quality of this paper.

References

  1. 1.
    Netezza underground: zone maps and data power. www.ibm.com/developerworks/community/blogs/Netezza/ entry/zone_maps_and_data_power20?lang=en. Accessed 27 June 2013
  2. 2.
    OLAP council APB-1 OLAP benchmark release 2. www.olapcouncil.org/research/APB1R2_spec.pdf. Accessed 20 Dec 2012
  3. 3.
    Transaction processing performance council. www.tpc.org/information/benchmarks.asp
  4. 4.
    Antoshenkov, G., Ziauddin, M.: Query processing and optimization in Oracle RDB. Int. J. Very Larg. Data Bases 5(4), 229–237 (1996)CrossRefGoogle Scholar
  5. 5.
    Aouiche, K., Darmont, J., Boussaïd, O., Bentayeb, F.: Automatic selection of bitmap join indexes in data warehouses. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 64–73. Springer, Heidelberg (2005)Google Scholar
  6. 6.
    Bellatreche, L., Missaoui, R., Necir, H., Drias, H.: A data mining approach for selecting bitmap join indices. J. Comput. Sci. Eng. 1(2), 177–194 (2007)CrossRefGoogle Scholar
  7. 7.
    Böhm, C., Berchtold, S., Kriegel, H., Urs, M.: Multidimensional index structures in relational databases. J. Intell. Inf. Syst. 15(1), 51–70 (2000)CrossRefGoogle Scholar
  8. 8.
    Brito, J.J., Siqueira, T.L.L., Times, V.C., Ciferri, R.R., de Ciferri, C.D.: Efficient processing of drill-across queries over geographic data warehouses. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2011. LNCS, vol. 6862, pp. 152–166. Springer, Heidelberg (2011)Google Scholar
  9. 9.
    Bryla, B., Loney, K.: Oracle Database 11g DBA Handbook. McGraw-Hill Osborne Media, New York (2007). ISBN 0071496637Google Scholar
  10. 10.
    Chan, C., Ioannidis, Y.: Bitmap index design and evaluation. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 355–366 (1998)Google Scholar
  11. 11.
    Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Rec. 26(1), 65–74 (1997)CrossRefGoogle Scholar
  12. 12.
    Chmiel, J., Morzy, T., Wrembel, R.: Time-HOBI: indexing dimension hierarchies by means of hierarchically organized bitmaps. In: Proceedings of ACM International Workshop on Data Warehousing and OLAP (DOLAP), pp. 69–76 (2010)Google Scholar
  13. 13.
    Davis, K.C., Gupta, A.: Indexing in data warehouses: bitmaps and beyond. In: Wrembel, R., Koncilia, C. (eds.) Data Warehouses and OLAP: Concepts, Architectures and Solutions, pp. 179–202. Idea Group Inc., London (2007). ISBN 1-59904-364-5Google Scholar
  14. 14.
    Deliège, F., Pedersen, T.B.: Position list word aligned hybrid: optimizing space and performance for compressed bitmaps. In: Proceedings of International Conference on Extending Database Technology (EDBT), pp. 228–239. ACM (2010)Google Scholar
  15. 15.
    Finkel, R.A., Bentley, J.L.: Quad trees: a data structure for retrieval on composite keys. Acta Informatica 4, 1–9 (1974)CrossRefGoogle Scholar
  16. 16.
    Fonseca, M.J., Jorge, J.A.: Indexing high-dimensional data for content-based retrieval in large databases. In: Proceedings of International Conference on Database Systems for Advanced Applications (DASFAA) (2003)Google Scholar
  17. 17.
    Furtado, P.: Workload-based placement and join processing in node-partitioned data warehouses. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 38–47. Springer, Heidelberg (2004)Google Scholar
  18. 18.
    Gaede, V., Günther, O.: Multidimensional access methods. ACM Comput. Surv. 30(2), 170–231 (1998)CrossRefGoogle Scholar
  19. 19.
    Graefe, G.: Fast loads and fast queries. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 111–124. Springer, Heidelberg (2009)Google Scholar
  20. 20.
    Gupta, A., Mumick, I.S. (eds.): Materialized Views: Techniques, Implementations, and Applications. MIT Press, Cambridge (1999)Google Scholar
  21. 21.
    Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 47–57. ACM (1984)Google Scholar
  22. 22.
    Gyssens, M., Lakshmanan, L.V.S.: A foundation for multi-dimensional databases. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 106–115 (1997)Google Scholar
  23. 23.
    Han, J., Xie, Z., Fu, Y.: Join index hierarchy: an indexing structure for efficient navigation in object-oriented databases. IEEE Trans. Knowl. Data Eng. 11(2), 321–337 (1999)CrossRefGoogle Scholar
  24. 24.
    Jarke, M., Lenzerini, M., Vassiliou, Y., Vassiliadis, P.: Fundamentals of Data Warehouses. Springer, Heidelberg (2003). ISBN 3-540-42089-4CrossRefGoogle Scholar
  25. 25.
    Jensen, C.S., Pedersen, T.B., Tomsen, C.: Multidimensional Databases and Data Warehousing. Morgan & Claypool Publishers, San Rafael (2010). ISBN 978-1-60845-537-9Google Scholar
  26. 26.
    Johnson, T., Sasha, D.: The performance of current B-tree algorithms. ACM Trans. Database Syst. (TODS) 18(1), 51–101 (1993)CrossRefGoogle Scholar
  27. 27.
    Karayannidis, N., Tsois, A., Sellis, T.: Advanced ad hoc star query processing. In: Wrembel, R., Koncilia, C. (eds.) Data Warehouses and OLAP: Concepts, Architectures and Solutions, pp. 136–156. Idea Group Inc., London (2007). ISBN 1-59904-364-5Google Scholar
  28. 28.
    Kemper, A., Moerkotte, G.: Access support in object bases. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 364–374 (1989)Google Scholar
  29. 29.
    Koudas, N.: Space efficient bitmap indexing. In: Proceedings of ACM Conference on Information and Knowledge Management (CIKM), pp. 194–201 (2000)Google Scholar
  30. 30.
    Li, C., Tang, C., Yu, Z., Liu, Y., Zhang, T., Liu, Q., Zhu, M., Jiang, Y.: Mining multi-dimensional frequent patterns without data cube construction. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 251–260. Springer, Heidelberg (2006)Google Scholar
  31. 31.
    Li, K.A., Ross, Z.: Fast joins using join indices. Int. J. Very Larg. Data Bases 8(1), 1–24 (1999)CrossRefGoogle Scholar
  32. 32.
    Malinowski, E., Zimányi, E.: Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications. Springer, Heidelberg (2008). ISBN 9783540744047Google Scholar
  33. 33.
    Moerkotte, G.: Small materialized aggregates: a light weight index structure for data warehousing. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 476–487 (1998)Google Scholar
  34. 34.
    Morzy, M.: Advanced database structure for efficient association rule mining. Ph.D. thesis, Poznan University of Technology, Institute of Computing Science (2004)Google Scholar
  35. 35.
    Morzy, M., Morzy, T., Nanopoulos, A., Manolopoulos, Y.: Hierarchical bitmap index: an efficient and scalable indexing technique for set-valued attributes. In: Kalinichenko, L.A., Manthey, R., Thalheim, B., Wloka, U. (eds.) ADBIS 2003. LNCS, vol. 2798, pp. 236–252. Springer, Heidelberg (2003)Google Scholar
  36. 36.
    Morzy, T., Wrembel, R., Chmiel, J., Wojciechowski, A.: Time-HOBI: index for optimizing star queries. Inf. Syst. 37(5), 412–429 (2012)CrossRefGoogle Scholar
  37. 37.
    Nourani, M., Tehranipour, M.H.: RL-Huffman encoding for test compression and power reduction in scan applications. ACM Trans. Des. Autom. Electron. Syst. (TODAES) 10(1), 91–115 (2005)CrossRefGoogle Scholar
  38. 38.
    O’Neil, P.E.: Model 204 architecture and performance. HPTS 1989. LNCS, vol. 359, pp. 39–59. Springer, Heidelberg (1989)Google Scholar
  39. 39.
    O’Neil, P., Graefe, G.: Multi-table joins through bitmapped join indices. SIGMOD Rec. 24(3), 8–11 (1995)CrossRefGoogle Scholar
  40. 40.
    O’Neil, P., O’Neil, E., Chen, X., Revilak, S.: The star schema benchmark and augmented fact table indexing. In: Nambiar, R., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 237–252. Springer, Heidelberg (2009)Google Scholar
  41. 41.
    O’Neil, P., Quass, D.: Improved query performance with variant indexes. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 38–49 (1997)Google Scholar
  42. 42.
    Padmanabhan, S., Bhattacharjee, B., Malkemus, T., Cranston, L., Huras, M.: Multi-dimensional clustering: a new data layout scheme in DB2. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 637–641 (2003)Google Scholar
  43. 43.
    Papadias, D., Kalnis, P., Zhang, J., Tao, Y.: Efficient OLAP operations in spatial data warehouses. In: Jensen, C.S., Schneider, M., Seeger, B., Tsotras, V.J. (eds.) SSTD 2001. LNCS, vol. 2121, pp. 443–459. Springer, Heidelberg (2001)Google Scholar
  44. 44.
    Papadias, D., Tao, Y., Kalnis, P., Zhang, J.: Indexing spatio-temporal data warehouses. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 166–175. IEEE Computer Society (2002)Google Scholar
  45. 45.
    Rao, J., Zhang, C., Megiddo, N., Lohman, G.: Automating physical database design in a parallel database. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 558–569 (2002)Google Scholar
  46. 46.
    Robinson, J.T.: The K-D-B-tree: a search structure for large multidimensional dynamic indexes. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 10–18. ACM (1981)Google Scholar
  47. 47.
    Rotem, D., Stockinger, K., Wu, K.: Optimizing candidate check costs for bitmap indices. In: Proceedings of ACM Conference on Information and Knowledge Management (CIKM), pp. 648–655 (2005)Google Scholar
  48. 48.
    Scientific Data Management Research Group: FastBit: an efficient compressed bitmap index technology. http://sdm.lbl.gov/fastbit/. Accessed 10 Nov 2006
  49. 49.
    Sinha, R.R., Mitra, S., Winslett, M.: Bitmap indexes for large scientific data sets: a case study. In: Proceedings of Parallel and Distributed Processing Symposium. IEEE (2006)Google Scholar
  50. 50.
    Sinha, R.R., Winslett, M.: Multi-resolution bitmap indexes for scientific data. ACM Trans. Database Syst. (TODS) 32(3), 1–38 (2007)CrossRefGoogle Scholar
  51. 51.
    Siqueira, T.L., Ciferri, C.D.D., Times, V.C., Ciferri, R.R.: The sb-index and the hsb-index: efficient indices for spatial data warehouses. Geoinformatica 16(1), 165–205 (2012)CrossRefGoogle Scholar
  52. 52.
    Stabno, M., Wrembel, R.: RLH: bitmap compression technique based on run-length and Huffman encoding. Inf. Syst. 34(4–5), 400–414 (2009)CrossRefGoogle Scholar
  53. 53.
    Stockinger, K., Wu, K.: Bitmap indices for data warehouses. In: Wrembel, R., Koncilia, C. (eds.) Data Warehouses and OLAP: Concepts, Architectures and Solutions, pp. 157–178. Idea Group Inc., London (2007). ISBN 1-59904-364-5Google Scholar
  54. 54.
    Stockinger, K., Wu, K., Shoshani, A.: Evaluation strategies for bitmap indices with binning. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds.) DEXA 2004. LNCS, vol. 3180, pp. 120–129. Springer, Heidelberg (2004)Google Scholar
  55. 55.
    Stöhr, T., Rahm, E.: Warlock: a data allocation tool for parallel warehouses. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 721–722 (2001)Google Scholar
  56. 56.
    Technical Documentation. Oracle Database Data Cartridge Developer’s Guide 11g Release 1(11.1)Google Scholar
  57. 57.
    Valduriez, P.: Join indices. ACM Trans. Database Syst. (TODS) 12(2), 218–246 (1987)CrossRefGoogle Scholar
  58. 58.
    Wrembel, R.: Data warehouse performance: selected techniques and data structures. In: Aufaure, M.-A., Zimányi, E. (eds.) eBISS 2011. LNBIP, vol. 96, pp. 27–62. Springer, Heidelberg (2012)Google Scholar
  59. 59.
    Wu, K., Otoo, E.J., Shoshani, A.: An efficient compression scheme for bitmap indices. Research report, Lawrence Berkeley National Laboratory (2004)Google Scholar
  60. 60.
    Wu, K., Otoo, E.J., Shoshani, A.: On the performance of bitmap indices for high cardinality attributes. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 24–35 (2004)Google Scholar
  61. 61.
    Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst. (TODS) 31(1), 1–38 (2006)CrossRefGoogle Scholar
  62. 62.
    Wu, K., Yu, P.: Range-based bitmap indexing for high cardinality attributes with skew. In: International Computer Software and Applications Conference (COMPSAC), pp. 61–67 (1998)Google Scholar
  63. 63.
    Wu, M., Buchmann, A.: Encoded bitmap indexing for data warehouses. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 220–230 (1998)Google Scholar
  64. 64.
    Xu, X., Han, J., Lu, W.: RT-tree: an improved R-tree index structure for spatiotemporal databases. In: International Symposium on Spatial Data Handling, pp. 1040–1049 (1990)Google Scholar
  65. 65.
    Yiu, M.L., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 483–494 (2007)Google Scholar
  66. 66.
    Yu, T.C., Meng, W.: Principles of Database Query Processing for Advanced Applications. Morgan Kaufmann, San Francisco (1998). ISBN 1-55860-434-0Google Scholar
  67. 67.
    Zhuang, Y., Zhuang, Y., Li, Q., Chen, L., Yu, Y.: Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach. In: Proceedings of International Conference on Extending Database Technology (EDBT), pp. 241–251 (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Institute of Computing SciencePoznan University of TechnologyPoznańPoland

Personalised recommendations