Advertisement

Multidimensional Index Structures in Relational Databases

  • Christian Böhm
  • Stefan Berchtold
  • Hans-Peter Kriegel
  • Urs Michel
Article

Abstract

Efficient query processing is one of the basic needs for data mining algorithms. Clustering algorithms, association rule mining algorithms and OLAP tools all rely on efficient query processors being able to deal with high-dimensional data. Inside such a query processor, multidimensional index structures are used as a basic technique. As the implementation of such an index structure is a difficult and time-consuming task, we propose a new approach to implement an index structure on top of a commercial relational database system. In particular, we map the index structure to a relational database design and simulate the behavior of the index structure using triggers and stored procedures. This can be easily done for a very large class of multidimensional index structures. To demonstrate the feasibility and efficiency, we implemented an X-tree on top of Oracle8. We ran several experiments on large databases and recorded a performance improvement up to a factor of 11.5 compared to a sequential scan of the database.

multidimensional index relational database similarity search range query 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Lin, K., Sawhney, H., and Shim, K. (1995). Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases. In Proc. 21st Int. Conf. on Very Large Data Bases (pp. 490–501).Google Scholar
  2. Agrawal, R. and Srikant, R. (1994). Fast Algorithms for Mining Association Rules. In Proc. 20th Int. Conf. on Very Large Data Bases, Chile (pp. 487–499).Google Scholar
  3. Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B. (1990). The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ (pp. 322–331).Google Scholar
  4. Bentley, J.L. (1975). Multidimensional Search Trees Used for Associative Searching. Communications of the ACM, 18(9), 509–517.Google Scholar
  5. Bentley, J.L. (1979). Multidimensional Binary Search in Database Applications. IEEE Trans. Software Eng., 4(5), 397–409.Google Scholar
  6. Berchtold, S., Böhm, C., Braunmüller, B., Keim, D.A., and Kriegel, H.-P. (1997a). Fast Parallel Similarity Search in Multimedia Databases. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, AZ (pp. 1–12).Google Scholar
  7. Berchtold, S., Böhm, C., Jagadish, H.V., Kriegel, H.-P., and Sander, J. (2000). Independent Quantization: An Index Compression Technique for High-Dimensional Spaces. In Proc. Int. Conf. on Data Engineering, San Diego, CA (pp. 577–588).Google Scholar
  8. Berchtold, S., Böhm, C., Keim, D., and Kriegel, H.-P. (1997b). A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space. In ACM PODS Symposium on Principles of Database Systems, Tucson, AZ (pp. 78–86).Google Scholar
  9. Berchtold, S., Böhm, C., and Kriegel, H.-P. (1998a). The Pyramid-Technique: Towards Indexing Beyond the Curse of Dimensionality. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA (pp. 142–153).Google Scholar
  10. Berchtold, S., Böhm, C., and Kriegel, H.-P. (1998b). Improving the Query Performance of High-Dimensional Index Structures Using Bulk-Load Operations. In 6th. Int. Conf. on Extending Database Technology, Valencia, Spain (pp. 216–230).Google Scholar
  11. Berchtold, S., Keim, D., and Kriegel, H.-P. (1996). The X-tree: An Index Structure for High-Dimensional Data. In Proc. 22nd Int. Conf. on Very Large Data Bases, Mumbay, India (pp. 28–39).Google Scholar
  12. Böhm, C. (1998). Efficiently Indexing High-Dimensional Data Spaces. Ph.D. Thesis, Faculty for Mathematics and Computer Science, University of Munich.Google Scholar
  13. Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR (pp. 226–231).Google Scholar
  14. Faloutsos, C. (1985). Multiattribute Hashing Using Gray Codes. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Austin, TX (pp. 227–238).Google Scholar
  15. Faloutsos, C. and Roseman, S. (1989). Fractals for Secondary Key Retrieval. In Proc. 8th ACM SIGACT/SIGMOD Symp. on Principles of Database Systems (pp. 247–252).Google Scholar
  16. Finkel, R. and Bentley, J.L. (1974). Quad Trees: A Data Structure for Retrieval of Composite Keys, Acta Informatica, 4(1), 1–9.Google Scholar
  17. Guttman, A. (1984). R-trees: A Dynamic Index Structure for Spatial Searching. In Proc. ACMSIGMOD Int. Conf. on Management of Data, Boston, MA (pp. 47–57).Google Scholar
  18. Hjaltason, G.R. and Samet, H. (1995). Ranking in Spatial Databases. In Proc. 4th Int. Symp. on Large Spatial Databases, Portland, ME (pp. 83–95).Google Scholar
  19. Ho, C.T., Agrawal, R., Megiddo, N., and Srikant, R. (1997). Range Queries in OLAP Data Cubes. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, AZ (pp. 73–88).Google Scholar
  20. Jagadish, H.V. (1990). Linear Clustering of Objects with Multiple Attributes. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ (pp. 332–342).Google Scholar
  21. Jain, R. and White, D.A. (1996). Similarity Indexing: Algorithms and Performance. In Proc. SPIE Storage and Retrieval for Image and Video Databases IV, San Jose, CA, Vol. 2670 (pp. 62–75).Google Scholar
  22. Katayama, N. and Satoh, S. (1997). The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, AZ (pp. 369–380).Google Scholar
  23. Knorr, E.M. and Ng, R.T. (1998). Algorithms for Mining Distance-Based Outliers in Large Datasets. In Proc. 24th Int. Conf. on Very Large Data Bases, New York City (pp. 392–403).Google Scholar
  24. Lin, K., Jagadish, H.V., and Faloutsos, C. (1995). The TV-Tree: An Index Structure for High-Dimensional Data. VLDB Journal, 3, 517–542.Google Scholar
  25. Lomet, D. and Salzberg, B. (1989). The hB-tree: A Robust Multiattribute Search Structure. In Proc. 5th IEEE Int. Conf. on Data Engineering, Los Angeles, CA (pp. 296–304).Google Scholar
  26. Mehrotra, R. and Gary, J. (1993). Feature-Based Retrieval of Similar Shapes. In Proc. 9th Int. Conf. on Data Engineering, Vienna, Austria (pp. 108–115).Google Scholar
  27. Nievergelt, J., Hinterberger, H., and Sevcik, K.C. (1984). The Grid File: An Adaptable, Symmetric Multikey File Structure, ACM Trans. on Database Systems, 9(1), 38–71.Google Scholar
  28. Sander, J., Ester, M., Kriegel, H.-P., and Xu, X. (1998). Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and its Applications, Data Mining and Knowledge Discovery, 2(2), 169–184.Google Scholar
  29. Wallace, T. and Wintz, P. (1980). An Efficient Three-Dimensional Aircraft Recognition Algorithm Using Normalized Fourier Descriptors, Computer Graphics and Image Processing, 13, 99–126.Google Scholar
  30. Weber, R., Schek, H.-J., and Blott, S. (1998).AQuantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In Proc. Int. Conf. on Very Large Data Bases, New York (pp. 194–205).Google Scholar
  31. White, D.A. and Jain, R. (1996). Similarity Indexing with the SS-Tree. In Proc. 12th Int. Conf. on Data Engineering, New Orleans, LA (pp. 516–523).Google Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • Christian Böhm
    • 1
  • Stefan Berchtold
    • 2
  • Hans-Peter Kriegel
    • 3
  • Urs Michel
    • 4
  1. 1.University of MunichMunichGermany
  2. 2.stb gmbh software technologie beratungAugsburgGermany
  3. 3.University of MunichMunichGermany
  4. 4.University of MunichMunichGermany

Personalised recommendations