Advertisement

Implementation of Multidimensional Index Structures for Knowledge Discovery in Relational Databases

  • Stefan Berchtold
  • Christian Böhm
  • Hans-Peter Kriegel
  • Urs Michel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1676)

Abstract

Efficient query processing is one of the basic needs for data mining algorithms. Clustering algorithms, association rule mining algorithms and OLAP tools all rely on efficient query processors being able to deal with high-dimensional data. Inside such a query processor, multidimensional index structures are used as a basic technique. As the implementation of such an index structures is a difficult and time-consuming task, we propose a new approach to implement an index structure on top of a commercial relational database system. In particular, we map the index structure to a relational database design and simulate the behavior of the index structure using triggers and stored procedures. This can easily be done for a very large class of multidimensional index structures. To demonstrate the feasibility and efficiency, we implemented an X-tree on top of Oracle 8. We ran several experiments on large databases and recorded a performance improvement of up to a factor of 11.5 compared to a sequential scan of the database.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [ALSS 95]
    Agrawal R., Lin K., Sawhney H., Shim K.: ‘Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases’, Proc. of the 21st Conf. on Very Large Databases, 1995, pp. 490–501.Google Scholar
  2. [AS 94]
    Agrawal R., Srikant R.: ‘Fast Algorithms for Mining Association Rules’, Proc. of the 20st Conf. on Very Large Databases, Chile, 1995, pp. 487–499.Google Scholar
  3. [BBB+ 97]
    Berchtold S., Böhm C., Braunmueller B., Keim D. A., Kriegel H.-P.: ‘Fast Similarity Search in Multimedia Databases’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1997, Tucson, Arizona.Google Scholar
  4. [BBK 98]
    Berchtold S., Böhm C., Kriegel H.-P.: ‘The Pyramid-Technique: Towards indexing beyond the Curse of Dimensionality’, Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, pp. 142–153,1998.Google Scholar
  5. [BBK 98a]
    Berchtold S., Böhm C., Kriegel H.-P.: ‘Improving the Query Performance of High-Dimensional Index Structures Using Bulk-Load Operations’, 6th. Int. Conf. on Extending Database Technology, Valencia, 1998.Google Scholar
  6. [BBKK 97]
    Berchtold S., Böhm C., Keim D., Kriegel H.-P.: ‘A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space’, ACM PODS Symposium on Principles of Database Systems, 1997, Tucson, Arizona.Google Scholar
  7. [Ben 75]
    Bentley J.L.: ‘Multidimensional Search Trees Used for Associative Searching’, Communications of the ACM, Vol. 18, No. 9, pp. 509–517, 1975.zbMATHCrossRefMathSciNetGoogle Scholar
  8. [Ben 79]
    Bentley J. L.: ‘Multidimensiuonal Binary Search in Database Applications’, IEEE Trans. Software Eng. 4(5), 1979, pp. 397–409.MathSciNetGoogle Scholar
  9. [BKK 96]
    Berchtold S., Keim D., Kriegel H.-P.: ‘The X-tree: An Index Structure for High-Dimensional Data’, 22nd Conf. on Very Large Databases, 1996.Google Scholar
  10. [BKSS 90]
    Beckmann N., Kriegel H.-P., Schneider R., Seeger B.: ‘The R*-tree: An Efficient and Robust Access Method for Points and Rectangles’, Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, 1990.Google Scholar
  11. [Böh 98]
    Böhm C.: ‘Efficiently Indexing High-Dimensional Data Spaces’, Ph.D. Thesis, Faculty for Mathematics and Computer Science, University of Munich, 1998.Google Scholar
  12. [EKSX 98]
    Ester M., Kriegel H.-P., Sander J., Xu X.: ‘Incremental Clustering for Mining in a Data Warehousing Environment’, Proc. 24th Int. Conf. on Very Large Databases (VLDB’ 98), NY, 1998, pp. 323–333.Google Scholar
  13. [Fal 85]
    Faloutsos C.: ‘Multiattribute Hashing Using Gray Codes’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1985, pp. 227–238.Google Scholar
  14. [FB 74]
    Finkel R, Bentley J.L. ‘Quad Trees: A Data Structure for Retrieval of Composite Keys’, Acta Informatica 4(1), 1974, pp. 1–9.zbMATHCrossRefGoogle Scholar
  15. [FR 89]
    Faloutsos C., Roseman S.: ‘Fractals for Secondary Key Retrieval’, Proc. 8th ACM SIGACT/SIGMOD Symp. on Principles of Database Systems, 1989, pp. 247–252.Google Scholar
  16. [Gut 84]
    Guttman A.: ‘R-trees: A Dynamic Index Structure for Spatial Searching’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1984.Google Scholar
  17. [HAMS 97]
    Ho C.T., Agrawal R., Megiddo N., Srikant R.: Range Queries in OLAP Data Cubes. SIGMOD Conference 1997: 73–88Google Scholar
  18. [HS 95]
    Hjaltason G. R., Samet H.: ‘Ranking in Spatial Databases’, Proc. 4th Int. Symp. on Large Spatial Databases, Portland, ME, 1995, pp. 83–95.Google Scholar
  19. [Jag 90]
    Jagadish H. V.: ‘Linear Clustering of Objects with Multiple Attributes’, Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, 1990, pp. 332–342.Google Scholar
  20. [JW 96]
    Jain R, White D.A.: ‘Similarity Indexing: Algorithms and Performance’, Proc. SPIE Storage and Retrieval for Image and Video Databases IV, Vol. 2670, San Jose, CA, 1996, pp. 62–75.Google Scholar
  21. [KS 97]
    Katayama N., Satoh S.: ‘The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1997, pp. 369–380.Google Scholar
  22. [LJF 95]
    Lin K., Jagadish H. V., Faloutsos C.: ‘The TV-Tree: An Index Structure for High-Dimensional Data’, VLDB Journal, Vol. 3, pp. 517–542, 1995.CrossRefGoogle Scholar
  23. [LS 89]
    Lomet D., Salzberg B.: ‘The hB-tree: A Robust Multiattribute Search Structure’, Proc. 5th IEEE Int. Conf. on Data Eng., 1989, pp. 296–304.Google Scholar
  24. [MG 93]
    Mehrotra R., Gary J.: ‘Feature-Based Retrieval of Similar Shapes’, Proc. 9th Int. Conf. on Data Engeneering, 1993.Google Scholar
  25. [NHS 84]
    Nievergelt J., Hinterberger H., Sevcik K. C.: ‘The Grid File: An Adaptable, Symmetric Multikey File Structure’, ACM Trans. on Database Systems, Vol. 9, No. 1, 1984, pp. 38–71.CrossRefGoogle Scholar
  26. [WJ 96]
    White D.A., Jain R.: ‘Similarity indexing with the SS-tree’, Proc. 12th Int. Conf on Data Engineering, New Orleans, LA, 1996.Google Scholar
  27. [WSB 98]
    Weber R., Scheck H.-J., Blott S.: ‘A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces’, Proc. Int. Conf. on Very Large Databases, New York, 1998.Google Scholar
  28. [WW 80]
    Wallace T., Wintz P.: ‘An Efficient Three-Dimensional Aircraft Recognition Algorithm Using Normalized Fourier Descriptors’, Computer Graphics and Image Processing, Vol. 13, pp. 99–126, 1980.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Stefan Berchtold
    • 1
  • Christian Böhm
    • 1
    • 2
  • Hans-Peter Kriegel
    • 2
  • Urs Michel
    • 2
  1. 1.stb gmbhAugsburgGermany
  2. 2.University of MunichMunichGermany

Personalised recommendations