Advertisement

Dynamically Optimizing High-Dimensional Index Structures

  • Christian Böhm
  • Hans-Peter Kriegel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1777)

Abstract

In high-dimensional query processing, the optimization of the logical page-size of index structures is an important research issue. Even very simple query processing techniques such as the sequential scan are able to outperform indexes which are not suitably optimized. Page-size optimization based on a cost model faces the problem, that the optimum not only depends on static schema information such as the dimension of the data space but also on dynamically changing parameters such as the number of objects stored in the database and the degree of clustering and correlation in the current data set. Therefore, we propose a method for adapting the page size of an index dynamically during insert processing. Our solution, called DABS-tree, uses a flat directory whose entries consist of an MBR, a pointer to the data page and the size of the data page. Before splitting pages in insert operations, a cost model is consulted to estimate whether the split operation is beneficial. Otherwise, the split is avoided and the logical page-size is adapted instead. A similar rule applies for merging when performing delete operations. We present an algorithm for the management of data pages with varying page-sizes in an index and show that all restructuring operations are locally restricted. We show in our experimental evaluation that the DABS tree outperforms the X-tree by a factor up to 4.6 and the sequential scan by a factor up to 6.6.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal R., Faloutsos C., Swami A.: ‘Efficient similarity search in sequence databases’, Proc. 4th Int. Conf. on Foundations of Data Organization and Algorithms, 1993, LNCS 730, pp. 69–84Google Scholar
  2. 2.
    Agrawal R., Lin K., Shawney H., Shim K.: ‘Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases’, Proc. of the 21st Conf. on Very Large Databases, 1995, pp. 490–501.Google Scholar
  3. 3.
    Arya S., Mount D.M., Narayan O.: ‘Accounting for Boundary Effects in Nearest Neighbor Searching’, Proc. 11th Symp. on Computational Geometry, Vancouver, Canada, pp. 336–344, 1995.Google Scholar
  4. 4.
    Aref W. G., Samet H.: ‘Optimization Strategies for Spatial Query Processing’, Proc. 17th Int. Conf. on Very Large Databases (VLDB’91), Barcelona, Catalonia, 1991, pp. 81–90.Google Scholar
  5. 5.
    Bentley J.L.: ‘Multidimensional Search Trees Used for Associative Searching’, Communications of the ACM, Vol. 18, No. 9, pp. 509–517, 1975.MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Berchtold S., Böhm C., Jagadish H. V., Kriegel H.-P., Sander J.: ‘Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces’, Proc. Int. Conf. on Data Engineering, Konstanz, Germany, 2000.Google Scholar
  7. 7.
    Berchtold S., Böhm C., Kriegel H.-P.: ‘The Pyramid-Technique: Towards indexing beyond the Curse of Dimensionality’, Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, pp. 142–153,1998.Google Scholar
  8. 8.
    Berchtold S., Böhm C., Keim D., Kriegel H.-P., Xu X.: ‘Optimal Multidimensional Query Processing Using Tree Striping’, submitted.Google Scholar
  9. 9.
    Berchtold S., Böhm C., Keim D., Kriegel H.-P.: ‘A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space’, ACM PODS Symposium on Principles of Database Systems, 1997, Tucson, Arizona.Google Scholar
  10. 10.
    Belussi A., Faloutsos C.: ‘Estimating the Selectivity of Spatial Queries Using the ‘Correlation’ Fractal Dimension’. Proceedings of 21th International Conference on Very Large Data Bases, VLDB’95, Zurich, Switzerland, 1995, pp. 299–310.Google Scholar
  11. 11.
    Berchtold S., Kriegel H.-P.: ‘S3: Similarity Search in CAD Database Systems’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1997, Tucson, Arizona, pp. 564–567.Google Scholar
  12. 12.
    Berchtold S., Keim D., Kriegel H.-P.: ‘The X-Tree: An Index Structure for High-Dimensional Data’, 22nd Conf. on Very Large Databases, 1996, Bombay, India, pp. 28–39.Google Scholar
  13. 13.
    Berchtold S., Keim D., Kriegel H.-P.: ‘Using Extended Feature Objects for Partial Similarity Retrieval’, VLDB Journal Vol. 6, No. 4, pp. 333–348, 1997.CrossRefGoogle Scholar
  14. 14.
    Böhm C.: ‘Efficiently Indexing High-Dimensional Data Spaces’, Ph.D. Thesis, Faculty for Mathematics and Computer Science, University of Munich, Utz-Verlag München, 1998.Google Scholar
  15. 15.
    Friedman J. H., Bentley J. L., Finkel R. A.: ‘An Algorithm for Finding Best Matches in Logarithmic Expected Time’, ACM Transactions on Mathematical Software, Vol. 3, No. 3, September 1977, pp. 209–226.MATHCrossRefGoogle Scholar
  16. 16.
    Faloutsos C., Barber R., Flickner M., Hafner J., et al.: ‘Efficient and Effective Querying by Image Content’, Journal of Intelligent Information Systems, 1994, Vol. 3, pp. 231–262.CrossRefGoogle Scholar
  17. 17.
    Faloutsos C., Kamel I.: ‘Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension’, Proceedings of the Thirteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Minneapolis, Minnesota, 1994, pp. 4–13.Google Scholar
  18. 18.
    Faloutsos C., Ranganathan M., Manolopoulos Y.: ‘Fast Subsequence Matching in Time-Series Databases’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1994, pp. 419–429.Google Scholar
  19. 19.
    Faloutsos C., Sellis T., Roussopoulos N.: ‘Analysis of Object-Oriented Spatial Access Methods’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1987.Google Scholar
  20. 20.
    Gaede V., Günther O.: ‘Survey on Multidimensional Access Methods’, Technical Report ISS-16, Humbold-Universität Berlin, 1995.Google Scholar
  21. 21.
    Gary J. E., Mehrotra R.: ‘Similar Shape Retrieval using a Structural Feature Index’, Information Systems, Vol. 18, No. 7, 1993, pp. 525–537.CrossRefGoogle Scholar
  22. 22.
    Henrich, A.: ‘The LSD h-tree: An Access Structure for Feature Vectors’, Proc. 14th Int. Conf. on Data Engineering, Orlando, 1998.Google Scholar
  23. 23.
    C.A.R. Hoare, ‘Quicksort’, Computer Journal, Vol. 5, No. 1, 1962.Google Scholar
  24. 24.
    Hjaltason G. R., Samet H.: ‘Ranking in Spatial Databases’, Proc. 4th Int. Symp. on Large Spatial Databases, Portland, ME, 1995, pp. 83–95.Google Scholar
  25. 25.
    Jagadish H. V.: ‘A Retrieval Technique for Similar Shapes’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1991, pp. 208–217.Google Scholar
  26. 26.
    Kastenmüller G., Kriegel H.-P., Seidl T.: ‘Similarity Search in 3D Protein Databases’, Proc. German Conference on Bioinformatics (GCB’98), Köln (Cologne), 1998.Google Scholar
  27. 27.
    Korn F., Sidiropoulos N., Faloutsos C., Siegel E., Protopapas Z.: ‘Fast Nearest Neighbor. Search in Medical Image Databases’, Proc. 22nd VLDB Conference, Mumbai (Bombay), India, 1996, pp. 215–226.Google Scholar
  28. 28.
    Katayama N., Satoh S.: ‘The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1997, pp. 369–380.Google Scholar
  29. 29.
    Kriegel H.-P., Seidl T.: ‘Approximation-Based Similarity Search for 3-D Surface Segments’, GeoInformatica Journal, Kluwer Academic Publishers, 1998, to appear.Google Scholar
  30. 30.
    Lin K., Jagadish H. V., Faloutsos C.: ‘The TV-Tree: An Index Structure for High-Dimensional Data’, VLDB Journal, Vol. 3, pp. 517–542, 1995.CrossRefGoogle Scholar
  31. 31.
    Papadopoulos A., Manolopoulos Y.: ‘Performance of Nearest Neighbor Queries in R-Trees’, Proc. 6th Int. Conf. on Database Theory, Delphi, Greece, in: Lecture Notes in Computer Science, Vol. 1186, Springer, pp. 394–408, 1997.Google Scholar
  32. 32.
    Pagel B.-U., Six H.-W., Toben H., Widmayer P.: ‘Towards an Analysis of Range Query Performance in Spatial Data Structures’, Proceedings of the Twelfth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS’93, Washington, D.C., 1993, pp.214–221.Google Scholar
  33. 33.
    Shawney H., Hafner J.: ‘Efficient Color Histogram Indexing’, Proc. Int. Conf. on Image Processing, 1994, pp. 66–70.Google Scholar
  34. 34.
    Seidl T., Kriegel H.-P.: ‘Efficient User-Adaptable Similarity Search in Large Multimedia Databases’, Proc. 23rd Int. Conf. on Very Large Databases (VLDB’97), Athens, Greece, 1997, pp. 506–515.Google Scholar
  35. 35.
    Yannis Theodoridis, Timos K. Sellis: ‘A Model for the Prediction of R-tree Performance’. Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 3–5, 1996, Montreal, Canada. ACM Press, 1996, ISBN 0-89791-781-2 pp. 161–171.CrossRefGoogle Scholar
  36. 36.
    White D.A., Jain R.: ‘Similarity indexing with the SS-tree’, Proc. 12th Int. Conf on Data Engineering, New Orleans, LA, 1996.Google Scholar
  37. 37.
    Weber R., Schek H.-J., Blott S.: ‘A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces’, Proc. Int. Conf. on Very Large Databases, New York, 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Christian Böhm
    • 1
  • Hans-Peter Kriegel
    • 1
  1. 1.University of MunichMünchenGermany

Personalised recommendations