On Optimizing Nearest Neighbor Queries in High-Dimensional Data Spaces

  • Stefan Berchtold
  • Christian Böhm
  • Daniel Keim
  • Florian Krebs
  • Hans-Peter Kriegel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1973)

Abstract

Nearest-neighbor queries in high-dimensional space are of high importance in various applications, especially in content-based indexing of multimedia data. For an optimization of the query processing, accurate models for estimating the query processing costs are needed. In this paper, we propose a new cost model for nearest neighbor queries in high-dimensional space, which we apply to enhance the performance of high-dimensional index structures. The model is based on new insights into effects occurring in high-dimensional space and provides a closed formula for the processing costs of nearest neighbor queries depending on the dimensionality, the block size and the database size. From the wide range of possible applications of our model, we select two interesting samples: First, we use the model to prove the known linear complexity of the nearest neighbor search problem in high-dimensional space, and second, we provide a technique for optimizing the block size. For data of medium dimensionality, the optimized block size allows significant speed-ups of the query processing time when compared to traditional block sizes and to the linear scan.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J.: ‘A Basic Local Alignment Search Tool’, Journal of Molecular Biology, Vol. 215, No. 3, 1990, pp. 403–410.Google Scholar
  2. [2]
    Beckmann N., Kriegel H.-P., Schneider R., Seeger B.: ‘The R*-tree: An Efficient and Robust Access Method for Points and Rectangles’, Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, 1990, pp. 322–331.Google Scholar
  3. [3]
    Berchtold S., Böhm C., Keim D., Kriegel H.-P.: ‘A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space’, Proc. ACM PODS Int. Conf. on Principles of Databases, Tucson, Arizona, 1997.Google Scholar
  4. [4]
    Berchtold S., Böhm C., Braunmüller B., Keim D., Kriegel H.-P.: ‘Fast Parallel Similarity Search in Multimedia Databases’, Proc. ACM SIGMOD Int. Conf. on Management of Data, Tucson, Arizona, 1997.Google Scholar
  5. [5]
    Berchtold S., Keim D. A.: ‘High-dimensional Index Structures: Database Support for Next Decades’s Applications’, Tutorial, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1998, p. 501.Google Scholar
  6. [6]
    Berchtold S., Keim D., Kriegel H.-P.: ‘The X-tree: An Index Structure for High-Dimensional Data’, 22nd Conf. on Very Large Databases, 1996, Bombay, India.Google Scholar
  7. [7]
    Berchtold S., Keim D., Kriegel H.-P.: ‘Fast Searching for Partial Similarity in Polygon Databases’, VLDB Journal, Dec. 1997.Google Scholar
  8. [8]
    Ciacia P., Patella M., Zezula P.: ‘A Cost Model for Similarity Queries in Metric Spaces’, Proc. ACM PODS Int. Conf. on Principals of Databases, Seattle, WA, 1998, pp. 59–68.Google Scholar
  9. [9]
    Cleary J. G.: ‘Analysis of an Algorithm for Finding Nearest Neighbors in Euclidean Space’, ACM Transactions on Mathematical Software, Vol. 5, No. 2, June 1979, pp.183–192.MATHCrossRefMathSciNetGoogle Scholar
  10. [10]
    Faloutsos C., Barber R., Flickner M., Hafner J., et al.: ‘Efficient and Effective Querying by Image Content’, Journal of Intelligent Information Systems, 1994, Vol. 3, pp. 231–262.CrossRefGoogle Scholar
  11. [11]
    Friedman J. H., Bentley J. L., Finkel R. A.: “An Algorithm for Finding Best Matches in Logarithmic Expected Time”, ACM Transactions on Mathematical Software, Vol. 3, No. 3, September 1977, pp. 209–226.MATHCrossRefGoogle Scholar
  12. [12]
    Hjaltason G. R., Samet H.: ‘Ranking in Spatial Databases’, Proc. 4th Int. Symp. on Large Spatial Databases, Portland, ME, 1995, pp. 83–95.Google Scholar
  13. [13]
    Katayama N., Satoh S.: ‘The SR-Tree: An Index Structure for High-Dimensional Nearest Neighbor Queries’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1997.Google Scholar
  14. [14]
    Kukich K.: ‘Techniques for Automatically Correcting Words in Text’, ACM Computing Surveys, Vol. 24, No. 4, 1992, pp. 377–440.CrossRefGoogle Scholar
  15. [15]
    Jagadish H. V.: ‘A Retrieval Technique for Similar Shapes’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1991, pp. 208–217.Google Scholar
  16. [16]
    Lin K., Jagadish H. V., Faloutsos C.: ‘The TV-tree: An Index Structure for High-Dimensional Data’, VLDB Journal, Vol. 3, 1995, pp. 517–542.CrossRefGoogle Scholar
  17. [17]
    Mehrotra R., Gary J. E.: ‘Feature-Based Retrieval of Similar Shapes’, Proc. 9th Int. Conf. on Data Engineering, Vienna, Austria, 1993, pp. 108–115.Google Scholar
  18. [18]
    Mehrotra R., Gary J. E.: ‘Feature-Index-Based Similar Shape Retrieval’, Proc. of the 3rd Working Conf. on Visual Database Systems, March 1995.Google Scholar
  19. [19]
    Roussopoulos N., Kelley S., Vincent F.: ‘Nearest Neighbor Queries’, Proc. ACM SIGMOD Int. Conf. on Management of Data, 1995, pp. 71–79.Google Scholar
  20. [20]
    Shawney H., Hafner J.: ‘Efficient Color Histogram Indexing’, Proc. Int. Conf. on Image Processing, 1994, pp. 66–70.Google Scholar
  21. [21]
    Shoichet B. K., Bodian D. L., Kuntz I. D.: ‘Molecular Docking Using Shape Descriptors’, Journal of Computational Chemistry, Vol. 13, No. 3, 1992, pp. 380–397.CrossRefGoogle Scholar
  22. [22]
    Sproull R.F.: ‘Refinements to Nearest Neighbor Searching in k-Dimensional Trees’, Algorithmica 1991, pp. 579–589.Google Scholar
  23. [23]
    Wallace T., Wintz P.: ‘An Efficient Three-Dimensional Aircraft Recognition Algorithm Using Normalized Fourier Descriptors’, Computer Graphics and Image Processing, Vol. 13, pp. 99–126, 1980.CrossRefGoogle Scholar
  24. [24]
    Weber R., Schek H.-J., Blott S.: ‘A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces’, Proc. Int. Conf. on Very Large Databases, New York, 1998.Google Scholar
  25. [25]
    White, D., Jain R.: ‘Similarity Indexing with the SS-Tree’, Proc. 12th Int. Conf. on Data Engineering, New Orleans, LA, 1996, pp. 516–523.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Stefan Berchtold
    • 1
  • Christian Böhm
    • 2
  • Daniel Keim
    • 3
  • Florian Krebs
    • 2
  • Hans-Peter Kriegel
    • 2
  1. 1.stb gmbhAugsburgGermany
  2. 2.University of MunichMunichGermany
  3. 3.University of Halle-WittenbergHalle (Saale)Germany

Personalised recommendations