Skip to main content

Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order

  • Conference paper
MICAI 2005: Advances in Artificial Intelligence (MICAI 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3789))

Included in the following conference series:

Abstract

Kernel based methods (such as k-nearest neighbors classifiers) for AI tasks translate the classification problem into a proximity search problem, in a space that is usually very high dimensional. Unfortunately, no proximity search algorithm does well in high dimensions. An alternative to overcome this problem is the use of approximate and probabilistic algorithms, which trade time for accuracy.

In this paper we present a new probabilistic proximity search algorithm. Its main idea is to order a set of samples based on their distance to each element. It turns out that the closeness between the order produced by an element and that produced by the query is an excellent predictor of the relevance of the element to answer the query.

The performance of our method is unparalleled. For example, for a full 128-dimensional dataset, it is enough to review 10% of the database to obtain 90% of the answers, and to review less than 1% to get 80% of the correct answers. The result is more impressive if we realize that a full 128-dimensional dataset may span thousands of dimensions of clustered data. Furthermore, the concept of proximity preserving order opens a totally new approach for both exact and approximated proximity searching.

Supported by CYTED VII.19 RIBIDI Project (all authors), CONACyT grant 36911A (first and second author) and Millennium Nucleus Center for Web Research, Grant P04-067-F, Mideplan, Chile (second and third author).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arya, S., Mount, D., Netanyahu, N., Silverman, R., Wu, A.: An optimal algorithm for approximate nearest neighbor searching in fixed dimension. In: Proc. 5th ACM-SIAM Symposium on Discrete Algorithms (SODA 1994), pp. 573–583 (1994)

    Google Scholar 

  2. Brin, S.: Near neighbor search in large metric spaces. In: Proc. 21st Conference on Very Large Databases (VLDB 1995), pp. 574–584 (1995)

    Google Scholar 

  3. Bustos, B., Navarro, G.: Probabilistic proximity search algorithms based on compact partitions. Journal of Discrete Algorithms (JDA) 2(1), 115–134 (2003)

    Article  MathSciNet  Google Scholar 

  4. Chávez, E., Figueroa, K.: Faster proximity searching in metric data. In: Proc. of the Mexican International Conference in Artificial Intelligence (MICAI). LNCS, pp. 222–231. Springer, Heidelberg (2004)

    Google Scholar 

  5. Chávez, E., Marroquin, J.L., Navarro, G.: Fixed queries array: A fast and economical data structure for proximity searching. Multimedia Tools and Applications (MTAP) 14(2), 113–135 (2001)

    Article  MATH  Google Scholar 

  6. Chávez, E., Navarro, G.: Probabilistic proximity search: Fighting the curse of dimensionality in metric spaces. Inf. Process. Lett. 85(1), 39–46 (2003)

    Article  MATH  Google Scholar 

  7. Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recognition Letters 26(9), 1363–1376 (2005)

    Article  Google Scholar 

  8. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.L.: Proximity searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)

    Article  Google Scholar 

  9. Ciaccia, P., Patella, M.: Pac nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces. In: Proc. 16th Intl. Conf. on Data Engineering (ICDE 2000), pp. 244–255. IEEE Computer Society Press, Los Alamitos (2000)

    Google Scholar 

  10. Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: Proc. of the 23rd Conference on Very Large Databases (VLDB 1997), pp. 426–435 (1997)

    Google Scholar 

  11. Clarkson, K.: Nearest neighbor queries in metric spaces. Discrete Computational Geometry 22(1), 63–93 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  12. Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. SIAM J. Discrete Math. 17(1), 134–160 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  13. Hjaltason, G., Samet, H.: Index-driven similarity search in metric spaces. ACM Trans. Database Syst. 28(4), 517–580 (2003)

    Article  Google Scholar 

  14. Navarro, G.: Searching in metric spaces by spatial approximation. The Very Large Databases Journal (VLDBJ) 11(1), 28–46 (2002)

    Article  Google Scholar 

  15. Vidal, E.: An algorithm for finding nearest neighbors in (approximately) constant average time. Pattern Recognition Letters 4, 145–157 (1986)

    Article  Google Scholar 

  16. White, D., Jain, R.: Algorithms and strategies for similarity retrieval. Technical Report VCL-96-101, Visual Computing Laboratory, University of California, La Jolla, California (July 1996)

    Google Scholar 

  17. Yianilos, P.: Excluded middle vantage point forests for nearest neighbor search. In: ALENEX (1999)

    Google Scholar 

  18. Yianilos, P.N.: Locally lifting the curse of dimensionality for nearest neighbor search. Technical report, NEC Research Institute, Princeton, NJ (June 1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chávez, E., Figueroa, K., Navarro, G. (2005). Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds) MICAI 2005: Advances in Artificial Intelligence. MICAI 2005. Lecture Notes in Computer Science(), vol 3789. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11579427_41

Download citation

  • DOI: https://doi.org/10.1007/11579427_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29896-0

  • Online ISBN: 978-3-540-31653-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics