Multimedia Tools and Applications

, Volume 41, Issue 2, pp 215–233 | Cite as

Improving the space cost of k-NN search in metric spaces by using distance estimators

Article

Abstract

Similarity searching in metric spaces has a vast number of applications in several fields like multimedia databases, text retrieval, computational biology, and pattern recognition. In this context, one of the most important similarity queries is the k nearest neighbor (k-NN) search. The standard best-first k-NN algorithm uses a lower bound on the distance to prune objects during the search. Although optimal in several aspects, the disadvantage of this method is that its space requirements for the priority queue that stores unprocessed clusters can be linear in the database size. Most of the optimizations used in spatial access methods (for example, pruning using MinMaxDist) cannot be applied in metric spaces, due to the lack of geometric properties. We propose a new k-NN algorithm that uses distance estimators, aiming to reduce the storage requirements of the search algorithm. The method stays optimal, yet it can significantly prune the priority queue without altering the output of the query. Experimental results with synthetic and real datasets confirm the reduction in storage space of our proposed algorithm, showing savings of up to 80% of the original space requirement.

Keywords

Similarity search Metric spaces k-NN search 

References

  1. 1.
    Baroni M, Cruciani G, Sciabola S, Perruccio F, Mason J (2007) A common reference framework for analyzing/comparing proteins and ligands. Fingerprints for ligands and proteins (FLAP): theory and applications. J Chem Inf Model 47:279–294CrossRefGoogle Scholar
  2. 2.
    Böhm C, Berchtold S, Keim D (2001) Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput Surv 33(3):322–373CrossRefGoogle Scholar
  3. 3.
    Bustos B, Keim D, Saupe D, Schreck T, Vranić D (2006) An experimental effectiveness comparison of methods for 3D similarity search. Special issue on Multimedia Contents and Management in Digital Libraries. Int J Digit Libr 6(1):39–54CrossRefGoogle Scholar
  4. 4.
    Chávez E, Navarro G (2005) A compact space decomposition for effective metric indexing. Pattern Recogn Lett 26(9):1363–1376CrossRefGoogle Scholar
  5. 5.
    Chávez E, Navarro G, Baeza-Yates R, Marroquín J (2001) Searching in metric spaces. ACM Comput Surv 33(3):273–321CrossRefGoogle Scholar
  6. 6.
    Ciaccia P, Patella M., Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proc. 23rd Intl. Conf. on Very Large Databases (VLDB’97), Morgan Kaufmann, pp 426–435Google Scholar
  7. 7.
    Dohnal V, Gennaro C, Savino P, Zezula P (2003) D-index: distance searching index for metric data sets. Multimed Tool Appl 21(1):9–33CrossRefGoogle Scholar
  8. 8.
    Funkhouser T, Kazhdan M, Shilane P, Min P, Kiefer W, Tal A, Rusinkiewicz S, Dobkin D (2004) Modeling by example. ACM Trans Graph 23(3):652–663CrossRefGoogle Scholar
  9. 9.
    Gaede V, Günther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231CrossRefGoogle Scholar
  10. 10.
    Hettich S, Bay S (1999) The UCI KDD archive [http://kdd.ics.uci.edu]
  11. 11.
    Hinneburg A, Aggarwal C, Keim D (2000) What is the nearest neighbor in high dimensional spaces? In: Proc. 26th international conference on very large databases (VLDB’00), Morgan Kaufmann, pp 506–515Google Scholar
  12. 12.
    Hjaltason G, Samet H (1995) Ranking in spatial databases. In: Proc. 4th intl. symp. on advances in spatial databases, LNCS, vol 951. Springer, pp 83–95Google Scholar
  13. 13.
    Hjaltason G, Samet H (2000) Incremental similarity search in multimedia databases. Technical report CS-TR-4199, University of Maryland, Computer Science DepartmentGoogle Scholar
  14. 14.
    Hjaltason G, Samet H (2003) Index-driven similarity search in metric spaces. ACM Trans Database Syst 28(4):517–580CrossRefGoogle Scholar
  15. 15.
    Keim DA (1999) Efficient geometry-based similarity search of 3D spatial databases. In: Proc. ACM international conference on management of data (SIGMOD’99), ACM Press, pp 419–430Google Scholar
  16. 16.
    Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88CrossRefGoogle Scholar
  17. 17.
    Navarro G (2002) Searching in metric spaces by spatial approximation. The VLDB J 11(1):28–46CrossRefGoogle Scholar
  18. 18.
    Roussopoulos N, Kelley S, Vincent F (1995) Nearest neighbor queries. In: Proc. ACM international conference on management of data (SIGMOD’95), ACM Press, pp 71–79Google Scholar
  19. 19.
    Samet H (2003) Depth-first k-nearest neighbor finding using the MaxNearestDist estimator. In: Proc. 12th intl. conf. on image analysis and processing (ICIAP’03), IEEE Computer Society, pp 486–491Google Scholar
  20. 20.
    Samet H (2006) Foundations of multidimensional and metric data structures. Morgan Kaufmann, San Francisco, CA, USAMATHGoogle Scholar
  21. 21.
    Santos-Filho R, Traina A, Traina C Jr, Faloutsos C (2001) Similarity search without tears: the OMNI family of all-purpose access methods. In: Proc. 17th intl. conf. on data engineering (ICDE’01), IEEE Computer Society, pp 623–630Google Scholar
  22. 22.
    Uhlmann J (1991) Implementing metric trees to satisfy general proximity/similarity queries. In: Code 5570 NRL Memo Report, Naval Research LaboratoryGoogle Scholar
  23. 23.
    Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity search: the metric space approach (advances in database systems). Springer, New YorkMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Center for Web Research, Department of Computer ScienceUniversity of ChileSantiagoChile

Personalised recommendations