Skip to main content

Polyphasic Metric Index: Reaching the Practical Limits of Proximity Searching

  • Conference paper
Similarity Search and Applications (SISAP 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7404))

Included in the following conference series:

Abstract

Some metric indexes, like the pivot based family, can natively trade space for query time. Other indexes may have a small memory footprint and still outperform the pivot based approach; but are unable to increase the memory usage to boost the query time. In this paper we propose a new metric indexing technique with an algorithmic mechanism to lift the performance of otherwise rigid metric indexes.

We selected the well known List of Clusters (LC) as the base data structure, obtaining an index which is orders of magnitude faster to build, with memory usage adaptable to the intrinsic dimension of the data, and faster at query time than the original LC. We also present a nearest neighbor algorithm, of independent interest, which is optimal in the sense that requires the same number of distance computations as a range query with the radius of the nearest neighbor.

We present exhaustive experimental evidence supporting our claims, for both synthetic and real world datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Skopal, T.: Where are you heading, metric access methods?: a provocative survey. In: Proceedings of the Third International Conference on Similarity Search and Applications, SISAP 2010, pp. 13–21. ACM, New York (2010)

    Chapter  Google Scholar 

  2. Samet, H.: Foundations of Multidimensional and Metric Data Structures, 1st edn. The morgan Kaufman Series in Computer Graphics and Geometic Modeling. Morgan Kaufmann Publishers, University of Maryland at College Park (2006)

    Google Scholar 

  3. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)

    Article  Google Scholar 

  4. Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys 33(3), 322–373 (2001)

    Article  Google Scholar 

  5. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer-Verlag New York, Inc., Secaucus (2005)

    Google Scholar 

  6. Pestov, V.: Intrinsic dimension of a dataset: what properties does one expect? In: Proc. 20th Int. Joint Conf. on Neural Networks, Orlando, FL, pp. 1775–1780 (2007)

    Google Scholar 

  7. Pestov, V.: An axiomatic approach to intrinsic dimension of a dataset. Neural Networks 21(2-3), 204–213 (2008)

    Article  MathSciNet  Google Scholar 

  8. Pestov, V.: Indexability, concentration, and vc theory. In: Proceedings of the Third International Conference on Similarity Search and Applications, SISAP 2010, pp. 3–12. ACM, New York (2010)

    Chapter  Google Scholar 

  9. Patella, M., Ciaccia, P.: Approximate similarity search: A multi-faceted problem. Journal of Discrete Algorithms 7(1), 36–48 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  10. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search - The Metric Space Approach, 1st edn. Advances in Database Systems, vol. 32. Springer (2006)

    Google Scholar 

  11. Amato, G., Rabitti, F., Savino, P., Zezula, P.: Region proximity in metric spaces and its use for approximate similarity search. ACM Trans. Inf. Syst. 21, 192–227 (2003)

    Article  Google Scholar 

  12. Tellez, E.S., Chávez, E., Navarro, G.: Succinct nearest neighbor search. In: Proc. 4th International Workshop on Similarity Search and Applications (SISAP). ACM Press (2011)

    Google Scholar 

  13. Tellez, E.S., Chavez, E., Graff, M.: Scalable Pattern Search Analysis. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Ben-Youssef Brants, C., Hancock, E.R. (eds.) MCPR 2011. LNCS, vol. 6718, pp. 75–84. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  14. Chavez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(9), 1647–1658 (2008)

    Article  Google Scholar 

  15. Amato, G., Savino, P.: Approximate similarity search in metric spaces using inverted files. In: InfoScale 2008: Proceedings of the 3rd International Conference on Scalable Information Systems, ICST, Brussels, Belgium, pp. 1–10. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering) (2008)

    Google Scholar 

  16. Esuli, A.: Pp-index: Using permutation prefixes for efficient and scalable approximate similarity search. In: Proceedings of the 7th Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR 2009), Boston, USA, pp. 17–24 (2009)

    Google Scholar 

  17. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB 1999: Proceedings of the 25th International Conference on Very Large Data Bases, pp. 518–529. Morgan Kaufmann Publishers Inc., San Francisco (1999)

    Google Scholar 

  18. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications ACM 51(1), 117–122 (2008)

    Article  Google Scholar 

  19. Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recogn. Lett. 26, 1363–1376 (2005)

    Article  Google Scholar 

  20. Kyselak, M., Novak, D., Zezula, P.: Stabilizing the recall in similarity search. In: Proceedings of the Fourth International Conference on Similarity Search and Applications, SISAP 2011, pp. 43–49. ACM, New York (2011)

    Chapter  Google Scholar 

  21. Yianilos, P.N.: Excluded middle vantage point forests for nearest neighbor search. Technical report, NEC Research Institute, Princeton, NJ (July 1998)

    Google Scholar 

  22. Skopal, T.: Pivoting m-tree: A metric access method for efficient similarity search. In: DATESO 2004, pp. 27–37 (2004)

    Google Scholar 

  23. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB 1997, pp. 426–435. Morgan Kaufmann Publishers Inc., San Francisco (1997)

    Google Scholar 

  24. Micó, M.L., Oncina, J., Vidal, E.: A new version of the nearest-neighbour approximating and eliminating search algorithm (aesa) with linear preprocessing time and memory requirements. Pattern Recogn. Lett. 15, 9–17 (1994)

    Article  Google Scholar 

  25. Hwang, F.K., Lin, S.: A simple algorithm for merging two disjoint linearly ordered sets. SIAM Journal Computing 1(1), 31–40 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  26. Baeza-Yates, R.A.: A fast set intersection algorithm for sorted sequences. In: CPM, pp. 400–408 (2004)

    Google Scholar 

  27. Demaine, E.D., López-Ortiz, A., Munro, J.I.: Adaptive set intersections, unions, and differences. In: SODA 2000: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 743–752. Society for Industrial and Applied Mathematics, Philadelphia (2000)

    Google Scholar 

  28. Barbay, J., Kenyon, C.: Adaptive intersection and t-threshold problems. In: Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 390–399. ACM-SIAM, ACM (January 2002)

    Google Scholar 

  29. Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., Rabitti, F.: CoPhIR: a test collection for content-based image retrieval. CoRR abs/0905.4627v2 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tellez, E.S., Chavez, E., Figueroa, K. (2012). Polyphasic Metric Index: Reaching the Practical Limits of Proximity Searching. In: Navarro, G., Pestov, V. (eds) Similarity Search and Applications. SISAP 2012. Lecture Notes in Computer Science, vol 7404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32153-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32153-5_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32152-8

  • Online ISBN: 978-3-642-32153-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics