Abstract
Some metric indexes, like the pivot based family, can natively trade space for query time. Other indexes may have a small memory footprint and still outperform the pivot based approach; but are unable to increase the memory usage to boost the query time. In this paper we propose a new metric indexing technique with an algorithmic mechanism to lift the performance of otherwise rigid metric indexes.
We selected the well known List of Clusters (LC) as the base data structure, obtaining an index which is orders of magnitude faster to build, with memory usage adaptable to the intrinsic dimension of the data, and faster at query time than the original LC. We also present a nearest neighbor algorithm, of independent interest, which is optimal in the sense that requires the same number of distance computations as a range query with the radius of the nearest neighbor.
We present exhaustive experimental evidence supporting our claims, for both synthetic and real world datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Skopal, T.: Where are you heading, metric access methods?: a provocative survey. In: Proceedings of the Third International Conference on Similarity Search and Applications, SISAP 2010, pp. 13–21. ACM, New York (2010)
Samet, H.: Foundations of Multidimensional and Metric Data Structures, 1st edn. The morgan Kaufman Series in Computer Graphics and Geometic Modeling. Morgan Kaufmann Publishers, University of Maryland at College Park (2006)
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys 33(3), 322–373 (2001)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer-Verlag New York, Inc., Secaucus (2005)
Pestov, V.: Intrinsic dimension of a dataset: what properties does one expect? In: Proc. 20th Int. Joint Conf. on Neural Networks, Orlando, FL, pp. 1775–1780 (2007)
Pestov, V.: An axiomatic approach to intrinsic dimension of a dataset. Neural Networks 21(2-3), 204–213 (2008)
Pestov, V.: Indexability, concentration, and vc theory. In: Proceedings of the Third International Conference on Similarity Search and Applications, SISAP 2010, pp. 3–12. ACM, New York (2010)
Patella, M., Ciaccia, P.: Approximate similarity search: A multi-faceted problem. Journal of Discrete Algorithms 7(1), 36–48 (2009)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search - The Metric Space Approach, 1st edn. Advances in Database Systems, vol. 32. Springer (2006)
Amato, G., Rabitti, F., Savino, P., Zezula, P.: Region proximity in metric spaces and its use for approximate similarity search. ACM Trans. Inf. Syst. 21, 192–227 (2003)
Tellez, E.S., Chávez, E., Navarro, G.: Succinct nearest neighbor search. In: Proc. 4th International Workshop on Similarity Search and Applications (SISAP). ACM Press (2011)
Tellez, E.S., Chavez, E., Graff, M.: Scalable Pattern Search Analysis. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Ben-Youssef Brants, C., Hancock, E.R. (eds.) MCPR 2011. LNCS, vol. 6718, pp. 75–84. Springer, Heidelberg (2011)
Chavez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(9), 1647–1658 (2008)
Amato, G., Savino, P.: Approximate similarity search in metric spaces using inverted files. In: InfoScale 2008: Proceedings of the 3rd International Conference on Scalable Information Systems, ICST, Brussels, Belgium, pp. 1–10. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering) (2008)
Esuli, A.: Pp-index: Using permutation prefixes for efficient and scalable approximate similarity search. In: Proceedings of the 7th Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR 2009), Boston, USA, pp. 17–24 (2009)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB 1999: Proceedings of the 25th International Conference on Very Large Data Bases, pp. 518–529. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications ACM 51(1), 117–122 (2008)
Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recogn. Lett. 26, 1363–1376 (2005)
Kyselak, M., Novak, D., Zezula, P.: Stabilizing the recall in similarity search. In: Proceedings of the Fourth International Conference on Similarity Search and Applications, SISAP 2011, pp. 43–49. ACM, New York (2011)
Yianilos, P.N.: Excluded middle vantage point forests for nearest neighbor search. Technical report, NEC Research Institute, Princeton, NJ (July 1998)
Skopal, T.: Pivoting m-tree: A metric access method for efficient similarity search. In: DATESO 2004, pp. 27–37 (2004)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB 1997, pp. 426–435. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Micó, M.L., Oncina, J., Vidal, E.: A new version of the nearest-neighbour approximating and eliminating search algorithm (aesa) with linear preprocessing time and memory requirements. Pattern Recogn. Lett. 15, 9–17 (1994)
Hwang, F.K., Lin, S.: A simple algorithm for merging two disjoint linearly ordered sets. SIAM Journal Computing 1(1), 31–40 (1972)
Baeza-Yates, R.A.: A fast set intersection algorithm for sorted sequences. In: CPM, pp. 400–408 (2004)
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Adaptive set intersections, unions, and differences. In: SODA 2000: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 743–752. Society for Industrial and Applied Mathematics, Philadelphia (2000)
Barbay, J., Kenyon, C.: Adaptive intersection and t-threshold problems. In: Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 390–399. ACM-SIAM, ACM (January 2002)
Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., Rabitti, F.: CoPhIR: a test collection for content-based image retrieval. CoRR abs/0905.4627v2 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tellez, E.S., Chavez, E., Figueroa, K. (2012). Polyphasic Metric Index: Reaching the Practical Limits of Proximity Searching. In: Navarro, G., Pestov, V. (eds) Similarity Search and Applications. SISAP 2012. Lecture Notes in Computer Science, vol 7404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32153-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-32153-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32152-8
Online ISBN: 978-3-642-32153-5
eBook Packages: Computer ScienceComputer Science (R0)