The List of Clusters Revisited

Sadit Tellez, Eric; Chávez, Edgar

doi:10.1007/978-3-642-31149-9_19

Eric Sadit Tellez²⁰ &
Edgar Chávez²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7329))

Included in the following conference series:

Mexican Conference on Pattern Recognition

1488 Accesses
3 Citations

Abstract

One of the most efficient index for similarity search, to fix ideas think in speeding up k-nn searches in a very large database, is the so called list of clusters. This data structure is a counterintuitive construction which can be seen as extremely unbalanced, as opposed to balanced data structures for exact searching. In practical terms there is no better alternative for exact indexing, when every search return all the incumbent results; as opposed to approximate similarity search. The major drawback of the list of clusters is its quadratic time construction.

In this paper we revisit the list of clusters aiming at speeding up the construction time without sacrificing its efficiency. We obtain similar search times while gaining a significant amount of time in the construction phase.

Download to read the full chapter text

Chapter PDF

Distance-Based Index Structures for Fast Similarity Search

Article 27 July 2017

Efficient Cluster Detection by Ordered Neighborhoods

A Comparative Performance Analysis of Fast K-Means Clustering Algorithms

References

Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Article Google Scholar
Samet, H.: Foundations of Multidimensional and Metric Data Structures, 1st edn. The Morgan Kaufman Series in Computer Graphics and Geometic Modeling. Morgan Kaufmann Publishers, University of Maryland at College Park (2006)
Google Scholar
Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces (survey article). ACM Trans. Database Syst. 28(4), 517–580 (2003)
Article Google Scholar
Vidal Ruiz, E.: An algorithm for finding nearest neighbours in (approximately) constant average time. Pattern Recognition Letters 4, 145–157 (2005)
Article Google Scholar
Micó, M.L., Oncina, J., Vidal, E.: A new version of the nearest-neighbour approximating and eliminating search algorithm (aesa) with linear preprocessing time and memory requirements. Pattern Recogn. Lett. 15, 9–17 (1994)
Article Google Scholar
Chávez, E., Marroquin, J., Navarro, G.: Fixed queries array: A fast and economical data structure for proximity searching. Multimedia Tools and Applications (MTAP) 14(2), 113–135 (2001)
Article MATH Google Scholar
Chávez, E., Figueroa, K.: Faster Proximity Searching in Metric Data. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds.) MICAI 2004. LNCS (LNAI), vol. 2972, pp. 222–231. Springer, Heidelberg (2004)
Chapter Google Scholar
Chávez, E., Navarro, G.: A compact space decomposition for effective metric indexing. Pattern Recogn. Lett. 26, 1363–1376 (2005)
Article Google Scholar
Baeza-Yates, R., Navarro, G.: Fast approximate string matching in a dictionary. In: Proc. 5th International Symposium on String Processing and Information Retrieval (SPIRE), pp. 14–22. IEEE CS Press (1998)
Google Scholar
Uhlmann, J.: Satisfying general proximity/similarity queries with metric trees. Information Processing Letters 40(4), 175–179 (1991)
Article MATH Google Scholar
Burkhard, W., Keller, R.: Some approaches to best-match file searching. Communications of the ACM 16(4), 230–236 (1973)
Article MATH Google Scholar
Brin, S.: Near neighbor search in large metric spaces. In: Proceedings of the 21th International Conference on Very Large Data Bases, VLDB 1995, pp. 574–584. Morgan Kaufmann Publishers Inc., San Francisco (1995)
Google Scholar
Paredes, R., Reyes, N.: Solving similarity joins and range queries in metric spaces with the list of twin clusters. J. of Discrete Algorithms 7, 18–35 (2009)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Michoacana de San Nicolás de Hidalgo, México
Eric Sadit Tellez & Edgar Chávez

Authors

Eric Sadit Tellez
View author publications
You can also search for this author in PubMed Google Scholar
Edgar Chávez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Optics and Electronics (INAOE), Computer Science Department, National Institute for Astrophysics, Luis Enrique Erro No. 1, Sta. Maria Tonantzintla, 72840, Puebla, Mexico
Jesús Ariel Carrasco-Ochoa
Optics and Electronics (INAOE), Computer Science Department, National Institute of Astrophysics, Luis Enrique Erro No. 1, Sta. Maria Tonantzintla, 72840, Puebla, Mexico
José Francisco Martínez-Trinidad
Faculty of Computer Sciences, Autonomous University of Puebla, Av. San Claudio y 14 Sur, Ciudad Universitaria, C.P. 7257, Puebla, Mexico
José Arturo Olvera López
Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute, 110 Eighth Street, 12180, Troy, NY, USA
Kim L. Boyer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sadit Tellez, E., Chávez, E. (2012). The List of Clusters Revisited. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera López, J.A., Boyer, K.L. (eds) Pattern Recognition. MCPR 2012. Lecture Notes in Computer Science, vol 7329. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31149-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-31149-9_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31148-2
Online ISBN: 978-3-642-31149-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

The List of Clusters Revisited

Abstract

Chapter PDF

Similar content being viewed by others

Distance-Based Index Structures for Fast Similarity Search

Efficient Cluster Detection by Ordered Neighborhoods

A Comparative Performance Analysis of Fast K-Means Clustering Algorithms

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

The List of Clusters Revisited

Abstract

Chapter PDF

Similar content being viewed by others

Distance-Based Index Structures for Fast Similarity Search

Efficient Cluster Detection by Ordered Neighborhoods

A Comparative Performance Analysis of Fast K-Means Clustering Algorithms

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation