Abstract
A new automatic method based on an intra-cluster criterion, to obtain a similarity threshold that generates a well-defined clustering (or near to it) for large data sets, is proposed. This method uses the connected component criterion, and it neither calculates nor stores the similarity matrix of the objects in main memory. The proposed method is focused on unsupervised Logical Combinatorial Pattern Recognition approach. In addition, some experimentations of the new method with large data sets are presented.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley, New York (2000)
Martínez-Trinidad, J.F., Guzmán-Arenas, A.: The logical combinatorial approach to pattern recognition an overview through selected works. Pattern Recognition 34(4), 741–751 (2001)
Ruiz-Shulcloper, J., Mongi, A.: A. Logical Combinatorial Pattern Recognition: A Review. In: Pandalai (ed.) Recent Research Developments in Pattern Recognition, Pub. Transword Research Networks, USA (to appear)
Martinez Trinidad, J.F., Ruiz Shulcloper, J., Lazo Cortes, M.: Structuraliation of universes. Fuzzy Sets and Systems 112(3), 485–500 (2000)
Sanchez-Diaz, G., Ruiz-Shulcloper, J.: MID mining: a logical combinatorial pattern recognition approach to clustering large data sets. In: Proc. 5th Iberoamerican Symposium on Pattern Recognition, Lisbon, Portugal, pp. 475–483 (2000)
Pico Peña, R.: Determining the similarity threshold for clustering algorithms in the Logical Combinatorial Pattern Recognition through a dendograme. In: Proc. 4th Iberoamerican Simposium of Pattern Recognition, Havana Cuba, pp. 259–265 (1999)
Reyes Gonzales, R., Ruiz-Shulcloper, J.: An algorithm for restricted structuralization of spaces. In: Proc. 4th Iberoamerican Simposium of Pattern Recognition, Havana Cuba, pp. 267–278 (1999)
Ruiz-Shulcloper, J., Montellano-Ballesteros, J.: A new model of fuzzy clustering algorithms. In: Proc. of the 3rd EUFIT, Aachen, Germany, pp. 1484–1488 (1995)
Ruiz-Shulcloper, J., Sanchez-Diaz, G., Abidi, M.: Clustering Mixed Incomplete Data. Heuristics & Optimization for Knowledge Discovery, pp. 88–106. Idea Group Publishing, USA (2002)
Han, J., Kamber, M.: Data mining: concepts and techniques. The Morgan Kaufmann Series in Data Management Systems, Jim Gray Series Editor (2000)
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. Department of Information and Computer Science. University of California, Irvine (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sánchez-Díaz, G., Martínez-Trinidad, J.F. (2003). Determination of Similarity Threshold in Clustering Problems for Large Data Sets. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds) Progress in Pattern Recognition, Speech and Image Analysis. CIARP 2003. Lecture Notes in Computer Science, vol 2905. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24586-5_75
Download citation
DOI: https://doi.org/10.1007/978-3-540-24586-5_75
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20590-6
Online ISBN: 978-3-540-24586-5
eBook Packages: Springer Book Archive