Determination of Similarity Threshold in Clustering Problems for Large Data Sets

  • Guillermo Sánchez-Díaz
  • José F. Martínez-Trinidad
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2905)


A new automatic method based on an intra-cluster criterion, to obtain a similarity threshold that generates a well-defined clustering (or near to it) for large data sets, is proposed. This method uses the connected component criterion, and it neither calculates nor stores the similarity matrix of the objects in main memory. The proposed method is focused on unsupervised Logical Combinatorial Pattern Recognition approach. In addition, some experimentations of the new method with large data sets are presented.


Similarity Matrix Cluster Problem Similarity Threshold Unsupervised Classification Cluster Criterion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley, New York (2000)Google Scholar
  2. 2.
    Martínez-Trinidad, J.F., Guzmán-Arenas, A.: The logical combinatorial approach to pattern recognition an overview through selected works. Pattern Recognition 34(4), 741–751 (2001)zbMATHCrossRefGoogle Scholar
  3. 3.
    Ruiz-Shulcloper, J., Mongi, A.: A. Logical Combinatorial Pattern Recognition: A Review. In: Pandalai (ed.) Recent Research Developments in Pattern Recognition, Pub. Transword Research Networks, USA (to appear)Google Scholar
  4. 4.
    Martinez Trinidad, J.F., Ruiz Shulcloper, J., Lazo Cortes, M.: Structuraliation of universes. Fuzzy Sets and Systems 112(3), 485–500 (2000)zbMATHMathSciNetCrossRefGoogle Scholar
  5. 5.
    Sanchez-Diaz, G., Ruiz-Shulcloper, J.: MID mining: a logical combinatorial pattern recognition approach to clustering large data sets. In: Proc. 5th Iberoamerican Symposium on Pattern Recognition, Lisbon, Portugal, pp. 475–483 (2000)Google Scholar
  6. 6.
    Pico Peña, R.: Determining the similarity threshold for clustering algorithms in the Logical Combinatorial Pattern Recognition through a dendograme. In: Proc. 4th Iberoamerican Simposium of Pattern Recognition, Havana Cuba, pp. 259–265 (1999)Google Scholar
  7. 7.
    Reyes Gonzales, R., Ruiz-Shulcloper, J.: An algorithm for restricted structuralization of spaces. In: Proc. 4th Iberoamerican Simposium of Pattern Recognition, Havana Cuba, pp. 267–278 (1999)Google Scholar
  8. 8.
    Ruiz-Shulcloper, J., Montellano-Ballesteros, J.: A new model of fuzzy clustering algorithms. In: Proc. of the 3rd EUFIT, Aachen, Germany, pp. 1484–1488 (1995)Google Scholar
  9. 9.
    Ruiz-Shulcloper, J., Sanchez-Diaz, G., Abidi, M.: Clustering Mixed Incomplete Data. Heuristics & Optimization for Knowledge Discovery, pp. 88–106. Idea Group Publishing, USA (2002)Google Scholar
  10. 10.
    Han, J., Kamber, M.: Data mining: concepts and techniques. The Morgan Kaufmann Series in Data Management Systems, Jim Gray Series Editor (2000)Google Scholar
  11. 11.
    Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. Department of Information and Computer Science. University of California, Irvine (1998), Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Guillermo Sánchez-Díaz
    • 1
  • José F. Martínez-Trinidad
    • 2
  1. 1.Center of Technologies Research on Information and SystemsThe Autonomous University of the Hidalgo StatePachucaMexico
  2. 2.National Institute of Astrophysics, Optics and ElectronicsPueblaMexico

Personalised recommendations