Partitional vs Hierarchical Clustering Using a Minimum Grammar Complexity Approach

  • Ana L. N. Fred
  • José M. N. Leitão
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1876)

Abstract

This paper addresses the problem of structural clustering of string patterns. Adopting the grammar formalism for representing both individual sequences and sets of patterns, a partitional clustering algorithm is proposed. The performance of the new algorithm, taking as reference the corresponding hierarchical version, is analyzed in terms of computational complexity and data partitioning results. The new algorithm introduces great improvements in terms of computational efficiency, as demonstrated by theoretical analysis. Unlike the hierarchical approach, clustering results are dependent on the order of patterns’ presentation, which may lead to performance degradation. This effect, however, is overcome by adopting a resampling technique. Empirical evaluation of the methods is performed through application examples, by matching clusters between pairs of partitions and determining an index of clusters agreement.

Keywords

Cluster Algorithm String Match Partitional Algorithm Grammar Formalism Structural Resemblance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    H. Bunke. String matching for structural pattern recognition. In H. Bunke and A. Sanfeliu, editors, Syntactic and Structural Pattern Recognition, Theory and Applications, pages 119–144. World Scientific, 1990.Google Scholar
  2. 2.
    H. Bunke. Recent advances in string matching. In H. Bunke, editor, Advances in Structural and Syntactic Pattern Recognition, pages 107–116. World Scientific, 1992.Google Scholar
  3. 3.
    G. Cortelazzo, D. Deretta, G. A. Mian, and P. Zamperoni. Normalized weighted levensthein distance and triangle inequality in the context of similarity discrimination of bilevel images. Pattern Recognition Letters, 17:431–436, 1996.CrossRefGoogle Scholar
  4. 4.
    A. L. Fred. Clustering of sequences using a minimum grammar complexity criterion. In Grammatical Inference: Learning Syntax from Sentence, pages 107–116. Springer-Verlag, 1996.Google Scholar
  5. 5.
    A. L. Fred and J. Leitão. A minimum code length technique for clustering of syntactic patterns. In Proc. Of the 13th I APR Int’l Conference on Pattern Recognition, Vienna, August 1996.Google Scholar
  6. 6.
    A. L. Fred and J. Leitão. Solomonoff coding as a means of introducing prior information in syntactic pattern recognition. In Proc. Of the 12th IAPR Int’l Conference on Pattern Recognition, pages 14–18, 1994.Google Scholar
  7. 7.
    A. L. Fred and J. Leitão. A comparative study of string dissimilarity measures in structural clustering. In S. Singh, editor, International Conference on Advances in Pattern Recognition, pages 385–384. Springer, 1998.Google Scholar
  8. 8.
    K. S. Fu. Syntactic pattern recognition. In Handbook of Pattern Recognition and Image Processing, pages 85–117. Academic Press, 1986.Google Scholar
  9. 9.
    K. S. Fu and S. Y. Lu. A clustering procedure for syntactic patterns. IEEE Trans. Systems Man Cybernetics, 7(7):537–541, 1977.MathSciNetCrossRefGoogle Scholar
  10. 10.
    K. S. Fu and S. Y. Lu. Grammatical inference: Introduction and survey-part i and ii. IEEE Trans. Pattern Anal. and Machine Intelligence, 8(5):343–359, 1986.MATHMathSciNetCrossRefGoogle Scholar
  11. 11.
    A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.Google Scholar
  12. 12.
    S. Y. Lu and K. S. Fu. A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans. Systems Man Cybernetics, 8(5):381–389, 1978.MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    A. Marzal and E. Vidal. Computation of normalized edit distance and applications. IEEE Trans. Pattern Anal. and Machine Intelligence, 2(15):926–932, 1993.CrossRefGoogle Scholar
  14. 14.
    L. Miclet. Grammatical inference. In H. Bunke and A. Sanfeliu, editors, Syntactic and Structural Pattern Recognition-Theory and Applications, pages 237–290. Scientific Publishing, 1990.Google Scholar
  15. 15.
    B. J. Oomen and R. S. K. Loke. Pattern recognition of strings containing traditional and generalized transposition errors. In Int. Conf. on Systems, Men and Cybernetics, pages 1154–1159, 1995.Google Scholar
  16. 16.
    E. S. Ristad and P. N. Yianilos. Learning string-edit distance. IEEE Trans. on Pattern Analysis and Machine Intelligence, 20(5):522–531, May 1998.Google Scholar
  17. 17.
    R. J. Solomonoff. A formal theory of inductive inference (part i and ii). Information and Control, 7:1–22, 224–254, 1964.CrossRefMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Ana L. N. Fred
    • 1
    • 2
  • José M. N. Leitão
    • 1
    • 2
  1. 1.Instituto de Telecomuncações/Instituto Superior TécnicoLisboaPortugal
  2. 2.IST-Torre NorteLisboaPortugal

Personalised recommendations