Automation and Remote Control

, Volume 75, Issue 7, pp 1309–1315 | Cite as

Hierarchical clustering of text documents

  • L. S. LomakinaEmail author
  • V. B. Rodionov
  • A. S. Surkova
Control Systems and Information Technologies


We consider the possibility to use compression algorithms to compute similarity distances in order to solve the clustering problem. We propose an actual hierarchical clustering machine that constructs a binary tree of object dependencies similar to a taxonomy.


Remote Control Text Document Compression Algorithm Cluster Problem Similarity Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bennett, C.H., Gacs, P., Li, M., Vitanyi, P.M.B., and Zurek, W., Information Distance, IEEE Trans. Inf. Theory, 1998, vol. 44, no. 4, pp. 1407–1423.CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Li, M., Chen, X., Li, X., Ma, B., and Vitanyi, P.M.B., The Similarity Metric, IEEE Trans. Inf. Theory, 2004, vol. 50, no. 12, pp. 3250–3264.CrossRefMathSciNetGoogle Scholar
  3. 3.
    Cilibrasi, R. and Vitanyi, P.M.B., Clustering by Compression, IEEE Trans. Inf. Theory, 2005, vol. 51, no. 4, pp. 1523–1545.CrossRefMathSciNetGoogle Scholar
  4. 4.
    Thaper, N.. Using Compression for Source Based Classification of Text, Master’s Thesis, MIT, 2001.Google Scholar

Copyright information

© Pleiades Publishing, Ltd. 2014

Authors and Affiliations

  • L. S. Lomakina
    • 1
    Email author
  • V. B. Rodionov
    • 1
  • A. S. Surkova
    • 1
  1. 1.Alexeev Nizhni Novgorod State Technical UniversityNizhni NovgorodRussia

Personalised recommendations