We summarize the recent developments of a general theory of information distance and its applications in whole genome phylogeny, document comparison, internet query-answer systems, and many other data mining tasks. We also solve an open problem regarding the universality of the normalized information distance.


Light Bulb Information Distance Kolmogorov Complexity Data Mining Task Computable Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ané, C., Sanderson, M.J.: Missing the Forest for the Trees: Phylogenetic Compression and Its Implications for Inferring Complex Evolutionary Histories. Systematic Biology 54(1), 146–157 (2005)CrossRefGoogle Scholar
  2. 2.
    Bennett, C.H., Gacs, P., Li, M., Vitanyi, P., Zurek, W.: Information Distance. IEEE Trans. Inform. Theory 44(4), 1407–1423 (1998) (STOC, 1993)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Bennett, C.H., Li, M., Ma, B.: Chain letters and evolutionary histories. Scientific American 288(6), 76–81 (2003) (feature article)CrossRefGoogle Scholar
  4. 4.
    Benedetto, D., Caglioti, E., Loreto, V.: Language trees and zipping. Phys. Rev. Lett. 88(4), 048702 (2002)CrossRefGoogle Scholar
  5. 5.
    Chen, X., Francia, B., Li, M., Mckinnon, B., Seker, A.: Shared information and program plagiarism detection. IEEE Trans. Information Theory 50(7), 1545–1550 (2004)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Chernov, A.V., Muchnik, A., Romashchenko, A.E., Shen, A.K., Vereshchagin, N.K.: Upper semi-lattice of binary strings with the relation x is simple conditional to y. Theoret. Comput. Sci. 271, 69–95 (2002)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Cilibrasi, R., Vitányi, P.M.B., de Wolf, R.: Algorithmic clustring of music based on string compression. Comput. Music J. 28(4), 49–67 (2004)CrossRefGoogle Scholar
  8. 8.
    Cilibrasi, R., Vitányi, P.M.B.: Automatic semantics using Google (manuscript, 2005) (2004),
  9. 9.
    Cilibrasi, R., Vitányi, P.M.B.: Clustering by compression. IEEE Trans. Inform. Theory 51(4), 1523–1545 (2005)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Cuturi, M., Vert, J.P.: The context-tree kernel for strings. Neural Networks 18(4), 1111–1123 (2005)CrossRefGoogle Scholar
  11. 11.
    Emanuel, K., Ravela, S., Vivant, E., Risi, C.: A combined statistical-deterministic approach of hurricane risk assessment. Program in Atmospheres, Oceans, and Climate. MIT (manuscript, 2005)Google Scholar
  12. 12.
    Hao, Y., Zhang, X., Zhu, X., Li, M.: Conditional normalized information distance (manuscript, 2006)Google Scholar
  13. 13.
    Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards parameter-free data mining. In: KDD 2004, pp. 206–215 (2004)Google Scholar
  14. 14.
    Kirk, S.R., Jenkins, S.: Information theory-based software metrics and obfuscation. J. Systems and Software 72, 179–186 (2004)CrossRefGoogle Scholar
  15. 15.
    Kraskov, A., Stögbauer, H., Andrzejak, R.G., Grassberger, P.: Hierarchical clustering using mutual information. Europhys. Lett. 70(2), 278–284 (2005)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Kocsor, A., Kertesz-Farkas, A., Kajan, L., Pongor, S.: Application of compression-based distance measures to protein sequence classification: a methodology study. Bioinformatics 22(4), 407–412 (2006)CrossRefGoogle Scholar
  17. 17.
    Krasnogor, N., Pelta, D.A.: Measuring the similarity of protein structures by means of the universal similarity metric. Bioinformatics 20(7), 1015–1021 (2004)CrossRefGoogle Scholar
  18. 18.
    Li, M., Badger, J., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17(2), 149–154 (2001)CrossRefGoogle Scholar
  19. 19.
    Li, M., Chen, X., Li, X., Ma, B., Vitanyi, P.M.B.: The similarity metric. IEEE Trans. Information Theory 50(12), 3250–3264 (2004)CrossRefMathSciNetGoogle Scholar
  20. 20.
    Li, M., Vitanyi, P.: An introduction to Kolmogorov complexity and its applications, 2nd edn., p. 637. Springer, Heidelberg (1997)MATHGoogle Scholar
  21. 21.
    Muchnik, A.: Conditional comlexity and codes. Theoretical Computer Science 271(1), 97–109 (2002)MATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Muchnik, A., Vereshchagin, N.K.: Logical operations and Kolmogorov complexity II. In: Proc. 16th Conf. Comput. Complexity, pp. 256–265 (2001)Google Scholar
  23. 23.
    Otu, H.H., Sayood, K.: Bioinformatics. A new sequence distance measure for phylogenetic tree construction 19(6), 2122–2130 (2003)Google Scholar
  24. 24.
    Pao, H.K., Case, J.: Computing entropy for ortholog detection. In: Int’l Conf. Comput. Intell., Istanbul Turkey, December 17-19 (2004)Google Scholar
  25. 25.
    Parry, D.: Use of Kolmogorov distance identification of web page authorship, topic and domain. In: Workshop on Open Source Web Inf. Retrieval (2005),
  26. 26.
    Costa Santos, C., Bernardes, J., Vitányi, P.M.B., Antunes, L.: Clustering fetal heart rate tracings by compression. In: Proc. 19th IEEE Intn’l Symp. Computer-Based Medical Systems, Salt Lake City, Utah, June 22-23 (2006)Google Scholar
  27. 27.
    Shen, A.K., Vereshchagin, N.K.: Logical operations and Kolmogorov complexity. Theoret. Comput. Sci. 271, 125–129 (2002)MATHCrossRefMathSciNetGoogle Scholar
  28. 28.
    Taha, W., Crosby, S., Swadi, K.: A new approach to data mining for software design, Rice Univ. (manuscript, 2006)Google Scholar
  29. 29.
    Varre, J.S., Delahaye, J.P., Rivals, E.: Transformation distances: a family of dissimilarity measures based on movements of segments. Bioinformatics 15(3), 194–202 (1999)CrossRefGoogle Scholar
  30. 30.
    Vereshchagin, N.K., V’yugin, M.V.: Independent minimum length programs to translate between given strings. Theoret. Comput. Sci. 271, 131–143 (2002)MATHCrossRefMathSciNetGoogle Scholar
  31. 31.
    V’yugin, M.V.: Information distance and conditional complexities. Theoret. Comput. Sci. 271, 145–150 (2002)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ming Li
    • 1
  1. 1.School of Computer ScienceUniversity of WaterlooWaterlooCanada

Personalised recommendations