Classifying Stem Cell Differentiation Images by Information Distance

  • Xianglilan Zhang
  • Hongnan Wang
  • Tony J. Collins
  • Zhigang Luo
  • Ming Li
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7523)


The ability of stem cells holds great potential for drug discovery and cell replacement therapy. To realize this potential, effective high content screening for drug candidates is required. Analysis of images from high content screening typically requires DNA staining to identify cell nuclei to do cell segmentation before feature extraction and classification. However, DNA staining has negative effects on cell growth, and segmentation algorithms err when compound treatments cause nuclear or cell swelling/shrinkage. In this paper, we introduced a novel Information Distance Classification (IDC) method, requiring no segmentation or feature extraction; hence no DNA staining is needed. In classifying 480 candidate compounds that may be used to stimulate stem cell differentiation, the proposed IDC method was demonstrated to achieve a 3% higher F1 score than conventional analysis. As far as we know, this is the first work to apply information distance in high content screening.


information distance stem cell differentiation image classification compound classification 


  1. 1.
    Jaenisch, R., Young, R.: Stem Cells, the Molecular Circuitry of Pluripotency and Nuclear Reprogramming. Cell 132, 567–582 (2008)CrossRefGoogle Scholar
  2. 2.
    Ding, S., Wu, T.Y.H., Brinker, A., Peters, E.C., Hur, W., Gray, N.S., Schultz, P.G.: Synthetic small molecules that control stem cell fate. PNAS 100, 7632–7637 (2003)CrossRefGoogle Scholar
  3. 3.
    Ljosa, V., Carpenter, A.E.: Introduction to the Quantitative Analysis of Two-Dimensional Fluorescence Microscopy Images for Cell-Based Screening. Plos Computational Biology 5, 1–10 (2009)CrossRefGoogle Scholar
  4. 4.
    Li, M., Vitanyi, P.: An introduction to Komogorov complexity and its applications. Springer, New York (1997)Google Scholar
  5. 5.
    Bennett, C.H., Gacs, P., Li, M., Vitanyi, P., Zurek, W.: Information Distance. IEEE Trans. Inform. Theory 44, 1407–1423 (1993)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Arbuchle, T., Balaban, A., Peters, D.K., Lawford, M.: Software documents: comparison and measurement. In: Proceeding 18th International Conference on Software Engineering & Knowledge Engineering, Boston, USA, pp. 740–748 (2007)Google Scholar
  7. 7.
    Anë, C., Sanderson, M.J.: Missing the Forest for the Trees: Phylogenetic Compression and Its Implications for Inferring Complex Evolutionary Histories. J. Sys. Biol. 54, 146–157 (2005)CrossRefGoogle Scholar
  8. 8.
    Campana, B.J.L., Keogh, E.J.: A Compression-Based Distance Measure for Texture. J. Statistical Analysis and Data Mining 3, 381–398 (2010)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Cerra, D., Mallet, A., Gueguen, L., Datcu, M.: Algorithmic Information Theory-Based Analysis of Earth Observation Images: An Assessment. J. IEEE Geoscience and Remote Sensing Letters 7, 8–12 (2010)CrossRefGoogle Scholar
  10. 10.
    Chen, X., Francia, B., Li, M., Mckinnon, B., Seker, A.: Shared information and program plagiarism detection. IEEE Trans. Info. Theory 50, 1545–1550 (2004)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Cilibrasi, R., Vitänyi, P.M.B., de Wolf, R.: Algorithmic clustring of music based on string compression. J. Comput. Music 28, 49–67 (2004)CrossRefGoogle Scholar
  12. 12.
    Cilibrasi, R., Vitänyi, P.M.B.: Clustering by compression. IEEE Trans. Knowledge & Data Engineering 19, 370–383 (2007)CrossRefGoogle Scholar
  13. 13.
    Cohen, A.R., Bjornsson, C.S., Temple, S., Banker, G., Roysam, B.: Automatic Summarization of Changes in Biological Image Sequences Using Algorithmic Information Theory. IEEE Trans. Pattern Analysis & Machine Intelligence 31, 1386–1403 (2009)CrossRefGoogle Scholar
  14. 14.
    Cuturi, M., Vert, J.P.: The context-tree kernel for strings. Neural Networks 18, 1111–1123 (2005)CrossRefGoogle Scholar
  15. 15.
    Benedetto, D., Caglioti, E., Loreto, V.: Language trees and zipping. Phys. Rev. Lett. 88, 48702 (2002)CrossRefGoogle Scholar
  16. 16.
    Kocsor, A., Kertesz, F.A., Kajan, L., Pongor, S.: Application of compression-based distance measures to protein sequence classification: a methodology study. Bioinformatics 22, 407–412 (2006)CrossRefGoogle Scholar
  17. 17.
    Kirk, S.R., Jenkins, S.: Information theory-based software metrics and obfuscation. J. Systems and Software 72, 179–186 (2004)CrossRefGoogle Scholar
  18. 18.
    Krasnogor, N., Pelta, D.A.: Measuring the similarity of protein structures by means of the universal similarity metric. Bioinformatics 20, 1015–1021 (2004)CrossRefGoogle Scholar
  19. 19.
    Kraskov, A., Stögbauer, H., Andrzejak, R.G., Grassberger, P.: Hierarchical clustering using mutual information. Europhys. Lett. 70, 278–284 (2005)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Li, M., Badger, J.H., Chen, X., Kwong, S., Kearney, P., Zhang, H.Y.: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17, 149–154 (2001)CrossRefGoogle Scholar
  21. 21.
    Nikvand, N., Wang, Z.: Generic Image Similarity Based on Kolmogorov Complexity. In: 17th IEEE International Conference on Image Processing, pp. 309–312. IEEE Press, Hong Kong (2010)Google Scholar
  22. 22.
    Otu, H.H., Sayood, K.: A new sequence distance measure for phy6logenetic tree construction. Bioinformatics 19, 2122–2130 (2003)CrossRefGoogle Scholar
  23. 23.
    Pao, H.K., Case, J.: Computing entropy for ortholog detection. In: International Conference on Computational Intelligence, Istanbul, Turkey, pp. 89–92 (2004)Google Scholar
  24. 24.
    Parry, D.: Use of Kolmogorov distance identification of web page authorship, topic and domain. In: Workshop on Open Source Web Information Retrieval (2005)Google Scholar
  25. 25.
    Perkiö, J., Hyvärinen, A.: Modelling Image Complexity by Independent Component Analysis, with Application to Content-Based Image Retrieval. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009, Part II. LNCS, vol. 5769, pp. 704–714. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  26. 26.
    Santos, C.C., Bernardes, J., Vitänyi, P.M.B., Antunes, L.: Clustering fetal heart rate tracings by compression. In: Proceeding 19th IEEE Internation Symposium Computer-Based Medical Systems, Salt Lake City, pp. 22–23 (2006)Google Scholar
  27. 27.
    Zhang, X., Hao, Y., Zhu, X.Y., Li, M.: Information Distance from a Question to an Answer. In: KDD, San Jose, pp. 12–15 (2007)Google Scholar
  28. 28.
    Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removel algorithms. Physica D 60, 259–268 (1992)zbMATHCrossRefGoogle Scholar
  29. 29.
    Panchal, R.G., Kota, K.P., Spurgers, K.B., Ruthel, G., Tran, J.P., Boltz, R.C., Bavari, S.: Development of High-Content Imaging Assays for Lethal Viral Pathogens. J. Biomol. Screen 15, 755–765 (2010)CrossRefGoogle Scholar
  30. 30.
    Gall, D.L.: Mpeg: a video compression standard for multimedia application. Commun. ACM 34, 46–58 (1991)CrossRefGoogle Scholar
  31. 31.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Xianglilan Zhang
    • 1
    • 2
    • 3
  • Hongnan Wang
    • 1
    • 2
    • 3
  • Tony J. Collins
    • 1
    • 2
    • 3
  • Zhigang Luo
    • 1
    • 2
    • 3
  • Ming Li
    • 1
    • 2
    • 3
  1. 1.School of ComputerNational University of Defense TechnologyChangshaChina
  2. 2.David R. Cheriton School of Computer ScienceUniversity of WaterlooWaterlooCanada
  3. 3.Stem Cell and Cancer Research InstituteMcmaster UniversityHamiltonCanada

Personalised recommendations