Skip to main content

Abstract

Information theoretic concepts, such as the mutual information, provide a general framework to detect and evaluate dependencies between variables. In this work, we describe and review several aspects of the mutual information as a measure of ‘distance’ between variables. Giving a brief overview over the mathematical background, including its recent generalization in the sense of Tsallis, our emphasis will be the numerical estimation of these quantities from finite datasets. The described concepts will be exemplified using large-scale gene expression data and compared to the results obtained from other measures, such as the Pearson Correlation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BRAZMA, A. and VILO J. (2000): Gene expression data analysis. FEBS Letters, 480, 17–24.

    Article  Google Scholar 

  • BUTTE, A.J. and KOHANE, I.S. (2000): Mutual information relevance networks: Functional genomic clustering using pairwise entropy measurements. Pacific Symposium on Biocomputing, 5, 415–426.

    Google Scholar 

  • COVER, T.M. and THOMAS, J.A. (1991): Elements of Information Theory. John Wiley, New York.

    Google Scholar 

  • CURADO, E.M.F. and TSALLIS, C. (1991): Generalized statistical mechanics: Connection with thermodynamics. J. Phys. A, 24, L69.

    Article  MathSciNet  Google Scholar 

  • D’HAESELEER, P., LIANG, S., and SOMOGYI, R. (2000): Genetic network inference: From co-expression clustering to reverse engineering. Bioinformatics, 16(8), 707–726.

    Article  Google Scholar 

  • EISEN, M.B, SPELLMAN, P.T., BROWN, P.O., and BOTSTEIN, D. (1998): Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, 95, 14863–14868.

    Article  Google Scholar 

  • FRASER, A.M. and SWINNEY, H.L. (1986): Independent coordinates for strange attractors from mutual information. Phys. Rev. A, 33(2), 2318–2321.

    Article  MathSciNet  Google Scholar 

  • GROSSE, I., HERZEL, H., BULDYREV, S.V., and STANLEY, H.E. (2000): Species independence of mutual information in coding and noncoding DNA. Phys. Rev. E, 61(5), 5624–5629.

    Article  Google Scholar 

  • HERWIG, R., POUSTKA, A.J., MUELLER, C, BULL, C, LEHRACH, H., and O’BRIAN, J. (1999): Large-scale clustering of cDNA-fingerprinting data. Genome Research, 9(11), 1093–1105.

    Article  Google Scholar 

  • HERZEL, H. and GROSSE, I. (1995): Measuring correlations in symbols sequences. Physica A, 216, 518–542.

    Article  MathSciNet  Google Scholar 

  • HERZEL, H. and GROSSE, I. (1997): Correlations in DNA sequences: The role of protein coding segments. Phys. Rev. E, 55(1), 800–810.

    Article  Google Scholar 

  • HERZEL, H., SCHMITT, A.O., and EBELING, W. (1994): Finite sample effects in sequence analysis. Chaos, Solitons & Fractals, 4(1), 97–113.

    Article  Google Scholar 

  • HUGHES, T.R. et al. (2000): Functional discovery via a compendium of expression profiles. Cell, 102, 109–126.

    Article  Google Scholar 

  • LIANG, S, FUHRMAN, S., and SOMOGYI, R. (1998): Reveal, a general reverse engineering algorithm for inference of genetic network architectures. Pacific Symposium on Biocomputing, 3, 18–29.

    Google Scholar 

  • MICHAELS, G.S., CARR, D.B., ASKENAZI, M., FUHRMAN, S., WEN, X., and SOMOGYI, R. (1998): Cluster analysis and data visualization of large-scale gene expression data. Pacific Symposium on Biocomputing, 3, 42–53.

    Google Scholar 

  • MOON, Y., RAJAGOPALAN, B., and LALL, U. (1995): Estimation of mutual information using kernel density estimators. Phys. Rev. E, 52(3), 2318–2321.

    Article  Google Scholar 

  • PRESS, W.H., TEUKOLSKY, S.A., VETTERLING, W.T., and FLANNERY, B.P. (1992): Numerical Recipes in C. Second edition, Cambridge University Press, Cambridge.

    Google Scholar 

  • SCHENA, M., SHALON, D., DAVIS, R.W., and BROWN, P.O. (1995): Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270, 467–470.

    Google Scholar 

  • SHANNON, C.E. (1948): A mathematical theory of communication. The Bell System Technical Journal, 27, 379–423, ibid. 623–656.

    MathSciNet  Google Scholar 

  • SILVERMAN, B.W. (1986): Density Estimation for Statistics and Data Analysis. Chapmann and Hall, London.

    Google Scholar 

  • SOMOGYI, R., FUHRMAN, S., and WEN, X. (2001): Genetic network inference in computational models and applications to large-scale gene expression data. In: J. M. Bower and H. Bolouri (Eds.): Computational Modeling of Genetic and Biochemical Networks. MIT Press, Cambridge, 129–157.

    Google Scholar 

  • STEUER, R., KURTHS, J, DAUB, CO, WEISE, J, and SELBIG, J. (2002): The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics, 18(Suppl. 2), 231–240.

    Google Scholar 

  • TSALLIS, C. (1998): Generalized entropy-based criterion for consistent testing. Phys. Rev. E, 58(2), 1442–1445.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Steuer, R., Daub, C.O., Selbig, J., Kurths, J. (2005). Measuring Distances Between Variables by Mutual Information. In: Baier, D., Wernecke, KD. (eds) Innovations in Classification, Data Science, and Information Systems. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-26981-9_11

Download citation

Publish with us

Policies and ethics