Abstract
Information theoretic concepts, such as the mutual information, provide a general framework to detect and evaluate dependencies between variables. In this work, we describe and review several aspects of the mutual information as a measure of ‘distance’ between variables. Giving a brief overview over the mathematical background, including its recent generalization in the sense of Tsallis, our emphasis will be the numerical estimation of these quantities from finite datasets. The described concepts will be exemplified using large-scale gene expression data and compared to the results obtained from other measures, such as the Pearson Correlation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
BRAZMA, A. and VILO J. (2000): Gene expression data analysis. FEBS Letters, 480, 17–24.
BUTTE, A.J. and KOHANE, I.S. (2000): Mutual information relevance networks: Functional genomic clustering using pairwise entropy measurements. Pacific Symposium on Biocomputing, 5, 415–426.
COVER, T.M. and THOMAS, J.A. (1991): Elements of Information Theory. John Wiley, New York.
CURADO, E.M.F. and TSALLIS, C. (1991): Generalized statistical mechanics: Connection with thermodynamics. J. Phys. A, 24, L69.
D’HAESELEER, P., LIANG, S., and SOMOGYI, R. (2000): Genetic network inference: From co-expression clustering to reverse engineering. Bioinformatics, 16(8), 707–726.
EISEN, M.B, SPELLMAN, P.T., BROWN, P.O., and BOTSTEIN, D. (1998): Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, 95, 14863–14868.
FRASER, A.M. and SWINNEY, H.L. (1986): Independent coordinates for strange attractors from mutual information. Phys. Rev. A, 33(2), 2318–2321.
GROSSE, I., HERZEL, H., BULDYREV, S.V., and STANLEY, H.E. (2000): Species independence of mutual information in coding and noncoding DNA. Phys. Rev. E, 61(5), 5624–5629.
HERWIG, R., POUSTKA, A.J., MUELLER, C, BULL, C, LEHRACH, H., and O’BRIAN, J. (1999): Large-scale clustering of cDNA-fingerprinting data. Genome Research, 9(11), 1093–1105.
HERZEL, H. and GROSSE, I. (1995): Measuring correlations in symbols sequences. Physica A, 216, 518–542.
HERZEL, H. and GROSSE, I. (1997): Correlations in DNA sequences: The role of protein coding segments. Phys. Rev. E, 55(1), 800–810.
HERZEL, H., SCHMITT, A.O., and EBELING, W. (1994): Finite sample effects in sequence analysis. Chaos, Solitons & Fractals, 4(1), 97–113.
HUGHES, T.R. et al. (2000): Functional discovery via a compendium of expression profiles. Cell, 102, 109–126.
LIANG, S, FUHRMAN, S., and SOMOGYI, R. (1998): Reveal, a general reverse engineering algorithm for inference of genetic network architectures. Pacific Symposium on Biocomputing, 3, 18–29.
MICHAELS, G.S., CARR, D.B., ASKENAZI, M., FUHRMAN, S., WEN, X., and SOMOGYI, R. (1998): Cluster analysis and data visualization of large-scale gene expression data. Pacific Symposium on Biocomputing, 3, 42–53.
MOON, Y., RAJAGOPALAN, B., and LALL, U. (1995): Estimation of mutual information using kernel density estimators. Phys. Rev. E, 52(3), 2318–2321.
PRESS, W.H., TEUKOLSKY, S.A., VETTERLING, W.T., and FLANNERY, B.P. (1992): Numerical Recipes in C. Second edition, Cambridge University Press, Cambridge.
SCHENA, M., SHALON, D., DAVIS, R.W., and BROWN, P.O. (1995): Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270, 467–470.
SHANNON, C.E. (1948): A mathematical theory of communication. The Bell System Technical Journal, 27, 379–423, ibid. 623–656.
SILVERMAN, B.W. (1986): Density Estimation for Statistics and Data Analysis. Chapmann and Hall, London.
SOMOGYI, R., FUHRMAN, S., and WEN, X. (2001): Genetic network inference in computational models and applications to large-scale gene expression data. In: J. M. Bower and H. Bolouri (Eds.): Computational Modeling of Genetic and Biochemical Networks. MIT Press, Cambridge, 129–157.
STEUER, R., KURTHS, J, DAUB, CO, WEISE, J, and SELBIG, J. (2002): The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics, 18(Suppl. 2), 231–240.
TSALLIS, C. (1998): Generalized entropy-based criterion for consistent testing. Phys. Rev. E, 58(2), 1442–1445.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Steuer, R., Daub, C.O., Selbig, J., Kurths, J. (2005). Measuring Distances Between Variables by Mutual Information. In: Baier, D., Wernecke, KD. (eds) Innovations in Classification, Data Science, and Information Systems. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-26981-9_11
Download citation
DOI: https://doi.org/10.1007/3-540-26981-9_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23221-6
Online ISBN: 978-3-540-26981-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)