Abstract
Categorical variables take on non-numeric values, e.g., a discretized numeric variable can be interpreted as a categorical variable. Many association measures exist for measuring the statistical dependence between categorical variables (e.g., Pearson chi-square statistic, likelihood ratio test statistic, Fisher’s exact test, mutual information). Association measures between two discretized numeric vectors have been used to measure nonlinear dependencies between them. We describe several approaches for defining weighted networks among categorical variables. In particular, mutual information networks are often used for constructing gene networks. The close relationship between mutual information and a likelihood ratio test statistic allows us to define a conditional measure of mutual information, which accounts for additional covariates. Estimating the mutual information between numeric variables is rather challenging and involves parameter choices. We argue that in many applications the mutual information measure can be approximated by a correlation-based measure. We review the ARACNE approach for constructing an unweighted mutual information network and generalize it to correlation networks and other association networks.
Keywords
- Mutual Information
- Adjacency Matrix
- Correlation Network
- Likelihood Ratio Test Statistic
- Association Measure
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agresti A (2007) An introduction to categorical data analysis (wiley series in probability and statistics), 2nd edn. Wiley, New York
Beirlant J, Dudewica EJ, Gyofi L, Meulen E (1997) Nonparametric entropy estimation: An overview. Int J Math Stat Sci 6(1):17–39
Butte AJ, Kohane IS (2000) Mutual information relevance networks: Functional genomic clustering using pairwise entropy measurments. Pac Symp Biocomput 5:418–429
Butte A, Tamayo P, Slonim D, Golub T, Kohane I (2000) Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc Natl Acad Sci USA 97:12182–12186
Cheng J, Greiner R, Kelly J, Bell D, Liu W (2002) Learning bayesian networks from data: An information-theory based approach. Artif Intell 137(1–2):43–90
Chow C, Liu C (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14:462–467
Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York
Darbellay G, Vajda I (1999) Estimation of the information by an adaptive partitioning of the observation space. IEEE Trans Inf Theory 45:1315–1321
Daub CO, Steuer R, Selbig J, Kloska S (2004) Estimating mutual information using B-spline functions - an improved similarity measure for analysing gene expression data. BMC Bioinform 5(1):118
Fraser AM, Swinney HL (1986) Independent coordinates for strange attractors from mutual information. Phys Rev A 33(2):1134–1140
Hausser J, Strimmer K (2008) Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks. J Mach Learn Res 10(July 2009):1469–1484
Kraskov A, Stogbauer H, andrzejak R, Grassberger P (2003) Hierarchical clustering based on mutual information CoRR q-bio.QM/0311037
Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, Califano A (2006) ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinform 7(Suppl. 1):S7
Mason M, Fan G, Plath K, Zhou Q, Horvath S (2009) Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells. BMC Genomics 10(1):327
Meyer P, Lafitte F, Bontempi G (2008) minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinform 9(1):461
Nemenman I (2004) Information theory, multivariate dependence, and genetic network inference. Technical Report. NSF-KITP-04-54, KITP, UCSB. arXiv: q-bio/0406015
Paninski L (2003) Estimation of entropy and mutual information. Neural Comput 15(6):1191–1253
Shannon CE (1948) A mathematical theory of communication. CSLI Publications, Stanford, CA
Steuer R, Kurths J, Daub CO, Weise J, Selbig J (2002) The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics 18(Suppl 2):S231–S240
Wiggins C, Nemenman I (2003) Process pathway inference via time series analysis. Exp Mech 43(3):361–370
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Horvath, S. (2011). Networks Between Categorical or Discretized Numeric Variables. In: Weighted Network Analysis. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-8819-5_14
Download citation
DOI: https://doi.org/10.1007/978-1-4419-8819-5_14
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-8818-8
Online ISBN: 978-1-4419-8819-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)