Abstract
Regulatory networks inferred from microarray data sets provide an estimated blueprint of the functional interactions taking place under the assayed experimental conditions. In each of these experiments, the gene expression pathway exerts a finely tuned control simultaneously over all genes relevant to the cellular state. This renders most pairs of those genes significantly correlated, and therefore, the challenge faced by every method that aims at inferring a molecular regulatory network from microarray data, lies in distinguishing direct from indirect interactions. A straightforward solution to this problem would be to move directly from bivariate to multivariate statistical approaches. However, the daunting dimension of typical microarray data sets, with a number of genes p several orders of magnitude larger than the number of samples n, precludes the application of standard multivariate techniques and confronts the biologist with sophisticated procedures that address this situation. We have introduced a new way to approach this problem in an intuitive manner, based on limited-order partial correlations, and in this chapter we illustrate this method through the R package qpgraph, which forms part of the Bioconductor project and is available at its Web site (1).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Butte AJ, Tamayo P, Slonim D et al (2000) Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc Natl Acad Sci U S A 97:12182–12186.
Basso K, Margolin AA, Stolovitzky G et al (2005) Reverse engineering of regulatory networks in human B cells. Nat Genet 37:382–390.
Faith JJ, Hayete B, Thaden JT et al (2007) Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol 5:e8.
Edwards D (2000) Introduction to graphical modelling. Springer, New York.
Dykstra RL (1970) Establishing Positive Definiteness of Sample Covariance Matrix. Ann Math Statist 41:2153–2154.
Barabasi A-L, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5:101–113.
Dobra A, Hans C, Jones B et al (2004) Sparse graphical models for exploring gene expression data. J. Multivariate. Anal. 90:196–212.
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9:432–441.
Yuan M, Lin Y (2007) Model selection and estimation in the Gaussian graphical model. Biometrika 94:19–35.
Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 4:1–32.
de la Fuente A, Bing N, Hoeschele I et al (2004) Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 20:3565–3574.
Wille A, Bühlmann P (2006) Low-order conditional independence graphs for inferring genetic networks. Stat. Appl. Genet. Mol. Biol. 5:1.
Castelo R, Roverato A (2006) A robust procedure for Gaussian graphical model search from microarray data with p larger than n. J Mach Learn Res 7: 2621–2650.
Castelo R, Roverato A (2009) Reverse engineering molecular regulatory networks from microarray data with qp-graphs. J Comput Biol 16:213–227.
Falcon S, Gentleman R (2007) Using GOstats to test gene lists for GO term association. Bioinformatics 23:257–258.
Covert MW, Knight EM, Reed JL et al (2004) Integrating high-throughput and computational data elucidates bacterial networks. Nature 429:92–96.
Gama-Castro S, Jimenez-Jacinto V, Peralta-Gil M et al (2008) RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation. Nucleic Acids Res 36:D120–124.
Gentleman RC, Carey VJ, Bates DM et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5: R80.
Schmidberger M, Morgan M, Eddelbuettel D et al (2009) State-of-the-art in Parallel Computing with R, Journal of Statistical Software 31:i01.
Wasserman WW, Sandelin A (2004) Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 5:276–287.
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27: 861–874.
Cho, B.-K., Knight, E. M., and Palsson, B. O. (2006) Transcriptional regulation of the fad regulon genes of Escherichia coli by arcA., Microbiology 152, 2207–2219.
Acknowledgments
This work is supported by the Spanish Ministerio de Ciencia e Innovación (MICINN) [TIN2008-00556/TIN] and the ISCIII COMBIOMED Network [RD07/0067/0001]. R.C. is a research fellow of the “Ramon y Cajal” program from the Spanish MICINN [RYC-2006-000932]. A.R. acknowledges support from the Ministero dell’Università e della Ricerca [PRIN-2007AYHZWC].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Castelo, R., Roverato, A. (2012). Inference of Regulatory Networks from Microarray Data with R and the Bioconductor Package qpgraph. In: Wang, J., Tan, A., Tian, T. (eds) Next Generation Microarray Bioinformatics. Methods in Molecular Biology, vol 802. Humana Press. https://doi.org/10.1007/978-1-61779-400-1_14
Download citation
DOI: https://doi.org/10.1007/978-1-61779-400-1_14
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-61779-399-8
Online ISBN: 978-1-61779-400-1
eBook Packages: Springer Protocols