Abstract
R is a powerful language and widely used software tool for the analysis and visualization of data. Its core capabilities can be extended through many different add-on packages. Among the many packages are some which offer a broad range of facilities for analyzing statistical properties of graphs. This chapter provides a practical tutorial covering the use of R methods for graphs and networks to examine biological data and analyze their topological and statistical properties.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
R Development Core Team. (2009) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, [ http://www.R-project.org ]. [ISBN 3-900051-07-0].
Huber W, Carey VJ, Long L, Falcon S, Gentleman R. (2007) Graphs in molecular biology. BMC Bioinformatics, 8(6):S8.
Castelo R, Roverato A. (2009) Reverse engineering molecular regulatory networks from microarray data with qp-graphs. J Comput Biol, 16(2):213–227.
Le Meur N, Gentleman R. (2008) Modeling synthetic lethality. Genome Biol, 9(9):R135.
Csardi G, Nepusz T. (2006) The igraph software package for complex network research. InterJournal, Complex Systems:1695.
Shannon P, Markiel A, Ozier O, Baliga N, Wang J, Ramage D, Amin N, Schwikowski B, Ideker T. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res, 13(11):2498.
Leisch F. (2002) Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis. In Compstat 2002 Proceedings in Computational Statistics. Edited by Härdle W, Rönz B, Physika Verlag, Heidelberg, Germany, 575–580, [http://www.ci.tuwien.ac.at/~leisch/Sweave]. [ISBN 3-7908-1517-9].
Venables WN, Ripley BD. (2002) Modern Applied Statistics with S (4e). Springer, New York.
Gentleman R. (2008) R Programming for Bioinformatics. CRC Press, Boca Raton.
Chambers JM. (2008) Software for Data Analysis: Programming with R. Springer, New York.
Hahne F, Huber W, Gentleman R, Falcon S. (2008) Bioconductor Case Studies. Springer, New York.
Boutros M, Bras L, Huber W. (2006) Analysis of cell-based RNAi screens. Genome Biol, 7(7):R66.
Hahne F, Le Meur N, Brinkman R, Ellis B, Haaland P, Sarkar D, Spidlen J, Strain E, Gentleman R. (2009) FlowCore: a bioconductor package for high throughput flow cytometry. BMC Bioinformatics, 10:106.
Morgan M, Anders S, Lawrence M, Aboyoun P, Pagès H, Gentleman R. (2009) ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data. Bioinformatics, 25(19):2607–2608.
Ellson J, Gansner E, Koutsofios E, North S, Woodhull G. (2004) Graphviz and Dynagraph – Static and Dynamic Graph Drawing Tools. In Graph Drawing Software. Edited by Junger M, Mutzel P, Springer, Berlin/Heidelberg, 127–148.
Brückner A, Polge C, Lentze N, Auerbach D, Schlattner U. (2009) Yeast two-hybrid, a powerful tool for systems biology. Int J Mol Sci, 10(6):2763–2788.
Wingren C, James P, Borrebaeck C. (2009) Strategy for surveying the proteome using affinity proteomics and mass spectrometry. Proteomics, 9(6):1511–1517.
Ishikawa S, Kawai Y, Hiramatsu K, Kuwano M, Ogasawara N. (2006) A new FtsZ-interacting protein, YlmF, complements the activity of FtsA during progression of cell division in Bacillus subtilis. Mol Microbiol, 60(6):1364–1380.
Chiang T, Scholtens D, Sarkar D, Gentleman R, Huber W. (2007) Coverage and error models of protein-protein interaction data by directed graph analysis. Genome Biol, 8(9):R186.
Scholtens D, Chiang T, Huber W, Gentleman R. (2008) Estimating node degree in bait-prey graphs. Bioinformatics, 24(2):218–224.
Covert M, Knight E, Reed J, Herrgard M, Palsson B. (2004) Integrating high-throughput and computational data elucidates bacterial networks. Nature, 429(6987):92–96.
The Gene Ontology Consortium. (2000) Gene ontology: tool for the unification of biology. Nat Genet, 25:25–29.
Iuchi S, Lin E. (1988) arcA(dye), a global regulatory gene in Escherichia coli mediating repression of enzymes in aerobic pathways. Proc Natl Acad Sci U S A, 85(6):1888–1892.
Salmon K, Hung S, Mekjian K, Baldi P, Hatfield G, Gunsalus R. (2003) Global gene expression profiling in Escherichia coli K12. The effects of oxygen availability and FNR. J Biol Chem, 278(32):29837–29855.
Correnti J, Munster V, Chan T, Woude M. (2002) Dam-dependent phase variation of Ag 43 in Escherichia coli is altered in a seqA mutant. Mol Microbiol, 44(2):521–532.
Chen H, Xu G, Zhao Y, Tian B, Lu H, Yu X, Xu Z, Ying N, Hu S, Hua Y. (2008) A novel OxyR sensor and regulator of hydrogen peroxide stress with one cysteine residue in Deinococcus radiodurans. PLoS ONE, 3(2):e1602.
Brondsted L, Atlung T. (1996) Effect of growth conditions on expression of the acid phosphatase (cyx-appA) operon and the appY gene, which encodes a transcriptional activator of Escherichia coli. J Bacteriol, 178(6):1556.
Falcon S, Gentleman R. (2007) Using GOstats to test gene lists for GO term association. Bioinformatics, 23(2):257–258.
Eisendle M, Schrettl M, Kragl C, Muller D, Illmer P, Haas H. (2006) The intracellular siderophore ferricrocin is involved in iron storage, oxidative-stress resistance, germination, and sexual development in Aspergillus nidulans. Eukaryot Cell, 5(10):1596.
Wasserman S, Faust K. (1994) Social Network Analysis, Methods and Applications. Cambridge University Press, Cambridge.
Scholtens D, Vidal M, Gentleman R. (2005) Local dynamic modeling of global interactome networks. Bioinformatics, 21:3548–3557.
Siek JG, Lee LQ, Lumsdaine A. (2002) The Boost Graph Library. Addison Wesley, Boston.
Schuchhardt J, Beule D, Malik A, Wolski E, Eickhoff H, Lehrach H, Herzel H. (2000) Normalization strategies for cDNA microarrays. Nucleic Acids Res, 28(10):E47.
Gentleman R, Huber W. (2007) Making the most of high-throughput protein-interaction data. Genome Biol, 8(10):112.
Chiang T, Scholtens D. (2009) A general pipeline for quality and statistical assessment of protein interaction data using R and Bioconductor. Nat Protoc, 4(4):535–546.
Haider S, Ballester B, Smedley D, Zhang J, Rice P, Kasprzyk A. (2009) BioMart Central Portal–unified access to biological data. Nucleic Acids Res, 37:W23–W27.
Li C, Wong WH. (2001) Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci U S A, 98:31–36.
Huber W, von Heydebreck A, Sueltmann H, Poustka A, Vingron M. (2002) Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics, 18(Suppl. 1):S96–S104.
Acknowledgments
Funding for this research was provided by grants from La Ligue Contre Le Cancer (RAB08008NSA; project R08010NS). We also gratefully acknowledge the Instituts national de la santé et de la recherche médicale (INSERM) for supporting this project. Funding for RG was provided by NIH grant P41 HG004059.
The authors would like to thank Wolfgang Huber, Tony Chiang, Shailesh Date and Denise Scholtens for helpful comments and for providing a stimulating research environment that has led to the many tools used here and to the ideas that underlie this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Meur, N.L., Gentleman, R. (2012). Analyzing Biological Data Using R: Methods for Graphs and Networks. In: van Helden, J., Toussaint, A., Thieffry, D. (eds) Bacterial Molecular Networks. Methods in Molecular Biology, vol 804. Springer, New York, NY. https://doi.org/10.1007/978-1-61779-361-5_19
Download citation
DOI: https://doi.org/10.1007/978-1-61779-361-5_19
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-61779-360-8
Online ISBN: 978-1-61779-361-5
eBook Packages: Springer Protocols