Processing Large-Scale, High-Dimension Genetic and Gene Expression Data

Molony, Cliona; Sieberts, Solveig K.; Schadt, Eric E.

doi:10.1007/978-3-540-69264-5_11

Processing Large-Scale, High-Dimension Genetic and Gene Expression Data

Cliona Molony³,
Solveig K. Sieberts³ &
Eric E. Schadt³

Chapter
First Online: 01 January 2009

1837 Accesses

Abstract

The now routine generation of large-scale, high-throughput data in multiple dimensions (genotype, gene expression, and so on) provides a significant challenge to researchers who desire to integrate data across these dimensions in hopes of painting a more comprehensive picture of complex system behavior. This type of integration promises to elucidate networks that drive disease traits associated with common human diseases like obesity, diabetes, and atherosclerosis. However, to effectively carry out this type of research not only requires the generation of large-scale genotype and molecular profiling data but also requires the development and application of methods and software in addition to a computing infrastructure capable of processing the large-scale data sets. Mastery of the methods and tools and having access to an appropriate computing environment capable of processing large-scale data will be critical to maintaining a competitive advantage, given future successes in biomedical research will likely demand a more comprehensive view of the complex array of interactions in biological systems and how such interactions are influenced by genetic background, infection, environmental states, life-style choices, and social structures more generally. In this chapter, we detail the methodological and computing issues associated with carrying out large-scale genome-wide association studies on tens of thousands of phenotypes, where the aim is to identify those phenotypes that are intermediate to DNA variations and disease phenotypes. This type of analysis can provide insights into the molecular networks that are perturbed by DNA and environmental variations, and as a result, induce changes in disease associated traits, providing a path to interpret genome-wide association study data as well as uncover networks that drive disease processes.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Edwards AO et al. (2005) Complement factor H polymorphism and age-related macular degeneration. Science 308:421–424
Article CAS PubMed Google Scholar
Haines JL et al. (2005) Complement factor H variant increases the risk of age-related macular degeneration. Science 308:419–421
Article CAS PubMed Google Scholar
Klein RJ et al. (2005) Complement factor H polymorphism in age-related macular degeneration. Science 308:385–389
Article CAS PubMed Google Scholar
Grant SF et al. (2006) Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat Genet 38:320–323
Article CAS PubMed Google Scholar
Sladek R et al. (2007) A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445:881–885
Article CAS PubMed Google Scholar
Herbert A et al. (2006) A common genetic variant is associated with adult and childhood obesity. Science 312:279–283
Article CAS PubMed Google Scholar
Peacock ML, Warren JT Jr, Roses AD, Fink JK (1993). Novel polymorphism in the A4 region of the amyloid precursor protein gene in a patient without Alzheimer’s disease. Neurology 43, 1254–1256.
CAS PubMed Google Scholar
Brem RB, Yvert G, Clinton R, Kruglyak L (2002) Genetic dissection of transcriptional regulation in budding yeast. Science 296:752–755
Article CAS PubMed Google Scholar
Bystrykh L et al. (2005) Uncovering regulatory pathways that affect hematopoietic stem cell function using ‘genetical genomics’. Nat Genet 37:225–232
Article CAS PubMed Google Scholar
Chesler EJ et al. (2005) Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat Genet 37:233–242
Article CAS PubMed Google Scholar
Monks SA et al. (2004) Genetic inheritance of gene expression in human cell lines. Am J Hum Genet 75:1094–1105
Article CAS PubMed Google Scholar
Morley M et al. (2004) Genetic analysis of genome-wide variation in human gene expression. Nature 430:743–747
Article CAS PubMed Google Scholar
Schadt EE et al. (2005) An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet 37:710–717
Article CAS PubMed Google Scholar
Schadt EE et al. (2003) Genetics of gene expression surveyed in maize, mouse and man. Nature 422:297–302
Article CAS PubMed Google Scholar
Hartwell LH, Hopfield JJ, Leibler SMurray A.W (1999) From molecular to modular cell biology. Nature 402:C47–52
Article CAS PubMed Google Scholar
Schadt EE, Sachs A, Friend S (2005) Embracing complexity, inching closer to reality. Sci STKE 2005:pe40
Google Scholar
Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5:101–113
Article CAS PubMed Google Scholar
Zerhouni E (2003) Medicine. The NIH Roadmap. Science 302:63–72
CAS Google Scholar
Han JD et al. (2003) Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 430:88–93
Article Google Scholar
Luscombe NM et al. (2004) Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431:308–312
Article CAS PubMed Google Scholar
Chen Y et al. (2008) Variations in DNA elucidate molecular networks that cause disease. Nature 452:429–435
Article CAS PubMed Google Scholar
Zhao LJ et al. (2005) SNPP: automating large-scale SNP genotype data management. Bioinformatics 21:266–268
Article CAS PubMed Google Scholar
Purcell S et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575.
Article CAS PubMed Google Scholar
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959
CAS PubMed Google Scholar
BRLMM: an Improved Genotype Calling Method for the GeneChip^®; Human Mapping 500K Array Set (Affymetrix, 2006)
Google Scholar
Carvalho B, Bengtsson H,, Speed TP, Irizarry RA (2007) Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics 8:485–499
Article PubMed Google Scholar
Hua J et al. (2007) SNiPer-HD: improved genotype calling accuracy by an expectation-maximization algorithm for high-density SNP arrays. Bioinformatics 23:57–63
Article CAS PubMed Google Scholar
Liu WM et al. (2003) Algorithms for large-scale genotyping microarrays. Bioinformatics 19:2397–2403
Article CAS PubMed Google Scholar
Rabbee N, Speed, TP (2006) A genotype calling algorithm for affymetrix SNP arrays. Bioinformatics 22:7–12
Article CAS PubMed Google Scholar
Teo YY et al. (2007) A genotype calling algorithm for the Illumina BeadArray platform. Bioinformatics 23:2741–2746
Article CAS PubMed Google Scholar
Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678
Article Google Scholar
Sieberts SK, Schadt EE (2007) Moving toward a system genetics view of disease. Mamm Genome 18:389–401
Article PubMed Google Scholar
He YD et al. (2003) Microarray standard data set and figures of merit for comparing data processing methods and experiment designs. Bioinformatics 19:956–965
Article CAS PubMed Google Scholar
Leek JT, Storey JD (2007) Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet 3:1724–1735
Article CAS PubMed Google Scholar
Emilsson V et al. (2008) Genetics of gene expression and its effect on disease. Nature 452:423–428
Article CAS PubMed Google Scholar
Yang X et al. (2006) Tissue-specific expression and regulation of sexually dimorphic genes in mice. Genome Res 16:995–1004
Article CAS PubMed Google Scholar
Wang S et al. (2006) Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity. PLoS Genet 2:e15
Article PubMed Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. JRSS B 57:289–300
Google Scholar
Storey JD (2002) A direct approach to false discovery rates. JRSS B 64:479–498
Google Scholar
Schadt EE et al. (2008) Mapping the genetic architecture of gene expression in human liver. PLoS Biol 6:e107
Article PubMed Google Scholar
Yeo C et al. (2006) Cluster computing: high-performance, high-availability, and high-throughput processing on a network of computers. In Zomaya A (ed) Handbook of nature-inspired and innovative computing, pp 521-55142. Message Passing
Google Scholar
Interface Forum. MPI (1994) A message-passing interface standard. Int J Supercomputer Appl 8:165–414
Google Scholar
Message Passing Interface Forum. MPI2 (1998) A message passing interface standard. Int J High Performance Comput Appl 12:1–299
Google Scholar
Geist A et al. (1994) PVM: Parallel Virtual Machine—a user’s guide and tutorial for network parallel computing, MIT, Cambridge, MA
Google Scholar
Gropp W, Lusk E (2002). Goals guiding design: PVM and MPI
Google Scholar
Carlborg O, Andersson-Eklund L, Andersson L (2001) Parallel computing in interval mapping of quantitative trait loci. J Hered 92:449–451
Article CAS PubMed Google Scholar
Jayawardena M, Ljungberg K, Holmgren S (2007) Using parallel computing and grid systems for genetic mapping of quantitative traits. In Applied parallel computing. State of the art in scientific computing, vol Volume 4699/2007 627–636, Springer, Berlin
Chapter Google Scholar
University of Washington, Fred Hutchinson Cancer Research Center to coordinate National Human Genome Research Institute disease studies (2007)
Google Scholar
Tanaka T (2005) [International HapMap project]. Nippon Rinsho 63(12):29–34
PubMed Google Scholar
Ramji DP, Singh NN, Foka P, Irvine SA, Arnaoutakis K (2006) Transforming growth factor-beta-regulated expression of genes in macrophages implicated in the control of cholesterol homoeostasis. Biochem Soc Trans 34:1141–1144
Article CAS PubMed Google Scholar
Zhu J et al. (2004) An integrative genomics approach to the reconstruction of gene networks in segregating populations. Cytogenet Genome Res 105:363–374
Article CAS PubMed Google Scholar
Zhu J et al. (2007) Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations. PLoS Comput Biol 3:e69
Article PubMed Google Scholar
Zhu J et al. (2008) Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat Genet 40:854–861
Article CAS PubMed Google Scholar
Kim JK et al. (2005) Functional genomic analysis of RNA interference in C. elegans. Science 308:1164–1167
Article CAS PubMed Google Scholar
Gargalovic PS et al. (2006) Identification of inflammatory gene modules based on variations of human endothelial cell responses to oxidized lipids. Proc Natl Acad Sci U S A 103: 12741–12746
Article CAS PubMed Google Scholar
Ghazalpour A et al. (2006) Integrating genetic and network analysis to characterize genes related to mouse weight. PLoS Genet 2:e130
Article PubMed Google Scholar
Lum PY et al. (2006) Elucidating the murine brain transcriptional network in a segregating mouse population to identify core functional modules for obesity and diabetes. J Neurochem 97(1):50–62
Article CAS PubMed Google Scholar
Butte AJ, Kohane IS (2000) Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac Symp Biocomput 2000:418–429
Google Scholar
Davidson EH, McClay DR, Hood L (2003) Regulatory gene networks and the properties of the developmental process. Proc Natl Acad Sci U S A 100:1475–1480
Article CAS PubMed Google Scholar
Bergmann S, Ihmels, J, Barkai N (2004) Similarities and differences in genome-wide expression data of six organisms. PLoS Biol 2:E9
Article PubMed Google Scholar
Carter SL, Brechbuhler CM, Griffin M, Bond A.T (2004) Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics 20:2242–2250
Article CAS PubMed Google Scholar
Doss S, Schadt EE, Drake TA, Lusis AJ (2005) Cis-acting expression quantitative trait loci in mice. Genome Res 15:681–691
Article CAS PubMed Google Scholar
Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Science 286:509–512
Article PubMed Google Scholar
Jiang C, Zeng ZB (1995) Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140:1111–1127
CAS PubMed Google Scholar
Zeng ZB (1993) Precision mapping of quantitative trait loci. Genetics 121:185–199
Google Scholar
Lee SI, Pe’er D, Dudley A.M, Church GM, Koller D (2006) Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification. Proc Natl Acad Sci U S A 103:14062–14067
Article CAS PubMed Google Scholar
Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL (2002) Hierarchical organization of modularity in metabolic networks. Science 297:1551–1555
Article CAS PubMed Google Scholar
Lee I, Date, SV, Adai AT, Marcotte EM (2004) A probabilistic functional network of yeast genes. Science 306:1555–1558
Article CAS PubMed Google Scholar
Wuchty S, Almaas E (2005) Peeling the yeast protein network. Proteomics 5:444–449
Article CAS PubMed Google Scholar
Palla G, Derenyi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818
Article CAS PubMed Google Scholar
Hughes TR et al. (2000) Functional discovery via a compendium of expression profiles. Cell 102:109–126
Article CAS PubMed Google Scholar
Pan X et al. (2006) A DNA integrity network in the yeast Saccharomyces cerevisiae. Cell 124:1069–1081
Article CAS PubMed Google Scholar
Kanehisa M et al. (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34:D354–D357
Article CAS PubMed Google Scholar
Ideker T et al. (2001) Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292:929–934
Article CAS PubMed Google Scholar
Jansen R et al. (2003) A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302:449–453
Article CAS PubMed Google Scholar
Pearl J (1998) Probabilistic reasoning in intelligent systems: networks of plausible inference, xix, p 552, Morgan Kaufmann, San Mateo, CA
Google Scholar
Schadt EE, Lum PY (2006) Reverse engineering gene networks to identify key drivers of complex disease phenotypes. J Lipid Res 47:2601–2613
Article CAS PubMed Google Scholar
Almasy L, Blangero J (1998) Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet 62:1198–211
Article CAS PubMed Google Scholar
Price AL et al. (2006) Principle components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Rosetta Inpharmatics, LLC, a wholly owned subsidiary of Merck & Co., Inc., Seattle, WA, 98109, USA
Cliona Molony, Solveig K. Sieberts & Eric E. Schadt

Authors

Cliona Molony
View author publications
You can also search for this author in PubMed Google Scholar
Solveig K. Sieberts
View author publications
You can also search for this author in PubMed Google Scholar
Eric E. Schadt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cliona Molony .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Molony, C., Sieberts, S.K., Schadt, E.E. (2009). Processing Large-Scale, High-Dimension Genetic and Gene Expression Data. In: Handbook on Analyzing Human Genetic Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69264-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-69264-5_11
Published: 20 August 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69263-8
Online ISBN: 978-3-540-69264-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics