Bioinformatics Analysis of Microarray Data

  • Yunyu Zhang
  • Joseph Szustakowski
  • Martina Schinke
Part of the Methods in Molecular Biology™ book series (MIMB, volume 573)


Gene expression profiling provides unprecedented opportunities to study patterns of gene expression regulation, for example, in diseases or developmental processes. Bioinformatics analysis plays an important part of processing the information embedded in large-scale expression profiling studies and for laying the foundation for biological interpretation.

Over the past years, numerous tools have emerged for microarray data analysis. One of the most popular platforms is Bioconductor, an open source and open development software project for the analysis and comprehension of genomic data, based on the R programming language.

In this chapter, we use Bioconductor analysis packages on a heart development dataset to demonstrate the workflow of microarray data analysis from annotation, normalization, expression index calculation, and diagnostic plots to pathway analysis, leading to a meaningful visualization and interpretation of the data.

Key words

Annotation normalization gene filtering moderated F-test GSEA pathway analysis affymetrix GeneChip™ sigPathway 


  1. 1.
    Reimers, M, Carey, VJ. (2006). Bioconductor: an open source framework for bioinformatics and computational biology. Method Enzymol 411, 119–134.CrossRefGoogle Scholar
  2. 2.
    Team, RDC. (2007). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.Google Scholar
  3. 3.
    Dai, M, Wang, P, Boyd, AD, et al. (2005). Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res 33, e175.PubMedCrossRefGoogle Scholar
  4. 4.
    Liu, H, Zeeberg, BR, Qu, G, et al. (2007). AffyProbeMiner: a web resource for computing or retrieving accurately redefined Affymetrix probe sets. Bioinformatics 23, 2385–2390.PubMedCrossRefGoogle Scholar
  5. 5.
    Hubbell, E, Liu, WM, Mei, R. Guide to Probe Logarithmic Intensity Error (PLIER) Estimation.
  6. 6.
    Choe, SE, Boutros, M, Michelson, AM. (2005). Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biol 6, R16.PubMedCrossRefGoogle Scholar
  7. 7.
    Seo, J, Hoffman, EP. (2006). Probe set algorithms: is there a rational best bet? BMC Bioinformatics 7, 395.PubMedCrossRefGoogle Scholar
  8. 8.
    McClintick, JN, Edenberg, HJ. (2006). Effects of filtering by Present call on analysis of microarray experiments. BMC Bioinformatics 7, 49.PubMedCrossRefGoogle Scholar
  9. 9.
    Pepper, SD, Saunders, EK, Edwards, LE, et al. (2007). The utility of MAS5 expression summary and detection call algorithms. BMC Bioinformatics 8, 273.PubMedCrossRefGoogle Scholar
  10. 10.
    Benjamini, Y, Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series 57, 289–300.Google Scholar
  11. 11.
    Dennis, G, Jr., Sherman, BT, Hosack, J, et al. (2003). DAVID: database for Annotation, Visualization, and Integrated Discovery. Genome Biol 4, P3.CrossRefGoogle Scholar
  12. 12.
    Tian, L, Greenberg, SA, Kong, SW, et al. (2005). Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci USA 102, 13544–13549.PubMedCrossRefGoogle Scholar
  13. 13.
    Nam, D, Kim, SY. (2008). Gene-set approach for expression pattern analysis. Brief Bioinform 9, 189–197.PubMedCrossRefGoogle Scholar
  14. 14.
    Raghavan, N, De Bondt, AM, Talloen, W, et al. (2007). The high-level similarity of some disparate gene expression measures. Bioinformatics 23, 3032–3038.PubMedCrossRefGoogle Scholar
  15. 15.
    Mootha, VK, Handschin, C, Arlow, D, et al. (2004). Erralpha and Gabpa/b specify PGC-1alpha-dependent oxidative phosphorylation gene expression that is altered in diabetic muscle. Proc Natl Acad Sci USA 101, 6570–6575.PubMedCrossRefGoogle Scholar
  16. 16.
    Baitaluk, M, Qian, X, Godbole, S, et al. (2006). PathSys: integrating molecular interaction graphs for systems biology. BMC Bioinformatics 7, 55.PubMedCrossRefGoogle Scholar
  17. 17.
    Draghici, S, Khatri, P, Tarca, AL, et al. (2007). A systems biology approach for pathway level analysis. Genome Res 17, 1537–1545.PubMedCrossRefGoogle Scholar
  18. 18.
    Irizarry, RA, Bolstad, BM, Collin, F, et al. (2003). Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31, e15.PubMedCrossRefGoogle Scholar
  19. 19.
    Wu, Z, Irizarry, RA. (2005). Stochastic models inspired by hybridization theory for short oligonucleotide arrays. J Comput Biol 12, 882–893.PubMedCrossRefGoogle Scholar
  20. 20.
    van der Laan, M, Dudoit, S, Pollard, K. (2003). Hybrid clustering of gene expression data with visualization and bootstrap. J Stat Plan Inference 117,275–303.CrossRefGoogle Scholar
  21. 21.
    Reich, M, Liefeld, T, Gould, J, et al. (2006). GenePattern 2.0. Nat Genet 38, 500–501.PubMedCrossRefGoogle Scholar
  22. 22.
    Saeed, AI, Sharov, V, White, J, et al. (2003). TM4: a free, open-source system for microarray data management and analysis. Biotechniques 34,374–378.PubMedGoogle Scholar
  23. 23.
    Li, C, Wong, WH. (2001). Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol 2(8), 0032.1–0032.11.Google Scholar
  24. 24.
    Li, C, Wong, WH. (2001). Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci USA 98, 31–36.PubMedCrossRefGoogle Scholar
  25. 25.
    Tusher, VG, Tibshirani, R, Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98, 5116–5121.PubMedCrossRefGoogle Scholar
  26. 26.
    Manduchi, E, Grant, GR, McKenzie, SE, et al. (2000). Generation of patterns from gene expression data by assigning confidence to differentially expressed genes. Bioinformatics 16, 685–698.PubMedCrossRefGoogle Scholar
  27. 27.
    Mootha, VK, Lindgren, CM, Eriksson, KF, et al. (2003). PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 34, 267–273.PubMedCrossRefGoogle Scholar
  28. 28.
    Subramanian, A, Kuehn, H, Gould, J, et al. (2007). GSEA-P: a desktop application for Gene Set Enrichment Analysis. Bioinformatics 23, 3251–3253.PubMedCrossRefGoogle Scholar
  29. 29.
    Backes, C, Keller, A, Kuentzer, J, et al. (2007). GeneTrail–advanced gene set enrichment analysis. Nucleic Acids Res 35, W186–192.CrossRefGoogle Scholar
  30. 30.
    Dahlquist, KD, Salomonis, N, Vranizan, K, et al. (2002). GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nat Genet 31, 19–20.PubMedCrossRefGoogle Scholar
  31. 31.
    Gautier, L, Cope, L, Bolstad, BM, et al. (2004). Affy–analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20, 307–315.PubMedCrossRefGoogle Scholar
  32. 32.
    Smyth, GK (2004). Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3(1), Article 3.Google Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Yunyu Zhang
    • 1
  • Joseph Szustakowski
    • 1
  • Martina Schinke
    • 1
  1. 1.Novartis Institutes for BioMedical ResearchCambridgeUSA

Personalised recommendations