Microarray Data Analysis for Transcriptome Profiling

  • Ming-an Sun
  • Xiaojian Shao
  • Yejun Wang
Part of the Methods in Molecular Biology book series (MIMB, volume 1751)


Microarray data have vastly accumulated in the past two decades. Due to the high-throughput characteristic of microarray techniques, it has transformed biological studies from specific genes to transcriptome level, and deeply boosted many fields of biological studies. While microarray offers great advantages for expression profiling, on the other hand it faces a lot challenges for computational analysis. In this chapter, we demonstrate how to perform standard analysis including data preprocessing, quality assessment, differential expression analysis, and general downstream analyses.

Key words

Microarray Normalization Clustering Differential expression Bioconductor Limma GeneFilter 



This work was supported by a Natural Science Funding of Shenzhen (JCYJ201607115221141) and a Shenzhen Peacock Plan fund (827-000116) to YW. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


  1. 1.
    Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235):467–470CrossRefPubMedGoogle Scholar
  2. 2.
    Allison DB, Cui X, Page GP, Sabripour M (2006) Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 7(1):55–65. CrossRefPubMedGoogle Scholar
  3. 3.
    Hoheisel JD (2006) Microarray technology: beyond transcript profiling and genotype analysis. Nat Rev Genet 7(3):200–210. CrossRefPubMedGoogle Scholar
  4. 4.
    Canales RD, Luo Y, Willey JC, Austermiller B, Barbacioru CC, Boysen C, Hunkapiller K, Jensen RV, Knight CR, Lee KY, Ma Y, Maqsodi B, Papallo A, Peters EH, Poulter K, Ruppel PL, Samaha RR, Shi L, Yang W, Zhang L, Goodsaid FM (2006) Evaluation of DNA microarray results with quantitative gene expression platforms. Nat Biotechnol 24(9):1115–1122. CrossRefPubMedGoogle Scholar
  5. 5.
    Malone JH, Oliver B (2011) Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol 9:34. CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Taylor S, Huang Y, Mallett G, Stathopoulou C, Felizardo TC, Sun MA, Martin EL, Zhu N, Woodward EL, Elias MS, Scott J, Reynolds NJ, Paul WE, Fowler DH, Amarnath S (2017) PD-1 regulates KLRG1+ group 2 innate lymphoid cells. J Exp Med 214(6):1663–1678. CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    The Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM (2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45(10):1113–1120.
  8. 8.
    Kauffmann A, Gentleman R, Huber W (2009) arrayQualityMetrics – a bioconductor package for quality assessment of microarray data. Bioinformatics 25(3):415–416. CrossRefPubMedGoogle Scholar
  9. 9.
    Eijssen LM, Jaillard M, Adriaens ME, Gaj S, de Groot PJ, Muller M, Evelo CT (2013) User-friendly solutions for microarray quality control and pre-processing on Nucleic Acids Res 41(Web Server issue):W71–W76. CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Wilson CL, Miller CJ (2005) Simpleaffy: a BioConductor package for Affymetrix Quality Control and data analysis. Bioinformatics 21(18):3683–3685. CrossRefPubMedGoogle Scholar
  11. 11.
    Lim WK, Wang K, Lefebvre C, Califano A (2007) Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks. Bioinformatics 23(13):i282–i288. CrossRefPubMedGoogle Scholar
  12. 12.
    Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98(9):5116–5121. CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Breitling R, Armengaud P, Amtmann A, Herzyk P (2004) Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 573(1-3):83–92. CrossRefPubMedGoogle Scholar
  14. 14.
    Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43(7):e47. CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, Bravo HC, Davis S, Gatto L, Girke T, Gottardo R, Hahne F, Hansen KD, Irizarry RA, Lawrence M, Love MI, MacDonald J, Obenchain V, Oles AK, Pages H, Reyes A, Shannon P, Smyth GK, Tenenbaum D, Waldron L, Morgan M (2015) Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 12(2):115–121. CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Carvalho B (2015) Platform Design Info for Affymetrix MoGene-2_0-st. R package version 3141
  17. 17.
    MacDonald JW (2016) mogene20sttranscriptcluster.db: Affymetrix mogene20 annotation data (chip mogene20sttranscriptcluster). R package version 850Google Scholar
  18. 18.
    Carvalho BS, Irizarry RA (2010) A framework for oligonucleotide microarray preprocessing. Bioinformatics 26(19):2363–2367. CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Quackenbush J (2002) Microarray data normalization and transformation. Nat Genet 32(Suppl):496–501. CrossRefPubMedGoogle Scholar
  20. 20.
    Bourgon R, Gentleman R, Huber W (2010) Independent filtering increases detection power for high-throughput experiments. Proc Natl Acad Sci U S A 107(21):9546–9551. CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Hackstadt AJ, Hess AM (2009) Filtering for increased power for microarray data analysis. BMC Bioinformatics 10:11. CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Gentleman R, Carey V, Huber W, Hahne F (2016) genefilter: methods for filtering genes from high-throughput experiments. R package version 1560Google Scholar
  23. 23.
    D'Haeseleer P (2005) How does gene expression clustering work? Nat Biotechnol 23(12):1499–1501. CrossRefPubMedGoogle Scholar
  24. 24.
    Kolde R (2015) pheatmap: Pretty Heatmaps. R package version 108Google Scholar
  25. 25.
    Jaskowiak PA, Campello RJ, Costa IG (2014) On the selection of appropriate distances for gene expression data clustering. BMC Bioinformatics 15(Suppl 2):S2. CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Falcon S, Gentleman R (2007) Using GOstats to test gene lists for GO term association. Bioinformatics 23(2):257–258. CrossRefPubMedGoogle Scholar
  27. 27.
    Huang d W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4(1):44–57. CrossRefGoogle Scholar
  28. 28.
    Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102(43):15545–15550. CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2018

Authors and Affiliations

  • Ming-an Sun
    • 1
  • Xiaojian Shao
    • 2
    • 3
  • Yejun Wang
    • 4
  1. 1.Epigenomics and Computational Biology LabBiocomplexity Institute of Virginia TechBlacksburgUSA
  2. 2.Department of Human GeneticsMcGill UniversityMontréalCanada
  3. 3.The McGill University and Génome Québec Innovation CentreMontréalCanada
  4. 4.Department of Cell Biology and Genetics, School of Basic MedicineShenzhen University Health Science CenterShenzhenChina

Personalised recommendations