Abstract
Microarray experiments can appear daunting because the considerations called for in their analysis cover several fields of research. To understand the data microarrays generate some knowledge of classical statistics and recent complexity theory are useful while emerging computational techniques such as XML directed workflows could aid in managing the data. These considerations are called for because as experimental tools, microarrays (arrays) exemplify the recent trend in biological research towards high dimensionality datasets. Until recently observations were made on only a few variables at a time and these were used to support or refute hypotheses, but high dimensionality datasets are generated by observing a very large number of variables (e.g. gene expression measurements) at the same time. The number of expression measurements made on arrays is not only high, but notably high when compared to the size of a typical sample population. This combination of high dimensionality and asymmetry leads to large datasets and fundamental problems when using standard approaches to interpret the data. An end-to-end approach is a general framework in which to place some useful considerations when planning an analysis. The framework described here explores the origins of signal and several sources of variance, approaches to representing high-throughput data, the statistical considerations when modeling array data and the software tools that can aid in carrying out the analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This is the one-tailed alternate hypothesis a two-tailed alternate hypothesis could have multiple conditions such as H1 and H2 for over and under abundance relative to null condition.
Suggested Reading: Background to Microarray Technologies
Zhang W, Shmulevich I, Astola J. Microarray quality control. Hoboken, N.J.: Wiley-Liss; 2004.
Speed TP. Statistical analysis of gene expression microarray data. Boca Raton, FL: Chapman & Hall/CRC; 2003.
Lee ML, Whitmore GA. Power and sample size for DNA microarray studies. Statistics in medicine 2002;21(23):3543–3570.
Do K-A, Müller P, Vannucci M. Bayesian inference for gene expression and proteomics. Cambridge; New York: Cambridge University Press; 2006.
Bentley DR. Whole-genome re-sequencing. Current opinion in genetics & development 2006;16(6):545-52.
Heng HH, Stevens JB, Liu G, et al. Stochastic cancer progression driven by non-clonal chromosome aberrations. Journal of cellular physiology 2006;208(2):461–472.
Martins RP, Krawetz SA. Decondensing the protamine domain for transcription. Proceedings of the National Academy of Sciences of the United States of America 2007;104(20):8340–8345.
Martin S, Pombo A. Transcription factories: quantitative studies of nanostructures in the mammalian nucleus. Chromosome Res 2003;11(5):461–470.
Martins RP, Ostermeier GC, Krawetz SA. Nuclear matrix interactions at the human protamine domain: a working model of potentiation. The Journal of biological chemistry 2004;279(50):51862–51868.
Wilusz CJ, Wilusz J. Bringing the role of mRNA decay in the control of gene expression into focus. Trends Genet 2004;20(10):491–497.
Moore MJ. From birth to death: the complex lives of eukaryotic mRNAs. Science New York, NY 2005;309(5740):1514–1518.
Wang Y, Liu CL, Storey JD, Tibshirani RJ, Herschlag D, Brown PO. Precision and functional specificity in mRNA decay. Proceedings of the National Academy of Sciences of the United States of America 2002;99(9):5860–5865.
Meizel S. The sperm, a neuron with a tail: ‘neuronal’ receptors in mammalian sperm. Biological reviews of the Cambridge Philosophical Society 2004;79(4):713–732.
Hargrove JL, Schmidt FH. The role of mRNA and protein stability in gene expression. Faseb J. 1989;3(12):2360–2370.
Schwartz DR, Moin K, Yao B, et al. Hu/Mu ProtIn oligonucleotide microarray: dual-species array for profiling protease and protease inhibitor gene expression in tumors and their microenvironment. Mol Cancer Res 2007;5(5):443–454.
Dallas PB, Gottardo NG, Firth MJ, et al. Gene expression levels assessed by oligonucleotide microarray analysis and quantitative real-time RT-PCR — how well do they correlate? BMC genomics 2005;6(1):59.
Signal Analysis and Modeling
Qiu W, Lee ML. SPCalc: A web-based calculator for sample size and power calculations in micro-array studies. Bioinformation 2006;1(7):251–252.
Seo J, Gordish-Dressman H, Hoffman EP. An interactive power analysis tool for microarray hypothesis testing and generation. Bioinformatics (Oxford, England) 2006;22(7):808–814.
ENCODE. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 2007;447(7146):799–816.
Draghici S. Data analysis tools for DNA microarrays. Boca Raton: Chapman & Hall/CRC; 2003.
Choe SE, Boutros M, Michelson AM, Church GM, Halfon MS. Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome biology 2005;6(2):R16.
Hoffmann R, Seidl T, Dugas M. Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis. Genome biology 2002;3(7):RESEARCH0033.
Dabney AR, Storey JD. A new approach to intensity-dependent normalization of two-channel microarrays. Biostatistics (Oxford, England) 2007;8(1):128–139.
Irizarry RA, Wu Z, Jaffee HA. Comparison of Affymetrix GeneChip expression measures. Bioinformatics (Oxford, England) 2006;22(7):789–794.
Online document: http://www.ambion.com/techlib/tn/111/8.html.
Williamson SH, Hubisz MJ, Clark AG, Payseur BA, Bustamante CD, Nielsen R. Localizing Recent Adaptive Evolution in the Human Genome. PLoS Genet 2007;3(6):e90.
Ptitsyn AA, Zvonic S, Gimble JM. Digital Signal Processing Reveals Circadian Baseline Oscillation in Majority of Mammalian Genes. PLoS Comput Biol 2007;3(6):e120.
Tomita H, Vawter MP, Walsh DM, et al. Effect of agonal and postmortem factors on gene expression profile: quality control in microarray analyses of postmortem human brain. Biological psychiatry 2004;55(4):346–352.
Statistical Approaches
Wu B. Differential gene expression detection and sample classification using penalized linear regression models. Bioinformatics (Oxford, England) 2006;22(4):472–476.
Robson B. Clinical and pharmacogenomic data mining: 3. Zeta theory as a general tactic for clinical bioinformatics. Journal of proteome research 2005;4(2):445–455.
Baldi P, Brunak S. Bioinformatics : the machine learning approach. 2nd ed. Cambridge, Mass: MIT Press; 2001.
Carlin BP, Louis TA. Bayes and Empirical Bayes methods for data analysis. 2nd ed. Boca Raton: Chapman & Hall/CRC; 2000.
Benjamini, Y. Yekutieli, D. The Control of the false discovery rate in multiple testing under dependency. The Annals of Statistics 2001;29(4):1165–1188.
Westfall PH, Young SS. Resampling-based multiple testing: examples and methods for P-value adjustment. New York: Wiley; 1993.
Irizarry RA, Warren D, Spencer F, et al. Multiple-laboratory comparison of microarray platforms. Nature methods 2005;2(5):345–350.
Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM Computing Surveys 1999;31(3):264–323.
Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America 1998;95(25):14863–14868.
Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 2005;102(43):15545–15550.
Tibshirani RJ, Efron B. On testing the significance of sets of genes. The Annals of Applied Statistics 2007;1(1):107–129.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Humana Press, a part of Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Platts, A.E., Krawetz, S.A. (2009). Tools and Approaches for an End-to-End Expression Array Analysis. In: Krawetz, S. (eds) Bioinformatics for Systems Biology. Humana Press. https://doi.org/10.1007/978-1-59745-440-7_13
Download citation
DOI: https://doi.org/10.1007/978-1-59745-440-7_13
Publisher Name: Humana Press
Print ISBN: 978-1-934115-02-2
Online ISBN: 978-1-59745-440-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)