A mixture model approach to detecting differentially expressed genes with microarray data

Pan, Wei; Lin, Jizhen; Le, Chap T.

doi:10.1007/s10142-003-0085-7

A mixture model approach to detecting differentially expressed genes with microarray data

Original Paper
Published: 01 July 2003

Volume 3, pages 117–124, (2003)
Cite this article

Functional & Integrative Genomics Aims and scope Submit manuscript

Wei Pan¹,
Jizhen Lin² &
Chap T. Le¹

598 Accesses
86 Citations
Explore all metrics

Abstract

An exciting biological advancement over the past few years is the use of microarray technologies to measure simultaneously the expression levels of thousands of genes. The bottleneck now is how to extract useful information from the resulting large amounts of data. An important and common task in analyzing microarray data is to identify genes with altered expression under two experimental conditions. We propose a nonparametric statistical approach, called the mixture model method (MMM), to handle the problem when there are a small number of replicates under each experimental condition. Specifically, we propose estimating the distributions of a t -type test statistic and its null statistic using finite normal mixture models. A comparison of these two distributions by means of a likelihood ratio test, or simply using the tail distribution of the null statistic, can identify genes with significantly changed expression. Several methods are proposed to effectively control the false positives. The methodology is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle ear infection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The HRD-Algorithm: A General Method for Parametric Estimation of Two-Component Mixture Models

Adjustment for Multiplicity

Multiple Hypothesis Testing: A Methodological Overview

References

Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) 2nd international symposium on information theory. Akademiai Kiado, Budapest, pp 267–281
Allison DB, Gadbury GL, Heo M, Fernandez J, Lee K-C, Prolla TA, Weindruch R (2002) A mixture model approach for the analysis of microarray gene expression data. Comput Stat Data Anal 39:1–20
Article Google Scholar
Baggerly KA, Coombes KR, Hess KR, Stivers DN, Abruzzo LV, Zhang W (2001) Identifying differentially expressed genes in cDNA microarray experiments. J Comput Biol 8:639–659
CAS PubMed Google Scholar
Baldi P, Long AD (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17:509–519
CAS PubMed Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300
Google Scholar
Biernacki C, Govaert G (1999) Choosing models in model-based clustering and discriminant analysis. J Stat Comput Simul 64:49–71
Google Scholar
Bolstad BM, Irizarry RA, Astrand M, Speed TP (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:185–193
Article CAS PubMed Google Scholar
Botstein D, Brown P (1999) Exploring the new world of the genome with DNA microarrays. Nat Genet Suppl 21:33–37
CAS Google Scholar
Broet P, Richardson S, Radvanyi F (2002) Bayesian hierarchical model for identifying changes in gene expression from microarray experiments. J Comput Biol 9:671–683
Article CAS PubMed Google Scholar
Chen Y, Dougherty ER, Bittner ML (1997) Ratio-based decisions and the quantitative analysis of cDNA microarray images. J Biomed Optics 2:364–367
Article CAS Google Scholar
Chu G, Narasimhan B, Tibshirani R, Tusher V (2003) SAM users guide and technical document (SAM 1.21). http://www-stat.stanford.edu/~tibs/SAM/index.html
Chuaqui RF, Bonner RF, Best CJM, et al (2002) Post-analysis follow-up and validation of microarray experiments. Nat Genet Suppl 32:509–514
Article CAS Google Scholar
Churchill GA (2002) Fundamentals of experimental design for cDNA microarrays. Nat Genet 32:490–495
Article CAS PubMed Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39:1–38
Google Scholar
Dudoit S, Yang YH, Callow MJ, Speed TP (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sin 12:111–139
Google Scholar
Efron B, Tibshirani R, Goss V, Chu G (2000) Microarrays and their use in a comparative experiment. http://www-stat.stanford.edu/~tibs/research.html
Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96:1151–1160
Article Google Scholar
Fraley C, Raftery AE (1998) How many clusters? Which clustering methods?—Answers via model-based cluster analysis. Comput J 41:578–588
Google Scholar
Friemert C, Erfle V, Strauss G (1998) Preparation of radiolabeled cDNA probes with high specific activity for rapid screening of gene expression. Methods Mol Cell Biol 1:143–153
Google Scholar
Guo X, Qi H, Verfaillie CM, Pan W (2003) Statistical significance analysis of longitudinal gene expression data. Bioinformatics (in press). Available at http://www.biostat.umn.edu/cgi-bin/rrs?print+2003
Halfon MS, Michelson AM (2002) Exploring genetic regulatory networks in metazoan development: methods and models. Physiol Genomics 10:131–143
CAS PubMed Google Scholar
Huang X, Pan W (2002) Comparing three methods for variance estimation with duplicated high density oligonucleotide arrays. Funct Integr Genom 2:126–133
Article CAS Google Scholar
Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M (2002) Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18:S96–S104
PubMed Google Scholar
Ibrahim JG, Chen M-H, Gray RJ (2002) Bayesian models for gene expression with DNA microarray data. J Am Stat Assoc 97:88–99
Article Google Scholar
Ideker T, Thorsson V, Siehel AF, Hood LE (2000) Testing for differentially-expressed genes by maximum likelihood analysis of microarray data. J Comput Biol 7:805–817
CAS PubMed Google Scholar
Irizarry RA, Hobbs B, Colin F, Beazer-Barclay YD, Antonellis K, Scherf U, Speed TP (2003) Exploration, normalization and summaries of high density oligonucleotide array probe level data. Biostatistics (in press)
Kendziorski CM, Newton MA, Lan H, Gould MN (2002) On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat Med (in press) Available at http://www.biostat.wisc.edu/ ~ kendzior/
Kerr MK, Churchill GA (2001) Experimental design for gene expression microarrays. Biostatistics 2:183–202
Article Google Scholar
Kerr MK, Martin M, Churchill GA (2000) Analysis of variance for gene expression microarray data. J Computal Biol 7:819–837
Article CAS Google Scholar
Kooperberg C, Sipione S, LeBlanc ML, Strand AD, Cattaneo E, Olson JM (2002) Evaluating test-statistics to select interesting genes in microarray experiments. Hum Mol Genet 11:2223–2232
Article CAS PubMed Google Scholar
Lander ES (1999) Array of hope. Nat Genet Suppl 21:3–4
CAS Google Scholar
Lee M-LT, Kuo FC, Whitmore GA, Sklar J (2000) Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc Natl Acad Sci 97:9834–9839
CAS PubMed Google Scholar
Lee M-LT, Lu W, Whitmore GA, Beier D (2002) Models for microarray gene expression data. J Biopharmaceut Stat 12:1–19
Article Google Scholar
Lehmann EL (1986) Theory of point estimation. Wiley, New York
Li C, Wong WH (2001) Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci 98:31–36
CAS PubMed Google Scholar
Li H, Luan Y, Hong F, Li Y (2002) Statistical methods for analysis of time course gene expression data. Frontiers Biosci 7:a90–a98
CAS Google Scholar
Lin Y, Nadler ST, Attie AD, Yandell BS (2001) Mining for low-abundance transcripts in microarray data. http://www.stat.wisc.edu/ ~ yilin/
Lonnstedt I, Speed T (2002) Replicated microarray data. Stat Sin 12:31–46
Google Scholar
McLachlan GL (1987) On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Appl Stat 36:318–324
Google Scholar
McLachlan GL, Basford KE (1988) Mixture models: inference and applications to clustering. Dekker, New York
Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Naef F, Socci ND, Magnasco M (2003) A study of accuracy and precision in oligonucleotide arrays: extracting more signal at large concentrations Bioinformatics 19:178–184
Newton MA, Kendziorski CM, Richmond CS, Blattner FR, Tsui KW (2001) On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J Comput Biol 8:37–52
CAS PubMed Google Scholar
Newton MA, Noueiry A, Sarkar D, Ahlquist P (2003) Detecting differential gene expression with a semiparametric hierarchical mixture method. Technical report 1074, Department of Statistics, UW Madison. http://www.stat.wisc.edu/ ~ newton/papers/publications/
Nguyen DV, Arpat AB, Wang N, Carroll RJ (2002) DNA microarray experiments: biological and technical aspects. Biometrics 58:701–717
PubMed Google Scholar
Pan W (2002) A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 12:546–554
Article Google Scholar
Pan W (2003) On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics (in press) http://www.biostat.umn.edu/cgi-bin/rrs?print+2002
Pan W, Lin J, Le C (2002a) Model-based cluster analysis of microarray gene expression data. Genome Biol 3(2):research009.1–research009.8
Google Scholar
Pan W, Lin J, Le C (2002b) How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol 3(5):research0022.1–research0022.10
PubMed Google Scholar
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical recipes in C, the art of scientific computing, 2nd edn. Cambridge University Press, New York
Quackenbush J (2002) Microarray data normalization and transformation. Nat Genet 32:496–501
Article CAS PubMed Google Scholar
Rocke DM, Durbin B (2001) A model for measurement error for gene expression arrays. J Comput Biol 8:557–570
CAS PubMed Google Scholar
Schwartz G (1978) Estimating the dimensions of a model. Ann Stat 6:461–464
Google Scholar
Smyth GK, Yang YH, Speed T (2002) Statistical issues in cDNA microarray data analysis. http://www.stat.Berkeley.EDU/users/terry/zarray/Html/papersindex.html
Storey JD (2001) The positive false discovery rate: a Bayesian interpretation and the q-value. Technical Report, Department of Statistics, Stanford University, Stanford, Calif.
Google Scholar
Strand AD, Olson JM, Kooperberg C (2002) Estimating the statistical significance of gene expression changes observed with oligonucleotide arrays. Hum Mol Genet 11:2207–2221
Article CAS PubMed Google Scholar
Thomas JG, Olson JM, Tapscott SJ, Zhao LP (2001) An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res 11:1227–1236
CAS PubMed Google Scholar
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York
Troyanskaya OG, Garber ME, Brown PO, et al (2002) Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 18:1454–1461
Article CAS PubMed Google Scholar
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci 98:5116–5121
CAS PubMed Google Scholar
Valafar F (2002) Pattern recognition techniques in microarray data analysis—a survey. Ann NY Acad Sci 980:41–64
CAS PubMed Google Scholar
Yang YH, Buckley MJ, Dudoit S, Speed TP (2002a) Comparison of methods for image analysis on cDNA microarray data. J Comput Graph Stat 11:108–136
Article Google Scholar
Yang YH, et al (2002b) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 30:e15
PubMed Google Scholar
Zhao Y, Pan W (2003) Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments. Bioinformatics (in press) http://www.biostat.umn.edu/cgi-bin/rrs?print+2002
Zhou Y, Abagyan R (2002) Match-only integral distribution (MOID) algorithm for high-density oligonucleotide array analysis. BMC Bioinformatics 3:3
Article PubMed Google Scholar

Download references

Acknowledgements

W.P. was supported by an NIH grant (R01-HL65462) and a Minnesota Medical Foundation grant. The authors are grateful to two referees for many helpful comments and suggestions.

Author information

Authors and Affiliations

Division of Biostatistics, School of Public Health, University of Minnesota, A460 Mayo, MMC 303, 420 Delaware Street SE, Minneapolis, MN 55455–0378, USA
Wei Pan & Chap T. Le
Department of Otolaryngology, School of Medicine, University of Minnesota, Minneapolis, MN 55455-0378, USA
Jizhen Lin

Authors

Wei Pan
View author publications
You can also search for this author in PubMed Google Scholar
Jizhen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Chap T. Le
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Pan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pan, W., Lin, J. & Le, C.T. A mixture model approach to detecting differentially expressed genes with microarray data. Funct Integr Genomics 3, 117–124 (2003). https://doi.org/10.1007/s10142-003-0085-7

Download citation

Received: 25 November 2002
Accepted: 16 April 2003
Published: 01 July 2003
Issue Date: July 2003
DOI: https://doi.org/10.1007/s10142-003-0085-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A mixture model approach to detecting differentially expressed genes with microarray data

Abstract

Access this article

Similar content being viewed by others

The HRD-Algorithm: A General Method for Parametric Estimation of Two-Component Mixture Models

Adjustment for Multiplicity

Multiple Hypothesis Testing: A Methodological Overview

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A mixture model approach to detecting differentially expressed genes with microarray data

Abstract

Access this article

Similar content being viewed by others

The HRD-Algorithm: A General Method for Parametric Estimation of Two-Component Mixture Models

Adjustment for Multiplicity

Multiple Hypothesis Testing: A Methodological Overview

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation