Abstract
Identification of differentially expressed (DE) genes across two conditions is a common task with microarray. Most existing approaches accomplish this goal by examining each gene separately based on a model and then control the false discovery rate over all genes. We took a different approach that employs a uniform platform to simultaneously depict the dynamics of the gene trajectories for all genes and select differently expressed genes. A new Functional Principal Component (FPC) approach is developed for time-course microarray data to borrow strength across genes. The approach is flexible as the temporal trajectory of the gene expressions is modeled nonparametrically through a set of orthogonal basis functions, and often fewer basis functions are needed to capture the shape of the gene expression trajectory than existing nonparametric methods. These basis functions are estimated from the data reflecting major modes of variation in the data. The correlation structure of the gene expressions over time is also incorporated without any parametric assumptions and estimated from all genes such that the information across other genes can be shared to infer one individual gene. Estimation of the parameters is carried out by an efficient hybrid EM algorithm. The performance of the proposed method across different scenarios was compared favorably in simulation to two-way mixed-effects ANOVA and the EDGE method using B-spline basis function. Application to the real data on C. elegans developmental stages also suggested that FPC analysis combined with hybrid EM algorithm provides a computationally fast and efficient method for identifying DE genes based on time-course microarray data.
Article PDF
Similar content being viewed by others
References
Ash RB, Gardner MF (1975) Topics in stochastic processes. Academic Press, New York
Bar-Joseph Z, Gerber G, Simon I, Gifford DK, Jaakkola TS (2003) Comparing the continuous representation of time-series expression profiles to identify differentially expressed genes. Proc Natl Acad Sci USA 100(18):10146–10151
Bar-Joseph Z, Gerber GK, Gifford DK, Jaakkola TS, Simon I (2003) Continuous representations of time-series gene expression data. J Comput Biol 10(3–4):341–356
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate—A practical and powerful approach to multiple testing. J R Stat Soc B 57(1):289–300
Churchill GA (2002) Fundamentals of experimental design for cDNA microarrays. Nat Genet 32:490–495
Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96(456):1151–1160
Hong F, Li H (2006) Functional hierarchical models for identifying genes with different time-course expression profiles. Biometrics 62(2):534–544
Joyce E, Popper S, Falkow S (2009) Streptococcus pneumoniae nasopharyngeal colonization induces type i interferons and interferon-induced gene expression. BMC Genomics 10:404
Leek JT, Monsen E, Dabney AR, Storey JD (2006) Edge: Extraction and analysis of differential gene expression. Bioinformatics 22:507–508
Leng X, Müller HG (2006) Classification using functional data analysis for temporal gene expression data. Bioinformatics 22:68–76
Liu X, Müller HG (2003) Modes and clustering for time-warped gene expression profile data. Bioinformatics 19:1937–1944
Liu X, Yang M (2009) Identifying temporally differentially expressed genes through functional principal component analysis. Biostatistics 10:667–679
Newton MA, Noueiry A, Sarkar D, Ahlquist P (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5(2):155–176
Park T, Yi SG, Lee S, Lee SY, Yoo DH, Ahn JI, Lee YS (2003) Statistical tests for identifying differentially expressed genes in time-course microarray experiments. Bioinformatics 19(6):694–703
Ramsay J, Silverman B (1997) Functional data analysis. Springer, New York
Rice J, Wu C (2001) Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57:253–259
Saaf A, Halbleib J, Chen X, Yuen S, Leung S, Nelson W, Brown P (2007) Parallels between global transcriptional programs of polarizing caco-2 intestinal epithelial cells in vitro and gene expression programs in normal colon and colon cancer. Mol Biol Cell 18:4245–4260
Storey JD, Tibshirani R (2003) Statistical significance for genome-wide studies. Proc Natl Acad Sci USA 100(16):9440–9445
Storey JD, Xiao WZ, Leek JT, Tompkins RG, Davis RW (2005) Significance analysis of time course microarray experiments. Proc Natl Acad Sci USA 102(36):12837–12842
Tai Y, Speed T (2006) A multivariate empirical Bayes statistic for replicated microarray time course data. Ann Stat 34(5):2387–2412
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98(9):5116–5121
Wang J, Kim SK (2003) Global analysis of dauer gene expression in Caenorhabditis elegans. Development 130(8):1621–1634
Xu XL, Olson JM, Zhao LP (2002) A regression-based method to identify differentially expressed genes in microarray time course studies and its application in an inducible Huntington’s disease transgenic model. Hum Mol Genet 11(17):1977–1985
Yao F, Müller HG, Wang JL (2005) Functional data analysis for sparse longitudinal data. J Am Stat Assoc 100(470):577–590
Yuan M, Kendziorski C (2006) Hidden Markov models for microarray time course data in multiple biological conditions. J Am Stat Assoc 101:1323–1332
Author information
Authors and Affiliations
Corresponding author
Additional information
Research of Jane-Ling Wang is supported by NSF Grant DMS09-06813.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Chen, K., Wang, JL. Identifying Differentially Expressed Genes for Time-course Microarray Data through Functional Data Analysis. Stat Biosci 2, 95–119 (2010). https://doi.org/10.1007/s12561-010-9024-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-010-9024-z