Advertisement

Statistics and Computing

, Volume 22, Issue 2, pp 579–595 | Cite as

On-line changepoint detection and parameter estimation with application to genomic data

  • François CaronEmail author
  • Arnaud Doucet
  • Raphael Gottardo
Article

Abstract

An efficient on-line changepoint detection algorithm for an important class of Bayesian product partition models has been recently proposed by Fearnhead and Liu (in J. R. Stat. Soc. B 69, 589–605, 2007). However a severe limitation of this algorithm is that it requires the knowledge of the static parameters of the model to infer the number of changepoints and their locations. We propose here an extension of this algorithm which allows us to estimate jointly on-line these static parameters using a recursive maximum likelihood estimation strategy. This particle filter type algorithm has a computational complexity which scales linearly both in the number of data and the number of particles. We demonstrate our methodology on a synthetic and two real-world datasets from RNA transcript analysis. On simulated data, it is shown that our approach outperforms standard techniques used in this context and hence has the potential to detect novel RNA transcripts.

Keywords

Sequential Monte Carlo Particle filtering Changepoint models Product partition models Recursive parameter estimation Tiling arrays 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andrieu, C., Doucet, A., Tadic, V.: On-line parameter estimation in general state-space models. In: Proc. 44th IEEE Conference on Decision and Control (2005) Google Scholar
  2. Azzalini, A., Capitanio, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. B 65, 367–389 (2003) MathSciNetzbMATHCrossRefGoogle Scholar
  3. Barry, D., Hartigan, J.: Product partition models for change point problems. Ann. Stat. 20, 260–279 (1992) MathSciNetzbMATHCrossRefGoogle Scholar
  4. Benveniste, A., Metivier, M., Priouret, P.: Adaptive Algorithms and Stochastic Approximations. Springer, Berlin (1990) zbMATHGoogle Scholar
  5. Bertone, P., Stolc, V., Royce, T.E., Rozowsky, J.S., Urban, A.E., Zhu, X., Rinn, J.L., Tongprasit, W., Samanta, M., Weissman, S., Gerstein, M.B., Snyder, M.: Global identification of human transcribed sequences with genome tiling arrays. Science 306(5705), 2242–2246 (2004) CrossRefGoogle Scholar
  6. Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003) CrossRefGoogle Scholar
  7. Carlin, B., Gelfand, A., Smith, A.: Hierarchical Bayesian analysis of changepoint problems. Appl. Stat. 41, 389–405 (1992) zbMATHCrossRefGoogle Scholar
  8. Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., Long, J., Stern, D., Tammana, H., Helt, G., Sementchenko, V., Piccolboni, A., Bekiranov, S., Bailey, D.K., Ganesh, M., Ghosh, S., Bell, I., Gerhard, D.S., Gingeras, T.R.: Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308(5725), 1149–1154 (2005) CrossRefGoogle Scholar
  9. Chib, S.: Estimation and comparison of multiple change-point models. J. Econom. 86, 221–241 (1998) MathSciNetzbMATHCrossRefGoogle Scholar
  10. Chopin, N.: Dynamic detection of change points in long time series. Ann. Inst. Math. Sci. 59, 349–366 (2007) MathSciNetzbMATHCrossRefGoogle Scholar
  11. Colella, S., Yau, C., Taylor, J., Mirza, G., Butler, H., Clouston, P., Bassett, A., Seller, A., Holmes, C., Ragoussis, J.: QuantiSNP: an objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 35, 2013–2025 (2007) CrossRefGoogle Scholar
  12. David, L., Huber, W., Granovskaia, M., Toedling, J., Palm, C.J., Bofkin, L., Jones, T., Davis, R.W., Steinmetz, L.M.: A high-resolution map of transcription in the yeast genome. Proc. Natl. Acad. Sci. USA 103(14), 5320–5325 (2006) CrossRefGoogle Scholar
  13. De Iorio, M., de Silva, E., Stumpf, M.: Recombination hotspots as a point process. Philos. Trans. R. Soc. B 360, 1597–1603 (2005) CrossRefGoogle Scholar
  14. Do, K., Muller, P., Tang, F.: A Bayesian mixture model for differential gene expression. J. R. Stat. Soc. C 54, 627–644 (2005) MathSciNetzbMATHCrossRefGoogle Scholar
  15. Efron, B.: Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Am. Stat. Assoc. 99(465), 96–104 (2004) MathSciNetzbMATHCrossRefGoogle Scholar
  16. Fearnhead, P.: MCMC, sufficient statistics and particle filter. J. Comput. Graph. Stat. 11, 848–862 (2002) MathSciNetCrossRefGoogle Scholar
  17. Fearnhead, P.: Exact Bayesian curve fitting and signal segmentation. IEEE Trans. Signal Process. 53, 2160–2166 (2005) MathSciNetCrossRefGoogle Scholar
  18. Fearnhead, P.: Exact and efficient Bayesian inference for multiple changepoint problems. Stat. Comput. 16, 203–213 (2006) MathSciNetCrossRefGoogle Scholar
  19. Fearnhead, P., Liu, Z.: On-line inference for multiple change points problems. J. R. Stat. Soc. B 69, 589–605 (2007) MathSciNetCrossRefGoogle Scholar
  20. Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R.A., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J.Y.H., Zhang, J.: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5(10), R80 (2004) CrossRefGoogle Scholar
  21. Gottardo, R., Pannucci, J.A., Kuske, C.R., Brettin, T.S.: Statistical analysis of microarray data: a Bayesian approach. Biostatistics 4(4), 597–620 (2003) zbMATHCrossRefGoogle Scholar
  22. Gottardo, R., Raftery, A.E., Yeung, K.Y., Bumgarner, R.E.: Bayesian robust inference for differential gene expression in microarrays with multiple samples. Biometrics 62(1), 10–18 (2006) MathSciNetzbMATHCrossRefGoogle Scholar
  23. Huber, W., Toedling, J., Steinetz, L.M.: Transcript mapping with high-density oligonucleotide tiling arrays. Bioinformatics 22(16), 1963–1970 (2006) CrossRefGoogle Scholar
  24. Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., Speed, T.P.: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4(2), 249–264 (2003) zbMATHCrossRefGoogle Scholar
  25. Johnson, T., Elashoff, R., Harkema, S.: A Bayesian changepoint analysis of electromyographic data: detecting muscle activation patterns and associated applications. Biostatistics 4, 143–164 (2003) zbMATHCrossRefGoogle Scholar
  26. Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., Fodor, S.P.A., Gingeras, T.R.: Large-scale transcriptional activity in chromosomes 21 and 22. Science 296(5569), 916–919 (2002) CrossRefGoogle Scholar
  27. Kass, R., Raftery, A.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995) zbMATHCrossRefGoogle Scholar
  28. Kendziorski, C.M., Newton, M.A., Lan, H., Gould, M.N.: On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med. 22(24), 3899–3914 (2003) CrossRefGoogle Scholar
  29. Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., Chee, M.S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., Brown, E.L.: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14(13), 1675–1680 (1996) CrossRefGoogle Scholar
  30. Newton, M.A., Kendziorski, C.M., Richmond, C.S., Blattner, F.R., Tsui, K.W.: On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8(1), 37–52 (2001) CrossRefGoogle Scholar
  31. Poyiadjis, G., Doucet, A., Singh, S.S.: Particle approximations of the score and observed information matrix in state space models with application to parameter estimation. Biometrica 98(1), 65–80 (2011) zbMATHCrossRefGoogle Scholar
  32. Stephens, D.: Bayesian retrospective multiple-changepoint identification. Appl. Stat. 43, 159–178 (1994) zbMATHCrossRefGoogle Scholar
  33. Xuan, X., Murphy, K.: Modeling changing dependency structure in multivariate time series. In: International Conference on Machine Learning (2007) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • François Caron
    • 1
    Email author
  • Arnaud Doucet
    • 2
  • Raphael Gottardo
    • 3
  1. 1.INRIA Bordeaux Sud-Ouest and Institut de Matématiques de BordeauxUniversité de BordeauxTalenceFrance
  2. 2.Departments of Statistics & Computer ScienceUniversity of British ColumbiaVancouverCanada
  3. 3.Vaccine and Infectious Disease and Public Health Sciences DivisionsFred Hutchinson Cancer Research CenterSeattleUSA

Personalised recommendations