On-line changepoint detection and parameter estimation with application to genomic data

Abstract

An efficient on-line changepoint detection algorithm for an important class of Bayesian product partition models has been recently proposed by Fearnhead and Liu (in J. R. Stat. Soc. B 69, 589–605, 2007). However a severe limitation of this algorithm is that it requires the knowledge of the static parameters of the model to infer the number of changepoints and their locations. We propose here an extension of this algorithm which allows us to estimate jointly on-line these static parameters using a recursive maximum likelihood estimation strategy. This particle filter type algorithm has a computational complexity which scales linearly both in the number of data and the number of particles. We demonstrate our methodology on a synthetic and two real-world datasets from RNA transcript analysis. On simulated data, it is shown that our approach outperforms standard techniques used in this context and hence has the potential to detect novel RNA transcripts.

This is a preview of subscription content, access via your institution.

References

  1. Andrieu, C., Doucet, A., Tadic, V.: On-line parameter estimation in general state-space models. In: Proc. 44th IEEE Conference on Decision and Control (2005)

    Google Scholar 

  2. Azzalini, A., Capitanio, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. B 65, 367–389 (2003)

    MathSciNet  MATH  Article  Google Scholar 

  3. Barry, D., Hartigan, J.: Product partition models for change point problems. Ann. Stat. 20, 260–279 (1992)

    MathSciNet  MATH  Article  Google Scholar 

  4. Benveniste, A., Metivier, M., Priouret, P.: Adaptive Algorithms and Stochastic Approximations. Springer, Berlin (1990)

    MATH  Google Scholar 

  5. Bertone, P., Stolc, V., Royce, T.E., Rozowsky, J.S., Urban, A.E., Zhu, X., Rinn, J.L., Tongprasit, W., Samanta, M., Weissman, S., Gerstein, M.B., Snyder, M.: Global identification of human transcribed sequences with genome tiling arrays. Science 306(5705), 2242–2246 (2004)

    Article  Google Scholar 

  6. Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003)

    Article  Google Scholar 

  7. Carlin, B., Gelfand, A., Smith, A.: Hierarchical Bayesian analysis of changepoint problems. Appl. Stat. 41, 389–405 (1992)

    MATH  Article  Google Scholar 

  8. Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., Long, J., Stern, D., Tammana, H., Helt, G., Sementchenko, V., Piccolboni, A., Bekiranov, S., Bailey, D.K., Ganesh, M., Ghosh, S., Bell, I., Gerhard, D.S., Gingeras, T.R.: Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308(5725), 1149–1154 (2005)

    Article  Google Scholar 

  9. Chib, S.: Estimation and comparison of multiple change-point models. J. Econom. 86, 221–241 (1998)

    MathSciNet  MATH  Article  Google Scholar 

  10. Chopin, N.: Dynamic detection of change points in long time series. Ann. Inst. Math. Sci. 59, 349–366 (2007)

    MathSciNet  MATH  Article  Google Scholar 

  11. Colella, S., Yau, C., Taylor, J., Mirza, G., Butler, H., Clouston, P., Bassett, A., Seller, A., Holmes, C., Ragoussis, J.: QuantiSNP: an objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 35, 2013–2025 (2007)

    Article  Google Scholar 

  12. David, L., Huber, W., Granovskaia, M., Toedling, J., Palm, C.J., Bofkin, L., Jones, T., Davis, R.W., Steinmetz, L.M.: A high-resolution map of transcription in the yeast genome. Proc. Natl. Acad. Sci. USA 103(14), 5320–5325 (2006)

    Article  Google Scholar 

  13. De Iorio, M., de Silva, E., Stumpf, M.: Recombination hotspots as a point process. Philos. Trans. R. Soc. B 360, 1597–1603 (2005)

    Article  Google Scholar 

  14. Do, K., Muller, P., Tang, F.: A Bayesian mixture model for differential gene expression. J. R. Stat. Soc. C 54, 627–644 (2005)

    MathSciNet  MATH  Article  Google Scholar 

  15. Efron, B.: Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Am. Stat. Assoc. 99(465), 96–104 (2004)

    MathSciNet  MATH  Article  Google Scholar 

  16. Fearnhead, P.: MCMC, sufficient statistics and particle filter. J. Comput. Graph. Stat. 11, 848–862 (2002)

    MathSciNet  Article  Google Scholar 

  17. Fearnhead, P.: Exact Bayesian curve fitting and signal segmentation. IEEE Trans. Signal Process. 53, 2160–2166 (2005)

    MathSciNet  Article  Google Scholar 

  18. Fearnhead, P.: Exact and efficient Bayesian inference for multiple changepoint problems. Stat. Comput. 16, 203–213 (2006)

    MathSciNet  Article  Google Scholar 

  19. Fearnhead, P., Liu, Z.: On-line inference for multiple change points problems. J. R. Stat. Soc. B 69, 589–605 (2007)

    MathSciNet  Article  Google Scholar 

  20. Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R.A., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J.Y.H., Zhang, J.: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5(10), R80 (2004)

    Article  Google Scholar 

  21. Gottardo, R., Pannucci, J.A., Kuske, C.R., Brettin, T.S.: Statistical analysis of microarray data: a Bayesian approach. Biostatistics 4(4), 597–620 (2003)

    MATH  Article  Google Scholar 

  22. Gottardo, R., Raftery, A.E., Yeung, K.Y., Bumgarner, R.E.: Bayesian robust inference for differential gene expression in microarrays with multiple samples. Biometrics 62(1), 10–18 (2006)

    MathSciNet  MATH  Article  Google Scholar 

  23. Huber, W., Toedling, J., Steinetz, L.M.: Transcript mapping with high-density oligonucleotide tiling arrays. Bioinformatics 22(16), 1963–1970 (2006)

    Article  Google Scholar 

  24. Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., Speed, T.P.: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4(2), 249–264 (2003)

    MATH  Article  Google Scholar 

  25. Johnson, T., Elashoff, R., Harkema, S.: A Bayesian changepoint analysis of electromyographic data: detecting muscle activation patterns and associated applications. Biostatistics 4, 143–164 (2003)

    MATH  Article  Google Scholar 

  26. Kapranov, P., Cawley, S.E., Drenkow, J., Bekiranov, S., Strausberg, R.L., Fodor, S.P.A., Gingeras, T.R.: Large-scale transcriptional activity in chromosomes 21 and 22. Science 296(5569), 916–919 (2002)

    Article  Google Scholar 

  27. Kass, R., Raftery, A.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995)

    MATH  Article  Google Scholar 

  28. Kendziorski, C.M., Newton, M.A., Lan, H., Gould, M.N.: On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med. 22(24), 3899–3914 (2003)

    Article  Google Scholar 

  29. Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., Chee, M.S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., Brown, E.L.: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14(13), 1675–1680 (1996)

    Article  Google Scholar 

  30. Newton, M.A., Kendziorski, C.M., Richmond, C.S., Blattner, F.R., Tsui, K.W.: On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8(1), 37–52 (2001)

    Article  Google Scholar 

  31. Poyiadjis, G., Doucet, A., Singh, S.S.: Particle approximations of the score and observed information matrix in state space models with application to parameter estimation. Biometrica 98(1), 65–80 (2011)

    MATH  Article  Google Scholar 

  32. Stephens, D.: Bayesian retrospective multiple-changepoint identification. Appl. Stat. 43, 159–178 (1994)

    MATH  Article  Google Scholar 

  33. Xuan, X., Murphy, K.: Modeling changing dependency structure in multivariate time series. In: International Conference on Machine Learning (2007)

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to François Caron.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Caron, F., Doucet, A. & Gottardo, R. On-line changepoint detection and parameter estimation with application to genomic data. Stat Comput 22, 579–595 (2012). https://doi.org/10.1007/s11222-011-9248-x

Download citation

Keywords

  • Sequential Monte Carlo
  • Particle filtering
  • Changepoint models
  • Product partition models
  • Recursive parameter estimation
  • Tiling arrays