Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Mixtures of regressions with changepoints

Abstract

We introduce an extension to the mixture of linear regressions model where changepoints are present. Such a model provides greater flexibility over a standard changepoint regression model if the data are believed to not only have changepoints present, but are also believed to belong to two or more unobservable categories. This model can provide additional insight into data that are already modeled using mixtures of regressions, but where the presence of changepoints has not yet been investigated. After discussing the mixture of regressions with changepoints model, we then develop an Expectation/Conditional Maximization (ECM) algorithm for maximum likelihood estimation. Two simulation studies illustrate the performance of our ECM algorithm and we analyze a real dataset.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

References

  1. Aitkin, M., Rubin, D.B.: Estimation and hypothesis testing in finite mixture models. J. R. Stat. Soc., Ser. B, Stat. Methodol. 47(1), 67–75 (1985)

  2. Allman, E.S., Matias, C., Rhodes, J.A.: Identifiability of parameters in latent structure models with many observed variables. Ann. Stat. 37(6A), 3099–3132 (2009)

  3. Andrews, D.W.K., Lee, I., Ploberger, W.: Optimal changepoint tests for normal linear regression. J. Econom. 70(1), 9–38 (1996)

  4. Benaglia, T., Chauveau, D., Hunter, D.R., Young, D.S.: mixtools: an R package for analyzing finite mixture models. J. Stat. Softw. 32(6), 1–29 (2009). http://www.jstatsoft.org/v32/i06/

  5. Betts, M., Forbes, G., Diamond, A.: Thresholds in songbird occurrence in relation to landscape structures. Conserv. Biol. 21(4), 1046–1058 (2007)

  6. Brinkman, N.D.: Ethanol fuel—a single-cylinder engine study of efficiency and exhaust emissions. In: S. A. E. Transactions, p. 68 (1981)

  7. Cohen, E.: Inharmonic tone perception. PhD dissertation, Stanford University (1980). Unpublished

  8. Csörgő, M., Horváth, L.: Limit Theorems in Change-Point Analysis. Wiley, New York (1998)

  9. Davis, R.A., Lee, T.C.M., Rodriguez-Yam, G.A.: Testing for a change in the parameter values and order of an autoregressive model. Ann. Stat. 101(1), 223–239 (2006)

  10. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., Ser. B, Stat. Methodol. 39(1), 1–38 (1977)

  11. DeSarbo, W.S., Cron, W.L.: A maximum likelihood methodology for clusterwise linear regression. J. Classif. 5(2), 249–282 (1988)

  12. DeVeaux, R.D.: Mixtures of linear regressions. Comput. Stat. Data Anal. 8(3), 227–245 (1989)

  13. Fong, D.K.H., DeSarbo, W.S.: A Bayesian methodology for simultaneously detecting and estimating regime change points and variable selection in multiple regression models for marketing research. Quant. Mark. Econ. 5(4), 427–453 (2007)

  14. Franke, J., Stockis, J.-P., Tadjuidje-Kamgaing, J., Li, W.K.: Mixtures of nonparametric autoregressions. J. Nonparametr. Stat. 23(2), 287–303 (2011)

  15. Gombay, E.: Change detection in autoregressive time series. J. Multivar. Anal. 99(3), 451–464 (2008)

  16. Hennig, C.: Identifiability of models for clusterwise linear regression. J. Classif. 17(2), 273–296 (2000)

  17. Henry, M., Kitamura, Y., Salanié, B.: Identifying finite mixtures in econometric models. Technical Report 1767, Cowles Foundation for Research in Economics, Yale University (2010)

  18. Hinkley, D.V.: Inference about the intersection in two-phase regression. Biometrika 56(3), 495–504 (1969)

  19. Hunter, D.R., Young, D.S.: Semiparametric mixtures of regressions. J. Nonparametr. Stat. 24(1), 19–38 (2012)

  20. Hurn, M., Justel, A., Robert, C.P.: Estimating mixtures of regressions. J. Comput. Graph. Stat. 12(1), 55–79 (2003)

  21. Hurvich, C.M., Simonoff, J.S., Tsai, C.: Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J. R. Stat. Soc., Ser. B, Stat. Methodol. 60(2), 271–293 (1998)

  22. Julious, S.A.: Inference and estimation in a changepoint regression problem. J. R. Stat. Soc., Ser. D, Stat. 50(1), 51–61 (2001)

  23. Kiefer, N.M.: Discrete parameter variation: efficient estimation of a switching regression model. Econometrica 46(2), 427–434 (1978)

  24. Kutner, M.H., Nachtsheim, C.J., Neter, J.: Applied Linear Regression Models, 4th edn. McGraw-Hill/Irwin, Boston (2004)

  25. Leisch, F.: FlexMix: a general framework for finite mixture models and latent class regressions in R. J. Stat. Softw. 11(8), 1–18 (2004). http://www.jstatsoft.org/v11/i08/

  26. Liu, S., Wu, S., Zidek, J.V.: On segmented multivariate regression. Stat. Sin. 7(2), 497–525 (1997)

  27. Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc., Ser. B, Stat. Methodol. 44(2), 226–233 (1982)

  28. Martin-Magniette, M.L., Mary-Huard, T., Bérard, C., Robin, S.: ChIPmix: mixture model of regressions for two-color ChIP-chip analysis. Bioinformatics 24(16), 181–186 (2008)

  29. McLachlan, G.J.: On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Appl. Stat. 36(3), 318–324 (1987)

  30. McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New York (2008)

  31. McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)

  32. Meng, X.L.: On the rate of convergence of the ECM algorithm. Ann. Stat. 22(1), 326–339 (1994)

  33. Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2), 267–278 (1993)

  34. Muggeo, V.M.R.: Estimating regression models with unknown break-points. Stat. Med. 22(19), 3055–3071 (2003)

  35. Muggeo, V.M.R.: Segmented: an R package to fit regression models with broken-line relationships. R News 8(1), 20–25 (2008)

  36. Ng, S.K., McLachlan, G.J.: Using the EM algorithm to train neural networks: misconceptions and a new algorithm for multiclass classification. IEEE Trans. Neural Netw. 15(3), 738–749 (2004)

  37. Park, C.-W., Kim, W.-C.: Estimation of a regression function with a sharp change point using boundary wavelets. Stat. Probab. Lett. 66(4), 435–448 (2004)

  38. Peña, D., Rodrìguez, J., Tiao, G.C.: Identifying mixtures of regression equations by the SAR procedure. In: Bernardo, J.M., Bayarri, M.J., Berger, J.O., Dawid, A.P., Heckerman, D., Smith, A.F.M., West, M. (eds.) Bayesian Statistics, vol. 7, pp. 327–348. Clarendon, Oxford (2003)

  39. Quandt, R.E.: A new approach to estimating switching regressions. J Am. Stat. Assoc. 67(338), 306–310 (1972)

  40. Quandt, R.E., Ramsey, J.B.: Estimating mixtures of normal distributions and switching regressions. J. Am. Stat. Assoc. 73(364), 730–738 (1978)

  41. Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components. J. R. Stat. Soc., Ser. B, Stat. Methodol. 59(4), 731–792 (1997)

  42. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

  43. Sen, A.K., Srivastava, M.S.: Regression Analysis: Theory, Methods, and Applications. Springer, New York (1990)

  44. Shao, X., Zhang, X.: Testing for change points in time series. J. Am. Stat. Assoc. 105(491), 1228–1240 (2010)

  45. Shewhart, W.A.: Statistical Method from the Viewpoint of Quality Control. Dover, Washington (1939)

  46. Sprent, P.: Some hypotheses concerning two phase regression lines. Biometrics 17(4), 634–645 (1961)

  47. Stephens, M.: Bayesian analysis of mixture models with an unknown number of components—an alternative to reversible jump methods. Ann. Stat. 28(1), 40–74 (2000)

  48. Tiwari, R.C., Cronin, K.A., Davies, W., Feuer, E.J., Yu, B., Chib, S.: Bayesian model selection for joinpoint regression with application to age-adjusted cancer rates. J. R. Stat. Soc., Ser. C, Appl. Stat. 54(5), 919–939 (2005)

  49. Turner, T.R.: Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. Appl. Stat. 49(3), 371–384 (2000)

  50. Turner, T.R.: Mixreg: functions to fit mixtures of regressions (2011). http://CRAN.R-project.org/package=mixreg. R Package Version 0.0-4

  51. Ulm, K.: A statistical method for assessing a threshold in epidemiological studies. Stat. Med. 10(3), 341–349 (1991)

  52. Viele, K., Tong, B.: Modeling with mixtures of linear regressions. Stat. Comput. 12(4), 315–330 (2002)

  53. Worsley, K.J.: Testing for a two-phase multiple regression. Technometrics 25(1), 35–42 (1983)

  54. Young, D.S., Hunter, D.R.: Mixtures of regressions with predictor-dependent mixing proportions. Comput. Stat. Data Anal. 54(10), 2253–2266 (2010)

  55. Zeileis, A., Leisch, F., Hornik, K., Kleiber, C.: Strucchange: an R package for testing for structural change in linear regression models. J. Stat. Softw. 7(2), 1–38 (2002). http://www.jstatsoft.org/v07/i02/

  56. Zhao, J.H., Yu, P.L.: Fast ML estimation of the mixture of factor analyzers via an ECM algorithm. IEEE Trans. Neural Netw. 19(11), 1956–1961 (2008)

Download references

Acknowledgements

We are grateful to two anonymous referees and an Associate Editor for numerous helpful comments during the preparation of this article.

Author information

Correspondence to Derek S. Young.

Additional information

Disclaimer: This report is released to inform interested parties of research and to encourage discussion. The views expressed are those of the author and not necessarily those of the U.S. Census Bureau.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Young, D.S. Mixtures of regressions with changepoints. Stat Comput 24, 265–281 (2014). https://doi.org/10.1007/s11222-012-9369-x

Download citation

Keywords

  • Breakpoints
  • ECM algorithm
  • Finite mixture models
  • Identifiability
  • Maximum likelihood estimation
  • Piecewise linear regression