Abstract
This paper tackles the problem of detecting abrupt changes in the mean of a heteroscedastic signal by model selection, without knowledge on the variations of the noise. A new family of change-point detection procedures is proposed, showing that cross-validation methods can be successful in the heteroscedastic framework, whereas most existing procedures are not robust to heteroscedasticity. The robustness to heteroscedasticity of the proposed procedures is supported by an extensive simulation study, together with recent partial theoretical results. An application to Comparative Genomic Hybridization (CGH) data is provided, showing that robustness to heteroscedasticity can indeed be required for their analysis.
Similar content being viewed by others
References
Abramovich, F., Benjamini, Y., Donoho, D.L., Johnstone, I.M.: Adapting to unknown sparsity by controlling the false discovery rate. Ann. Stat. 34(2), 584–653 (2006)
Akaike, H.: Statistical predictor identification. Ann. Inst. Stat. Math. 22, 203–217 (1970)
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Second International Symposium on Information Theory, Tsahkadsor, 1971, pp. 267–281. Akadémiai Kiadó, Budapest (1973)
Allen, D.M.: The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16, 125–127 (1974)
Arlot, S.: V-fold cross-validation improved: V-fold penalization. arXiv:0802.0566v2 (2008)
Arlot, S.: Model selection by resampling penalization. Electron. J. Stat. 3, 557–624 (2009) (electronic)
Arlot, S.: Choosing a penalty for model selection in heteroscedastic regression. arXiv:0812.3141 (2010)
Arlot, S., Celisse, A.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010). doi:10.1214/09-SS054
Arlot, S., Massart, P.: Data-driven calibration of penalties for least-squares regression. J. Mach. Learn. Res. 10, 245–279 (2009) (electronic)
Baraud, Y.: Model selection for regression on a fixed design. Probab. Theory Relat. Fields 117(4), 467–493 (2000)
Baraud, Y.: Model selection for regression on a random design. ESAIM Probab. Stat. 6, 127–146 (2002) (electronic)
Baraud, Y., Giraud, C., Huet, S.: Gaussian model selection with an unknown variance. Ann. Stat. 37(2), 630–672 (2009)
Barron, A., Birgé, L., Massart, P.: Risk bounds for model selection via penalization. Probab. Theory Relat. Fields 113, 301–413 (1999)
Basseville, M., Nikiforov, I.V.: Detection of Abrupt Changes: Theory and Application. Prentice Hall Information and System Sciences Series. Englewood Cliffs, Prentice Hall (1993)
Bellman, R.E., Dreyfus, S.E.: Applied Dynamic Programming. Princeton University Press, Princeton (1962)
Birgé, L., Massart, P.: From model selection to adaptive estimation. In: Pollard, D., Torgensen, E., Yang, G. (eds.) Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics, pp. 55–87. Springer, New York (1997)
Birgé, L., Massart, P.: Gaussian model selection. J. Eur. Math. Soc. 3(3), 203–268 (2001)
Birgé, L., Massart, P.: Minimal penalties for Gaussian model selection. Probab. Theory Relat. Fields 138(1–2), 33–73 (2007)
Brodsky, B.E., Darkhovsky, B.S.: Methods in Change-Point Problems. Kluwer Academic, Dordrecht (1993)
Burman, P.: A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods. Biometrika 76(3), 503–514 (1989)
Burman, P., Nolan, D.: Data-dependent estimation of prediction functions. J. Time Ser. Anal. 13(3), 189–207 (1992)
Celisse, A.: Model selection in density estimation via cross-validation. Technical Report (2008a). arXiv:0811.0802v2
Celisse, A.: Model selection via cross-validation in density estimation, regression and change-points detection. PhD thesis, University Paris-Sud 11 (2008b). http://tel.archives-ouvertes.fr/tel-00346320/
Celisse, A., Robin, S.: Nonparametric density estimation by exact leave-p-out cross-validation. Comput. Stat. Data Anal. 52(5), 2350–2368 (2008)
Celisse, A., Robin, S.: A cross-validation based estimation of the proportion of true null hypotheses. J. Stat. Plan. Inference (2010). doi:10.1016/j.jspi.2010.04.014
Chu, C.-K., Marron, J.S.: Comparison of two bandwidth selectors with dependent errors. Ann. Stat. 19(4), 1906–1918 (1991)
Comte, F., Rozenholc, Y.: Adaptive estimation of mean and volatility functions in (auto-)regressive models. Stoch. Process. Appl. 97(1), 111–145 (2002)
Dudoit, S., van der Laan, M.J.: Asymptotics of cross-validated risk estimation in estimator selection and performance assessment. Stat. Methodol. 2(2), 131–154 (2005)
Geisser, S.: A predictive approach to the random effect model. Biometrika 61(1), 101–107 (1974)
Geisser, S.: The predictive sample reuse method with applications. J. Am. Stat. Assoc. 70, 320–328 (1975)
Gendre, X.: Simultaneous estimation of the mean and the variance in heteroscedastic Gaussian regression. Electron. J. Stat. 2, 1345–1372 (2008)
Harchaoui, Z., Vallet, F., Lung-Yut-Fong, A., Cappé, O.: A regularized kernel-based approach to unsupervised audio segmentation. In: Proc. International Conference on Acoustics, Speech and Signal Processing, ICASSP, 2009
Kearns, M., Mansour, Y., Ng, A.Y., Ron, D.: An experimental and theoretical comparison of model selection methods. Mach. Learn. 7, 7–50 (1997)
Lachenbruch, P.A., Mickey, M.R.: Estimation of error rates in discriminant analysis. Technometrics 10, 1–11 (1968)
Lavielle, M.: Using penalized contrasts for the change-point problem. Signal Process. 85, 1501–1510 (2005)
Lavielle, M., Teyssière, G.: Detection of multiple change-points in multivariate time series. Lith. Math. J. 46, 287–306 (2006)
Lebarbier, É.: Detecting multiple change-points in the mean of a Gaussian process by model selection. Signal Process. 85, 717–736 (2005)
Li, K.-C.: Asymptotic optimality for C p , C L , cross-validation and generalized cross-validation: discrete index set. Ann. Stat. 15(3), 958–975 (1987)
Mallows, C.L.: Some comments on C p . Technometrics 15, 661–675 (1973)
Massart, P.: Concentration Inequalities and Model Selection. Lecture Notes in Mathematics. Springer, Berlin (2007)
Miao, B.Q., Zhao, L.C.: On detection of change points when the number is unknown. Chin. J. Appl. Probab. Stat. 9(2), 138–145 (1993)
Opsomer, J., Wang, Y., Yang, Y.: Nonparametric regression with correlated errors. Stat. Sci. 16(2), 134–153 (2001)
Picard, D.: Testing and estimating change-points in time series. Adv. Appl. Probab. 17(4), 841–867 (1985)
Picard, F.: Process segmentation/clustering application to the analysis of array CGH data. PhD thesis, Université Paris-Sud 11, 2005. http://tel.archives-ouvertes.fr/tel-00116025/fr/
Picard, F., Robin, S., Lavielle, M., Vaisse, C., Daudin, J.-J.: A statistical approach for array CGH data analysis. BMC Bioinform. 27(6) (2005) (electronic access)
Picard, F., Robin, S., Lebarbier, É., Daudin, J.-J.: A segmentation/clustering model for the analysis of array CGH data. Biometrics 63(3), 758–766 (2007)
Rissanen, J.: A universal prior for integers and estimation by minimum description length. Ann. Stat. 11(2), 416–431 (1983)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Shao, J.: An asymptotic theory for linear model selection. Stat. Sinica 7, 221–264 (1997)
Shibata, R.: An optimal selection of regression variables. Biometrika 68(1), 45–54 (1981)
Stone, C.J.: An asymptotically optimal window selection rule for kernel density estimates. Ann. Stat. 12(4), 1285–1297 (1984)
Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc., Ser. B 36, 111–147 (1974)
Stone, M.: An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. J. R. Stat. Soc. B 39(1), 44–47 (1977)
Tibshirani, R., Knight, K.: The covariance inflation criterion for adaptive model selection. J. R. Stat. Soc., Ser. B Stat. Methodol. 61(3), 529–546 (1999)
Yang, Y.: Regression with multiple candidate model: selection or mixing? Stat. Sinica 13, 783–809 (2003)
Yang, Y.: Comparing learning methods for classification. Stat. Sinica 16, 635–657 (2006)
Yang, Y.: Consistency of cross-validation for comparing regression procedures. Ann. Stat. 35(6), 2450–2473 (2007)
Yao, Y.-C.: Estimating the number of change-points via Schwarz’ criterion. Stat. Probab. Lett. 6(3), 181–189 (1988)
Zhang, N.R., Siegmund, D.O.: Modified Bayes information criterion with application to the analysis of comparative genomic hybridization data. Biometrics 63, 22–32 (2007)
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Arlot, S., Celisse, A. Segmentation of the mean of heteroscedastic data via cross-validation. Stat Comput 21, 613–632 (2011). https://doi.org/10.1007/s11222-010-9196-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-010-9196-x