Skip to main content
Log in

Segmentation of the mean of heteroscedastic data via cross-validation

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

This paper tackles the problem of detecting abrupt changes in the mean of a heteroscedastic signal by model selection, without knowledge on the variations of the noise. A new family of change-point detection procedures is proposed, showing that cross-validation methods can be successful in the heteroscedastic framework, whereas most existing procedures are not robust to heteroscedasticity. The robustness to heteroscedasticity of the proposed procedures is supported by an extensive simulation study, together with recent partial theoretical results. An application to Comparative Genomic Hybridization (CGH) data is provided, showing that robustness to heteroscedasticity can indeed be required for their analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abramovich, F., Benjamini, Y., Donoho, D.L., Johnstone, I.M.: Adapting to unknown sparsity by controlling the false discovery rate. Ann. Stat. 34(2), 584–653 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  • Akaike, H.: Statistical predictor identification. Ann. Inst. Stat. Math. 22, 203–217 (1970)

    Article  MATH  MathSciNet  Google Scholar 

  • Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Second International Symposium on Information Theory, Tsahkadsor, 1971, pp. 267–281. Akadémiai Kiadó, Budapest (1973)

    Google Scholar 

  • Allen, D.M.: The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16, 125–127 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  • Arlot, S.: V-fold cross-validation improved: V-fold penalization. arXiv:0802.0566v2 (2008)

  • Arlot, S.: Model selection by resampling penalization. Electron. J. Stat. 3, 557–624 (2009) (electronic)

    Article  MathSciNet  Google Scholar 

  • Arlot, S.: Choosing a penalty for model selection in heteroscedastic regression. arXiv:0812.3141 (2010)

  • Arlot, S., Celisse, A.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010). doi:10.1214/09-SS054

    Article  MATH  MathSciNet  Google Scholar 

  • Arlot, S., Massart, P.: Data-driven calibration of penalties for least-squares regression. J. Mach. Learn. Res. 10, 245–279 (2009) (electronic)

    Google Scholar 

  • Baraud, Y.: Model selection for regression on a fixed design. Probab. Theory Relat. Fields 117(4), 467–493 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  • Baraud, Y.: Model selection for regression on a random design. ESAIM Probab. Stat. 6, 127–146 (2002) (electronic)

    Article  MATH  MathSciNet  Google Scholar 

  • Baraud, Y., Giraud, C., Huet, S.: Gaussian model selection with an unknown variance. Ann. Stat. 37(2), 630–672 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  • Barron, A., Birgé, L., Massart, P.: Risk bounds for model selection via penalization. Probab. Theory Relat. Fields 113, 301–413 (1999)

    Article  MATH  Google Scholar 

  • Basseville, M., Nikiforov, I.V.: Detection of Abrupt Changes: Theory and Application. Prentice Hall Information and System Sciences Series. Englewood Cliffs, Prentice Hall (1993)

    Google Scholar 

  • Bellman, R.E., Dreyfus, S.E.: Applied Dynamic Programming. Princeton University Press, Princeton (1962)

    MATH  Google Scholar 

  • Birgé, L., Massart, P.: From model selection to adaptive estimation. In: Pollard, D., Torgensen, E., Yang, G. (eds.) Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics, pp. 55–87. Springer, New York (1997)

    Google Scholar 

  • Birgé, L., Massart, P.: Gaussian model selection. J. Eur. Math. Soc. 3(3), 203–268 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Birgé, L., Massart, P.: Minimal penalties for Gaussian model selection. Probab. Theory Relat. Fields 138(1–2), 33–73 (2007)

    Article  MATH  Google Scholar 

  • Brodsky, B.E., Darkhovsky, B.S.: Methods in Change-Point Problems. Kluwer Academic, Dordrecht (1993)

    Google Scholar 

  • Burman, P.: A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods. Biometrika 76(3), 503–514 (1989)

    MATH  MathSciNet  Google Scholar 

  • Burman, P., Nolan, D.: Data-dependent estimation of prediction functions. J. Time Ser. Anal. 13(3), 189–207 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  • Celisse, A.: Model selection in density estimation via cross-validation. Technical Report (2008a). arXiv:0811.0802v2

  • Celisse, A.: Model selection via cross-validation in density estimation, regression and change-points detection. PhD thesis, University Paris-Sud 11 (2008b). http://tel.archives-ouvertes.fr/tel-00346320/

  • Celisse, A., Robin, S.: Nonparametric density estimation by exact leave-p-out cross-validation. Comput. Stat. Data Anal. 52(5), 2350–2368 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Celisse, A., Robin, S.: A cross-validation based estimation of the proportion of true null hypotheses. J. Stat. Plan. Inference (2010). doi:10.1016/j.jspi.2010.04.014

    MathSciNet  Google Scholar 

  • Chu, C.-K., Marron, J.S.: Comparison of two bandwidth selectors with dependent errors. Ann. Stat. 19(4), 1906–1918 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  • Comte, F., Rozenholc, Y.: Adaptive estimation of mean and volatility functions in (auto-)regressive models. Stoch. Process. Appl. 97(1), 111–145 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  • Dudoit, S., van der Laan, M.J.: Asymptotics of cross-validated risk estimation in estimator selection and performance assessment. Stat. Methodol. 2(2), 131–154 (2005)

    Article  MathSciNet  Google Scholar 

  • Geisser, S.: A predictive approach to the random effect model. Biometrika 61(1), 101–107 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  • Geisser, S.: The predictive sample reuse method with applications. J. Am. Stat. Assoc. 70, 320–328 (1975)

    Article  MATH  Google Scholar 

  • Gendre, X.: Simultaneous estimation of the mean and the variance in heteroscedastic Gaussian regression. Electron. J. Stat. 2, 1345–1372 (2008)

    Article  MathSciNet  Google Scholar 

  • Harchaoui, Z., Vallet, F., Lung-Yut-Fong, A., Cappé, O.: A regularized kernel-based approach to unsupervised audio segmentation. In: Proc. International Conference on Acoustics, Speech and Signal Processing, ICASSP, 2009

  • Kearns, M., Mansour, Y., Ng, A.Y., Ron, D.: An experimental and theoretical comparison of model selection methods. Mach. Learn. 7, 7–50 (1997)

    Article  Google Scholar 

  • Lachenbruch, P.A., Mickey, M.R.: Estimation of error rates in discriminant analysis. Technometrics 10, 1–11 (1968)

    Article  MathSciNet  Google Scholar 

  • Lavielle, M.: Using penalized contrasts for the change-point problem. Signal Process. 85, 1501–1510 (2005)

    Article  MATH  Google Scholar 

  • Lavielle, M., Teyssière, G.: Detection of multiple change-points in multivariate time series. Lith. Math. J. 46, 287–306 (2006)

    Article  MATH  Google Scholar 

  • Lebarbier, É.: Detecting multiple change-points in the mean of a Gaussian process by model selection. Signal Process. 85, 717–736 (2005)

    Article  MATH  Google Scholar 

  • Li, K.-C.: Asymptotic optimality for C p , C L , cross-validation and generalized cross-validation: discrete index set. Ann. Stat. 15(3), 958–975 (1987)

    Article  MATH  Google Scholar 

  • Mallows, C.L.: Some comments on C p . Technometrics 15, 661–675 (1973)

    Article  MATH  Google Scholar 

  • Massart, P.: Concentration Inequalities and Model Selection. Lecture Notes in Mathematics. Springer, Berlin (2007)

    MATH  Google Scholar 

  • Miao, B.Q., Zhao, L.C.: On detection of change points when the number is unknown. Chin. J. Appl. Probab. Stat. 9(2), 138–145 (1993)

    MATH  MathSciNet  Google Scholar 

  • Opsomer, J., Wang, Y., Yang, Y.: Nonparametric regression with correlated errors. Stat. Sci. 16(2), 134–153 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Picard, D.: Testing and estimating change-points in time series. Adv. Appl. Probab. 17(4), 841–867 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  • Picard, F.: Process segmentation/clustering application to the analysis of array CGH data. PhD thesis, Université Paris-Sud 11, 2005. http://tel.archives-ouvertes.fr/tel-00116025/fr/

  • Picard, F., Robin, S., Lavielle, M., Vaisse, C., Daudin, J.-J.: A statistical approach for array CGH data analysis. BMC Bioinform. 27(6) (2005) (electronic access)

  • Picard, F., Robin, S., Lebarbier, É., Daudin, J.-J.: A segmentation/clustering model for the analysis of array CGH data. Biometrics 63(3), 758–766 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Rissanen, J.: A universal prior for integers and estimation by minimum description length. Ann. Stat. 11(2), 416–431 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Article  MATH  Google Scholar 

  • Shao, J.: An asymptotic theory for linear model selection. Stat. Sinica 7, 221–264 (1997)

    MATH  Google Scholar 

  • Shibata, R.: An optimal selection of regression variables. Biometrika 68(1), 45–54 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  • Stone, C.J.: An asymptotically optimal window selection rule for kernel density estimates. Ann. Stat. 12(4), 1285–1297 (1984)

    Article  MATH  Google Scholar 

  • Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc., Ser. B 36, 111–147 (1974)

    MATH  Google Scholar 

  • Stone, M.: An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. J. R. Stat. Soc. B 39(1), 44–47 (1977)

    MATH  Google Scholar 

  • Tibshirani, R., Knight, K.: The covariance inflation criterion for adaptive model selection. J. R. Stat. Soc., Ser. B Stat. Methodol. 61(3), 529–546 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Yang, Y.: Regression with multiple candidate model: selection or mixing? Stat. Sinica 13, 783–809 (2003)

    MATH  Google Scholar 

  • Yang, Y.: Comparing learning methods for classification. Stat. Sinica 16, 635–657 (2006)

    MATH  Google Scholar 

  • Yang, Y.: Consistency of cross-validation for comparing regression procedures. Ann. Stat. 35(6), 2450–2473 (2007)

    Article  MATH  Google Scholar 

  • Yao, Y.-C.: Estimating the number of change-points via Schwarz’ criterion. Stat. Probab. Lett. 6(3), 181–189 (1988)

    Article  MATH  Google Scholar 

  • Zhang, N.R., Siegmund, D.O.: Modified Bayes information criterion with application to the analysis of comparative genomic hybridization data. Biometrics 63, 22–32 (2007)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alain Celisse.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 1.20 MB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arlot, S., Celisse, A. Segmentation of the mean of heteroscedastic data via cross-validation. Stat Comput 21, 613–632 (2011). https://doi.org/10.1007/s11222-010-9196-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-010-9196-x

Keywords

Navigation