Skip to main content
Log in

Model Selection for Variable Length Markov Chains and Tuning the Context Algorithm

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

We consider the model selection problem in the class of stationary variable length Markov chains (VLMC) on a finite space. The processes in this class are still Markovian of high order, but with memory of variable length. Various aims in selecting a VLMC can be formalized with different non-equivalent risks, such as final prediction error or expected Kullback-Leibler information. We consider the asymptotic behavior of different risk functions and show how they can be generally estimated with the same resampling strategy. Such estimated risks then yield new model selection criteria. In particular, we obtain a data-driven tuning of Rissanen's tree structured context algorithm which is a computationally feasible procedure for selection and estimation of a VLMC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akaike, H. (1969). Fitting autoregressive models for prediction, Ann. Inst. Statist. Math., 21, 243–247.

    Google Scholar 

  • Akaike, H. (1970). Statistical predictor identification, Ann. Inst. Statist. Math., 22, 202–217.

    Google Scholar 

  • Akaike, H. (1973). Information theory and the maximum likelihood principle, 2nd International Symposium on Information Theory (eds. B. N. Petrov and F. Csàki), 267–281, Akademiai Kiàdo, Budapest.

    Google Scholar 

  • Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees, Wadsworth, Belmont, CA.

    Google Scholar 

  • Bühlmann, P. (1999). Efficient and adaptive post-model-selection estimators, J. Statist. Plann. Inference, 79, 1–9.

    Google Scholar 

  • Bühlmann, P. and Wyner, A. J. (1999). Variable length Markov chains, Ann. Statist., 27, 480–513.

    Google Scholar 

  • Bunton, S. (1997). A percolating state selector for suffix-tree context models, Proc. of the 1997 Data Compression Conference, Snowbird, Utah (eds. J. A. Storer and M. Cohn), 32–41, IEEE Computer Society Press, Los Alamitos, CA.

    Google Scholar 

  • Cavanaugh, J. and Shumway, R. (1997). A bootstrap variant of AIC for state-space model selection, Statist. Sinica, 7, 473–496.

    Google Scholar 

  • Doukhan, P. (1994). Mixing. Properties and Examples, Lecture Notes in Statist., No. 85, Springer, New York.

    Google Scholar 

  • Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation, J. Amer. Statist. Assoc., 78, 316–331.

    Google Scholar 

  • Efron, B. (1986). How biased is the apparent error rate of a prediction rule, J. Amer. Statist. Assoc., 81, 461–470.

    Google Scholar 

  • Merhav, N., Gutman, M. and Ziv, J. (1989). On the estimation of the order of a Markov chain and universal data compression, IEEE Trans. Inform. Theory, IT-35, 1014–1019.

    Google Scholar 

  • Rissanen, J. (1983). A universal data compression system, IEEE Trans. Inform. Theory, IT-29, 656–664.

    Google Scholar 

  • Rissanen, J. (1986). Complexity of strings in the class of Markov sources, IEEE Trans. Inform. Theory, IT-32, 526–532.

    Google Scholar 

  • Rissanen, J. (1994). Noise separation and MDL modeling of chaotic processes, From Statistical Physics to Statistical Inference and Back (eds. P. Grassberger and J.-P. Nadal), 317–330. Kluwer, Dordrecht.

    Google Scholar 

  • Shibata, R. (1989). Statistical aspects of model selection, From Data to Model (ed. J. C. Willems), 215–240, Springer, New York.

    Google Scholar 

  • Shibata, R. (1997). Bootstrap estimate of Kullback-Leibler information for model selection, Statist. Sinica, 7, 375–394.

    Google Scholar 

  • Takeuchi, K. (1976). Distribution of informational statistics and a criterion of model fitting, Suri-Kagaku (Mathematical Sciences), 153, 12–18 (in Japanese).

    Google Scholar 

  • Tong, H. (1975). Determination of the order of a Markov chain by Akaike's information criterion, J. Appl. Probab., 12, 488–497.

    Google Scholar 

  • Weinberger, M. J. and Feder, M. (1994). Predictive stochastic complexity and model estimation for finite-state processes, J. Statist. Plann. Inference, 39, 353–372.

    Google Scholar 

  • Weinberger, M. J., Lempel, A. and Ziv, J. (1992). A sequential algorithm for the universal coding of finite memory sources, IEEE Trans. Inform. Theory, IT-38, 1002–1014.

    Google Scholar 

  • Weinberger, M. J., Rissanen, J. and Feder, M. (1995). A universal finite memory source, IEEE Trans. Inform. Theory, IT-41, 643–652.

    Google Scholar 

  • Weinberger, M. J., Rissanen, J. and Arps, R. B. (1996). Applications of universal context modeling to lossless compression of gray-scale images, IEEE Trans. Image Processing, IP-5, 575–586.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

About this article

Cite this article

Bühlmann, P. Model Selection for Variable Length Markov Chains and Tuning the Context Algorithm. Annals of the Institute of Statistical Mathematics 52, 287–315 (2000). https://doi.org/10.1023/A:1004165822461

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1004165822461

Navigation