Skip to main content
Log in

Classification error in multiclass discrimination from Markov data

  • Published:
Statistical Inference for Stochastic Processes Aims and scope Submit manuscript

Abstract

As a model for an on-line classification setting we consider a stochastic process \((X_{-n},Y_{-n})_{n}\), the present time-point being denoted by 0, with observables \(\ldots ,X_{-n},X_{-n+1}, \ldots , X_{-1}, X_0\) from which the pattern \(Y_0\) is to be inferred. So in this classification setting, in addition to the present observation \(X_0\) a number l of preceding observations may be used for classification, thus taking a possible dependence structure into account as it occurs e.g. in an ongoing classification of handwritten characters. We treat the question how the performance of classifiers is improved by using such additional information. For our analysis, a hidden Markov model is used. Letting \(R_l\) denote the minimal risk of misclassification using l preceding observations we show that the difference \(\sup _k |R_l - R_{l+k}|\) decreases exponentially fast as l increases. This suggests that a small l might already lead to a noticeable improvement. To follow this point we look at the use of past observations for kernel classification rules. Our practical findings in simulated hidden Markov models and in the classification of handwritten characters indicate that using \(l=1\), i.e. just the last preceding observation in addition to \(X_0\), can lead to a substantial reduction of the risk of misclassification. So, in the presence of stochastic dependencies, we advocate to use \( X_{-1},X_0\) for finding the pattern \(Y_0\) instead of only \(X_0\) as one would in the independent situation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Cappé O, Moulines E, Rydén T (2005) Inference in hidden Markov models, vol 6. Springer, New York

    MATH  Google Scholar 

  • Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition, vol 31., Applications of mathematics (New York). Springer-Verlag, New York

  • Györfi L, Härdle W, Sarda P, Vieu P (1989) Nonparametric curve estimation from time series, vol 60. Springer-Verlag, Berlin

    MATH  Google Scholar 

  • Györfi L, Kohler M, Krzyżak A, Walk H (2002) A distribution-free theory of nonparametric regression. Springer Series in Statistics. Springer-Verlag, New York

    Book  MATH  Google Scholar 

  • Holst M, Irle A (2001) Nearest neighbor classification with dependent training sequences. Ann. Statist. 29(5):1424–1442

    Article  MathSciNet  MATH  Google Scholar 

  • Huang XD, Ariki Y, Jack MA (1990) Hidden Markov models for speech recognition, vol 2004. Edinburgh university press, Edinburgh

    Google Scholar 

  • Institute for Defense Analyses. (1980) Communications Research Division and John D Ferguson. Symposium on the Application of Hidden Markov Models to Text and Speech. Institute for Defense Analyses, Communications Research Division

  • Irle A (1997) On consistency in nonparametric estimation under mixing conditions. J Multivar Anal 60(1):123–147

    Article  MathSciNet  MATH  Google Scholar 

  • MacDonald IL, Zucchini W (1997) Monographs on statistics and applied probability, In: Cox DR, Hinkley DV, Rubin D, Silverman BW (eds) Hidden Markov and other models for discrete-valued time series, vol 70. Chapman & Hall, London

    Google Scholar 

  • Meyn SP, Tweedie RL (2012) Markov chains and stochastic stability. Springer Science & Business Media, New York

    MATH  Google Scholar 

  • Ryabko D (2006) Pattern recognition for conditionally independent data. J Mach Learn Res 7:645–664

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sören Christensen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Christensen, S., Irle, A. & Willert, L. Classification error in multiclass discrimination from Markov data. Stat Inference Stoch Process 19, 321–336 (2016). https://doi.org/10.1007/s11203-015-9129-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11203-015-9129-6

Keywords

Navigation