Abstract
As a model for an on-line classification setting we consider a stochastic process \((X_{-n},Y_{-n})_{n}\), the present time-point being denoted by 0, with observables \(\ldots ,X_{-n},X_{-n+1}, \ldots , X_{-1}, X_0\) from which the pattern \(Y_0\) is to be inferred. So in this classification setting, in addition to the present observation \(X_0\) a number l of preceding observations may be used for classification, thus taking a possible dependence structure into account as it occurs e.g. in an ongoing classification of handwritten characters. We treat the question how the performance of classifiers is improved by using such additional information. For our analysis, a hidden Markov model is used. Letting \(R_l\) denote the minimal risk of misclassification using l preceding observations we show that the difference \(\sup _k |R_l - R_{l+k}|\) decreases exponentially fast as l increases. This suggests that a small l might already lead to a noticeable improvement. To follow this point we look at the use of past observations for kernel classification rules. Our practical findings in simulated hidden Markov models and in the classification of handwritten characters indicate that using \(l=1\), i.e. just the last preceding observation in addition to \(X_0\), can lead to a substantial reduction of the risk of misclassification. So, in the presence of stochastic dependencies, we advocate to use \( X_{-1},X_0\) for finding the pattern \(Y_0\) instead of only \(X_0\) as one would in the independent situation.
Similar content being viewed by others
References
Cappé O, Moulines E, Rydén T (2005) Inference in hidden Markov models, vol 6. Springer, New York
Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition, vol 31., Applications of mathematics (New York). Springer-Verlag, New York
Györfi L, Härdle W, Sarda P, Vieu P (1989) Nonparametric curve estimation from time series, vol 60. Springer-Verlag, Berlin
Györfi L, Kohler M, Krzyżak A, Walk H (2002) A distribution-free theory of nonparametric regression. Springer Series in Statistics. Springer-Verlag, New York
Holst M, Irle A (2001) Nearest neighbor classification with dependent training sequences. Ann. Statist. 29(5):1424–1442
Huang XD, Ariki Y, Jack MA (1990) Hidden Markov models for speech recognition, vol 2004. Edinburgh university press, Edinburgh
Institute for Defense Analyses. (1980) Communications Research Division and John D Ferguson. Symposium on the Application of Hidden Markov Models to Text and Speech. Institute for Defense Analyses, Communications Research Division
Irle A (1997) On consistency in nonparametric estimation under mixing conditions. J Multivar Anal 60(1):123–147
MacDonald IL, Zucchini W (1997) Monographs on statistics and applied probability, In: Cox DR, Hinkley DV, Rubin D, Silverman BW (eds) Hidden Markov and other models for discrete-valued time series, vol 70. Chapman & Hall, London
Meyn SP, Tweedie RL (2012) Markov chains and stochastic stability. Springer Science & Business Media, New York
Ryabko D (2006) Pattern recognition for conditionally independent data. J Mach Learn Res 7:645–664
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Christensen, S., Irle, A. & Willert, L. Classification error in multiclass discrimination from Markov data. Stat Inference Stoch Process 19, 321–336 (2016). https://doi.org/10.1007/s11203-015-9129-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11203-015-9129-6