Non-Parametric Change-Point Estimation using String Matching Algorithms
- 226 Downloads
Given the output of a data source taking values in a finite alphabet, we wish to estimate change-points, that is times when the statistical properties of the source change. Motivated by ideas of match lengths in information theory, we introduce a novel non-parametric estimator which we call CRECHE (CRossings Enumeration CHange Estimator). We present simulation evidence that this estimator performs well, both for simulated sources and for real data formed by concatenating text sources. For example, we show that we can accurately estimate the point at which a source changes from a Markov chain to an IID source with the same stationary distribution. Our estimator requires no assumptions about the form of the source distribution, and avoids the need to estimate its probabilities. Further, establishing a fluid limit and using martingale arguments.
KeywordsChange-point estimation Entropy Non-parametric String matching
AMS 2000 Subject ClassificationsPrimary 62L10 Secondary 62M09 68W32
Unable to display preview. Download preview PDF.
- Brodsky BE, Darkhovsky BS (1993) Nonparametric methods in change-point problems. In: Mathematics and its applications, vol 243. Kluwer Academic Publishers Group, DordrechtGoogle Scholar
- Brodsky BE, Darkhovsky BS (2000) Non-parametric statistical diagnosis. In: Mathematics and its applications, vol 509. Kluwer Academic Publishers, DordrechtGoogle Scholar
- Darling RWR (2002) Fluid limits of pure jump markov processes: a practical guide. arXiv:math/0210109
- Kim H, Rozovskii BL, Tartakovsky AG (2004) A nonparametric multichart CUSUM test for rapid detection of DOS attacks in computer networks. Int J Comput Inform Sci 2(3):149–158Google Scholar
- Kontoyiannis I, Suhov YM (1993) Prefixes and the entropy rate for long-range sources. In: Kelly FP (ed) Probability, statistics and optimisation. Wiley, New York, pp 89–98Google Scholar
- Rényi A (1956) A characterization of poisson processes. Magyar Tud Akad Mat Kutató Int Közl 1:519–527Google Scholar