# Tracking drifting concepts by minimizing disagreements

- 489 Downloads
- 39 Citations

## Abstract

In this paper we consider the problem of tracking a subset of a domain (called the*target*) which changes gradually over time. A single (unknown) probability distribution over the domain is used to generate random examples for the learning algorithm and measure the speed at which the target changes. Clearly, the more rapidly the target moves, the harder it is for the algorithm to maintain a good approximation of the target. Therefore we evaluate algorithms based on how much movement of the target can be tolerated between examples while predicting with accuracy ε. Furthermore, the complexity of the class*H* of possible targets, as measured by*d*, its VC-dimension, also effects the difficulty of tracking the target concept. We show that if the problem of minimizing the number of disagreements with a sample from among concepts in a class*H* can be approximated to within a factor*k*, then there is a simple tracking algorithm for*H* which can achieve a probability ε of making a mistake if the target movement rate is at most a constant times ε^{2}/(*k*(*d* +*k*) ln 1/ε), where*d* is the Vapnik-Chervonenkis dimension of*H*. Also, we show that if*H* is properly PAC-learnable, then there is an efficient (randomized) algorithm that with high probability approximately minimizes disagreements to within a factor of 7*d* + 1, yielding an efficient tracking algorithm for*H* which tolerates drift rates up to a constant times ε^{2}/(*d*^{2} ln 1/ε). In addition, we prove complementary results for the classes of halfspaces and axisaligned hyperrectangles showing that the maximum rate of drift that any algorithm (even with unlimited computational power) can tolerate is a constant times ε^{2}/*d*.

## Keywords

Computational learning theory concept drift concept learning## References

- M. Anthony, N. Biggs, and J. Shawe-Taylor, (1990). The learnability of formal concepts.
*The 1990 Workshop on Computational Learning Theory*, 246–257.Google Scholar - D. Angluin and L. Valiant, (1979). Fast probabilistic algorithms for Hamiltonion circuits and matchings.
*Journal of Computer and System Sciences*, 18(2):155–193.Google Scholar - D. Aldous and U. Vazirani, (1990). A Markovian extension of Valiant's learning model.
*Proceedings of the 31st Annual Symposium on the Foundations of Computer Science*, pages 392–396.Google Scholar - N. Abe and O. Watanabe, (1992). Polynomially sparse variations and reducibility among prediction problems.
*IEICE Trans. Inf. & Syst.*, E75-D(4):449–458, 1992.Google Scholar - A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth, (1989). Learnability and the Vapnik-Chervonenkis dimension.
*JACM*, 36(4):929–965.Google Scholar - A. Ehrenfeucht, D. Haussler, M. Kearns, and L.G. Valiant, (1989). A general lower bound on the number of examples needed for learning.
*Information and Computation*, 82(3):247–251.Google Scholar - D. Haussler, (1991). Decision theoretic generalizations of the PAC model for neural net and other learning applications. Technical Report UCSC-CRL-91-02, University of California at Santa Cruz.Google Scholar
- D.P. Helmbold and P.M. Long, (1991). Tracking drifting concepts using random examples.
*The 1991 Workshop on Computational Learning Theory*, pages 13–23.Google Scholar - D. Haussler, N. Littlestone, and M.K. Warmuth, (1988). Predicting {0, 1} functions on randomly drawn points.
*Proceedings of the 29th Annual Symposium on the Foundations of Computer Science*, pages 100–109.Google Scholar - David Haussler, Nick Littlestone, and Manfred Warmuth, (1990). Predicting {0, 1}-functions on randomly drawn points. Technical Report UCSC-CRL-90-54, University of California Santa Cruz. To appear in Information and Computation.Google Scholar
- T. Hagerup and C. Rub, (1990). A guided tour of Chernov bounds.
*Information Processing Letters*, 33:305–308.Google Scholar - M. Kearns and M. Li, (1988). Learning in the presence of malicious errors.
*Proceedings of the 20th ACM Symposium on the Theory of Computation*, pages 267–279.Google Scholar - T. Kuh, T. Petsche, and R. Rivest, (1990). Learning time varying concepts. In
*NIPS 3*. Morgan Kaufmann.Google Scholar - T. Kuh, T. Petsche, and R. Rivest, (1991). Mistake bounds of incremental learners when concepts drift with applications to feedforward networks. In
*NIPS 4*. Morgan Kaufmann.Google Scholar - N. Littlestone, (1989).
*Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms*. PhD thesis, UC Santa Cruz.Google Scholar - P.M. Long, (1992).
*Towards a more comprehensive theory of learning in computers*. PhD thesis, UC Santa Cruz.Google Scholar - N. Littlestone and M.K. Warmuth, (1989). The weighted majority algorithm.
*Proceedings of the 30th Annual Symposium on the Foundations of Computer Science*.Google Scholar - D. Pollard, (1984).
*Convergence of Stochastic Processes*. Springer Verlag.Google Scholar - L. Pitt and L.G. Valiant, (1988). Computational limitations on learning from examples.
*Journal of the Association for Computing Machinery*, 35(4):965–984.Google Scholar - L. Pitt and M.K. Warmuth, (1990). Prediction preserving reducibility.
*Journal of Computer and System Sciences*, 41(3).Google Scholar - L.G. Valiant, (1984). A theory of the learnable.
*Communications of the ACM*, 27(11):1134–1142.Google Scholar - V.N. Vapnik, (1982).
*Estimation of Dependencies based on Empirical Data*. Springer Verlag.Google Scholar - V.N. Vapnik, (1989). Inductive principles of the search for empirical dependences (methods based on weak convergence of probability measures).
*The 1989 Workshop on Computational Learning Theory*.Google Scholar - V.N. Vapnik and A.Y. Chervonenkis, (1971). On the uniform convergence of relative frequencies of events to their probabilities.
*Theory of Probability and its Applications*, 16(2):264–280.Google Scholar