Abstract
In this paper we consider the problem of tracking a subset of a domain (called the target) which changes gradually over time. A single (unknown) probability distribution over the domain is used to generate random examples for the learning algorithm and measure the speed at which the target changes. Clearly, the more rapidly the target moves, the harder it is for the algorithm to maintain a good approximation of the target. Therefore we evaluate algorithms based on how much movement of the target can be tolerated between examples while predicting with accuracy ε Furthermore, the complexity of the class \(\mathcal{H}\) of possible targets, as measured by d, its VC-dimension, also effects the difficulty of tracking the target concept. We show that if the problem of minimizing the number of disagreements with a sample from among concepts in a class \(\mathcal{H}\) can be approximated to within a factor k, then there is a simple tracking algorithm for \(\mathcal{H}\) which can achieve a probability ε of making a mistake if the target movement rate is at most a constant times \( \in ^2 /(k(d + k)\ln \frac{1}{ \in })\), where d is the Vapnik-Chervonenkis dimension of \(\mathcal{H}\). Also, we show that if \(\mathcal{H}\) is properly PAC-learnable, then there is an efficient (randomized) algorithm that with high probability approximately minimizes disagreements to within a factor of 7d + 1, yielding an efficient tracking algorithm for \(\mathcal{H}\) which tolerates drift rates up to a constant times \( \in ^2 /(d^2 \ln \frac{1}{ \in })\). In addition, we prove complementary results for the classes of halfspaces and axis-aligned hyperrectangles showing that the maximum rate of drift that any algorithm (even with unlimited computational power) can tolerate is a constant times ε2/d.
Article PDF
Similar content being viewed by others
References
M. Anthony, N. Biggs, and J. Shawe-Taylor, (1990). The learnability of formal concepts. The 1990 Workshop on Computational Learning Theory, 246–257.
D. Angluin and L. Valiant, (1979). Fast probabilistic algorithms for Hamiltonion circuits and matchings. Journal of Computer and System Sciences, 18 (2):155–193.
D. Aldous and U. Vazirani, (1990). A Markovian extension of Valiant's learning model. Proceedings of the 31st Annual Symposium on the Foundations of Computer Science, pages 392–396.
N. Abe and O. Watanabe, (1992). Polynomially sparse variations and reducibility among prediction problems. IEICE Trans. Inf. & Syst., E75-D (4):449–458, 1992.
A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth, (1989). Learnability and the Vapnik-Chervonenkis dimension. JACM, 36 (4):929–965.
A. Ehrenfeucht, D. Haussler, M. Kearns, and L.G. Valiant, (1989). A general lower bound on the number of examples needed for learning. Information and Computation, 82 (3):247–251.
D. Haussler, (1991). Decision theoretic generalizations of the PAC model for neural net and other learning applications. Technical Report UCSC-CRL-91-02, University of California at Santa Cruz.
D.P. Helmbold and P.M. Long, (1991). Tracking drifting concepts using random examples. The 1991 Workshop on Computational Learning Theory, pages 13–23.
D. Haussler, N. Littlestone, and M.K. Warmuth, (1988). Predicting {0, 1} functions on randomly drawn points. Proceedings of the 29th Annual Symposium on the Foundations of Computer Science, pages 100–109.
David Haussler, Nick Littlestone, and Manfred Warmuth, (1990). Predicting {0, 1}-functions on randomly drawn points. Technical Report UCSC-CRL-90-54, University of California Santa Cruz. To appear in Information and Computation.
T. Hagerup and C. Rub, (1990). A guided tour of Chernov bounds. Information Processing Letters, 33:305–308.
M. Kearns and M. Li, (1988). Learning in the presence of malicious errors. Proceedings of the 20th ACM Symposium on the Theory of Computation, pages 267–279.
T. Kuh, T. Petsche, and R. Rivest, (1990). Learning time varying concepts. In NIPS 3. Morgan Kaufmann.
T. Kuh, T. Petsche, and R. Rivest, (1991). Mistake bounds of incremental learners when concepts drift with applications to feedforward networks. In NIPS 4. Morgan Kaufmann.
N. Littlestone, (1989). Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, UC Santa Cruz.
P.M. Long, (1992). Towards a more comprehensive theory of learning in computers. PhD thesis, UC Santa Cruz.
N. Littlestone and M.K. Warmuth, (1989). The weighted majority algorithm. Proceedings of the 30th Annual Symposium on the Foundations of Computer Science.
D. Pollard, (1984). Convergence of Stochastic Processes. Springer Verlag.
L. Pitt and L.G. Valiant, (1988). Computational limitations on learning from examples. Journal of the Association for Computing Machinery, 35 (4):965–984.
L. Pitt and M.K. Warmuth, (1990). Prediction preserving reducibility. Journal of Computer and System Sciences, 41 (3).
L.G. Valiant, (1984). A theory of the learnable. Communications of the ACM, 27 (11):1134–1142.
V.N. Vapnik, (1982). Estimation of Dependencies based on Empirical Data. Springer Verlag.
V.N. Vapnik, (1989). Inductive principles of the search for empirical dependences (methods based on weak convergence of probability measures). The 1989 Workshop on Computational Learning Theory.
V.N. Vapnik and A.Y. Chervonenkis, (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16 (2):264–280.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Helmbold, D.P., Long, P.M. Tracking Drifting Concepts By Minimizing Disagreements. Machine Learning 14, 27–45 (1994). https://doi.org/10.1023/A:1022694620923
Issue Date:
DOI: https://doi.org/10.1023/A:1022694620923