New Analysis and Algorithm for Learning with Drifting Distributions

  • Mehryar Mohri
  • Andres Muñoz Medina
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7568)


We present a new analysis of the problem of learning with drifting distributions in the batch setting using the notion of discrepancy. We prove learning bounds based on the Rademacher complexity of the hypothesis set and the discrepancy of distributions both for a drifting PAC scenario and a tracking scenario. Our bounds are always tighter and in some cases substantially improve upon previous ones based on the L 1 distance. We also present a generalization of the standard on-line to batch conversion to the drifting scenario in terms of the discrepancy and arbitrary convex combinations of hypotheses. We introduce a new algorithm exploiting these learning guarantees, which we show can be formulated as a simple QP. Finally, we report the results of preliminary experiments demonstrating the benefits of this algorithm.


Drifting environment generalization bound domain adaptation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Valiant, L.G.: A theory of the learnable. ACM Press, New York (1984)Google Scholar
  2. 2.
    Vapnik, V.N.: Statistical Learning Theory. J. Wiley & Sons (1998)Google Scholar
  3. 3.
    Cesa-Bianchi, N., Lugosi, G.: Prediction, learning, and games. Cambridge University Press (2006)Google Scholar
  4. 4.
    Herbster, M., Warmuth, M.: Tracking the best expert. Machine Learning 32(2), 151–178 (1998)zbMATHCrossRefGoogle Scholar
  5. 5.
    Herbster, M., Warmuth, M.: Tracking the best linear predictor. Journal of Machine Learning Research 1, 281–309 (2001)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Cavallanti, G., Cesa-Bianchi, N., Gentile, C.: Tracking the best hyperplane with a simple budget perceptron. Machine Learning 69(2/3), 143–167 (2007)CrossRefGoogle Scholar
  7. 7.
    Helmbold, D.P., Long, P.M.: Tracking drifting concepts by minimizing disagreements. Machine Learning 14(1), 27–46 (1994)zbMATHGoogle Scholar
  8. 8.
    Bartlett, P.L.: Learning with a slowly changing distribution. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT 1992, pp. 243–252. ACM, New York (1992)CrossRefGoogle Scholar
  9. 9.
    Long, P.M.: The complexity of learning according to two models of a drifting environment. Machine Learning 37, 337–354 (1999)zbMATHCrossRefGoogle Scholar
  10. 10.
    Barve, R.D., Long, P.M.: On the complexity of learning from drifting distributions. Information and Computation 138(2), 101–123 (1997)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Freund, Y., Mansour, Y.: Learning under Persistent Drift. In: Ben-David, S. (ed.) EuroCOLT 1997. LNCS, vol. 1208, pp. 109–118. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  12. 12.
    Bartlett, P.L., Ben-David, S., Kulkarni, S.: Learning changing concepts by exploiting the structure of change. Machine Learning 41, 153–174 (2000)zbMATHCrossRefGoogle Scholar
  13. 13.
    Crammer, K., Even-Dar, E., Mansour, Y., Vaughan, J.W.: Regret minimization with concept drift. In: COLT, pp. 168–180 (2010)Google Scholar
  14. 14.
    Mansour, Y., Mohri, M., Rostamizadeh, A.: Domain adaptation: Learning bounds and algorithms. In: Proceedings of COLT. Omnipress, Montréal (2009)Google Scholar
  15. 15.
    Valiant, P.: Testing symmetric properties of distributions. SIAM J. Comput. 40(6), 1927–1968 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    Rakhlin, A., Sridharan, K., Tewari, A.: Online learning: Random averages, combinatorial parameters, and learnability (2010)Google Scholar
  17. 17.
    Cortes, C., Mohri, M.: Domain Adaptation in Regression. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds.) ALT 2011. LNCS, vol. 6925, pp. 308–323. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  18. 18.
    Dudley, R.M.: A course on empirical processes. Lecture Notes in Math., vol. 1097, pp. 2–142 (1984)Google Scholar
  19. 19.
    Pollard, D.: Convergence of Stochastic Processess. Springer, New York (1984)CrossRefGoogle Scholar
  20. 20.
    Talagrand, M.: The Generic Chaining. Springer, New York (2005)zbMATHGoogle Scholar
  21. 21.
    Littlestone, N.: From on-line to batch learning. In: Proceedings of the Second Annual Workshop on Computational Learning Theory, pp. 269–284. Morgan Kaufmann Publishers Inc. (1989)Google Scholar
  22. 22.
    Cesa-Bianchi, N., Conconi, A., Gentile, C.: On the generalization ability of on-line learning algorithms. In: NIPS, pp. 359–366 (2001)Google Scholar
  23. 23.
    Widrow, B., Hoff, M.E.: Adaptive switching circuits. Neurocomputing: Foundations of Research (1988)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Mehryar Mohri
    • 1
    • 2
  • Andres Muñoz Medina
    • 1
  1. 1.Courant Institute of Mathematical SciencesNew YorkUSA
  2. 2.Google ResearchNew YorkUSA

Personalised recommendations