Re-adapting the Regularization of Weights for Non-stationary Regression

  • Nina Vaits
  • Koby Crammer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6925)


The goal of a learner in standard online learning is to have the cumulative loss not much larger compared with the best-performing prediction-function from some fixed class. Numerous algorithms were shown to have this gap arbitrarily close to zero compared with the best function that is chosen off-line. Nevertheless, many real-world applications (such as adaptive filtering) are non-stationary in nature and the best prediction function may not be fixed but drift over time. We introduce a new algorithm for regression that uses per-feature-learning rate and provide a regret bound with respect to the best sequence of functions with drift. We show that as long as the cumulative drift is sub-linear in the length of the sequence our algorithm suffers a regret that is sub-linear as well. We also sketch an algorithm that achieves the best of the two worlds: in the stationary settings has log(T) regret, while in the non-stationary settings has sub-linear regret. Simulations demonstrate the usefulness of our algorithm compared with other state-of-the-art approaches.


Online Learning Less Mean Square Recursive Little Square Recursive Little Square Algorithm Normalize Little Mean Square 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Auer, P., Warmuth, M.K.: Tracking the best disjunction. Electronic Colloquium on Computational Complexity (ECCC) 7(70) (2000)Google Scholar
  2. 2.
    Bershad, N.J.: Analysis of the normalized lms algorithm with gaussian inputs. IEEE Transactions on Acoustics, Speech, and Signal Processing 34(4), 793–806 (1986)CrossRefGoogle Scholar
  3. 3.
    Cavallanti, G., Cesa-Bianchi, N., Gentile, C.: Tracking the best hyperplane with a simple budget perceptron. Machine Learning 69(2-3), 143–167 (2007)CrossRefGoogle Scholar
  4. 4.
    Ceas-Bianchi, N., Long, P.M., Warmuth, M.K.: Worst case quadratic loss bounds for on-line prediction of linear functions by gradient descent. Technical Report IR-418, University of California, Santa Cruz, CA, USA (1993)Google Scholar
  5. 5.
    Cesa-Bianchi, N., Conconi, A., Gentile, C.: A second-order perceptron algorithm. Siam Journal of Commutation 34(3), 640–668 (2005)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York (2006)CrossRefMATHGoogle Scholar
  7. 7.
    Chen, M.-S., Yen, J.-Y.: Application of the least squares algorithm to the observer design for linear time-varying systems. IEEE Transactions on Automatic Control 44(9), 1742–1745 (1999)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Crammer, K., Dredze, M., Pereira, F.: Exact confidence-weighted learning. In: NIPS, vol. 22 (2008)Google Scholar
  9. 9.
    Crammer, K., Kulesza, A., Dredze, M.: Adaptive regularization of weighted vectors. In: Advances in Neural Information Processing Systems, vol. 23 (2009)Google Scholar
  10. 10.
    Dredze, M., Crammer, K., Pereira, F.: Confidence-weighted linear classification. In: ICML (2008)Google Scholar
  11. 11.
    Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. In: COLT, pp. 257–269 (2010)Google Scholar
  12. 12.
    Feuer, A., Weinstein, E.: Convergence analysis of lms filters with uncorrelated Gaussian data. IEEE Transactions on Acoustics, Speech, and Signal Processing 33(1), 222–230 (1985)CrossRefGoogle Scholar
  13. 13.
    Forster, J.: On relative loss bounds in generalized linear regression. In: Ciobanu, G., Păun, G. (eds.) FCT 1999. LNCS, vol. 1684, pp. 269–280. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  14. 14.
    Foster, D.P.: Prediction in the worst case. The Annals of Statistics 19(2), 1084–1090 (1991)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Golub, G.H., Van Loan, C.F.: Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)MATHGoogle Scholar
  16. 16.
    Goodhart, S.G., Burnham, K.J., James, D.J.G.: Logical covariance matrix reset in self-tuning control. Mechatronics 1(3), 339–351 (1991)CrossRefGoogle Scholar
  17. 17.
    Goodwin, G.C., Teoh, E.K., Elliott, H.: Deterministic convergence of a self-tuning regulator with covariance resetting. Control Theory and App., IEE Proc. D 130(1), 6–8 (1983)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Vovk, V.G.: Aggregating strategies. In: Proceedings of the Third Annual Workshop on Computational Learning Theory, pp. 371–383. Morgan Kaufmann, San Francisco (1990)Google Scholar
  19. 19.
    Hayes, M.H.: 9.4: Recursive least squares. In: Statistical Digital Signal Processing and Modeling, p. 541 (1996)Google Scholar
  20. 20.
    Herbster, M., Warmuth, M.K.: Tracking the best linear predictor. Journal of Machine Learning Research 1, 281–309 (2001)MathSciNetMATHGoogle Scholar
  21. 21.
    Itmead, R.R., Anderson, B.D.O.: Performance of adaptive estimation algorithms in dependent random environments. IEEE Transactions on Automatic Control 25, 788–794 (1980)CrossRefMATHGoogle Scholar
  22. 22.
    Kivinen, J., Warmuth, M.K.: Exponential gradient versus gradient descent for linear predictors. Information and Computation 132, 132–163 (1997)CrossRefMATHGoogle Scholar
  23. 23.
    Kivinen, J., Smola, A.J., Williamson, R.C.: Online learning with kernels. In: NIPS, pp. 785–792 (2001)Google Scholar
  24. 24.
    Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Comput. 108(2), 212–261 (1994)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    McMahan, H.B., Streeter, M.J.: Adaptive bound optimization for online convex optimization. In: COLT, pp. 244–256 (2010)Google Scholar
  26. 26.
    Salgado, M.E., Goodwin, G.C., Middleton, R.H.: Modified least squares algorithm incorporating exponential resetting and forgetting. International Journal of Control 47(2), 477–491 (1988)CrossRefMATHGoogle Scholar
  27. 27.
    Song, H.-S., Nam, K., Mutschler, P.: Very fast phase angle estimation algorithm for a single-phase system having sudden phase angle jumps. In: Industry Applications Conference. 37th IAS Annual Meeting, vol. 2, pp. 925–931 (2002)Google Scholar
  28. 28.
    Widrow, B., Hoff Jr., M.E.: Adaptive switching circuits (1960)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Nina Vaits
    • 1
  • Koby Crammer
    • 1
  1. 1.Department of Electrical EngneeringThe TechnionHaifaIsrael

Personalised recommendations