Abstract
The goal of a learner in standard online learning is to have the cumulative loss not much larger compared with the best-performing prediction-function from some fixed class. Numerous algorithms were shown to have this gap arbitrarily close to zero compared with the best function that is chosen off-line. Nevertheless, many real-world applications (such as adaptive filtering) are non-stationary in nature and the best prediction function may not be fixed but drift over time. We introduce a new algorithm for regression that uses per-feature-learning rate and provide a regret bound with respect to the best sequence of functions with drift. We show that as long as the cumulative drift is sub-linear in the length of the sequence our algorithm suffers a regret that is sub-linear as well. We also sketch an algorithm that achieves the best of the two worlds: in the stationary settings has log(T) regret, while in the non-stationary settings has sub-linear regret. Simulations demonstrate the usefulness of our algorithm compared with other state-of-the-art approaches.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Auer, P., Warmuth, M.K.: Tracking the best disjunction. Electronic Colloquium on Computational Complexity (ECCC) 7(70) (2000)
Bershad, N.J.: Analysis of the normalized lms algorithm with gaussian inputs. IEEE Transactions on Acoustics, Speech, and Signal Processing 34(4), 793–806 (1986)
Cavallanti, G., Cesa-Bianchi, N., Gentile, C.: Tracking the best hyperplane with a simple budget perceptron. Machine Learning 69(2-3), 143–167 (2007)
Ceas-Bianchi, N., Long, P.M., Warmuth, M.K.: Worst case quadratic loss bounds for on-line prediction of linear functions by gradient descent. Technical Report IR-418, University of California, Santa Cruz, CA, USA (1993)
Cesa-Bianchi, N., Conconi, A., Gentile, C.: A second-order perceptron algorithm. Siam Journal of Commutation 34(3), 640–668 (2005)
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York (2006)
Chen, M.-S., Yen, J.-Y.: Application of the least squares algorithm to the observer design for linear time-varying systems. IEEE Transactions on Automatic Control 44(9), 1742–1745 (1999)
Crammer, K., Dredze, M., Pereira, F.: Exact confidence-weighted learning. In: NIPS, vol. 22 (2008)
Crammer, K., Kulesza, A., Dredze, M.: Adaptive regularization of weighted vectors. In: Advances in Neural Information Processing Systems, vol. 23 (2009)
Dredze, M., Crammer, K., Pereira, F.: Confidence-weighted linear classification. In: ICML (2008)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. In: COLT, pp. 257–269 (2010)
Feuer, A., Weinstein, E.: Convergence analysis of lms filters with uncorrelated Gaussian data. IEEE Transactions on Acoustics, Speech, and Signal Processing 33(1), 222–230 (1985)
Forster, J.: On relative loss bounds in generalized linear regression. In: Ciobanu, G., Păun, G. (eds.) FCT 1999. LNCS, vol. 1684, pp. 269–280. Springer, Heidelberg (1999)
Foster, D.P.: Prediction in the worst case. The Annals of Statistics 19(2), 1084–1090 (1991)
Golub, G.H., Van Loan, C.F.: Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
Goodhart, S.G., Burnham, K.J., James, D.J.G.: Logical covariance matrix reset in self-tuning control. Mechatronics 1(3), 339–351 (1991)
Goodwin, G.C., Teoh, E.K., Elliott, H.: Deterministic convergence of a self-tuning regulator with covariance resetting. Control Theory and App., IEE Proc. D 130(1), 6–8 (1983)
Vovk, V.G.: Aggregating strategies. In: Proceedings of the Third Annual Workshop on Computational Learning Theory, pp. 371–383. Morgan Kaufmann, San Francisco (1990)
Hayes, M.H.: 9.4: Recursive least squares. In: Statistical Digital Signal Processing and Modeling, p. 541 (1996)
Herbster, M., Warmuth, M.K.: Tracking the best linear predictor. Journal of Machine Learning Research 1, 281–309 (2001)
Itmead, R.R., Anderson, B.D.O.: Performance of adaptive estimation algorithms in dependent random environments. IEEE Transactions on Automatic Control 25, 788–794 (1980)
Kivinen, J., Warmuth, M.K.: Exponential gradient versus gradient descent for linear predictors. Information and Computation 132, 132–163 (1997)
Kivinen, J., Smola, A.J., Williamson, R.C.: Online learning with kernels. In: NIPS, pp. 785–792 (2001)
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Comput. 108(2), 212–261 (1994)
McMahan, H.B., Streeter, M.J.: Adaptive bound optimization for online convex optimization. In: COLT, pp. 244–256 (2010)
Salgado, M.E., Goodwin, G.C., Middleton, R.H.: Modified least squares algorithm incorporating exponential resetting and forgetting. International Journal of Control 47(2), 477–491 (1988)
Song, H.-S., Nam, K., Mutschler, P.: Very fast phase angle estimation algorithm for a single-phase system having sudden phase angle jumps. In: Industry Applications Conference. 37th IAS Annual Meeting, vol. 2, pp. 925–931 (2002)
Widrow, B., Hoff Jr., M.E.: Adaptive switching circuits (1960)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vaits, N., Crammer, K. (2011). Re-adapting the Regularization of Weights for Non-stationary Regression. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds) Algorithmic Learning Theory. ALT 2011. Lecture Notes in Computer Science(), vol 6925. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24412-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-24412-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24411-7
Online ISBN: 978-3-642-24412-4
eBook Packages: Computer ScienceComputer Science (R0)