Efficient Algorithms for Online Decision Problems
In an online decision problem, one makes a sequence of decisions without knowledge of the future. Tools from learning such as Weighted Majority and its many variants [4, 13, 18] demonstrate that online algorithms can perform nearly as well as the best single decision chosen in hindsight, even when there are exponentially many possible decisions. However, the naive application of these algorithms is inefficient for such large problems. For some problems with nice structure, specialized efficient solutions have been developed [3, 6, 10, 16, 17].
We show that a very simple idea, used in Hannan’s seminal 1957 paper , gives efficient solutions to all of these problems. Essentially, in each period, one chooses the decision that worked best in the past. To guarantee low regret, it is necessary to add randomness. Surprisingly, this simple approach gives additive ε regret per period, efficiently. We present a simple general analysis and several extensions, including a (1+ε)-competitive algorithm as well as a lazy one that rarely switches between decisions.
Unable to display preview. Download preview PDF.
- 1.Blum, A.: On-line algorithms in machine learning. Technical Report CMU-CS-97- 163, Carnegie Mellon University (1997)Google Scholar
- 3.Blum, A., Chawla, S., Kalai, A.: Static Optimality and Dynamic Search Optimality in Lists and Trees. In: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2002) (2002)Google Scholar
- 6.Freund, Y., Schapire, R., Singer, Y., Warmuth, M.: Using and combining predictors that specialize. In: Proceedings of the 29th Annual ACM Symposium on the Theory of Computing, pp. 334–343 (1997)Google Scholar
- 9.Hannan, J.: Approximation to Bayes risk in repeated plays. In: Dresher, M., Tucker, A., Wolfe, P. (eds.) Contributions to the Theory of Games, vol. 3, pp. 97–139. Princeton University Press, Princeton (1957)Google Scholar
- 11.Kalai, A., Vempala, S.: Geometric algorithms for online optimization. MIT Technical report MIT-LCS-TR-861 (2002)Google Scholar
- 16.Takimoto, E., Warmuth, M.: Path Kernels and Multiplicative Updates. In: Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pp. 74–89 (2002)Google Scholar
- 18.Vovk, V.: Aggregating strategies. In: Proc. 3rd Ann. Workshop on Computational Learning Theory, pp. 371–383 (1990)Google Scholar
- 19.Zinkevich, M.: Online Convex Programming and Generalized Infinitesimal Gradient Ascent. CMU Technical Report CMU-CS-03-110 (2003)Google Scholar