Abstract
This paper shows how universal learning can be achieved with expert advice. To this aim, we specify an experts algorithm with the following characteristics: (a) it uses only feedback from the actions actually chosen (bandit setup), (b) it can be applied with countably infinite expert classes, and (c) it copes with losses that may grow in time appropriately slowly. We prove loss bounds against an adaptive adversary. From this, we obtain a master algorithm for “reactive” experts problems, which means that the master’s actions may influence the behavior of the adversary. Our algorithm can significantly outperform standard experts algorithms on such problems. Finally, we combine it with a universal expert class. The resulting universal learner performs – in a certain sense – almost as well as any computable strategy, for any online decision problem. We also specify the (worst-case) convergence speed, which is very slow.
Keywords
- Neural Information Processing System
- Expert Advice
- Repeated Game
- Bandit Problem
- Universal Turing Machine
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
de Farias, D.P., Megiddo, N.: How to combine expert (and novice) advice when actions impact the environment? In: Advances in Neural Information Processing Systems (NIPS) 16. MIT Press, Cambridge (2004)
Hannan, J.: Approximation to Bayes risk in repeated plays. In: Dresher, M., Tucker, A.W., Wolfe, P. (eds.) Contributions to the Theory of Games 3, pp. 97–139. Princeton University Press, Princeton (1957)
Kalai, A., Vempala, S.: Efficient algorithms for online decision. In: Proc. 16th Annual Conference on Learning Theory (COLT), pp. 506–521. Springer, Heidelberg (2003)
McMahan, H.B., Blum, A.: Online geometric optimization in the bandit setting against an adaptive adversary. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 109–123. Springer, Heidelberg (2004)
Hutter, M., Poland, J.: Prediction with expert advice by following the perturbed leader for general weights. In: International Conference on Algorithmic Learning Theory (ALT), pp. 279–293 (2004)
Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004)
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proc. 36th Annual Symposium on Foundations of Computer Science (FOCS), pp. 322–331. IEEE, Los Alamitos (1995)
Poland, J.: FPL analysis for adaptive bandits. In: 3rd Symposium on Stochastic Algorithms, Foundations and Applications, SAGA (2005) (to appear)
Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32, 48–77 (2002)
Hutter, M., Poland, J.: Adaptive online prediction by following the perturbed leader. Journal of Machine Learning Research 6, 639–660 (2005)
Solomonoff, R.J.: Complexity-based induction systems: comparisons and convergence theorems. IEEE Trans. Inform. Theory 24, 422–432 (1978)
Hutter, M.: Towards a universal theory of artificial intelligence based on algorithmic probability and sequential decisions. In: Proc. 12th European Conference on Machine Learning (ECML-2001), pp. 226–238 (2001)
de Farias, D.P., Megiddo, N.: Exploration-exploitation tradeoffs for experts algorithms in reactive environments. In: Advances in Neural Information Processing Systems 17 (2005)
Cesa-Bianchi, N., Lugosi, G., Stoltz, G.: Minimizing regret with label efficient prediction. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 77–92. Springer, Heidelberg (2004)
Cesa-Bianchi, N., Lugosi, G., Stoltz, G.: Regret minimization under partial monitoring. Technical report (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Poland, J., Hutter, M. (2005). Defensive Universal Learning with Experts. In: Jain, S., Simon, H.U., Tomita, E. (eds) Algorithmic Learning Theory. ALT 2005. Lecture Notes in Computer Science(), vol 3734. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564089_28
Download citation
DOI: https://doi.org/10.1007/11564089_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29242-5
Online ISBN: 978-3-540-31696-1
eBook Packages: Computer ScienceComputer Science (R0)