Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring

Allenberg, Chamy; Auer, Peter; Györfi, László; Ottucsák, György

doi:10.1007/11894841_20

Chamy Allenberg²¹,
Peter Auer²²,
László Györfi²³ &
…
György Ottucsák²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4264))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

846 Accesses
12 Citations

Abstract

In this paper the sequential prediction problem with expert advice is considered when the loss is unbounded under partial monitoring scenarios. We deal with a wide class of the partial monitoring problems: the combination of the label efficient and multi-armed bandit problem, that is, where the algorithm is only informed about the performance of the chosen expert with probability ε≤1. For bounded losses an algorithm is given whose expected regret scales with the square root of the loss of the best expert. For unbounded losses we prove that Hannan consistency can be achieved, depending on the growth rate of the average squared losses of the experts.

We would like to thank Gilles Stoltz and András György for useful comments.

This research was supported in part by the Hungarian Inter-University Center for Telecommunications and Informatics (ETIK).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversial multi-armed bandit problem. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, FOCS 1995, Washington, DC, USA, October 1995, pp. 322–331. IEEE Computer Society Press, Los Alamitos, CA (1995)
Google Scholar
Blackwell, D.: An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics 6, 1–8 (1956)
MATH MathSciNet Google Scholar
Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., Haussler, D., Schapire, R., Warmuth, M.K.: How to use expert advice. Journal of the ACM 44(3), 427–485 (1997)
Article MATH MathSciNet Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)
Book MATH Google Scholar
Cesa-Bianchi, N., Lugosi, G., Stoltz, G.: Minimizing regret with label efficient prediction. IEEE Trans. Inform. Theory 51, 2152–2162 (2005)
Article MathSciNet Google Scholar
Cesa-Bianchi, N., Mansour, Y., Stoltz, G.: Improved second-order bounds for prediction with expert advice. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS, vol. 3559, pp. 217–232. Springer, Heidelberg (2005)
Chapter Google Scholar
Cesa-Bianchi, N., Mansour, Y., Stoltz, G.: Improved second-order bounds for prediction with expert advice (submitted, 2006)
Google Scholar
Chow, Y.S.: Local convergence of martingales and the law of large numbers. Annals of Mathematical Statistics 36, 552–558 (1965)
Article MATH MathSciNet Google Scholar
Györfi, L., Lugosi, G.: Strategies for sequential prediction of stationary time series. In: Dror, M., L’Ecuyer, P., Szidarovszky, F. (eds.) Modelling Uncertainty: An Examination of its Theory, Methods and Applications, pp. 225–248. Kluwer Academic Publishers, Dordrecht (2001)
Google Scholar
Györfi, L., Ottucsák, G.: Sequential prediction of unbounded stationary time series (submitted, 2006)
Google Scholar
György, A., Ottucsák, G.: Adaptive routing using expert advice. The Computer Journal 49(2), 180–189 (2006)
Article Google Scholar
Hannan, J.: Approximation to bayes risk in repeated plays. In: Dresher, M., Tucker, A., Wolfe, P. (eds.) Contributions to the Theory of Games, vol. 3, pp. 97–139. Princeton University Press, Princeton (1957)
Google Scholar
Hart, S., Mas-Colell, A.: A simple adaptive procedure leading to correlated equilibrium. Econometria 68(5), 181–200 (2002)
MathSciNet Google Scholar
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108, 212–261 (1994)
Article MATH MathSciNet Google Scholar
Poland, J., Hutter, M.: Defensive universal learning with experts. In: Jain, S., Simon, H.U., Tomita, E. (eds.) ALT 2005. LNCS, vol. 3734, pp. 356–370. Springer, Heidelberg (2005)
Chapter Google Scholar
Vovk, V.: Aggregating strategies. In: Proceedings of the 3rd Annual Workshop on Computational Learning Theory, Rochester, NY, pp. 372–383. Morgan Kaufmann, San Francisco (1990)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
Chamy Allenberg
Chair for Information Technology, University of Leoben, Leoben, A-8700, Austria
Peter Auer
Department of Computer Science and Information Theory, Budapest University of Technology and Economics, Magyar Tudósok körútja 2., Budapest, H-1117, Hungary
László Györfi & György Ottucsák

Authors

Chamy Allenberg
View author publications
You can also search for this author in PubMed Google Scholar
Peter Auer
View author publications
You can also search for this author in PubMed Google Scholar
László Györfi
View author publications
You can also search for this author in PubMed Google Scholar
György Ottucsák
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departament de Llenguatges i Sistemes Informàtics Laboratori d’Algorísmica Relacional, Complexitat i Aprenentatge, Universitat Politècnica de Catalunya, Barcelona,
José L. Balcázar
Google, 1600 Amphitheatre Parkway, 94043, Mountain View, CA, USA
Philip M. Long
Department of Computer Science and Department of Mathematics, National University of Singapore, 117543, Singapore, Republic of Singapore
Frank Stephan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Allenberg, C., Auer, P., Györfi, L., Ottucsák, G. (2006). Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring . In: Balcázar, J.L., Long, P.M., Stephan, F. (eds) Algorithmic Learning Theory. ALT 2006. Lecture Notes in Computer Science(), vol 4264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11894841_20

Download citation

DOI: https://doi.org/10.1007/11894841_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46649-9
Online ISBN: 978-3-540-46650-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics