Skip to main content

Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring

  • Conference paper
Algorithmic Learning Theory (ALT 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4264))

Included in the following conference series:

Abstract

In this paper the sequential prediction problem with expert advice is considered when the loss is unbounded under partial monitoring scenarios. We deal with a wide class of the partial monitoring problems: the combination of the label efficient and multi-armed bandit problem, that is, where the algorithm is only informed about the performance of the chosen expert with probability ε≤1. For bounded losses an algorithm is given whose expected regret scales with the square root of the loss of the best expert. For unbounded losses we prove that Hannan consistency can be achieved, depending on the growth rate of the average squared losses of the experts.

We would like to thank Gilles Stoltz and András György for useful comments.

This research was supported in part by the Hungarian Inter-University Center for Telecommunications and Informatics (ETIK).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversial multi-armed bandit problem. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, FOCS 1995, Washington, DC, USA, October 1995, pp. 322–331. IEEE Computer Society Press, Los Alamitos, CA (1995)

    Google Scholar 

  2. Blackwell, D.: An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics 6, 1–8 (1956)

    MATH  MathSciNet  Google Scholar 

  3. Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., Haussler, D., Schapire, R., Warmuth, M.K.: How to use expert advice. Journal of the ACM 44(3), 427–485 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  4. Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)

    Book  MATH  Google Scholar 

  5. Cesa-Bianchi, N., Lugosi, G., Stoltz, G.: Minimizing regret with label efficient prediction. IEEE Trans. Inform. Theory 51, 2152–2162 (2005)

    Article  MathSciNet  Google Scholar 

  6. Cesa-Bianchi, N., Mansour, Y., Stoltz, G.: Improved second-order bounds for prediction with expert advice. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS, vol. 3559, pp. 217–232. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Cesa-Bianchi, N., Mansour, Y., Stoltz, G.: Improved second-order bounds for prediction with expert advice (submitted, 2006)

    Google Scholar 

  8. Chow, Y.S.: Local convergence of martingales and the law of large numbers. Annals of Mathematical Statistics 36, 552–558 (1965)

    Article  MATH  MathSciNet  Google Scholar 

  9. Györfi, L., Lugosi, G.: Strategies for sequential prediction of stationary time series. In: Dror, M., L’Ecuyer, P., Szidarovszky, F. (eds.) Modelling Uncertainty: An Examination of its Theory, Methods and Applications, pp. 225–248. Kluwer Academic Publishers, Dordrecht (2001)

    Google Scholar 

  10. Györfi, L., Ottucsák, G.: Sequential prediction of unbounded stationary time series (submitted, 2006)

    Google Scholar 

  11. György, A., Ottucsák, G.: Adaptive routing using expert advice. The Computer Journal 49(2), 180–189 (2006)

    Article  Google Scholar 

  12. Hannan, J.: Approximation to bayes risk in repeated plays. In: Dresher, M., Tucker, A., Wolfe, P. (eds.) Contributions to the Theory of Games, vol. 3, pp. 97–139. Princeton University Press, Princeton (1957)

    Google Scholar 

  13. Hart, S., Mas-Colell, A.: A simple adaptive procedure leading to correlated equilibrium. Econometria 68(5), 181–200 (2002)

    MathSciNet  Google Scholar 

  14. Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108, 212–261 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  15. Poland, J., Hutter, M.: Defensive universal learning with experts. In: Jain, S., Simon, H.U., Tomita, E. (eds.) ALT 2005. LNCS, vol. 3734, pp. 356–370. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Vovk, V.: Aggregating strategies. In: Proceedings of the 3rd Annual Workshop on Computational Learning Theory, Rochester, NY, pp. 372–383. Morgan Kaufmann, San Francisco (1990)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Allenberg, C., Auer, P., Györfi, L., Ottucsák, G. (2006). Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring . In: Balcázar, J.L., Long, P.M., Stephan, F. (eds) Algorithmic Learning Theory. ALT 2006. Lecture Notes in Computer Science(), vol 4264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11894841_20

Download citation

  • DOI: https://doi.org/10.1007/11894841_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-46649-9

  • Online ISBN: 978-3-540-46650-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics