On Learning the Optimal Waiting Time

Lattimore, Tor; György, András; Szepesvári, Csaba

doi:10.1007/978-3-319-11662-4_15

Tor Lattimore²³,
András György²³ &
Csaba Szepesvári^23,24

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8776))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

1354 Accesses
3 Citations

Abstract

Consider the problem of learning how long to wait for a bus before walking, experimenting each day and assuming that the bus arrival times are independent and identically distributed random variables with an unknown distribution. Similar uncertain optimal stopping problems arise when devising power-saving strategies, e.g., learning the optimal disk spin-down time for mobile computers, or speeding up certain types of satisficing search procedures by switching from a potentially fast search method that is unreliable, to one that is reliable, but slower. Formally, the problem can be described as a repeated game. In each round of the game an agent is waiting for an event to occur. If the event occurs while the agent is waiting, the agent suffers a loss that is the sum of the event’s “arrival time” and some fixed loss. If the agents decides to give up waiting before the event occurs, he suffers a loss that is the sum of the waiting time and some other fixed loss. It is assumed that the arrival times are independent random quantities with the same distribution, which is unknown, while the agent knows the loss associated with each outcome. Two versions of the game are considered. In the full information case the agent observes the arrival times regardless of its actions, while in the partial information case the arrival time is observed only if it does not exceed the waiting time. After some general structural observations about the problem, we present a number of algorithms for both cases that learn the optimal weighting time with nearly matching minimax upper and lower bounds on their regret.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Teneketzis, D., Anantharam, V.: Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: Finite parameter space. IEEE Transaction on Automatic Control 34, 258–267 (1989)
Article MathSciNet MATH Google Scholar
Alon, N., Cesa-Bianchi, N., Gentile, C., Mansour, Y.: From bandits to experts: A tale of domination and independence. In: Advances in Neural Information Processing Systems, pp. 1610–1618 (2013)
Google Scholar
Bartók, G.: A near-optimal algorithm for finite partial-monitoring games against adversarial opponents. In: COLT, pp. 696–710 (2013)
Google Scholar
Bartók, G., Pál, D., Szepesvári, C.: Minimax regret of finite partial-monitoring games in stochastic environments. In: COLT 2011, pp. 133–154 (2011)
Google Scholar
Cesa-Bianchi, N.: Prediction, learning, and games. Cambridge University Press (2006)
Google Scholar
Cohen, A.C.: Truncated and censored samples: theory and applications. CRC Press (1991)
Google Scholar
Devroye, L., Lugosi, G.: Combinatorial methods in density estimation. Springer (2001)
Google Scholar
Dvoretzky, A., Kiefer, J., Wolfowitz, J.: Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics 27, 642–669 (1956)
Article MathSciNet MATH Google Scholar
Foster, D.P., Rakhlin, A.: No internal regret via neighborhood watch. Journal of Machine Learning Research - Proceedings Track (AISTATS) 22, 382–390 (2012)
Google Scholar
Ganchev, K., Nevmyvaka, Y., Kearns, M., Vaughan, J.W.: Censored exploration and the dark pool problem. Communications of the ACM 53(5), 99–107 (2010)
Article Google Scholar
Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, pp. 594–605. IEEE (2003)
Google Scholar
Krishnan, P., Long, P.M., Vitter, J.S.: Adaptive disk spindown via optimal rent-to-buy in probabilistic environments. Algorithmica 23(1), 31–56 (1999)
Article MathSciNet MATH Google Scholar
Lattimore, T., György, A., Szepesvári, C.: On learning the optimal waiting time (2014), http://downloads.tor-lattimore.com/projects/optimal_waiting/
Mannor, S., Shamir, O.: From bandits to experts: On the value of side-observations. In: NIPS, pp. 684–692 (2011)
Google Scholar
Massart, P.: The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. The Annals of Probability 18, 1269–1283 (1990)
Article MathSciNet MATH Google Scholar
Ribeiro, C.C., Rosseti, I., Vallejos, R.: Exploiting run time distributions to compare sequential and parallel stochastic local search algorithms. Journal of Global Optimization 54(2), 405–429 (2012)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Science, University of Alberta, Canada
Tor Lattimore, András György & Csaba Szepesvári
Microsoft Research, Redmond, USA
Csaba Szepesvári

Authors

Tor Lattimore
View author publications
You can also search for this author in PubMed Google Scholar
András György
View author publications
You can also search for this author in PubMed Google Scholar
Csaba Szepesvári
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Montanuniversitaet Leoben, 8700, Leoben, Austria
Peter Auer
Department of Philosophy, King’s College, WC2R 2LS, London, UK
Alexander Clark
Division of Computer Science, Hokkaido University, N-14, W-9, 060-0814, Sapporo, Japan
Thomas Zeugmann
Department of Computer Science, University of Regina, S4S 0A2, Regina, SK, Canada
Sandra Zilles

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lattimore, T., György, A., Szepesvári, C. (2014). On Learning the Optimal Waiting Time. In: Auer, P., Clark, A., Zeugmann, T., Zilles, S. (eds) Algorithmic Learning Theory. ALT 2014. Lecture Notes in Computer Science(), vol 8776. Springer, Cham. https://doi.org/10.1007/978-3-319-11662-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-11662-4_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11661-7
Online ISBN: 978-3-319-11662-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics