Skip to main content

Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint

  • Chapter
  • First Online:

Part of the book series: Springer Optimization and Its Applications ((SOIA,volume 71))

Abstract

We consider the problem of sequential sampling from a finite number of independent statistical populations to maximize the expected infinite horizon average outcome per period, under a constraint that the expected average sampling cost does not exceed an upper bound. The outcome distributions are not known. We construct a class of consistent adaptive policies, under which the average outcome converges with probability 1 to the true value under complete information for all distributions with finite means. We also compare the rate of convergence for various policies in this class using simulation.

AMS Subject Classification: Primary 93E35, Stochastic learning and adaptive control; Secondary 62L05, Sequential designs

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit. Machine Learning. 47, 235–256 (2002)

    Article  MATH  Google Scholar 

  2. Burnetas, A.N., Katehakis, M.N.: Optimal adaptive policies for sequential allocation problems. Adv. App. Math. 17, 122–142 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  3. Burnetas, A.N., Katehakis, M.N.: Optimal adaptive policies for markovian decision processes. Math. Oper. Res. 22, 222–255 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  4. Katehakis, M.N., Robbins, H.: Sequential choice from several populations. Proc. Natl. Acad. Sci. USA. 92, 8584–8585 (1995)

    Article  MathSciNet  Google Scholar 

  5. Kulkarni, S.R., Lugosi, G.: Finite-time lower bounds for the two-armed bandit problem. IEEE Trans. Automatic Contr. 45, 711–714 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  6. Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. App. Math. 6, 4–22 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  7. Madani, O., Lizotte, D., Greiner, R.: The budgeted multi-armed bandit problem. In: Lecture Notes in Artificial Intelligence, Subseries of Lecture Notes in Computer Science, vol. 3120, pp. 643–645 (2004)

    MathSciNet  Google Scholar 

  8. Pezeshk, H., Gittins, J.: Sample size determination in clinical trials. Student. 3(1), 19–26 (1999)

    Google Scholar 

  9. Poznyak, A., Nazim, K., Gomez, E.: Self-Learning Control of Finite Markov Chains. CRC Press, New York (2000)

    Google Scholar 

  10. Robbins, H.: Some aspects of the sequential design of experiments. Bull. Amer. Math. Monthly. 58, 527–536 (1952)

    MathSciNet  MATH  Google Scholar 

  11. Wang, Y.G.: Gittins indices and constrained allocation in clinical trials. Biometrika. 78, 101–111 (1991)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This research was supported by the Greek Secretariat of Research and Technology under a Greece/Turkey bilateral research collaboration program. The authors thank Nickos Papadatos and George Afendras for useful discussions on the problem of consistent estimation in a random sequence of random variables.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Apostolos Burnetas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media New York

About this chapter

Cite this chapter

Burnetas, A., Kanavetas, O. (2012). Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint. In: Daras, N. (eds) Applications of Mathematics and Informatics in Military Science. Springer Optimization and Its Applications, vol 71. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4109-0_8

Download citation

Publish with us

Policies and ethics