IFIP International Conference on Artificial Intelligence Applications and Innovations

Artificial Intelligence Applications and Innovations pp 307-317 | Cite as

Thompson Sampling Guided Stochastic Searching on the Line for Adversarial Learning

  • Sondre Glimsdal
  • Ole-Christoffer Granmo
Conference paper
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 458)

Abstract

The multi-armed bandit problem has been studied for decades. In brief, a gambler repeatedly pulls one out of N slot machine arms, randomly receiving a reward or a penalty from each pull. The aim of the gambler is to maximize the expected number of rewards received, when the probabilities of receiving rewards are unknown. Thus, the gambler must, as quickly as possible, identify the arm with the largest probability of producing rewards, compactly capturing the exploration-exploitation dilemma in reinforcement learning. In this paper we introduce a particular challenging variant of the multi-armed bandit problem, inspired by the so-called N-Door Puzzle. In this variant, the gambler is only told whether the optimal arm lies to the “left” or to the “right” of the one pulled, with the feedback being erroneous with probability 1 − p. Our novel scheme for this problem is based on a Bayesian representation of the solution space, and combines this representation with Thompson sampling to balance exploration against exploitation. Furthermore, we introduce the possibility of traitorous environments that lie about the direction of the optimal arm (adversarial learning problem). Empirical results show that our scheme deals with both traitorous and non-traitorous environments, significantly outperforming competing algorithms.

Keywords

N-Door Puzzle Multi-armed Bandit Problem Adversarial Learning Bayesian Learning Thompson Sampling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. arXiv preprint arXiv:1209.3352 (2012)Google Scholar
  2. 2.
    Atan, O., Tekin, C., van der Schaar, M.: Global bandits. arXiv preprint arXiv:1503.08370 (2015)Google Scholar
  3. 3.
    Baeza-Yates, R.A., Culberson, J.C., Rawlins, G.J.: Searching with uncertainty extended abstract. Springer (1988)Google Scholar
  4. 4.
    Chapelle, O., Li, L.: An empirical evaluation of thompson sampling. Neural Information Processing Systems (NIPS) (2011)Google Scholar
  5. 5.
    Glimsdal, S., Granmo, O.-C.: Gaussian process based optimistic knapsack sampling with applications to stochastic resource allocation. In: MAICS 2011, 24 (2013) (to appear)Google Scholar
  6. 6.
    Granmo, O.-C.: Solving two-armed bernoulli bandit problems using a bayesian learning automaton. International Journal of Intelligent Computing and Cybernetics 3(2), 207–234 (2010)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Granmo, O.-C., Oommen, B.J., Myrer, S.A., Olsen, M.G.: Learning Automata-based Solutions to the Nonlinear Fractional Knapsack Problem with Applications to Optimal Resource Allocation. IEEE Transactions on Systems, Man, and Cybernetics, Part B 37(1), 166–175 (2007)CrossRefGoogle Scholar
  8. 8.
    Misra, S., Oommen, B.J.: Gpspa: a new adaptive algorithm for maintaining shortest path routing trees in stochastic networks. International Journal of Communication Systems 17(10), 963–984 (2004)CrossRefGoogle Scholar
  9. 9.
    Oommen, B.J.: Stochastic Searching on the Line and its Applications to Parameter Learning in Nonlinear Optimization. IEEE Transactions on Systems, Man, and Cybernetics, Part B 27(4), 733–739 (1997)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Oommen, B.J., Granmo, O.-C., Liang, Z.: A novel multidimensional scaling technique for mapping word-of-mouth discussions. In: Chien, B.-C., Hong, T.-P. (eds.) Opportunities and Challenges for Next-Generation Applied Intelligence. SCI, vol. 214, pp. 317–322. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  11. 11.
    John Oommen, B., Kim, S.-W., Samuel, M.T., Granmo, O.-C.: A solution to the stochastic point location problem in metalevel nonstationary environments. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 38(2), 466–476 (2008)CrossRefGoogle Scholar
  12. 12.
    John Oommen, B., Raghunath, G.: Automata learning and intelligent tertiary searching for stochastic point location. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 28(6), 947–954 (1998)CrossRefGoogle Scholar
  13. 13.
    Oommen, B.J., Raghunath, G., Kuipers, B.: On how to learn from a stochastic teacher or a stochastic compulsive liar of unknown identity. In: Gedeon, T(T.) D., Fung, L.C.C. (eds.) AI 2003. LNCS (LNAI), vol. 2903, pp. 24–40. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  14. 14.
    Smullyan, R.: To Mock a Mockingbird and Other Logic Puzzles: Including an Amazing Adventure in Combinatory Logic. Knopf (1988)Google Scholar
  15. 15.
    Tao, T., Ge, H., Cai, G., Li, S.: Adaptive step searching for solving stochastic point location problem 7995, 192–198 (2013)Google Scholar
  16. 16.
    Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4), 285–294 (1933)CrossRefMATHGoogle Scholar
  17. 17.
    Tolpin, D., Wood, F.: Maximum a posteriori estimation by search in probabilistic programs. arXiv preprint arXiv:1504.06848 (2015)Google Scholar
  18. 18.
    Yazidi, A., Granmo, O.-C., Oommen, B.J.: A stochastic search on the line-based solution to discretized estimation. In: Jiang, H., Ding, W., Ali, M., Wu, X. (eds.) IEA/AIE 2012. LNCS, vol. 7345, pp. 764–773. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  19. 19.
    Yazidi, A., Granmo, O.-C., Oommen, B.J., Goodwin, M.: A hierarchical learning scheme for solving the stochastic point location problem. In: Jiang, H., Ding, W., Ali, M., Wu, X. (eds.) IEA/AIE 2012. LNCS, vol. 7345, pp. 774–783. Springer, Heidelberg (2012)CrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2015

Authors and Affiliations

  • Sondre Glimsdal
    • 1
  • Ole-Christoffer Granmo
    • 1
  1. 1.Department of ICTUniversity of AgderGrimstadNorway

Personalised recommendations