Performance and Convergence of Multi-user Online Learning

  • Cem Tekin
  • Mingyan Liu
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 75)

Abstract

We study the problem of allocating multiple users to a set of wireless channels in a decentralized manner when the channel qualities are time-varying and unknown to the users, and accessing the same channel by multiple users leads to reduced quality due to interference. In such a setting the users not only need to learn the inherent channel quality and at the same time the best allocations of users to channels so as to maximize the social welfare. Assuming that the users adopt a certain online learning algorithm, we investigate under what conditions the socially optimal allocation is achievable. In particular we examine the effect of different levels of knowledge the users may have and the amount of communications and cooperation. The general conclusion is that when the cooperation of users decreases and the uncertainty about channel payoffs increases it becomes harder to achieve the socially optimal allocation.

Keywords

multi-user learning multi-armed bandits spectrum sharing congestion games 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R.: Sample Mean Based Index Policies with O(log(n)) Regret for the Multi-armed Bandit Problem. Advances in Applied Probability 27(4), 1054–1078 (1995)MathSciNetMATHGoogle Scholar
  2. 2.
    Ahmad, S., Tekin, C., Liu, M., Southwell, R., Huang, J.: Spectrum Sharing as Spatial Congestion Games (2010), http://arxiv.org/abs/1011.5384
  3. 3.
    Anandkumar, A., Michael, N., Tang, A.: Opportunistic Spectrum Access with Multiple Players: Learning under Competition. In: Proc. of IEEE INFOCOM (March 2010)Google Scholar
  4. 4.
    Anantharam, V., Varaiya, P., Walrand, J.: Asymptotically Efficient Allocation Rules for the Multiarmed Bandit Problem with Multiple Plays-Part I: IID Rewards. IEEE Trans. Automat. Contr., 968–975 (November 1987)Google Scholar
  5. 5.
    Anantharam, V., Varaiya, P., Walrand, J.: Asymptotically Efficient Allocation Rules for the Multiarmed Bandit Problem with Multiple Plays-Part II: Markovian Rewards. IEEE Trans. Automat. Contr., 977–982 (November 1987)Google Scholar
  6. 6.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47, 235–256 (2002)CrossRefMATHGoogle Scholar
  7. 7.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The Nonstochastic Multiarmed Bandit Problem. SIAM Journal on Computing 32, 48–77 (2002)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Chlebus, E.: An Approximate Formula for a Partial Sum of the Divergent p-series. Applied Mathematics Letters 22, 732–737 (2009)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Turner, D.W., Young, D.M., Seaman, J.: A Kolmogorov Inequality for the Sum of Independent Bernoulli Random Variables with Unequal Means. Statistics and Probability Letters 23, 243–245 (1995)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Freund, Y., Schapire, R.: Adaptive Game Playing Using Multiplicative Weights. Games and Economic Behaviour 29, 79–103 (1999)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Gai, Y., Krishnamachari, B., Jain, R.: Learning Multiuser Channel Allocations in Cognitive Radio Networks: a Combinatorial Multi-armed Bandit Formulation. In: IEEE Symp. on Dynamic Spectrum Access Networks (DySPAN) (April 2010)Google Scholar
  12. 12.
    Kakhbod, A., Teneketzis, D.: Power Allocation and Spectrum Sharing in Cognitive Radio Networks With Strategic Users. In: 49th IEEE Conference on Decision and Control (CDC) (December 2010)Google Scholar
  13. 13.
    Kasbekar, G., Proutiere, A.: Opportunustic Medium Access in Multi-channel Wireless Systems: A Learning Approach. In: Proceedings of the 48th Annual Allerton Conference on Communication, Control, and Computation (September 2010)Google Scholar
  14. 14.
    Kleinberg, R., Piliouras, G., Tardos, E.: Multiplicative Updates Outperform Generic No-Regret Learning in Congestion Games. In: Annual ACM Symposium on Theory of Computing, STOC (2009)Google Scholar
  15. 15.
    Lai, T., Robbins, H.: Asymptotically Efficient Adaptive Allocation Rules. Advances in Applied Mathematics 6, 4–22 (1985)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Liu, K., Zhao, Q.: Distributed Learning in Multi-Armed Bandit with Multiple Players. IEEE Transactions on Signal Processing 58(11), 5667–5681 (2010)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Monderer, D., Shapley, L.S.: Potential Games. Games and Economic Behavior 14(1), 124–143 (1996)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Rosenthal, R.: A Class of Games Possessing Pure-strategy Nash Equilibria. International Journal of Game Theory 2, 65–67 (1973)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Sandholm, W.H.: Population Games and Evolutionary Dynamics (2008) (manuscript)Google Scholar
  20. 20.
    Smith, J.M.: Evolution and the Theory of Games. Cambridge University Press (1982)Google Scholar
  21. 21.
    Tekin, C., Liu, M.: Online Algorithms for the Multi-armed Bandit Problem with Markovian Rewards. In: Proceedings of the 48th Annual Allerton Conference on Communication, Control, and Computation (September 2010)Google Scholar
  22. 22.
    Tekin, C., Liu, M.: Online Learning in Opportunistic Spectrum Access: A Restless Bandit Approach. In: 30th IEEE International Conference on Computer Communications (INFOCOM) (April 2011)Google Scholar

Copyright information

© ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering 2012

Authors and Affiliations

  • Cem Tekin
    • 1
  • Mingyan Liu
    • 1
  1. 1.Department of Electrical Engineering and Computer ScienceUniversity of MichiganAnn ArborUSA

Personalised recommendations