Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings

Bonnefoi, Rémi; Besson, Lilian; Moy, Christophe; Kaufmann, Emilie; Palicot, Jacques

doi:10.1007/978-3-319-76207-4_15

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 228))

Included in the following conference series:

International Conference on Cognitive Radio Oriented Wireless Networks

1312 Accesses
28 Citations

Abstract

Setting up the future Internet of Things (IoT) networks will require to support more and more communicating devices. We prove that intelligent devices in unlicensed bands can use Multi-Armed Bandit (MAB) learning algorithms to improve resource exploitation. We evaluate the performance of two classical MAB learning algorithms, \(\mathrm {UCB}_1\) and Thomson Sampling, to handle the decentralized decision-making of Spectrum Access, applied to IoT networks; as well as learning performance with a growing number of intelligent end-devices. We show that using learning algorithms does help to fit more devices in such networks, even when all end-devices are intelligent and are dynamically changing channel. In the studied scenario, stochastic MAB learning provides a up to \(16\%\) gain in term of successful transmission probabilities, and has near optimal performance even in non-stationary and non-i.i.d. settings with a majority of intelligent devices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In the experiments below, p is about \(10^{-3}\), because in a crowded network p should be smaller than \(N_c / (S + D)\) for all devices to communicate successfully (in average).
2.
This optimal policy needs an oracle seeing the entire system, and affecting all the dynamic devices, once and for all, in order to avoid any signaling overhead.
3.
We tried similar experiments with other values for \(N_c\) and this repartition vector, and results were similar for non-homogeneous repartitions. Clearly, the problem is less interesting for homogeneous repartition, as all channels appear the same for dynamic devices, and so even with D small in comparison to S, the system behaves like in Fig. 2d, where the performance of the five approaches are very close.

References

Centenaro, M., Vangelista, L., Zanella, A., Zorzi, M.: Long-range communications in unlicensed bands: the rising stars in the IoT and smart city scenarios. IEEE Wirel. Commun. 23(5), 60–67 (2016)
Article Google Scholar
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)
Article MathSciNet MATH Google Scholar
Bubeck, S., Cesa-Bianchi, N., et al.: Regret analysis of stochastic and non-stochastic multi-armed bandit problems. Found. Trends® Mach. Learn. 5(1), 1–122 (2012)
Article MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multi-armed bandit problem. Mach. Learn. 47(2), 235–256 (2002)
Article MATH Google Scholar
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)
Article MATH Google Scholar
Agrawal, S., Goyal, N.: Analysis of Thompson sampling for the multi-armed bandit problem. In: Conference on Learning Theory, JMLR, p. 39-1 (2012)
Google Scholar
Kaufmann, E., Korda, N., Munos, R.: Thompson sampling: an asymptotically optimal finite-time analysis. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS (LNAI), vol. 7568, pp. 199–213. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34106-9_18
Chapter Google Scholar
Haykin, S.: Cognitive radio: brain-empowered wireless communications. IEEE J. Sel. Areas Commun. 23(2), 201–220 (2005)
Article Google Scholar
Jouini, W., Ernst, D., Moy, C., Palicot, J.: Upper confidence bound based decision making strategies and dynamic spectrum access. In: 2010 IEEE International Conference on Communications, pp. 1–5 (2010)
Google Scholar
Toldov, V., Clavier, L., Loscrí, V., Mitton N.: A Thompson sampling approach to channel exploration-exploitation problem in multihop cognitive radio networks. In: PIMRC, pp. 1–6 (2016)
Google Scholar
Bonnefoi, R., Moy, C., Palicot, J.: Advanced metering infrastructure backhaul reliability improvement with cognitive radio. In: SmartGridComm, pp. 230–236 (2016)
Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University, Cambridge (2004)
Book MATH Google Scholar
Luenberger, D.G.: Quasi-convex programming. SIAM J. Appl. Math. 16(5), 1090–1095 (1968)
Article MathSciNet MATH Google Scholar
Arrow, K.J., Enthoven, A.C.: Quasi-concave programming. Econometrica 29(4), 779–800 (1961)
Article MathSciNet MATH Google Scholar
Corless, R., Gonnet, G., Hare, D., Jeffrey, D., Knuth, D.: On the lambert \(\cal{W}\) function. Adv. Comput. Math. 5(1), 329–359 (1996)
Article MathSciNet MATH Google Scholar
Robbins, H.: Some aspects of the sequential design of experiments. Bull. Am. Math. Soc. 58(5), 527–535 (1952)
Article MathSciNet MATH Google Scholar
Maskooki, A., Toldov, V., Clavier, L., Loscrí, V., Mitton, N.: Competition: channel exploration/exploitation based on a Thompson sampling approach in a radio cognitive environment. In: EWSN (2016)
Google Scholar
Moy, C., Palicot, J., Darak, S.J.: Proof-of-concept system for opportunistic spectrum access in multi-user decentralized networks. EAI Endorsed Trans. Cogn. Commun. 2, 1–10 (2016)
Google Scholar
Liu, K., Zhao, Q.: Distributed learning in multi-armed bandit with multiple players. IEEE Trans. Sig. Process. 58(11), 5667–5681 (2010)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work is supported by the French National Research Agency (ANR), under the projects SOGREEN (grant coded: N ANR-14-CE28-0025-02) and BADASS (N ANR-16-CE40-0002), by Région Bretagne, France, by the French Ministry of Higher Education and Research (MENESR) and ENS Paris-Saclay.

Author information

Authors and Affiliations

CentraleSupélec (campus of Rennes), IETR, SCEE Team, Avenue de la Boulaie - CS 47601, 35576, Cesson-Sévigné, France
Rémi Bonnefoi, Lilian Besson, Christophe Moy & Jacques Palicot
Univ. Lille 1, CNRS, Inria, SequeL Team, UMR 9189 - CRIStAL, 59000, Lille, France
Lilian Besson & Emilie Kaufmann

Authors

Rémi Bonnefoi
View author publications
You can also search for this author in PubMed Google Scholar
Lilian Besson
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Moy
View author publications
You can also search for this author in PubMed Google Scholar
Emilie Kaufmann
View author publications
You can also search for this author in PubMed Google Scholar
Jacques Palicot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rémi Bonnefoi .

Editor information

Editors and Affiliations

Instituto de Telecomunicações, Lisbon, Portugal
Paulo Marques
Instituto de Telecomunicações, Lisbon, Portugal
Ayman Radwan
Instituto de Telecomunicações, Lisbon, Portugal
Shahid Mumtaz
CEA-LETI, Grenoble, France
Dominique Noguet
Instituto de Telecomunicações, Lisbon, Portugal
Jonathan Rodriguez
NOKIA, Munich, Germany
Michael Gundlach

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bonnefoi, R., Besson, L., Moy, C., Kaufmann, E., Palicot, J. (2018). Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings. In: Marques, P., Radwan, A., Mumtaz, S., Noguet, D., Rodriguez, J., Gundlach, M. (eds) Cognitive Radio Oriented Wireless Networks. CrownCom 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 228. Springer, Cham. https://doi.org/10.1007/978-3-319-76207-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-76207-4_15
Published: 17 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76206-7
Online ISBN: 978-3-319-76207-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics