Abstract
Setting up the future Internet of Things (IoT) networks will require to support more and more communicating devices. We prove that intelligent devices in unlicensed bands can use Multi-Armed Bandit (MAB) learning algorithms to improve resource exploitation. We evaluate the performance of two classical MAB learning algorithms, \(\mathrm {UCB}_1\) and Thomson Sampling, to handle the decentralized decision-making of Spectrum Access, applied to IoT networks; as well as learning performance with a growing number of intelligent end-devices. We show that using learning algorithms does help to fit more devices in such networks, even when all end-devices are intelligent and are dynamically changing channel. In the studied scenario, stochastic MAB learning provides a up to \(16\%\) gain in term of successful transmission probabilities, and has near optimal performance even in non-stationary and non-i.i.d. settings with a majority of intelligent devices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In the experiments below, p is about \(10^{-3}\), because in a crowded network p should be smaller than \(N_c / (S + D)\) for all devices to communicate successfully (in average).
- 2.
This optimal policy needs an oracle seeing the entire system, and affecting all the dynamic devices, once and for all, in order to avoid any signaling overhead.
- 3.
We tried similar experiments with other values for \(N_c\) and this repartition vector, and results were similar for non-homogeneous repartitions. Clearly, the problem is less interesting for homogeneous repartition, as all channels appear the same for dynamic devices, and so even with D small in comparison to S, the system behaves like in Fig. 2d, where the performance of the five approaches are very close.
References
Centenaro, M., Vangelista, L., Zanella, A., Zorzi, M.: Long-range communications in unlicensed bands: the rising stars in the IoT and smart city scenarios. IEEE Wirel. Commun. 23(5), 60–67 (2016)
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)
Bubeck, S., Cesa-Bianchi, N., et al.: Regret analysis of stochastic and non-stochastic multi-armed bandit problems. Found. Trends® Mach. Learn. 5(1), 1–122 (2012)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multi-armed bandit problem. Mach. Learn. 47(2), 235–256 (2002)
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)
Agrawal, S., Goyal, N.: Analysis of Thompson sampling for the multi-armed bandit problem. In: Conference on Learning Theory, JMLR, p. 39-1 (2012)
Kaufmann, E., Korda, N., Munos, R.: Thompson sampling: an asymptotically optimal finite-time analysis. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS (LNAI), vol. 7568, pp. 199–213. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34106-9_18
Haykin, S.: Cognitive radio: brain-empowered wireless communications. IEEE J. Sel. Areas Commun. 23(2), 201–220 (2005)
Jouini, W., Ernst, D., Moy, C., Palicot, J.: Upper confidence bound based decision making strategies and dynamic spectrum access. In: 2010 IEEE International Conference on Communications, pp. 1–5 (2010)
Toldov, V., Clavier, L., Loscrí, V., Mitton N.: A Thompson sampling approach to channel exploration-exploitation problem in multihop cognitive radio networks. In: PIMRC, pp. 1–6 (2016)
Bonnefoi, R., Moy, C., Palicot, J.: Advanced metering infrastructure backhaul reliability improvement with cognitive radio. In: SmartGridComm, pp. 230–236 (2016)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University, Cambridge (2004)
Luenberger, D.G.: Quasi-convex programming. SIAM J. Appl. Math. 16(5), 1090–1095 (1968)
Arrow, K.J., Enthoven, A.C.: Quasi-concave programming. Econometrica 29(4), 779–800 (1961)
Corless, R., Gonnet, G., Hare, D., Jeffrey, D., Knuth, D.: On the lambert \(\cal{W}\) function. Adv. Comput. Math. 5(1), 329–359 (1996)
Robbins, H.: Some aspects of the sequential design of experiments. Bull. Am. Math. Soc. 58(5), 527–535 (1952)
Maskooki, A., Toldov, V., Clavier, L., Loscrí, V., Mitton, N.: Competition: channel exploration/exploitation based on a Thompson sampling approach in a radio cognitive environment. In: EWSN (2016)
Moy, C., Palicot, J., Darak, S.J.: Proof-of-concept system for opportunistic spectrum access in multi-user decentralized networks. EAI Endorsed Trans. Cogn. Commun. 2, 1–10 (2016)
Liu, K., Zhao, Q.: Distributed learning in multi-armed bandit with multiple players. IEEE Trans. Sig. Process. 58(11), 5667–5681 (2010)
Acknowledgements
This work is supported by the French National Research Agency (ANR), under the projects SOGREEN (grant coded: N ANR-14-CE28-0025-02) and BADASS (N ANR-16-CE40-0002), by Région Bretagne, France, by the French Ministry of Higher Education and Research (MENESR) and ENS Paris-Saclay.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Bonnefoi, R., Besson, L., Moy, C., Kaufmann, E., Palicot, J. (2018). Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings. In: Marques, P., Radwan, A., Mumtaz, S., Noguet, D., Rodriguez, J., Gundlach, M. (eds) Cognitive Radio Oriented Wireless Networks. CrownCom 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 228. Springer, Cham. https://doi.org/10.1007/978-3-319-76207-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-76207-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76206-7
Online ISBN: 978-3-319-76207-4
eBook Packages: Computer ScienceComputer Science (R0)