Reinforcement Learning in Education: A Multi-armed Bandit Approach

Combrink, Herkulaas MvE; Marivate, Vukosi; Rosman, Benjamin

doi:10.1007/978-3-031-35883-8_1

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 503))

Included in the following conference series:

International Conference on Emerging Technologies for Developing Countries

130 Accesses
1 Altmetric

Abstract

Advances in reinforcement learning research have demonstrated the ways in which different agent-based models can learn how to optimally perform a task within a given environment. Reinforcement leaning solves unsupervised problems where agents move through a state-action-reward loop to maximize the overall reward for the agent, which in turn optimizes the solving of a specific problem in a given environment. However, these algorithms are designed based on our understanding of actions that should be taken in a real-world environment to solve a specific problem. One such problem is the ability to identify, recommend and execute an action within a system where the users are the subject, such as in education. In recent years, the use of blended learning approaches integrating face-to-face learning with online learning in the education context, has increased. Additionally, online platforms used for education require the automation of certain functions such as the identification, recommendation or execution of actions that can benefit the user, in this sense, the student or learner. As promising as these scientific advances are, there is still a need to conduct research in a variety of different areas to ensure the successful deployment of these agents within education systems. Therefore, the aim of this study was to contextualise and simulate the cumulative reward within an environment for an intervention recommendation problem in the education context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Source: https://www.sciencedirect.com/science/article/pii/S00078 specifix50618301471.
2.
Source: https://iopscience.iop.org/article/10.1088/1742-6596/1717/1/012002.

References

Coetzee, J., Neneh, B., Stemmet, K., Lamprecht, J., Motsitsi, C., Sereeco, W.: South African universities in a time of increasing disruption. South African J. Econ. Manage. Sci. 24(1), 1–12 (2021)
Google Scholar
Rashied, N., Bhamjee, M.: Does the global south need to decolonise the fourth industrial revolution? In: Doorsamy, W., Paul, B.S., Marwala, T. (eds.) The Disruptive Fourth Industrial Revolution. LNEE, vol. 674, pp. 95–110. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-48230-5_5
Chapter Google Scholar
Oke, A., Fernandes, F.A.P.: Innovations in teaching and learning: exploring the perceptions of the education sector on the 4th industrial revolution (4IR). J. Open Innov. Technol. Market Complex. 6(2), 31 (2020)
Article Google Scholar
Gamede, B.T., Ajani, O.A., Afolabi, O.S.: Exploring the adoption and usage of learning management system as alternative for curriculum delivery in South African higher education institutions during COVID-19 lockdown. Int. J. High. Educ. 11(1), 71–84 (2022)
Article Google Scholar
Bortolini, M., Faccio, M., Galizia, F.G., Gamberi, M., Pilati, F.: Design, engineering and testing of an innovative adaptive automation assembly system. Assembly Autom. (2020)
Google Scholar
D’Addona, D.M., Bracco, F., Bettoni, A., Nishino, N., Carpanzano, E., Bruzzone, A.A.: Adaptive automation and human factors in manufacturing: An experimental assessment for a cognitive approach. CIRP Ann. 67(1), 455–458 (2018)
Article Google Scholar
Dwivedi, S., Roshni, V.K.: Recommender system for big data in education. In: 2017 5th National Conference on E-Learning & E-Learning Technologies (ELELTECH), pp. 1–4. IEEE (2017)
Google Scholar
Obeid, C., Lahoud, I., El Khoury, H., Champin, P.A.: Ontology-based recommender system in higher education. In: Companion Proceedings of the The Web Conference 2018, pp. 1031–1034 (2018)
Google Scholar
Li, Q., Kim, J.: A deep learning-based course recommender system for sustainable development in education. Appl. Sci. 11(19), 8993 (2021)
Article Google Scholar
Nouh, R.M., Lee, H.H., Lee, W.J., Lee, J.D.: A smart recommender based on hybrid learning methods for personal well-being services. Sensors 19(2), 431 (2019)
Article Google Scholar
Zheng, Z., Ma, H., Lyu, M.R., King, I.: Wsrec: a collaborative filtering based web service recommender system. In: 2009 IEEE International Conference on Web Services, pp. 437–444. IEEE (2009)
Google Scholar
Geetha, G., Safa, M., Fancy, C., Saranya, D.: A hybrid approach using collaborative filtering and content based filtering for recommender system. In: Journal of Physics: Conference Series, vol. 1000, no. 1, p. 012101. IOP Publishing (2018)
Google Scholar
Gaw, F.: Algorithmic logics and the construction of cultural taste of the Netflix Recommender System. Media Cult. Soc. 44(4), 706–725 (2022)
Article Google Scholar
Anwar, T., Uma, V.: A review of recommender system and related dimensions. Data, Engineering and Applications, pp. 3–10 (2019)
Google Scholar
Afoudi, Y., Lazaar, M., Al Achhab, M.: Hybrid recommendation system combined content-based filtering and collaborative prediction using artificial neural network. Simul. Model. Pract. Theory 113, 102375 (2021)
Article Google Scholar
Lika, B., Kolomvatsos, K., Hadjiefthymiades, S.: Facing the cold start problem in recommender systems. Expert Syst. Appl. 41(4), 2065–2073 (2014)
Article Google Scholar
Natarajan, S., Vairavasundaram, S., Natarajan, S., Gandomi, A.H.: Resolving data sparsity and cold start problem in collaborative filtering recommender system using linked open data. Expert Syst. Appl. 149, 113248 (2020)
Article Google Scholar
de Graaff, V., van de Venis, A., van Keulen, M., Rolf, A.: Generic knowledge-based analysis of social media for recommendations. In: CBRecSys@ RecSys, pp. 22–29 (2015)
Google Scholar
Chen, L.-C., Kuo, P.-J., Liao, I.-E.: Ontology-based library recommender system using MapReduce. Clust. Comput. 18(1), 113–121 (2014). https://doi.org/10.1007/s10586-013-0342-z
Article Google Scholar
Ma, C., Gong, W., Hernández-Lobato, J.M., Koenigstein, N., Nowozin, S., Zhang, C.: Partial VAE for hybrid recommender system. In: NIPS Workshop on Bayesian Deep Learning, vol. 2018 (2018)
Google Scholar
Gräßer, F., et al.: Therapy decision support based on recommender system methods. J. Healthcare Eng. (2017)
Google Scholar
Hu, Y., Chapman, A., Wen, G., Hall, D.W.: What can knowledge bring to machine learning?—a survey of low-shot learning for structured data. ACM Trans. Intell. Syst. Technol. 13(3), 1–45 (2022)
Article Google Scholar
Dayan, P., Balleine, B.W.: Reward, motivation, and reinforcement learning. Neuron 36(2), 285–298 (2002)
Article Google Scholar
Ludvig, E.A., Bellemare, M.G., Pearson, K.G.: A primer on reinforcement learning in the brain: psychological, computational, and neural perspectives. In: Computational Neuroscience for Advancing Artificial Intelligence: Models, Methods and Applications, pp. 111–144. IGI Global (2011)
Google Scholar
Even-Dar, E., Mannor, S., Mansour, Y., Mahadevan, S.: Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Mach. Learn. Res. 7(6) (2006)
Google Scholar
Koulouriotis, D.E., Xanthopoulos, A.: Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems. Appl. Math. Comput. 196(2), 913–922 (2008)
MATH Google Scholar
Wang, K., Liu, Q., Chen, L.: Optimality of greedy policy for a class of standard reward function of restless multi-armed bandit problem. IET Signal Proc. 6(6), 584–593 (2012)
Article MathSciNet Google Scholar
Krishnamurthy, V., Wahlberg, B., Lingelbach, F.: A value iteration algorithm for partially observed markov decision process multi-armed bandits. Math. Oper. Res. 133–152 (2005)
Google Scholar
Rosman, B., Hawasly, M., Ramamoorthy, S.: Bayesian policy reuse. Mach. Learn. 104(1), 99–127 (2016). https://doi.org/10.1007/s10994-016-5547-y
Article MathSciNet MATH Google Scholar
Agarwal, S., Rodriguez, M.A., Buyya, R.: A reinforcement learning approach to reduce serverless function cold start frequency. In: 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 797–803. IEEE (2021)
Google Scholar
Tabatabaei, S.A., Hoogendoorn, M., van Halteren, A.: Narrowing reinforcement learning: overcoming the cold start problem for personalized health interventions. In: Miller, T., Oren, N., Sakurai, Y., Noda, I., Savarimuthu, B.T.R., Cao Son, T. (eds.) PRIMA 2018. LNCS (LNAI), vol. 11224, pp. 312–327. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03098-8_19
Chapter Google Scholar
Zou, L., et al.: Pseudo Dyna-Q: a reinforcement learning framework for interactive recommendation. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 816–824 (2020)
Google Scholar
MacGregor, K.: Access, retention and student success–a global view. Student Affairs and Services in Higher Education: Global Foundations, Issues, and Best Practices Third Edition, vol. 107
Google Scholar
Rajagopalan, R., Midgley, G.: Knowing differently in systemic intervention. Syst. Res. Behav. Sci. 32(5), 546–561 (2015)
Article Google Scholar
Burns, M.K., Deno, S.L., Jimerson, S.R.: Toward a unified response-to-intervention model. In: Jimerson, S.R., Burns, M.K., VanDerHeyden, A.M. (eds.) Handbook of Response to Intervention. Springer, Boston, MA (2007). https://doi.org/10.1007/978-0-387-49053-3_32
Zhao, C., Watanabe, K., Yang, B., Hirate, Y.: Fast converging multi-armed bandit optimization using probabilistic graphical model. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds.) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. LNCS, vol. 10938. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93037-4_10
Leitner, P., Khalil, M., Ebner, M.: Learning analytics in higher education—a literature review. Learning analytics: Fundaments, applications, and trends, pp.1–23 (2017)
Google Scholar
Gupta, S.: Higher education management, policies and strategies. J. Bus. Manage. Qual. Assur. (e ISSN 2456–9291) 1(1), 5–11 (2020)
Google Scholar
Kuh, G.D., Kinzie, J.: What really makes a “high-impact” practice high impact. Inside Higher Ed (2018)
Google Scholar
Organ, D., et al.: A systematic review of user-centred design practices in illicit substance use interventions for higher education students. In: European Conference on Information Systems 2018: Beyond Digitization-Facets of Socio-Technical Change. AIS Electronic Library (AISeL) (2018)
Google Scholar
Cupák, A., Fessler, P., Silgoner, M., Ulbrich, E.: Exploring differences in financial literacy across countries: the role of individual characteristics and institutions. Soc. Indic. Res. 1–30 (2021)
Google Scholar
Lacave, C., Molina, A.I., Cruz-Lemus, J.A.: Learning Analytics to identify dropout factors of Computer Science studies through Bayesian networks. Behav. Inform. Technol. 37(10–11), 993–1007 (2018). (Fundaments, applications, and trends, pp.1–23)
Google Scholar
Scanagatta, M., Salmerón, A., Stella, F.: A survey on Bayesian network structure learning from data. Progress Artific. Intell. 8(4), 425–439 (2019). https://doi.org/10.1007/s13748-019-00194-y
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Pretoria, Pretoria, South Africa
Herkulaas MvE Combrink & Vukosi Marivate
School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, South Africa
Benjamin Rosman

Authors

Herkulaas MvE Combrink
View author publications
You can also search for this author in PubMed Google Scholar
Vukosi Marivate
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Rosman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Herkulaas MvE Combrink .

Editor information

Editors and Affiliations

Central University of Technology, Free State, Bloemfontein, South Africa
Muthoni Masinde
University of the Western Cape, Cape Town, South Africa
Antoine Bagula

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Combrink, H.M., Marivate, V., Rosman, B. (2023). Reinforcement Learning in Education: A Multi-armed Bandit Approach. In: Masinde, M., Bagula, A. (eds) Emerging Technologies for Developing Countries. AFRICATEK 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 503. Springer, Cham. https://doi.org/10.1007/978-3-031-35883-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-35883-8_1
Published: 06 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35882-1
Online ISBN: 978-3-031-35883-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics