Skip to main content

Reinforcement Learning in Education: A Multi-armed Bandit Approach

  • Conference paper
  • First Online:
Emerging Technologies for Developing Countries (AFRICATEK 2022)

Abstract

Advances in reinforcement learning research have demonstrated the ways in which different agent-based models can learn how to optimally perform a task within a given environment. Reinforcement leaning solves unsupervised problems where agents move through a state-action-reward loop to maximize the overall reward for the agent, which in turn optimizes the solving of a specific problem in a given environment. However, these algorithms are designed based on our understanding of actions that should be taken in a real-world environment to solve a specific problem. One such problem is the ability to identify, recommend and execute an action within a system where the users are the subject, such as in education. In recent years, the use of blended learning approaches integrating face-to-face learning with online learning in the education context, has increased. Additionally, online platforms used for education require the automation of certain functions such as the identification, recommendation or execution of actions that can benefit the user, in this sense, the student or learner. As promising as these scientific advances are, there is still a need to conduct research in a variety of different areas to ensure the successful deployment of these agents within education systems. Therefore, the aim of this study was to contextualise and simulate the cumulative reward within an environment for an intervention recommendation problem in the education context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Source: https://www.sciencedirect.com/science/article/pii/S00078 specifix50618301471.

  2. 2.

    Source: https://iopscience.iop.org/article/10.1088/1742-6596/1717/1/012002.

References

  1. Coetzee, J., Neneh, B., Stemmet, K., Lamprecht, J., Motsitsi, C., Sereeco, W.: South African universities in a time of increasing disruption. South African J. Econ. Manage. Sci. 24(1), 1–12 (2021)

    Google Scholar 

  2. Rashied, N., Bhamjee, M.: Does the global south need to decolonise the fourth industrial revolution? In: Doorsamy, W., Paul, B.S., Marwala, T. (eds.) The Disruptive Fourth Industrial Revolution. LNEE, vol. 674, pp. 95–110. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-48230-5_5

    Chapter  Google Scholar 

  3. Oke, A., Fernandes, F.A.P.: Innovations in teaching and learning: exploring the perceptions of the education sector on the 4th industrial revolution (4IR). J. Open Innov. Technol. Market Complex. 6(2), 31 (2020)

    Article  Google Scholar 

  4. Gamede, B.T., Ajani, O.A., Afolabi, O.S.: Exploring the adoption and usage of learning management system as alternative for curriculum delivery in South African higher education institutions during COVID-19 lockdown. Int. J. High. Educ. 11(1), 71–84 (2022)

    Article  Google Scholar 

  5. Bortolini, M., Faccio, M., Galizia, F.G., Gamberi, M., Pilati, F.: Design, engineering and testing of an innovative adaptive automation assembly system. Assembly Autom. (2020)

    Google Scholar 

  6. D’Addona, D.M., Bracco, F., Bettoni, A., Nishino, N., Carpanzano, E., Bruzzone, A.A.: Adaptive automation and human factors in manufacturing: An experimental assessment for a cognitive approach. CIRP Ann. 67(1), 455–458 (2018)

    Article  Google Scholar 

  7. Dwivedi, S., Roshni, V.K.: Recommender system for big data in education. In: 2017 5th National Conference on E-Learning & E-Learning Technologies (ELELTECH), pp. 1–4. IEEE (2017)

    Google Scholar 

  8. Obeid, C., Lahoud, I., El Khoury, H., Champin, P.A.: Ontology-based recommender system in higher education. In: Companion Proceedings of the The Web Conference 2018, pp. 1031–1034 (2018)

    Google Scholar 

  9. Li, Q., Kim, J.: A deep learning-based course recommender system for sustainable development in education. Appl. Sci. 11(19), 8993 (2021)

    Article  Google Scholar 

  10. Nouh, R.M., Lee, H.H., Lee, W.J., Lee, J.D.: A smart recommender based on hybrid learning methods for personal well-being services. Sensors 19(2), 431 (2019)

    Article  Google Scholar 

  11. Zheng, Z., Ma, H., Lyu, M.R., King, I.: Wsrec: a collaborative filtering based web service recommender system. In: 2009 IEEE International Conference on Web Services, pp. 437–444. IEEE (2009)

    Google Scholar 

  12. Geetha, G., Safa, M., Fancy, C., Saranya, D.: A hybrid approach using collaborative filtering and content based filtering for recommender system. In: Journal of Physics: Conference Series, vol. 1000, no. 1, p. 012101. IOP Publishing (2018)

    Google Scholar 

  13. Gaw, F.: Algorithmic logics and the construction of cultural taste of the Netflix Recommender System. Media Cult. Soc. 44(4), 706–725 (2022)

    Article  Google Scholar 

  14. Anwar, T., Uma, V.: A review of recommender system and related dimensions. Data, Engineering and Applications, pp. 3–10 (2019)

    Google Scholar 

  15. Afoudi, Y., Lazaar, M., Al Achhab, M.: Hybrid recommendation system combined content-based filtering and collaborative prediction using artificial neural network. Simul. Model. Pract. Theory 113, 102375 (2021)

    Article  Google Scholar 

  16. Lika, B., Kolomvatsos, K., Hadjiefthymiades, S.: Facing the cold start problem in recommender systems. Expert Syst. Appl. 41(4), 2065–2073 (2014)

    Article  Google Scholar 

  17. Natarajan, S., Vairavasundaram, S., Natarajan, S., Gandomi, A.H.: Resolving data sparsity and cold start problem in collaborative filtering recommender system using linked open data. Expert Syst. Appl. 149, 113248 (2020)

    Article  Google Scholar 

  18. de Graaff, V., van de Venis, A., van Keulen, M., Rolf, A.: Generic knowledge-based analysis of social media for recommendations. In: CBRecSys@ RecSys, pp. 22–29 (2015)

    Google Scholar 

  19. Chen, L.-C., Kuo, P.-J., Liao, I.-E.: Ontology-based library recommender system using MapReduce. Clust. Comput. 18(1), 113–121 (2014). https://doi.org/10.1007/s10586-013-0342-z

    Article  Google Scholar 

  20. Ma, C., Gong, W., Hernández-Lobato, J.M., Koenigstein, N., Nowozin, S., Zhang, C.: Partial VAE for hybrid recommender system. In: NIPS Workshop on Bayesian Deep Learning, vol. 2018 (2018)

    Google Scholar 

  21. Gräßer, F., et al.: Therapy decision support based on recommender system methods. J. Healthcare Eng. (2017)

    Google Scholar 

  22. Hu, Y., Chapman, A., Wen, G., Hall, D.W.: What can knowledge bring to machine learning?—a survey of low-shot learning for structured data. ACM Trans. Intell. Syst. Technol. 13(3), 1–45 (2022)

    Article  Google Scholar 

  23. Dayan, P., Balleine, B.W.: Reward, motivation, and reinforcement learning. Neuron 36(2), 285–298 (2002)

    Article  Google Scholar 

  24. Ludvig, E.A., Bellemare, M.G., Pearson, K.G.: A primer on reinforcement learning in the brain: psychological, computational, and neural perspectives. In: Computational Neuroscience for Advancing Artificial Intelligence: Models, Methods and Applications, pp. 111–144. IGI Global (2011)

    Google Scholar 

  25. Even-Dar, E., Mannor, S., Mansour, Y., Mahadevan, S.: Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Mach. Learn. Res. 7(6) (2006)

    Google Scholar 

  26. Koulouriotis, D.E., Xanthopoulos, A.: Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems. Appl. Math. Comput. 196(2), 913–922 (2008)

    MATH  Google Scholar 

  27. Wang, K., Liu, Q., Chen, L.: Optimality of greedy policy for a class of standard reward function of restless multi-armed bandit problem. IET Signal Proc. 6(6), 584–593 (2012)

    Article  MathSciNet  Google Scholar 

  28. Krishnamurthy, V., Wahlberg, B., Lingelbach, F.: A value iteration algorithm for partially observed markov decision process multi-armed bandits. Math. Oper. Res. 133–152 (2005)

    Google Scholar 

  29. Rosman, B., Hawasly, M., Ramamoorthy, S.: Bayesian policy reuse. Mach. Learn. 104(1), 99–127 (2016). https://doi.org/10.1007/s10994-016-5547-y

    Article  MathSciNet  MATH  Google Scholar 

  30. Agarwal, S., Rodriguez, M.A., Buyya, R.: A reinforcement learning approach to reduce serverless function cold start frequency. In: 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 797–803. IEEE (2021)

    Google Scholar 

  31. Tabatabaei, S.A., Hoogendoorn, M., van Halteren, A.: Narrowing reinforcement learning: overcoming the cold start problem for personalized health interventions. In: Miller, T., Oren, N., Sakurai, Y., Noda, I., Savarimuthu, B.T.R., Cao Son, T. (eds.) PRIMA 2018. LNCS (LNAI), vol. 11224, pp. 312–327. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03098-8_19

    Chapter  Google Scholar 

  32. Zou, L., et al.: Pseudo Dyna-Q: a reinforcement learning framework for interactive recommendation. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 816–824 (2020)

    Google Scholar 

  33. MacGregor, K.: Access, retention and student success–a global view. Student Affairs and Services in Higher Education: Global Foundations, Issues, and Best Practices Third Edition, vol. 107

    Google Scholar 

  34. Rajagopalan, R., Midgley, G.: Knowing differently in systemic intervention. Syst. Res. Behav. Sci. 32(5), 546–561 (2015)

    Article  Google Scholar 

  35. Burns, M.K., Deno, S.L., Jimerson, S.R.: Toward a unified response-to-intervention model. In: Jimerson, S.R., Burns, M.K., VanDerHeyden, A.M. (eds.) Handbook of Response to Intervention. Springer, Boston, MA (2007). https://doi.org/10.1007/978-0-387-49053-3_32

  36. Zhao, C., Watanabe, K., Yang, B., Hirate, Y.: Fast converging multi-armed bandit optimization using probabilistic graphical model. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds.) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. LNCS, vol. 10938. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93037-4_10

  37. Leitner, P., Khalil, M., Ebner, M.: Learning analytics in higher education—a literature review. Learning analytics: Fundaments, applications, and trends, pp.1–23 (2017)

    Google Scholar 

  38. Gupta, S.: Higher education management, policies and strategies. J. Bus. Manage. Qual. Assur. (e ISSN 2456–9291) 1(1), 5–11 (2020)

    Google Scholar 

  39. Kuh, G.D., Kinzie, J.: What really makes a “high-impact” practice high impact. Inside Higher Ed (2018)

    Google Scholar 

  40. Organ, D., et al.: A systematic review of user-centred design practices in illicit substance use interventions for higher education students. In: European Conference on Information Systems 2018: Beyond Digitization-Facets of Socio-Technical Change. AIS Electronic Library (AISeL) (2018)

    Google Scholar 

  41. Cupák, A., Fessler, P., Silgoner, M., Ulbrich, E.: Exploring differences in financial literacy across countries: the role of individual characteristics and institutions. Soc. Indic. Res. 1–30 (2021)

    Google Scholar 

  42. Lacave, C., Molina, A.I., Cruz-Lemus, J.A.: Learning Analytics to identify dropout factors of Computer Science studies through Bayesian networks. Behav. Inform. Technol. 37(10–11), 993–1007 (2018). (Fundaments, applications, and trends, pp.1–23)

    Google Scholar 

  43. Scanagatta, M., Salmerón, A., Stella, F.: A survey on Bayesian network structure learning from data. Progress Artific. Intell. 8(4), 425–439 (2019). https://doi.org/10.1007/s13748-019-00194-y

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Herkulaas MvE Combrink .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Combrink, H.M., Marivate, V., Rosman, B. (2023). Reinforcement Learning in Education: A Multi-armed Bandit Approach. In: Masinde, M., Bagula, A. (eds) Emerging Technologies for Developing Countries. AFRICATEK 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 503. Springer, Cham. https://doi.org/10.1007/978-3-031-35883-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-35883-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-35882-1

  • Online ISBN: 978-3-031-35883-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics