Reinforcement Agents for E-Learning Applications

  • Hamid R. Tizhoosh
  • Maryam Shokri
  • Mohamed Kamel
Part of the Advanced Information and Knowledge Processing book series (AI&KP)


Advanced computer systems have become pivotal components for learning. However, we are still faced with many challenges in e-learning environments when developing reliable tools to assist users and facilitate and enhance the learning process. For instance, the problem of creating a user-friendly system that can learn from interaction with dynamic learning requirements and deal with largescale information is still widely unsolved. We need systems that have the ability to communicate and cooperate with the users, learn their preferences and increase the learning efficiency of individual users. Reinforcement learning (RL) is an intelligent technique with the ability to learn from interaction with the environment. It learns from trial and error and generally does not need any training data or a user model. At the beginning of the learning process, the RL agent does not have any knowledge about the actions it should take. After a while, the agent learns which actions yield the maximum reward. The ability of learning from interaction with a dynamic environment and using reward and punishment independent of any training data set makes reinforcement learning a suitable tool for e-learning situations, where subjective user feedback can easily be translated into a reinforcement signal.


Learning Object Multiagent System Markov Decision Process Reinforcement Agent Partially Observable Markov Decision Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ayesh, A. (2004) Emotionally Motivated Reinforcement Learning Based Controller. The Hague, The Netherlands: IEEE SMC.CrossRefGoogle Scholar
  2. 2.
    Berenji, H.R. (1994) Fuzzy Q-learning: a new approach for fuzzy dynamic programming problems. Third IEEE International Conference on Fuzzy Systems, Orlando, FL.Google Scholar
  3. 3.
    Chang, Y.H., Ho, T., Kaelbling, L.P. (2004) All learning is local: Multi-agent learning in global reward games, Advances in Neural Information Processing Systems 16, Vancouver, (NIPS-03).Google Scholar
  4. 4.
    Chalkiadakis, G., Boutilier, C. (2003) Coordination in Multiagent Reinforcement Learning: A Bayesian Approach, AAMAS03, Melbourne, Australia, 1418.Google Scholar
  5. 5.
    Claus, C., Boutilier, C. (1998) The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems, Department of Computer Science, University of British Columbia, Canada (American Association for Artificial Intelligence).Google Scholar
  6. 6.
    Dearden, R., Friedman, N., Russell, S. (1998) Bayesian Q-learning, Department of Computer Science, University of British Columbia, Vancouver, Canada Computer Science Division, University of California Berkeley.Google Scholar
  7. 7.
    Gadanho, S. (1999) Reinforcement Learning in Autonomous Robots: An Empirical Investigation of the Role of Emotions. Edinburgh: PhD Thesis, University of Edinburgh.Google Scholar
  8. 8.
    Ghahramani, Z. (2001) An Introduction to hidden Markov models and Bayesian networks. International Journal of Pattern Recognition and Artificial Intelligence, 15(1):9–42.CrossRefGoogle Scholar
  9. 9.
    Glorennec, P.Y. (1994) Fuzzy Q-learning and dynamical fuzzy Q-learning. Proceedings of the Third IEEE International Conference on Fuzzy Systems, IEEE Press, Piscataway, NJ, pp. 474–479.Google Scholar
  10. 10.
    Glorennec, P.Y., Jouffe, L. (1997) Fuzzy Q-Learning. Proceedings of Sixth International Conference on Fuzzy Systems, Barcelona, Spain, pp. 659–662.Google Scholar
  11. 11.
    Hearst, M.A. (1999) Trends & Controversies, Mixed-Initiative Interaction, IEEE Intelligence Systems, September/October.Google Scholar
  12. 12.
    Horvitz, E. (May, 1999) Principles of Mixed-Initiative User Interfaces. Proceedings of CHI’99, ACM SIGCHI Conference on Human Factors in Computing Systems, Pittsburgh, PA.Google Scholar
  13. 13.
    Jaakkola, T., Singh, S.P., Jordan, M.I. (1994) Reinforcement learning algorithm for partially observable markov decision problems, In Advances in Neural Information Processing Systems (NIPS), 7.Google Scholar
  14. 14.
    Jouffe, L. (1999) Fuzzy inference system learning by reinforcement methods, IEEE Transactions on Systems, Man and Cybernetics, 28:338–355.CrossRefGoogle Scholar
  15. 15.
    Kaelbling, L.P., Littman, M.L., Cassandra, A.R. (1998) Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101:99–134.MathSciNetCrossRefGoogle Scholar
  16. 16.
    Kaelbling, L.P., Littman, M.L., Moore, A.W. (1996) Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4:237–285.CrossRefGoogle Scholar
  17. 17.
    Li, Y. (2005) Hidden Markov models with states depending on observations source, Pattern Recognition Letters Archive, New York, NY: Elsevier Science Inc. 26(7): 977–984.Google Scholar
  18. 18.
    Littman, M.L., Cassandra, A.R., Kaelbling, L.P. (1995) Learning Policies for Partially Observable Environments: Scaling Up, Proceedings of the Twelfth International Conference on Machine Learning.Google Scholar
  19. 19.
    Ng, A.Y., Jordan, M.I. (2000) PEGASUS: A policy search method for large MDPs and POMDPs, Uncertainty in artificial intelligence (UAI), Proceedinjgs of the Sixteenth Conference.Google Scholar
  20. 20.
    Online Tutorial, Brown University, Department of Computer Science, POMDPs for Dummies, Subtitled: POMDPs and Their Algorithms, Sans Formula!, Scholar
  21. 21.
    Pearl, J. (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann Publishers.zbMATHGoogle Scholar
  22. 22.
    Pham, T.D. (2002). Perception-Based Hidden Markov Models: A Theoretical Framework for Data Mining and Knowledge Discovery. Soft Computing, 6: 400–405. New York: Springer-Verlag.Google Scholar
  23. 23.
    Ribeiro, C. (2002) Reinforcement learning agent. Artificial Intelligence Review 17:223–250.CrossRefGoogle Scholar
  24. 24.
    Roy, N., Pineau, J., Thrun, S. (2000) Spoken dialogue management using probabilistic reasoning, In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000), Hong Kong.Google Scholar
  25. 25.
    Russell, S.J., Norvig, P. (2003) Artificial Intelligence:AModern Approach. NJ: Pearson Education Inc.Google Scholar
  26. 26.
    Sarawagi, S., Cohen, W.W. (2004) Semi-Markov Conditional Random Fields for Information Extraction, NIPS 2004 (Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, NIPS 2004, December 13-18, 2004, Vancouver, British Columbia, Canada]).Google Scholar
  27. 27.
    Shokri, M. (2004) Adjustable Autonomy in Reinforced Image Thresholding, Report, Cs 886: Advanced Topics in Artificial Intelligence, University of Waterloo.Google Scholar
  28. 28.
    Shokri, M., Tizhoosh, H.R. (2003) Using Reinforcement Learning for Image Thresholding, Canadian Conference on Electrical and Computer Engineering, 1:1231–1234.Google Scholar
  29. 29.
    Shokri, M., Tizhoosh, H.R. (2004) Q(λ)-Based Image Thresholding, Canadian Conference on Computer and Robot Vision.Google Scholar
  30. 30.
    Smyth, P., Heckerman, D., Jordan, M. (1996) Probabilistic Independence Networks for Hidden Markov Models, Massachusetts Institute of Technology, Artificial Intelligence Laboratory and Center for Biological and Computational Learning, Department of Brain and Cognitive Science.Google Scholar
  31. 31.
    Sutton R.S., Barto, A.G. (1998) Reinforcement Learning: An Introduction, Cambridge, MA: MIT Press.zbMATHGoogle Scholar
  32. 32.
    Thacker, N.A., Lacey, A.J. (1998) Tutorial: The Kalman Filter, Imaging Science and Biomedical Engineering Division, Medical School, University of Manchester, Stopford Building, Oxford Road, Manchester, M13 9PT.Google Scholar
  33. 33.
    Tsiriga, V., Virvou, M. (2004) A Framework for the initialization of student models in Web-based intelligent tutoring systems. User Modeling and User-Adapted Interaction, 14:289–316.CrossRefGoogle Scholar
  34. 34.
    Walker, M.A. (2000) An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email, Journal of Artificial Intelligence Research (JAIR), 12:387–416.CrossRefGoogle Scholar
  35. 35.
    Watkins, C.J.H. (1989) Learning from Delayed Rewards. Cambridge: Cambridge University.Google Scholar
  36. 36.
    Watkins, C.J.H., Dayan, P. (1992) Technical note, Q-learning. Machine Learning, 8:279–292.zbMATHGoogle Scholar
  37. 37.
    Wang, G., Mahadevan, S. (1999) Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes, Proceeding of the 16th International Conference on Machine Learning (ICML’ 99), Bled, Slovenia, June 27–30. (nominated for best paper award at ICML-99).Google Scholar
  38. 38.
    Yin, P.Y. (2002) Maximum entropy-based optimal threshold selection using deterministic reinforcement learning with controlled randomization. Signal Processing 82:993–1006.CrossRefGoogle Scholar
  39. 39.
    Zhang, W., Dietterich, T.G. (1995) Value Function Approximations and Job-Shop Scheduling, Submitted to the Workshop on Value Function Approximation in Reinforcement Learning at ICML-95.Google Scholar

Copyright information

© Springer-Verlag London Limited 2007

Authors and Affiliations

  • Hamid R. Tizhoosh
    • 1
  • Maryam Shokri
    • 1
  • Mohamed Kamel
    • 2
  1. 1.Pattern Analysis and Machine Intelligence Lab Department of Systems Design EngineeringUniversity of WaterlooONTARIOCanada
  2. 2.Electrical & Computer EngineeringUniversity of WaterlooWaterlooCanada

Personalised recommendations