Soft Computing

, Volume 20, Issue 7, pp 2855–2881 | Cite as

Learning a robot controller using an adaptive hierarchical fuzzy rule-based system

  • Antony Waldock
  • Brian Carse
Methodologies and Application


The majority of machine learning techniques applied to learning a robot controller generalise over either a uniform or pre-defined representation that is selected by a human designer. The approach taken in this paper is to reduce the reliance on the human designer by adapting the representation to improve the generalisation during the learning process. An extension of a Hierarchical Fuzzy Rule-Based System (HFRBS) is proposed that identifies and refines inaccurate regions of a fuzzy controller, while interacting with the environment, for both supervised and reinforcement learning problems. The paper shows that a controller using an adaptive HFRBS can learn a suitable control policy using a fewer number of fuzzy rules for both a supervised and reinforcement learning problem and is not sensitive to the layout as with a uniform representation. In supervised learning problems, a small number of extra trials are required to find an effective representation but for reinforcement learning problems, the process of adapting the representation is shown to significantly reduce the time taken to learn a suitable control policy and hence open the door to high-dimensional problems.


Fuzzy systems Reinforcement learning Robotics 



This research was carried out in collaboration with BAE SYSTEMS, UK.


  1. Alcala R, Casillas J, Cordón O, Herrera F (2001) Building fuzzy graphs: features and taxonomy of learning for non-grid-oriented fuzzy rule-based systems. J Intell Fuzzy Syst 11:99–119Google Scholar
  2. Assilian S, Mamdani EH (1974) An experiment in linguistic synthesis with a fuzzy logic controller. Int J Man-Mach Stud 7(1):1–13zbMATHGoogle Scholar
  3. Bastian A (1994) How to handle the flexibility of linguistic variables with applications. Int J Uncertain Fuziness Knowl-Based Syst 2(4):463–484MathSciNetCrossRefzbMATHGoogle Scholar
  4. Bellman R (1957) Dynamic programming. Princeton University Press, PrincetonzbMATHGoogle Scholar
  5. Bellman R (1961) Adaptive control processes: a guided tour. Princeton University Press, PrincetonCrossRefzbMATHGoogle Scholar
  6. Bouchon-Meunier B, Marsala C (1999) Learning fuzzy decision rules. In: The Handbooks of fuzzy sets series, vol 4. Kluwer Academic Publishers, pp 279–304Google Scholar
  7. Boyan J, Moore A (1995) Generalization in reinforcement learning: safely approximating the value function. Adv Neural Inf Process Syst (NIPS) 7:369–376Google Scholar
  8. Bungartz HJ, Griebel M (2004) Sparse grids. Acta Numer 13:1–123MathSciNetCrossRefzbMATHGoogle Scholar
  9. Cara A, Pomares H, Rojas I, Lendek Z, Babuka R (2010) Online self-evolving fuzzy controller with global learning capabilities. Evol Syst 1(4):225–239. doi: 10.1007/s12530-010-9016-8 CrossRefGoogle Scholar
  10. Carmona P, Castro J, Zurita J (2004) Strategies to identify fuzzy rules directly from certainty degrees: a comparison and a proposal. Fuzzy Syst IEEE Trans 12:631–640CrossRefGoogle Scholar
  11. Carse B, Fogarty T, Munro A (1996) Evolutionary learning of fuzzy rule based controllers using genetic algorithms. Fuzzy Sets Syst 80:273–293CrossRefGoogle Scholar
  12. Chen G, Pham TT (2001) Introduction to fuzzy sets, fuzzy logic and fuzzy control systems. CRC Press LLC, Boca RatonGoogle Scholar
  13. Cheong F, Lai R (2003) Constrained optimization of genetic fuzzy systems. In: Casillas J, Cordón O, Herrera F, Magdalena L (eds) Accuracy improvements in linguistic fuzzy modeling. Studies in fuzziness and soft computing, vol 129, chap 2. Springer, Berlin, pp 46–71Google Scholar
  14. Chung CC, Hauser J (1995) Nonlinear control of a swinging pendulum. Automatica 31(6):851–862. doi: 10.1016/0005-1098(94)00148-C MathSciNetCrossRefzbMATHGoogle Scholar
  15. Cordón O, Herrera F, Peregrín A (1997) Applicability of the fuzzy operators in the design of fuzzy logic controllers. Fuzzy Sets Syst 86(1):15–41CrossRefzbMATHGoogle Scholar
  16. Cordón O, Herrera F, Zwir I (2001) Fuzzy modeling by hierarchical built fuzzy rule bases. Int J Approx Reason 27:61–93CrossRefzbMATHGoogle Scholar
  17. Cordón O, Herrera F, Zwir I (2003) A hierarchical knowledge-based environment for linguistic modeling: models and iterative methodology. Fuzzy Sets Syst 138(2):307–341MathSciNetCrossRefGoogle Scholar
  18. Cory RE (2010) Supermaneuverable perching. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USAGoogle Scholar
  19. Cupertino F, Giordano V, Naso D, Delfine L (2006) Fuzzy control of a mobile robot. IEEE Robotics Autom Mag 13(4):74–81CrossRefGoogle Scholar
  20. Dai K, Kammoun H, Alimi A (2012) H-FQL: a new reinforcement learning method for automatic hierarchization of fuzzy systems: an application to the route choice problem. In: Intelligent systems (IS), 2012 6th IEEE international conference, pp 54–59Google Scholar
  21. Doya K (2000) Reinforcement learning in continuous time and space. Neural Comput 12:219–245CrossRefGoogle Scholar
  22. Fahlman SE, Lebiere C (1990) The Cascade-Correlation Learning Architecture. In: Touretzky DS (ed) Advances in Neural Information Processing Systems 2. Morgan, Kaufmann, pp 524–532Google Scholar
  23. Farahmand AM, Munos R, Szepesvári C (2010) Error propagation for approximate policy and value iteration. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in Neural Information Processing Systems. Curran Associates, Inc., pp 568–576.
  24. Gaskett C (2002) Q-learning for robot control. PhD thesis, Research School of Information Sciences and Engineering, ANUGoogle Scholar
  25. Glorennec PY, Jouffe L (1997) Fuzzy q-learning. In: Proceedings of Fuzz-IEEE 1997, 6th international conference on fuzzy systems. Barcelona, pp 659–662Google Scholar
  26. Guanloa R, Musilek P, Ahmed F, Kaboli A (2004) Fuzzy situation based navigation of autonomous mobile robot using reinforcement learning. In: Proceedings of North American fuzzy information processing systems (NAFIPS), pp 820–825Google Scholar
  27. Hagras H, Callaghan V, Colley M (2001) Outdoor mobile robot learning and adapation. IEEE Robotics Autom Mag 8(3):53–69CrossRefGoogle Scholar
  28. Hellendoorn H, Thomas C (1993) Defuzzification in fuzzy controllers. J Intell Fuzzy Syst 1:109–123Google Scholar
  29. Holve R (1997) Rule generation for hierarchical fuzzy systems. In: North American fuzzy information processing Societ—NAFIPS, pp 444–449. doi: 10.1109/NAFIPS.1997.624082
  30. Holve R (1998) Automatic input space partitioning for hierarchical fuzzy systems. In: Fuzzy information processing society—NAFIPs, 1998 conference of the North American. Pensacola Beach, pp 266–270Google Scholar
  31. Holve R (1998b) Investigation of automatic rule generation for hierarchical fuzzy systems. In: Fuzzy systems proceesings, 1998, IEEE world congress on computational intelligence, vol 2. Anchorage, pp 973–978Google Scholar
  32. Ishibuchi H, Nozaki K, Tanaka H (1992) Distributed representation of fuzzy rules and its application to pattern classification. Fuzzy Sets Syst 52(1):21–32CrossRefGoogle Scholar
  33. Kaelbling LP, Littman ML, Moore A (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285Google Scholar
  34. Kosko B (1992) Neural Networks and Fuzzy Systems: A Dynamical systems approach to machine intelligence. Prentice-Hall, Inc., Upper Saddle River, NJ, USAGoogle Scholar
  35. Kuhlmann G, Stone P (2003) Progress in learning 3 vs. 2 keepaway. In: Systems, man and cybernetics, 2003 IEEE international conference on, vol 1, pp 52–59Google Scholar
  36. Mitchell TM (1997) Machine learning. MIT Press, CambridgezbMATHGoogle Scholar
  37. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. In: NIPS deep learning workshopGoogle Scholar
  38. Munos R (1998) A general convergence method for reinforcement learning in the continuous case. In: 10th European conference on machine learning, pp 394–405Google Scholar
  39. Munos R (2010) Approximate dynamic programming. In: Sigaud O, Buffet O (eds) Markov Decision Processes in Artificial Intelligence. ISTE Ltd and Wiley, chap 3, pp 67–98Google Scholar
  40. Munos R (2000) A study of reinforcement learning in the continuous case by means of viscosity solutions. Mach Learn J 40:265–299Google Scholar
  41. Munos R, Moore A (2002) Variable resolution discretization in optimal control. Mach Learn 49:291–323CrossRefzbMATHGoogle Scholar
  42. Nozaki K, Ishibuchi H, Tanaka H (1997) A simple but powerful heuristic method for generating fuzzy rules from numerical data. Fuzzy Sets Syst 86:251–270CrossRefGoogle Scholar
  43. Pan L, bin Tong Y (2009) Research of reinforcement learning control of intelligent robot based on fuzzy-cmac network. In: Computer network and multimedia technology, 2009. CNMT 2009. International symposium on, pp 1–4. doi: 10.1109/CNMT.2009.5374686
  44. Passino KM, Yurkovich S (1998) Fuzzy control. Addison Wesley Longman, Menlo ParkzbMATHGoogle Scholar
  45. Ritthipravat P, Maneewarn T, Laowattana D, Wyatt J (2004) A modified approach to fuzzy q learning for mobile robots. In: Systems, man and cybernetics, 2004 IEEE international conference, vol 3, pp 2350–2356Google Scholar
  46. Saffiotti A, Ruspini E, Konolige K (1999) Using fuzzy logic for mobile robot control. In: Handbooks of fuzzy sets, vol 6, chap 5. Kluwer Academic, MA, pp 185–205Google Scholar
  47. Santamaria JC, Sutton R, Ram A (1998) Experiments with reinforcement learning in problems with continuous state and action spaces. Adap Behav 6(2):163–218CrossRefGoogle Scholar
  48. Schaal S, Atkeson CG, Vijayakumar S (2000) Real-time robot learning with locally weighted statistical learning. In: International conference on robotics and automation (irca2000), p 1280Google Scholar
  49. Shi Z, Tu J, Li Y, Wang Z (2013) Adaptive reinforcement q-learning algorithm for swarm-robot system using pheromone mechanism. In: Robotics and biomimetics (ROBIO), 2013 IEEE international conference on, pp 952–957. doi: 10.1109/ROBIO.2013.6739586
  50. Smart WD, Kaelbling LP (2002) Reinforcement learning for robot control. In: Proceedings—SPIE the international society for optical engineering, SPIE, vol 4573, pp 92–103Google Scholar
  51. Spong M (1994) Swing up control of the acrobot. IEEE international conference on robotics and automation. San Diego, CA, pp 2356–2361Google Scholar
  52. Spong M (1995) The swingup control problem for the acrobot. IEEE Control Syst Mag 15(1):49–55CrossRefGoogle Scholar
  53. Sudkamp T, Hammell RJ (1994) Interpolation, completion and learning fuzzy rules. Syst Man Cybern IEEE Trans 24:332–342CrossRefGoogle Scholar
  54. Sutton RS (1996) Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems, vol 8. The MIT Press, Cambridge, pp 1038–1044Google Scholar
  55. Sutton R, Singh S (1996) Reinforcement learning with replacing eligibility traces. Mach Learn 22:123–158zbMATHGoogle Scholar
  56. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. The MIT Press, CambridgeGoogle Scholar
  57. Takeda M, Nakamura T, Ogasawara T (2001) Continuous values q-learning method able to incrementally refine state space. In: Proceedings of the 2001 IEEE/RSJ international conference on intelligent robots and systems, vol 1. Maui, pp 265–271Google Scholar
  58. Thongchai S (2002) Behavior-based learning fuzzy rules for mobile robots. In: Proceedings of the American control conference vol 2, pp 995–1000Google Scholar
  59. Thrun S, Schwartz A (1993) Issues in using function approximation for reinforcement learning. In: Mozer M, Smolensky P, Touretzky DS, Elman J, Weigend A (eds) Proceedings of the connectionist models summer school. Hillsdale, pp 255–263Google Scholar
  60. Wang L, Mendel J (1992) Generating fuzzy rules by learning from examples. IEEE Trans Syst Man Cybern 22(6):1414–1427MathSciNetCrossRefGoogle Scholar
  61. Watkins C (1989) Learning with delayed rewards. PhD thesis, University of Cambridge, EnglandGoogle Scholar
  62. Watkins C, Dayan P (1992) Q learning. Mach Learn 8(3/4):279–292CrossRefzbMATHGoogle Scholar
  63. Zadeh LA (1988) Fuzzy logic. Computer 21(4):83–93CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.BAE Systems Advanced Technology CentreChelmsfordUK
  2. 2.Bristol Robotics Lab at the University of the West of EnglandBristolUK

Personalised recommendations