Learning a robot controller using an adaptive hierarchical fuzzy rule-based system

Abstract

The majority of machine learning techniques applied to learning a robot controller generalise over either a uniform or pre-defined representation that is selected by a human designer. The approach taken in this paper is to reduce the reliance on the human designer by adapting the representation to improve the generalisation during the learning process. An extension of a Hierarchical Fuzzy Rule-Based System (HFRBS) is proposed that identifies and refines inaccurate regions of a fuzzy controller, while interacting with the environment, for both supervised and reinforcement learning problems. The paper shows that a controller using an adaptive HFRBS can learn a suitable control policy using a fewer number of fuzzy rules for both a supervised and reinforcement learning problem and is not sensitive to the layout as with a uniform representation. In supervised learning problems, a small number of extra trials are required to find an effective representation but for reinforcement learning problems, the process of adapting the representation is shown to significantly reduce the time taken to learn a suitable control policy and hence open the door to high-dimensional problems.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

References

  1. Alcala R, Casillas J, Cordón O, Herrera F (2001) Building fuzzy graphs: features and taxonomy of learning for non-grid-oriented fuzzy rule-based systems. J Intell Fuzzy Syst 11:99–119

    Google Scholar 

  2. Assilian S, Mamdani EH (1974) An experiment in linguistic synthesis with a fuzzy logic controller. Int J Man-Mach Stud 7(1):1–13

    MATH  Google Scholar 

  3. Bastian A (1994) How to handle the flexibility of linguistic variables with applications. Int J Uncertain Fuziness Knowl-Based Syst 2(4):463–484

    MathSciNet  Article  MATH  Google Scholar 

  4. Bellman R (1957) Dynamic programming. Princeton University Press, Princeton

    MATH  Google Scholar 

  5. Bellman R (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton

    Book  MATH  Google Scholar 

  6. Bouchon-Meunier B, Marsala C (1999) Learning fuzzy decision rules. In: The Handbooks of fuzzy sets series, vol 4. Kluwer Academic Publishers, pp 279–304

  7. Boyan J, Moore A (1995) Generalization in reinforcement learning: safely approximating the value function. Adv Neural Inf Process Syst (NIPS) 7:369–376

    Google Scholar 

  8. Bungartz HJ, Griebel M (2004) Sparse grids. Acta Numer 13:1–123

    MathSciNet  Article  MATH  Google Scholar 

  9. Cara A, Pomares H, Rojas I, Lendek Z, Babuka R (2010) Online self-evolving fuzzy controller with global learning capabilities. Evol Syst 1(4):225–239. doi:10.1007/s12530-010-9016-8

    Article  Google Scholar 

  10. Carmona P, Castro J, Zurita J (2004) Strategies to identify fuzzy rules directly from certainty degrees: a comparison and a proposal. Fuzzy Syst IEEE Trans 12:631–640

    Article  Google Scholar 

  11. Carse B, Fogarty T, Munro A (1996) Evolutionary learning of fuzzy rule based controllers using genetic algorithms. Fuzzy Sets Syst 80:273–293

    Article  Google Scholar 

  12. Chen G, Pham TT (2001) Introduction to fuzzy sets, fuzzy logic and fuzzy control systems. CRC Press LLC, Boca Raton

  13. Cheong F, Lai R (2003) Constrained optimization of genetic fuzzy systems. In: Casillas J, Cordón O, Herrera F, Magdalena L (eds) Accuracy improvements in linguistic fuzzy modeling. Studies in fuzziness and soft computing, vol 129, chap 2. Springer, Berlin, pp 46–71

  14. Chung CC, Hauser J (1995) Nonlinear control of a swinging pendulum. Automatica 31(6):851–862. doi:10.1016/0005-1098(94)00148-C

    MathSciNet  Article  MATH  Google Scholar 

  15. Cordón O, Herrera F, Peregrín A (1997) Applicability of the fuzzy operators in the design of fuzzy logic controllers. Fuzzy Sets Syst 86(1):15–41

    Article  MATH  Google Scholar 

  16. Cordón O, Herrera F, Zwir I (2001) Fuzzy modeling by hierarchical built fuzzy rule bases. Int J Approx Reason 27:61–93

    Article  MATH  Google Scholar 

  17. Cordón O, Herrera F, Zwir I (2003) A hierarchical knowledge-based environment for linguistic modeling: models and iterative methodology. Fuzzy Sets Syst 138(2):307–341

    MathSciNet  Article  Google Scholar 

  18. Cory RE (2010) Supermaneuverable perching. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA

  19. Cupertino F, Giordano V, Naso D, Delfine L (2006) Fuzzy control of a mobile robot. IEEE Robotics Autom Mag 13(4):74–81

    Article  Google Scholar 

  20. Dai K, Kammoun H, Alimi A (2012) H-FQL: a new reinforcement learning method for automatic hierarchization of fuzzy systems: an application to the route choice problem. In: Intelligent systems (IS), 2012 6th IEEE international conference, pp 54–59

  21. Doya K (2000) Reinforcement learning in continuous time and space. Neural Comput 12:219–245

    Article  Google Scholar 

  22. Fahlman SE, Lebiere C (1990) The Cascade-Correlation Learning Architecture. In: Touretzky DS (ed) Advances in Neural Information Processing Systems 2. Morgan, Kaufmann, pp 524–532

  23. Farahmand AM, Munos R, Szepesvári C (2010) Error propagation for approximate policy and value iteration. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in Neural Information Processing Systems. Curran Associates, Inc., pp 568–576. http://papers.nips.cc/paper/4181-error-propagation-for-approximate-policy-and-value-iteration.pdf

  24. Gaskett C (2002) Q-learning for robot control. PhD thesis, Research School of Information Sciences and Engineering, ANU

  25. Glorennec PY, Jouffe L (1997) Fuzzy q-learning. In: Proceedings of Fuzz-IEEE 1997, 6th international conference on fuzzy systems. Barcelona, pp 659–662

  26. Guanloa R, Musilek P, Ahmed F, Kaboli A (2004) Fuzzy situation based navigation of autonomous mobile robot using reinforcement learning. In: Proceedings of North American fuzzy information processing systems (NAFIPS), pp 820–825

  27. Hagras H, Callaghan V, Colley M (2001) Outdoor mobile robot learning and adapation. IEEE Robotics Autom Mag 8(3):53–69

    Article  Google Scholar 

  28. Hellendoorn H, Thomas C (1993) Defuzzification in fuzzy controllers. J Intell Fuzzy Syst 1:109–123

    Google Scholar 

  29. Holve R (1997) Rule generation for hierarchical fuzzy systems. In: North American fuzzy information processing Societ—NAFIPS, pp 444–449. doi:10.1109/NAFIPS.1997.624082

  30. Holve R (1998) Automatic input space partitioning for hierarchical fuzzy systems. In: Fuzzy information processing society—NAFIPs, 1998 conference of the North American. Pensacola Beach, pp 266–270

  31. Holve R (1998b) Investigation of automatic rule generation for hierarchical fuzzy systems. In: Fuzzy systems proceesings, 1998, IEEE world congress on computational intelligence, vol 2. Anchorage, pp 973–978

  32. Ishibuchi H, Nozaki K, Tanaka H (1992) Distributed representation of fuzzy rules and its application to pattern classification. Fuzzy Sets Syst 52(1):21–32

    Article  Google Scholar 

  33. Kaelbling LP, Littman ML, Moore A (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

    Google Scholar 

  34. Kosko B (1992) Neural Networks and Fuzzy Systems: A Dynamical systems approach to machine intelligence. Prentice-Hall, Inc., Upper Saddle River, NJ, USA

  35. Kuhlmann G, Stone P (2003) Progress in learning 3 vs. 2 keepaway. In: Systems, man and cybernetics, 2003 IEEE international conference on, vol 1, pp 52–59

  36. Mitchell TM (1997) Machine learning. MIT Press, Cambridge

    MATH  Google Scholar 

  37. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. In: NIPS deep learning workshop

  38. Munos R (1998) A general convergence method for reinforcement learning in the continuous case. In: 10th European conference on machine learning, pp 394–405

  39. Munos R (2010) Approximate dynamic programming. In: Sigaud O, Buffet O (eds) Markov Decision Processes in Artificial Intelligence. ISTE Ltd and Wiley, chap 3, pp 67–98

  40. Munos R (2000) A study of reinforcement learning in the continuous case by means of viscosity solutions. Mach Learn J 40:265–299

  41. Munos R, Moore A (2002) Variable resolution discretization in optimal control. Mach Learn 49:291–323

    Article  MATH  Google Scholar 

  42. Nozaki K, Ishibuchi H, Tanaka H (1997) A simple but powerful heuristic method for generating fuzzy rules from numerical data. Fuzzy Sets Syst 86:251–270

    Article  Google Scholar 

  43. Pan L, bin Tong Y (2009) Research of reinforcement learning control of intelligent robot based on fuzzy-cmac network. In: Computer network and multimedia technology, 2009. CNMT 2009. International symposium on, pp 1–4. doi:10.1109/CNMT.2009.5374686

  44. Passino KM, Yurkovich S (1998) Fuzzy control. Addison Wesley Longman, Menlo Park

    MATH  Google Scholar 

  45. Ritthipravat P, Maneewarn T, Laowattana D, Wyatt J (2004) A modified approach to fuzzy q learning for mobile robots. In: Systems, man and cybernetics, 2004 IEEE international conference, vol 3, pp 2350–2356

  46. Saffiotti A, Ruspini E, Konolige K (1999) Using fuzzy logic for mobile robot control. In: Handbooks of fuzzy sets, vol 6, chap 5. Kluwer Academic, MA, pp 185–205

  47. Santamaria JC, Sutton R, Ram A (1998) Experiments with reinforcement learning in problems with continuous state and action spaces. Adap Behav 6(2):163–218

    Article  Google Scholar 

  48. Schaal S, Atkeson CG, Vijayakumar S (2000) Real-time robot learning with locally weighted statistical learning. In: International conference on robotics and automation (irca2000), p 1280

  49. Shi Z, Tu J, Li Y, Wang Z (2013) Adaptive reinforcement q-learning algorithm for swarm-robot system using pheromone mechanism. In: Robotics and biomimetics (ROBIO), 2013 IEEE international conference on, pp 952–957. doi:10.1109/ROBIO.2013.6739586

  50. Smart WD, Kaelbling LP (2002) Reinforcement learning for robot control. In: Proceedings—SPIE the international society for optical engineering, SPIE, vol 4573, pp 92–103

  51. Spong M (1994) Swing up control of the acrobot. IEEE international conference on robotics and automation. San Diego, CA, pp 2356–2361

    Google Scholar 

  52. Spong M (1995) The swingup control problem for the acrobot. IEEE Control Syst Mag 15(1):49–55

    Article  Google Scholar 

  53. Sudkamp T, Hammell RJ (1994) Interpolation, completion and learning fuzzy rules. Syst Man Cybern IEEE Trans 24:332–342

    Article  Google Scholar 

  54. Sutton RS (1996) Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems, vol 8. The MIT Press, Cambridge, pp 1038–1044

  55. Sutton R, Singh S (1996) Reinforcement learning with replacing eligibility traces. Mach Learn 22:123–158

    MATH  Google Scholar 

  56. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. The MIT Press, Cambridge

    Google Scholar 

  57. Takeda M, Nakamura T, Ogasawara T (2001) Continuous values q-learning method able to incrementally refine state space. In: Proceedings of the 2001 IEEE/RSJ international conference on intelligent robots and systems, vol 1. Maui, pp 265–271

  58. Thongchai S (2002) Behavior-based learning fuzzy rules for mobile robots. In: Proceedings of the American control conference vol 2, pp 995–1000

  59. Thrun S, Schwartz A (1993) Issues in using function approximation for reinforcement learning. In: Mozer M, Smolensky P, Touretzky DS, Elman J, Weigend A (eds) Proceedings of the connectionist models summer school. Hillsdale, pp 255–263

  60. Wang L, Mendel J (1992) Generating fuzzy rules by learning from examples. IEEE Trans Syst Man Cybern 22(6):1414–1427

    MathSciNet  Article  Google Scholar 

  61. Watkins C (1989) Learning with delayed rewards. PhD thesis, University of Cambridge, England

  62. Watkins C, Dayan P (1992) Q learning. Mach Learn 8(3/4):279–292

    Article  MATH  Google Scholar 

  63. Zadeh LA (1988) Fuzzy logic. Computer 21(4):83–93

    Article  Google Scholar 

Download references

Acknowledgments

This research was carried out in collaboration with BAE SYSTEMS, UK.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Antony Waldock.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Waldock, A., Carse, B. Learning a robot controller using an adaptive hierarchical fuzzy rule-based system. Soft Comput 20, 2855–2881 (2016). https://doi.org/10.1007/s00500-015-1688-3

Download citation

Keywords

  • Fuzzy systems
  • Reinforcement learning
  • Robotics