Machine Learning

, Volume 19, Issue 3, pp 209–240 | Cite as

Alecsys and the AutonoMouse: Learning to control a real robot by distributed classifier systems

  • Marco Dorigo
Article

Abstract

In this article we investigate the feasibility of using learning classifier systems as a tool for building adaptive control systems for real robots. Their use on real robots imposes efficiency constraints which are addressed by three main tools: parallelism, distributed architecture, and training. Parallelism is useful to speed up computation and to increase the flexibility of the learning system design. Distributed architecture helps in making it possible to decompose the overall task into a set of simpler learning tasks. Finally, training provides guidance to the system while learning, shortening the number of cycles required to learn. These tools and the issues they raise are first studied in simulation, and then the experience gained with simulations is used to implement the learning system on the real robot. Results have shown that with this approach it is possible to let the AutonoMouse, a small real robot, learn to approach a light source under a number of different noise and lesion conditions.

Keywords

learning classifier systems reinforcement learning genetic algorithms animat problem 

References

  1. Barto, A.G., Sutton, R.S., & Anderson, C.W. (1983). Neuronlike elements that can solve difficult learning control problems,IEEE Transactions on Systems, Man and Cybernetics, 13:834–846.Google Scholar
  2. Booker, L.B. (1988). Classifier systems that learn internal world models,Machine Learning, 3(3):161–192.Google Scholar
  3. Booker, L.B., Goldberg, D.E., & Holland, J.H. (1989). Classifier systems and genetic algorithms,Artificial Intelligence, 40(2):235–282.Google Scholar
  4. Brooks, R.A. (1990). Elephants don't play chess,Robotics and Autonomous Systems, 6(1–2):3–16.Google Scholar
  5. Brooks, R.A. (1991a). Intelligence without representation,Artificial Intelligence, 47(1–3):139–159.Google Scholar
  6. Brooks, R.A. (1991b). Artificial life and real robots,Proceedings of the First European Conference on Artificial Life (pp. 3–10), Paris, MIT Press/Bradford Books.Google Scholar
  7. Camilli, A., Di Meglio, R., Baiardi, F., Vanneschi, M., Montanari, D., & Serra, R. (1990). Classifier system parallelization on MIMD architectures. Tech. Rep. 3/17 CNR, Italy.Google Scholar
  8. Colombetti, M. (1994). Adaptive agents: Steps to an ethology of the artificial, In F. Masulli, P. Morasso and A. Schenone (Eds.),Neural Networks in Biomedicince, World Scientific, Singapore, 391–403.Google Scholar
  9. Colombetti, M., & Dorigo, M. (1994). Training agents to perform sequential behavior,Adaptive Behavior, MIT Press, 2(3):247–275.Google Scholar
  10. Compiani, M., Montanari, D., Serra, R., & Valastro, G. (1989). Classifier systems and neural networks, in E.R. Caianiello (Ed.),Parallel architectures and neural networks, World Scientific.Google Scholar
  11. Dorigo, M. (1992). Using transputer to increase speed and flexibility of genetics-based machine learning systems,Microprocessing and Microprogramming, Euromicro Journal, North Holland, 34:147–152.Google Scholar
  12. Dorigo, M. (1993). Genetic and non-genetic operators inAlecsys,Evolutionary Computation, MIT Press, 1(2):151–164.Google Scholar
  13. Dorigo, M., & Bersini, H. (1994). A Comparison of Q-Learning and Classifier Systems.Proceedings of From Animals to Animats, Third International Conference on Simulation of Adaptive Behavior (SAB94) (pp. 248–255), Brighton, UK, MIT Press.Google Scholar
  14. Dorigo, M. & Colombetti, M. (1994a). Robot shaping: Developing autonomous agents through learning,Artificial Intelligence, 71(2):321–370.Google Scholar
  15. Dorigo, M., & Colombetti, M. (1994b). The role of the trainer in reinforcement learning. Proceedings of MLCCOLT '94 Workshop on Robot Learning, S. Mahadevan, et al. (Eds.), July 10, 1994, New Brunswick, NJ, pp. 37–45.Google Scholar
  16. Dorigo, M., & Schnepf, U. (1993). Genetics-based machine learning and behavior based robotics: A new synthesis,IEEE Transactions on Systems, Man, and Cybernetics, 23(1):141–154.Google Scholar
  17. Dorigo, M., & Sirtori, E. (1991a). A learning environment for robots,Proceedings of GAA91-Second Italian Workshop on Machine Learning, Bari-Italy.Google Scholar
  18. Dorigo, M. & Sirtori, E. (1991b).Alecsys: A parallel laboratory for learning classifier systems,Proceedings of the Fourth International Conference on Genetic Algorithms (pp. 296–302), San Diego-CA, Morgan Kaufmann, UCSD.Google Scholar
  19. Flynn, M.J. (1972). Some computer organizations and their effectiveness,IEEE Transaction on Computers, C-21(9):948–960.Google Scholar
  20. Grefenstette, J.J., Ramsey, C.L., & Schultz, A.C. (1990). Learning sequential decision rules using simulation models and competition,Machine Learning, 5(4):355–381.Google Scholar
  21. Grefenstette, J.J. (1991). Lamarckian learning in multi-agent environments,Proceedings of the Fourth International Conference on Genetic Algorithms (pp. 303–310), San Diego-CA, Morgan Kaufmann, UCSD.Google Scholar
  22. Holland, J.H. (1975).Adaptation in natural and artificial systems, Ann Arbor: The University of Michigan Press.Google Scholar
  23. Holland, J.H. (1980). Adaptive algorithms for discovering and using general patterns in growing knowledge bases,International Journal of Policy Analysis and Information Systems, 4(2):217–240.Google Scholar
  24. Holland, J.H., & Reitman, J.S. (1978).Cognitive systems bases on adaptive algorithms, Academic Press.Google Scholar
  25. Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching,Machine Learning, 8(3–4):293–322.Google Scholar
  26. Lin, L.-J. (1993a). Hierarchical learning of robot skills by reinforcement,Proceedings of the 1993 IEEE International Conference on Neural Networks (pp. 181–186), IEEE.Google Scholar
  27. Lin, L.-J. (1993b). Scaling up reinforcement learning for robot control,Proceedings of the Tenth International Conference on Machine Learning (pp. 182–189), Morgan Kaufmann.Google Scholar
  28. Mahadevan, S., & Connell, J. (1992). Automatic programming of behavior-based robots using reinforcement learning,Artificial Intelligence, 55(2):311–365.Google Scholar
  29. Riolo, R.L. (1989). The emergence of default hierarchies in learning classifier systems, in J.D. Schaffer (Ed.),Proceedings of the Third International Conference on Genetic Algorithms (pp. 322–327), Morgan Kaufmann.Google Scholar
  30. Robertson, G.G. (1987). Parallel implementation of genetic algorithms in a classifier system.Proceedings of the Second International Conference on Genetic Algorithms (pp. 140–147), Lawrence Erlbaum, MIT-Cambridge-MA.Google Scholar
  31. Sutton, R.S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,Proceedings of the Seventh International Conference on Machine Learning (pp. 216–224), Morgan Kaufmann, Palo Alto, CA.Google Scholar
  32. Watkins, C.J.C.H. (1989).Learning with delayed rewards, Ph.D. dissertation, Psychology Department, University of Cambridge, England.Google Scholar
  33. Williams, R.J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning,Machine Learning, 8(3–4):229–256.Google Scholar
  34. Wilson, S. (1987). Classifier systems and the Animat problem,Machine Learning, 2(3):199–228.Google Scholar
  35. Zhou, H.H. (1990). CSM: A computational model of cumulative learning,Machine Learning, 5(4):383–406.Google Scholar

Copyright information

© Kluwer Academic Publishers 1995

Authors and Affiliations

  • Marco Dorigo
    • 1
  1. 1.Progetto di Intelligenza Artificiale e Robotica, Dipartimento di Elettronica e InformazionePolitecnico di MilanoMilanoItaly

Personalised recommendations