A Machine Learning System for Controlling a Rescue Robot

  • Timothy Wiley
  • Ivan Bratko
  • Claude SammutEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11175)


Many rescue robots are reconfigurable, having subtracks (or flippers) that can be adjusted to help the robot traverse different types of terrain. Knowing how to adjust them requires skill on the part of the operator. If the robot is intended to run autonomously, the control system must have an understanding of how the flippers affect the robot’s interaction with the ground. We describe a system that first learns the effects of a robot’s actions and then uses this knowledge to plan how to reconfigure the robot’s tracks so that it can overcome different types of obstacles. The system is a hybrid of qualitative symbolic learning and reinforcement learning.


Machine learning Qualitative models Rescue robots 



We thank Jure Žabkar, the University of Ljubljana, for his help in using the Padé software, and Torsten Schaub and Max Ostrowski, the University of Potsdam, for their assistance with the Clingo-4 ASP solver, which are used in our symbolic planner. This research was supported by the Australian Research Council grant DP130102351 and an Australian Postgraduate Award.


  1. 1.
    Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. Int. J. Robot. Res. 29, 1608–1639 (2010)CrossRefGoogle Scholar
  2. 2.
    Brown, S., Sammut, C.: Learning tool use in robots. In: Langley, P. (ed.) Advances in Cognitive Systems: Papers from the AAAI Fall Symposium, pp. 58–65. AAAI Press, Menlo Park (2011)Google Scholar
  3. 3.
    Dietterich, T.G.: The MAXQ method for hierarchical reinforcement learning. In: 15th International Conference on Machine Learning, pp. 118–126 (1998)Google Scholar
  4. 4.
    Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T.: Answer set solving in practice. Synthesis Lectures of Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, San Rafael (2013)Google Scholar
  5. 5.
    Gebser, M., Kaufmann, B., Kaminski, R., Ostrowski, M., Schaub, T., Schneider, M.: Potassco: the Potsdam answer set solving collection. AI Commun. 24(2), 107–124 (2011)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Hengst, B.: Discovering hierarchy in reinforcement learning with HEXQ. In: 19th International Conference on Machine Learning, pp. 243–250. Morgan Kaufmann (2002)Google Scholar
  7. 7.
    Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: SPUDD: stochastic planning using decision diagrams. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 279–288. Morgan Kaufmann (1999)Google Scholar
  8. 8.
    Kuipers, B.J.: Qualitative simulation. Artif. Intell. 29(3), 289–338 (1986)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147(1–2), 5–34 (2003)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Mahadevan, S.: Learning representation and control in markov decision processes: new frontiers. Found. Trends Mach. Learn. 1(4), 403–565 (2009)CrossRefGoogle Scholar
  11. 11.
    Michie, D., Chambers, R.A.: BOXES: an experiment in adaptive control. Mach. Intell. 2(2), 137–152 (1968)zbMATHGoogle Scholar
  12. 12.
    Mihankhah, E., Kalantari, A., Aboosaeedan, E., Taghirad, H.D., Moosavian, S.A.A.: Autonomous staircase detection and stair climbing for a tracked mobile robot using fuzzy controller. In: Proceedings of the 2008 IEEE International Conference on Robotics and Biometrics, pp. 1980–1985 (2008)Google Scholar
  13. 13.
    Mourikis, A., Trawny, N., Roumeliotis, S.I., Helmick, D.M., Matthies, L.: Autonomous stair climbing for tracked vehicles 26(7), 737–758 (2007)Google Scholar
  14. 14.
    Ohno, K., Morimura, S., Tadokoro, S., Koyanagi, E., Yoshida, T.: Semi-autonomous control system of rescue crawler robot having flippers for getting over unknown-steps. In: 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3012–3018 (2007)Google Scholar
  15. 15.
    Pang, W., Coghill, G.M.: QML-Morven: a novel framework for learning qualitative differential equation models using both symbolic and evolutionary approaches. J. Comput. Sci. 5(5), 795–808 (2014)CrossRefGoogle Scholar
  16. 16.
    Potts, D., Sammut, C.: Incremental learning of linear model trees. Mach. Learn. 6(1–3), 5–48 (2005)CrossRefGoogle Scholar
  17. 17.
    Powers, M., Balch, T.: A learning approach to integration of layers of a hybrid control architecture. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009, pp. 893–898 (2009)Google Scholar
  18. 18.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Manteo (1993)Google Scholar
  19. 19.
    Ryan, M.R.K.: Using abstract models of behaviours to automatically generate reinforcement learning hierarchies. In: Sammut, C., Hoffmann, A. (eds.) Proceedings of the Nineteenth International Conference on Machine Learning, pp. 522–529. Morgan Kaufmann Publishers Inc., Sydney (2002)Google Scholar
  20. 20.
    Sammut, C., Yik, T.F.: Multistrategy learning for robot behaviours. In: Koronacki, J., Raś, Z., Wierzchon, S., Kacprzyk, J. (eds.) Advances in Machine Learning I. SCI, vol. 262, pp. 457–476. Springer, Heidelberg (2010). Scholar
  21. 21.
    Stulp, F., Buchli, J., Theodorou, E., Schaal, S.: Reinforcement learning of full-body humanoid motor skills. In: IEEE-RAS International Conference on Humanoid Robots, pp. 405–410 (2010)Google Scholar
  22. 22.
    Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction, 1st edn. MIT Press, Cambridge (1998)Google Scholar
  23. 23.
    Sutton, R.S., Precup, D., Singh, S.P.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Tseng, C.K., Li, I.H., Chien, Y.H., Chen, M.C., Wang, W.Y.: Autonomous stair detection and climbing systems for a tracked robot. In: IEEE International Conference on System Science and Engineering, pp. 201–204 (2013)Google Scholar
  25. 25.
    Vincent, I., Sun, Q.: A combined reactive and reinforcement learning controller for an autonomous tracked vehicle. Robot. Auton. Syst. 60(4), 599–608 (2012)CrossRefGoogle Scholar
  26. 26.
    Wiley, T.: A Planning and Learning Hierarchy for the Online Acquisition of Robot Behaviours. Ph.D. thesis, School of Computer Science and Engineering, University of New South Wales (2017)Google Scholar
  27. 27.
    Wiley, T., Sammut, C., Bratko, I.: Qualitative simulation with answer set programming. In: Schaub, T., Friedrich, G., O’Sullivan, B. (eds.) Proceedings of the Twenty-First European Conference on Artificial Intelligence, pp. 915–920. IOS Press, Prague, August 2014Google Scholar
  28. 28.
    Wiley, T., Sammut, C., Hengst, B., Bratko, I.: A planning and learning hierarchy using qualitative reasoning for the on-line acquisition of robotic behaviors. Adv. Cogn. Syst. 4, 93–112 (2016)Google Scholar
  29. 29.
    Žabkar, J., Mozina, M., Bratko, I., Demsar, J.: Learning qualitative models from numerical data. Artif. Intell. 175(9–10), 1604–1619 (2011)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringThe University of New South WalesSydneyAustralia
  2. 2.Faculty of Computer and Information ScienceThe University of LjubljanaLjubljanaSlovenia

Personalised recommendations