Efficient Supervision for Robot Learning Via Imitation, Simulation, and Adaptation

  • Markus WulfmeierEmail author
Dissertation and Habilitation Abstracts


Recent successes in machine learning have led to a shift in the design of autonomous systems, improving performance on existing tasks and rendering new applications possible. Data-focused approaches gain relevance across diverse, intricate applications when developing data collection and curation pipelines becomes more effective than manual behaviour design. The following work aims at increasing the efficiency of this pipeline in two principal ways: by utilising more powerful sources of informative data and by extracting additional information from existing data. In particular, we target three orthogonal fronts: imitation learning, domain adaptation, and transfer from simulation.


Data efficiency Transfer learning Inverse Reinforcement learning Domain adaptation Sim2Real 



The author would like to acknowledge the support of the UK EPSRC through the Doctoral Training Award, the Hans-Lenze-Foundation, the Dr-Jost-Henkel-Foundation, New College Oxford and the Department of Engineering Science.


  1. 1.
    Argall BD, Chernova S, Veloso M, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483CrossRefGoogle Scholar
  2. 2.
    Bousmalis K, Trigeorgis G, Silberman N, Krishnan D, Erhan D (2016) Domain separation networks. In: Advances in neural information processing systems. pp 343–351Google Scholar
  3. 3.
    Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V, Dogan U, Kloft M, Orabona F, Tommasi T (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17:1–35MathSciNetzbMATHGoogle Scholar
  4. 4.
    Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems. pp 2672–2680Google Scholar
  5. 5.
    Ho J, Ermon S (2016) Generative adversarial imitation learning. In: Advances in neural information processing systems. pp 4565–4573Google Scholar
  6. 6.
    Levine S, Popovic Z, Koltun V (2011) Nonlinear inverse reinforcement learning with Gaussian processes. In: Advances in neural information processing systems. pp 19–27Google Scholar
  7. 7.
    Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. CoRR, arXiv:1312.5602
  8. 8.
    Poplin R, Varadarajan AV, Blumer K, Liu Y, McConnell MV, Corrado GS, Peng L, Webster DR (2018) Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat Biomed Eng 2(3):158CrossRefGoogle Scholar
  9. 9.
    Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354CrossRefGoogle Scholar
  10. 10.
    Stadie BC, Abbeel P, Sutskever I (2017) Third-person imitation learning. In: Proceedings of the 33rd international conference on learning representations (ICLR)Google Scholar
  11. 11.
    Urmson C, Anhalt J, Bagnell D, Baker C, Bittner R, Clark MN, Dolan J, Duggins D, Galatali T, Geyer C, Gittleman M, Harbaugh S, Hebert M, Howard TM, Kolski S, Kelly A, Likhachev M, McNaughton M, Miller N, Peterson K, Pilnick B, Rajkumar R, Rybski P, Salesky B, Seo Y-W, Singh S, Snider J, Stentz A, Whittaker WR, Wolkowicki Z, Ziglar J, Bae H, Brown T, Demitrish D, Litkouhi B, Nickolaou J, Sadekar V, Zhang W, Struble J, Taylor M, Darms M, Ferguson D (2009) Autonomous driving in urban environments: boss and the urban challenge. Springer, Berlin, pp 1–59Google Scholar
  12. 12.
    Wulfmeier M (2018) Efficient supervision for robot learning via imitation, simulation, and adaptation. PhD thesis, University of OxfordGoogle Scholar
  13. 13.
    Wulfmeier M, Bewley A, Ingmar P (2017) Addressing appearance change in outdoor robotics with adversarial domain adaptation. In: Proceedings of the IEEE international conference on intelligent robots and systemsGoogle Scholar
  14. 14.
    Wulfmeier M, Bewley A, Ingmar P (2018) Incremental adversarial domain adaptation for continually changing environments. In: 2018 IEEE International conference on robotics and automation (ICRA), pp 1–9. IEEEGoogle Scholar
  15. 15.
    Wulfmeier M, Posner I, Abbeel P (2017) Mutual alignment transfer learning. In: Proceedings of the conference on robot learning (CoRL)Google Scholar
  16. 16.
    Wulfmeier M, Rao D, Wang DZ, Ondruska P, Posner I (2017) Large-scale cost function learning for path planning using deep inverse reinforcement learning. Int J Robot Res 36(10):1073–1087CrossRefGoogle Scholar
  17. 17.
    Wulfmeier M, Wang DZ, Posner I (2016) Watch this: scalable cost-function learning for path planning in urban environments. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)Google Scholar
  18. 18.
    Ziebart BD, Maas AL, Bagnell JA, Dey AK (2008) Maximum entropy inverse reinforcement learning. In: AAAI, pp 1433–1438Google Scholar

Copyright information

© Gesellschaft für Informatik e.V. and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of OxfordOxfordUK
  2. 2.DeepMindLondonUK

Personalised recommendations