Advertisement

Artificial Intelligence for Prosthetics: Challenge Solutions

  • Łukasz KidzińskiEmail author
  • Carmichael Ong
  • Sharada Prasanna Mohanty
  • Jennifer Hicks
  • Sean Carroll
  • Bo Zhou
  • Hongsheng Zeng
  • Fan Wang
  • Rongzhong Lian
  • Hao Tian
  • Wojciech Jaśkowski
  • Garrett Andersen
  • Odd Rune Lykkebø
  • Nihat Engin Toklu
  • Pranav Shyam
  • Rupesh Kumar Srivastava
  • Sergey Kolesnikov
  • Oleksii Hrinchuk
  • Anton Pechenko
  • Mattias Ljungström
  • Zhen Wang
  • Xu Hu
  • Zehong Hu
  • Minghui Qiu
  • Jun Huang
  • Aleksei Shpilman
  • Ivan Sosin
  • Oleg Svidchenko
  • Aleksandra Malysheva
  • Daniel Kudenko
  • Lance Rane
  • Aditya Bhatt
  • Zhengfei Wang
  • Penghui Qi
  • Zeyang Yu
  • Peng Peng
  • Quan Yuan
  • Wenxin Li
  • Yunsheng Tian
  • Ruihan Yang
  • Pingchuan Ma
  • Shauharda Khadka
  • Somdeb Majumdar
  • Zach Dwiel
  • Yinyin Liu
  • Evren Tumer
  • Jeremy Watson
  • Marcel Salathé
  • Sergey Levine
  • Scott Delp
Conference paper
Part of the The Springer Series on Challenges in Machine Learning book series (SSCML)

Abstract

In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants described their algorithms in this paper. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each team implemented different modifications of the known algorithms by, for example, dividing the task into subtasks, learning low-level control, or by incorporating expert knowledge and using imitation learning.

References

  1. 1.
    Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., Zaremba, W.: Hindsight experience replay. In: NIPS (2017)Google Scholar
  2. 2.
    authors, A.: Recurrent experience replay in distributed reinforcement learning. https://openreview.net/pdf?id=r1lyTjAqYX (2018)
  3. 3.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)Google Scholar
  4. 4.
    Barth-Maron, G., Hoffman, M.W., Budden, D., Dabney, W., Horgan, D., Muldal, A., Heess, N., Lillicrap, T.: Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617 (2018)Google Scholar
  5. 5.
    Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. arXiv preprint arXiv:1707.06887 (2017)Google Scholar
  6. 6.
    Bellman, R.E.: Adaptive control processes: a guided tour. Princeton University Press (1961)Google Scholar
  7. 7.
    Bhatt, A., Argus, M., Amiranashvili, A., Brox, T.: Crossnorm: Normalization for off-policy td reinforcement learning. arXiv preprint arXiv:1902.05605 (2019)Google Scholar
  8. 8.
    Crowninshield, R.D., Brand, R.A.: A physiologically based criterion of muscle force prediction in locomotion. Journal of Biomechanics 14(11), 793–801 (1981)CrossRefGoogle Scholar
  9. 9.
    Dabney, W., Rowland, M., Bellemare, M.G., Munos, R.: Distributional reinforcement learning with quantile regression. arXiv preprint arXiv:1710.10044 (2017)Google Scholar
  10. 10.
    Delp, S.L., Anderson, F.C., Arnold, A.S., Loan, P., Habib, A., John, C.T., Guendelman, E., Thelen, D.G.: Opensim: open-source software to create and analyze dynamic simulations of movement. IEEE transactions on biomedical engineering 54(11), 1940–1950 (2007)CrossRefGoogle Scholar
  11. 11.
    Dhariwal, P., Hesse, C., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y.: OpenAI Baselines. https://github.com/openai/baselines (2017)
  12. 12.
    Dietterich, T.G., et al.: Ensemble methods in machine learning. Multiple classifier systems 1857, 1–15 (2000)CrossRefGoogle Scholar
  13. 13.
    Farris, D.J., Hicks, J.L., Delp, S.L., Sawicki, G.S.: Musculoskeletal modelling deconstructs the paradoxical effects of elastic ankle exoskeletons on plantar-flexor mechanics and energetics during hopping. Journal of Experimental Biology 217(22), 4018–4028 (2014)CrossRefGoogle Scholar
  14. 14.
    Fortunato, M., Azar, M.G., Piot, B., Menick, J., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)Google Scholar
  15. 15.
    Fujimoto, S., van Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018)Google Scholar
  16. 16.
    Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290 (2018)Google Scholar
  17. 17.
    Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., Van Hasselt, H., Silver, D.: Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933 (2018)Google Scholar
  18. 18.
    Huang, G., Li, Y., Pleiss, G., Liu, Z., Hopcroft, J.E., Weinberger, K.Q.: Snapshot ensembles: Train 1, get m for free. arXiv preprint arXiv:1704.00109 (2017)Google Scholar
  19. 19.
    Huang, Z., Zhou, S., Zhuang, B., Zhou, X.: Learning to run with actor-critic ensemble. arXiv preprint arXiv:1712.08987 (2017)Google Scholar
  20. 20.
    Ian Osband Charles Blundell, A.P.B.V.R.: Deep exploration via bootstrapped dqn (2016)Google Scholar
  21. 21.
    Jaśkowski, W., Lykkebø, O.R., Toklu, N.E., Trifterer, F., Buk, Z., Koutník, J., Gomez, F.: Reinforcement Learning to Run…Fast. In: S. Escalera, M. Weimer (eds.) NIPS 2017 Competition Book. Springer, Springer (2018)Google Scholar
  22. 22.
    John, C.T., Anderson, F.C., Higginson, J.S., Delp, S.L.: Stabilisation of walking by intrinsic muscle properties revealed in a three-dimensional muscle-driven simulation. Computer methods in biomechanics and biomedical engineering 16(4), 451–462 (2013)CrossRefGoogle Scholar
  23. 23.
    Kidziński, Ł., Mohanty, S.P., Ong, C., Huang, Z., Zhou, S., Pechenko, A., Stelmaszczyk, A., Jarosik, P., Pavlov, M., Kolesnikov, S., et al.: Learning to run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments. arXiv preprint arXiv:1804.00361 (2018)Google Scholar
  24. 24.
    Kidziński, Ł., Sharada, M.P., Ong, C., Hicks, J., Francis, S., Levine, S., Salathé, M., Delp, S.: Learning to run challenge: Synthesizing physiologically accurate motion using deep reinforcement learning. In: S. Escalera, M. Weimer (eds.) NIPS 2017 Competition Book. Springer, Springer (2018)Google Scholar
  25. 25.
    Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. arXiv preprint arXiv:1706.02515 (2017)Google Scholar
  26. 26.
    Lee, G., Kim, J., Panizzolo, F., Zhou, Y., Baker, L., Galiana, I., Malcolm, P., Walsh, C.: Reducing the metabolic cost of running with a tethered soft exosuit. Science Robotics 2(6) (2017)CrossRefGoogle Scholar
  27. 27.
    Lee, S.R.: Helper for NIPS 2018: AI for Prosthetics. https://github.com/seungjaeryanlee/osim-rl-helper (2018)
  28. 28.
    Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)Google Scholar
  29. 29.
    Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts. In: International Conference on Learning Representations (ICLR) 2017 Conference Track (2017)Google Scholar
  30. 30.
    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  31. 31.
    Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., Elibol, M., Yang, Z., Paul, W., Jordan, M.I., et al.: Ray: A distributed framework for emerging {AI} applications. In: 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18), pp. 561–577 (2018)Google Scholar
  32. 32.
    Ong, C.F., Geijtenbeek, T., Hicks, J.L., Delp, S.L.: Predictive simulations of human walking produce realistic cost of transport at a range of speeds. In: Proceedings of the 16th International Symposium on Computer Simulation in Biomechanics, pp. 19–20 (2017)Google Scholar
  33. 33.
    Pardo, F., Tavakoli, A., Levdik, V., Kormushev, P.: Time limits in reinforcement learning. arXiv preprint arXiv:1712.00378 (2017)Google Scholar
  34. 34.
    Pavlov, M., Kolesnikov, S., Plis, S.M.: Run, skeleton, run: skeletal model in a physics-based simulation. ArXiv e-prints (2017)Google Scholar
  35. 35.
    Peng, X.B., Abbeel, P., Levine, S., van de Panne, M.: Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. arXiv preprint arXiv:1804.02717 (2018)Google Scholar
  36. 36.
    Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R.Y., Chen, X., Asfour, T., Abbeel, P., Andrychowicz, M.: Parameter space noise for exploration. arXiv preprint arXiv:1706.01905 (2) (2017)Google Scholar
  37. 37.
    Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 627–635 (2011)Google Scholar
  38. 38.
    Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)Google Scholar
  39. 39.
    Schulman, J., Levine, S., Abbeel, P., Jordan, M.I., Moritz, P.: Trust region policy optimization. In: ICML, pp. 1889–1897 (2015)Google Scholar
  40. 40.
    Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)Google Scholar
  41. 41.
    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017). URL http://arxiv.org/abs/1707.06347
  42. 42.
    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)Google Scholar
  43. 43.
    Seth, A., Hicks, J., Uchida, T., Habib, A., Dembia, C., Dunne, J., Ong, C., DeMers, M., Rajagopal, A., Millard, M., Hamner, S., Arnold, E., Yong, J., Lakshmikanth, S., Sherman, M., Delp, S.: Opensim: Simulating musculoskeletal dynamics and neuromuscular control to study human and animal movement. Plos Computational Biology, 14(7). (2018)CrossRefGoogle Scholar
  44. 44.
    Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 387–395 (2014)Google Scholar
  45. 45.
    Song, S., Geyer, H.: A neural circuitry that emphasizes spinal feedback generates diverse behaviours of human locomotion. The Journal of physiology 593(16), 3493–3511 (2015)CrossRefGoogle Scholar
  46. 46.
    Sosin, I., Svidchenko, O., Malysheva, A., Kudenko, D., Shpilman, A.: Framework for Deep Reinforcement Learning with GPU-CPU Multiprocessing (2018). URL  https://doi.org/10.5281/zenodo.1938263
  47. 47.
    Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112 (1999)MathSciNetCrossRefGoogle Scholar
  48. 48.
    Thelen, D.G., Anderson, F.C., Delp, S.L.: Generating dynamic simulations of movement using computed muscle control. Journal of Biomechanics 36(3), 321–328 (2003)CrossRefGoogle Scholar
  49. 49.
    Thelen, D.G., Anderson, F.C., Delp, S.L.: Generating dynamic simulations of movement using computed muscle control. Journal of biomechanics 36(3), 321–328 (2003)CrossRefGoogle Scholar
  50. 50.
    Uchida, T.K., Seth, A., Pouya, S., Dembia, C.L., Hicks, J.L., Delp, S.L.: Simulating ideal assistive devices to reduce the metabolic cost of running. PLOS ONE 11(9), 1–19 (2016).  https://doi.org/10.1371/journal.pone.0163417 CrossRefGoogle Scholar
  51. 51.
    Wu, Y., Tian, Y.: Training agent for first-person shooter game with actor-critic curriculum learning (2017)Google Scholar
  52. 52.
    Yoshua, B., Jerome, L., Ronan, C., Jason, W.: Curriculum learning (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Łukasz Kidziński
    • 1
    Email author
  • Carmichael Ong
    • 1
  • Sharada Prasanna Mohanty
    • 2
  • Jennifer Hicks
    • 1
  • Sean Carroll
    • 2
  • Bo Zhou
    • 3
  • Hongsheng Zeng
    • 3
  • Fan Wang
    • 3
  • Rongzhong Lian
    • 3
  • Hao Tian
    • 3
  • Wojciech Jaśkowski
    • 4
  • Garrett Andersen
    • 4
  • Odd Rune Lykkebø
    • 4
  • Nihat Engin Toklu
    • 4
  • Pranav Shyam
    • 4
  • Rupesh Kumar Srivastava
    • 4
  • Sergey Kolesnikov
    • 5
  • Oleksii Hrinchuk
    • 6
  • Anton Pechenko
    • 7
  • Mattias Ljungström
    • 8
  • Zhen Wang
    • 9
  • Xu Hu
    • 9
  • Zehong Hu
    • 9
  • Minghui Qiu
    • 9
  • Jun Huang
    • 9
  • Aleksei Shpilman
    • 10
  • Ivan Sosin
    • 10
  • Oleg Svidchenko
    • 10
  • Aleksandra Malysheva
    • 10
  • Daniel Kudenko
    • 11
  • Lance Rane
    • 12
  • Aditya Bhatt
    • 13
  • Zhengfei Wang
    • 14
    • 15
  • Penghui Qi
    • 14
  • Zeyang Yu
    • 14
    • 16
  • Peng Peng
    • 14
  • Quan Yuan
    • 14
  • Wenxin Li
    • 15
  • Yunsheng Tian
    • 17
  • Ruihan Yang
    • 17
  • Pingchuan Ma
    • 17
  • Shauharda Khadka
    • 18
  • Somdeb Majumdar
    • 18
  • Zach Dwiel
    • 18
  • Yinyin Liu
    • 18
  • Evren Tumer
    • 18
  • Jeremy Watson
    • 19
  • Marcel Salathé
    • 2
  • Sergey Levine
    • 20
  • Scott Delp
    • 1
  1. 1.Department of BioengineeringStanford UniversityStanfordUSA
  2. 2.École Polytechnique Fédérale de LausanneLausanneSwitzerland
  3. 3.Baidu Inc.ShenzhenChina
  4. 4.NNAISENSELuganoSwitzerland
  5. 5.DBrainMoscowRussia
  6. 6.Skolkovo Institute of Science and TechnologyMoscowRussia
  7. 7.GiantAIAthensGreece
  8. 8.Spaces of Play UGBerlinGermany
  9. 9.Alibaba GroupHangzhouChina
  10. 10.JetBrains Research and National Research University Higher School of EconomicsSt. PetersburgRussia
  11. 11.JetBrains Research and University of YorkYorkUK
  12. 12.Imperial College LondonLondonUK
  13. 13.University of FreiburgFreiburgGermany
  14. 14.inspir.aiBeijingChina
  15. 15.Peking UniversityBeijingChina
  16. 16.Jilin UniversityChangchunChina
  17. 17.Nankai UniversityTianjinChina
  18. 18.Intel AISan DiegoUSA
  19. 19.AICrowd LtdLausanneSwitzerland
  20. 20.University of California, BerkeleyBerkeleyUSA

Personalised recommendations