Skip to main content

Artificial Intelligence for Prosthetics: Challenge Solutions

  • Conference paper
  • First Online:
The NeurIPS '18 Competition

Abstract

In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants described their algorithms in this paper. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each team implemented different modifications of the known algorithms by, for example, dividing the task into subtasks, learning low-level control, or by incorporating expert knowledge and using imitation learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Find open-source code at: https://github.com/PaddlePaddle/PARL.

  2. 2.

    https://youtu.be/ckPSJYLAWy0.

  3. 3.

    https://youtu.be/mw9cVvaM0vQ.

  4. 4.

    https://github.com/scitator/catalyst.

  5. 5.

    https://mljx.io/x/neurips_walk_2018.gif.

  6. 6.

    https://github.com/joneswong/rl_stadium.

  7. 7.

    https://www.alibabacloud.com/press-room/alibaba-cloud-announces-machine-learning -platform-pai.

  8. 8.

    Each observation provided by the simulator was a python dict, so it had to be flattened into an array of floats for the agent’s consumption. This flattening was done using a function from the helper library [27]. Due to an accident in using this code, some of the coordinates were replicated several times, thus the actual vector size used in the training is 417.

  9. 9.

    https://github.com/wangzhengfei0730/NIPS2018-AIforProsthetics.

  10. 10.

    https://github.com/hagrid67/prosthetics_public.

  11. 11.

    joint_pos hip_l [1] in the observation dictionary.

References

  1. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., Zaremba, W.: Hindsight experience replay. In: NIPS (2017)

    Google Scholar 

  2. authors, A.: Recurrent experience replay in distributed reinforcement learning. https://openreview.net/pdf?id=r1lyTjAqYX (2018)

  3. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

    Google Scholar 

  4. Barth-Maron, G., Hoffman, M.W., Budden, D., Dabney, W., Horgan, D., Muldal, A., Heess, N., Lillicrap, T.: Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617 (2018)

    Google Scholar 

  5. Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. arXiv preprint arXiv:1707.06887 (2017)

    Google Scholar 

  6. Bellman, R.E.: Adaptive control processes: a guided tour. Princeton University Press (1961)

    Google Scholar 

  7. Bhatt, A., Argus, M., Amiranashvili, A., Brox, T.: Crossnorm: Normalization for off-policy td reinforcement learning. arXiv preprint arXiv:1902.05605 (2019)

    Google Scholar 

  8. Crowninshield, R.D., Brand, R.A.: A physiologically based criterion of muscle force prediction in locomotion. Journal of Biomechanics 14(11), 793–801 (1981)

    Article  Google Scholar 

  9. Dabney, W., Rowland, M., Bellemare, M.G., Munos, R.: Distributional reinforcement learning with quantile regression. arXiv preprint arXiv:1710.10044 (2017)

    Google Scholar 

  10. Delp, S.L., Anderson, F.C., Arnold, A.S., Loan, P., Habib, A., John, C.T., Guendelman, E., Thelen, D.G.: Opensim: open-source software to create and analyze dynamic simulations of movement. IEEE transactions on biomedical engineering 54(11), 1940–1950 (2007)

    Article  Google Scholar 

  11. Dhariwal, P., Hesse, C., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y.: OpenAI Baselines. https://github.com/openai/baselines (2017)

  12. Dietterich, T.G., et al.: Ensemble methods in machine learning. Multiple classifier systems 1857, 1–15 (2000)

    Article  Google Scholar 

  13. Farris, D.J., Hicks, J.L., Delp, S.L., Sawicki, G.S.: Musculoskeletal modelling deconstructs the paradoxical effects of elastic ankle exoskeletons on plantar-flexor mechanics and energetics during hopping. Journal of Experimental Biology 217(22), 4018–4028 (2014)

    Article  Google Scholar 

  14. Fortunato, M., Azar, M.G., Piot, B., Menick, J., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)

    Google Scholar 

  15. Fujimoto, S., van Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018)

    Google Scholar 

  16. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290 (2018)

    Google Scholar 

  17. Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., Van Hasselt, H., Silver, D.: Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933 (2018)

    Google Scholar 

  18. Huang, G., Li, Y., Pleiss, G., Liu, Z., Hopcroft, J.E., Weinberger, K.Q.: Snapshot ensembles: Train 1, get m for free. arXiv preprint arXiv:1704.00109 (2017)

    Google Scholar 

  19. Huang, Z., Zhou, S., Zhuang, B., Zhou, X.: Learning to run with actor-critic ensemble. arXiv preprint arXiv:1712.08987 (2017)

    Google Scholar 

  20. Ian Osband Charles Blundell, A.P.B.V.R.: Deep exploration via bootstrapped dqn (2016)

    Google Scholar 

  21. Jaśkowski, W., Lykkebø, O.R., Toklu, N.E., Trifterer, F., Buk, Z., Koutník, J., Gomez, F.: Reinforcement Learning to Run…Fast. In: S. Escalera, M. Weimer (eds.) NIPS 2017 Competition Book. Springer, Springer (2018)

    Google Scholar 

  22. John, C.T., Anderson, F.C., Higginson, J.S., Delp, S.L.: Stabilisation of walking by intrinsic muscle properties revealed in a three-dimensional muscle-driven simulation. Computer methods in biomechanics and biomedical engineering 16(4), 451–462 (2013)

    Article  Google Scholar 

  23. Kidziński, Ł., Mohanty, S.P., Ong, C., Huang, Z., Zhou, S., Pechenko, A., Stelmaszczyk, A., Jarosik, P., Pavlov, M., Kolesnikov, S., et al.: Learning to run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments. arXiv preprint arXiv:1804.00361 (2018)

    Google Scholar 

  24. Kidziński, Ł., Sharada, M.P., Ong, C., Hicks, J., Francis, S., Levine, S., Salathé, M., Delp, S.: Learning to run challenge: Synthesizing physiologically accurate motion using deep reinforcement learning. In: S. Escalera, M. Weimer (eds.) NIPS 2017 Competition Book. Springer, Springer (2018)

    Google Scholar 

  25. Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. arXiv preprint arXiv:1706.02515 (2017)

    Google Scholar 

  26. Lee, G., Kim, J., Panizzolo, F., Zhou, Y., Baker, L., Galiana, I., Malcolm, P., Walsh, C.: Reducing the metabolic cost of running with a tethered soft exosuit. Science Robotics 2(6) (2017)

    Article  Google Scholar 

  27. Lee, S.R.: Helper for NIPS 2018: AI for Prosthetics. https://github.com/seungjaeryanlee/osim-rl-helper (2018)

  28. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

    Google Scholar 

  29. Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts. In: International Conference on Learning Representations (ICLR) 2017 Conference Track (2017)

    Google Scholar 

  30. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  31. Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., Elibol, M., Yang, Z., Paul, W., Jordan, M.I., et al.: Ray: A distributed framework for emerging {AI} applications. In: 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18), pp. 561–577 (2018)

    Google Scholar 

  32. Ong, C.F., Geijtenbeek, T., Hicks, J.L., Delp, S.L.: Predictive simulations of human walking produce realistic cost of transport at a range of speeds. In: Proceedings of the 16th International Symposium on Computer Simulation in Biomechanics, pp. 19–20 (2017)

    Google Scholar 

  33. Pardo, F., Tavakoli, A., Levdik, V., Kormushev, P.: Time limits in reinforcement learning. arXiv preprint arXiv:1712.00378 (2017)

    Google Scholar 

  34. Pavlov, M., Kolesnikov, S., Plis, S.M.: Run, skeleton, run: skeletal model in a physics-based simulation. ArXiv e-prints (2017)

    Google Scholar 

  35. Peng, X.B., Abbeel, P., Levine, S., van de Panne, M.: Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. arXiv preprint arXiv:1804.02717 (2018)

    Google Scholar 

  36. Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R.Y., Chen, X., Asfour, T., Abbeel, P., Andrychowicz, M.: Parameter space noise for exploration. arXiv preprint arXiv:1706.01905 (2) (2017)

    Google Scholar 

  37. Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 627–635 (2011)

    Google Scholar 

  38. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)

    Google Scholar 

  39. Schulman, J., Levine, S., Abbeel, P., Jordan, M.I., Moritz, P.: Trust region policy optimization. In: ICML, pp. 1889–1897 (2015)

    Google Scholar 

  40. Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)

    Google Scholar 

  41. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017). URL http://arxiv.org/abs/1707.06347

  42. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

    Google Scholar 

  43. Seth, A., Hicks, J., Uchida, T., Habib, A., Dembia, C., Dunne, J., Ong, C., DeMers, M., Rajagopal, A., Millard, M., Hamner, S., Arnold, E., Yong, J., Lakshmikanth, S., Sherman, M., Delp, S.: Opensim: Simulating musculoskeletal dynamics and neuromuscular control to study human and animal movement. Plos Computational Biology, 14(7). (2018)

    Article  Google Scholar 

  44. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 387–395 (2014)

    Google Scholar 

  45. Song, S., Geyer, H.: A neural circuitry that emphasizes spinal feedback generates diverse behaviours of human locomotion. The Journal of physiology 593(16), 3493–3511 (2015)

    Article  Google Scholar 

  46. Sosin, I., Svidchenko, O., Malysheva, A., Kudenko, D., Shpilman, A.: Framework for Deep Reinforcement Learning with GPU-CPU Multiprocessing (2018). URL https://doi.org/10.5281/zenodo.1938263

  47. Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112 (1999)

    Article  MathSciNet  Google Scholar 

  48. Thelen, D.G., Anderson, F.C., Delp, S.L.: Generating dynamic simulations of movement using computed muscle control. Journal of Biomechanics 36(3), 321–328 (2003)

    Article  Google Scholar 

  49. Thelen, D.G., Anderson, F.C., Delp, S.L.: Generating dynamic simulations of movement using computed muscle control. Journal of biomechanics 36(3), 321–328 (2003)

    Article  Google Scholar 

  50. Uchida, T.K., Seth, A., Pouya, S., Dembia, C.L., Hicks, J.L., Delp, S.L.: Simulating ideal assistive devices to reduce the metabolic cost of running. PLOS ONE 11(9), 1–19 (2016). https://doi.org/10.1371/journal.pone.0163417

    Article  Google Scholar 

  51. Wu, Y., Tian, Y.: Training agent for first-person shooter game with actor-critic curriculum learning (2017)

    Google Scholar 

  52. Yoshua, B., Jerome, L., Ronan, C., Jason, W.: Curriculum learning (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Łukasz Kidziński .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kidziński, Ł. et al. (2020). Artificial Intelligence for Prosthetics: Challenge Solutions. In: Escalera, S., Herbrich, R. (eds) The NeurIPS '18 Competition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-29135-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29135-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29134-1

  • Online ISBN: 978-3-030-29135-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics