The NeurIPS '18 Competition pp 69-128 | Cite as
Artificial Intelligence for Prosthetics: Challenge Solutions
Conference paper
First Online:
Abstract
In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants described their algorithms in this paper. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each team implemented different modifications of the known algorithms by, for example, dividing the task into subtasks, learning low-level control, or by incorporating expert knowledge and using imitation learning.
References
- 1.Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., Zaremba, W.: Hindsight experience replay. In: NIPS (2017)Google Scholar
- 2.authors, A.: Recurrent experience replay in distributed reinforcement learning. https://openreview.net/pdf?id=r1lyTjAqYX (2018)
- 3.Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)Google Scholar
- 4.Barth-Maron, G., Hoffman, M.W., Budden, D., Dabney, W., Horgan, D., Muldal, A., Heess, N., Lillicrap, T.: Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617 (2018)Google Scholar
- 5.Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. arXiv preprint arXiv:1707.06887 (2017)Google Scholar
- 6.Bellman, R.E.: Adaptive control processes: a guided tour. Princeton University Press (1961)Google Scholar
- 7.Bhatt, A., Argus, M., Amiranashvili, A., Brox, T.: Crossnorm: Normalization for off-policy td reinforcement learning. arXiv preprint arXiv:1902.05605 (2019)Google Scholar
- 8.Crowninshield, R.D., Brand, R.A.: A physiologically based criterion of muscle force prediction in locomotion. Journal of Biomechanics 14(11), 793–801 (1981)CrossRefGoogle Scholar
- 9.Dabney, W., Rowland, M., Bellemare, M.G., Munos, R.: Distributional reinforcement learning with quantile regression. arXiv preprint arXiv:1710.10044 (2017)Google Scholar
- 10.Delp, S.L., Anderson, F.C., Arnold, A.S., Loan, P., Habib, A., John, C.T., Guendelman, E., Thelen, D.G.: Opensim: open-source software to create and analyze dynamic simulations of movement. IEEE transactions on biomedical engineering 54(11), 1940–1950 (2007)CrossRefGoogle Scholar
- 11.Dhariwal, P., Hesse, C., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y.: OpenAI Baselines. https://github.com/openai/baselines (2017)
- 12.Dietterich, T.G., et al.: Ensemble methods in machine learning. Multiple classifier systems 1857, 1–15 (2000)CrossRefGoogle Scholar
- 13.Farris, D.J., Hicks, J.L., Delp, S.L., Sawicki, G.S.: Musculoskeletal modelling deconstructs the paradoxical effects of elastic ankle exoskeletons on plantar-flexor mechanics and energetics during hopping. Journal of Experimental Biology 217(22), 4018–4028 (2014)CrossRefGoogle Scholar
- 14.Fortunato, M., Azar, M.G., Piot, B., Menick, J., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)Google Scholar
- 15.Fujimoto, S., van Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018)Google Scholar
- 16.Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290 (2018)Google Scholar
- 17.Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., Van Hasselt, H., Silver, D.: Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933 (2018)Google Scholar
- 18.Huang, G., Li, Y., Pleiss, G., Liu, Z., Hopcroft, J.E., Weinberger, K.Q.: Snapshot ensembles: Train 1, get m for free. arXiv preprint arXiv:1704.00109 (2017)Google Scholar
- 19.Huang, Z., Zhou, S., Zhuang, B., Zhou, X.: Learning to run with actor-critic ensemble. arXiv preprint arXiv:1712.08987 (2017)Google Scholar
- 20.Ian Osband Charles Blundell, A.P.B.V.R.: Deep exploration via bootstrapped dqn (2016)Google Scholar
- 21.Jaśkowski, W., Lykkebø, O.R., Toklu, N.E., Trifterer, F., Buk, Z., Koutník, J., Gomez, F.: Reinforcement Learning to Run…Fast. In: S. Escalera, M. Weimer (eds.) NIPS 2017 Competition Book. Springer, Springer (2018)Google Scholar
- 22.John, C.T., Anderson, F.C., Higginson, J.S., Delp, S.L.: Stabilisation of walking by intrinsic muscle properties revealed in a three-dimensional muscle-driven simulation. Computer methods in biomechanics and biomedical engineering 16(4), 451–462 (2013)CrossRefGoogle Scholar
- 23.Kidziński, Ł., Mohanty, S.P., Ong, C., Huang, Z., Zhou, S., Pechenko, A., Stelmaszczyk, A., Jarosik, P., Pavlov, M., Kolesnikov, S., et al.: Learning to run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments. arXiv preprint arXiv:1804.00361 (2018)Google Scholar
- 24.Kidziński, Ł., Sharada, M.P., Ong, C., Hicks, J., Francis, S., Levine, S., Salathé, M., Delp, S.: Learning to run challenge: Synthesizing physiologically accurate motion using deep reinforcement learning. In: S. Escalera, M. Weimer (eds.) NIPS 2017 Competition Book. Springer, Springer (2018)Google Scholar
- 25.Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. arXiv preprint arXiv:1706.02515 (2017)Google Scholar
- 26.Lee, G., Kim, J., Panizzolo, F., Zhou, Y., Baker, L., Galiana, I., Malcolm, P., Walsh, C.: Reducing the metabolic cost of running with a tethered soft exosuit. Science Robotics 2(6) (2017)CrossRefGoogle Scholar
- 27.Lee, S.R.: Helper for NIPS 2018: AI for Prosthetics. https://github.com/seungjaeryanlee/osim-rl-helper (2018)
- 28.Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)Google Scholar
- 29.Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts. In: International Conference on Learning Representations (ICLR) 2017 Conference Track (2017)Google Scholar
- 30.Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
- 31.Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., Elibol, M., Yang, Z., Paul, W., Jordan, M.I., et al.: Ray: A distributed framework for emerging {AI} applications. In: 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18), pp. 561–577 (2018)Google Scholar
- 32.Ong, C.F., Geijtenbeek, T., Hicks, J.L., Delp, S.L.: Predictive simulations of human walking produce realistic cost of transport at a range of speeds. In: Proceedings of the 16th International Symposium on Computer Simulation in Biomechanics, pp. 19–20 (2017)Google Scholar
- 33.Pardo, F., Tavakoli, A., Levdik, V., Kormushev, P.: Time limits in reinforcement learning. arXiv preprint arXiv:1712.00378 (2017)Google Scholar
- 34.Pavlov, M., Kolesnikov, S., Plis, S.M.: Run, skeleton, run: skeletal model in a physics-based simulation. ArXiv e-prints (2017)Google Scholar
- 35.Peng, X.B., Abbeel, P., Levine, S., van de Panne, M.: Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. arXiv preprint arXiv:1804.02717 (2018)Google Scholar
- 36.Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R.Y., Chen, X., Asfour, T., Abbeel, P., Andrychowicz, M.: Parameter space noise for exploration. arXiv preprint arXiv:1706.01905 (2) (2017)Google Scholar
- 37.Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 627–635 (2011)Google Scholar
- 38.Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)Google Scholar
- 39.Schulman, J., Levine, S., Abbeel, P., Jordan, M.I., Moritz, P.: Trust region policy optimization. In: ICML, pp. 1889–1897 (2015)Google Scholar
- 40.Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)Google Scholar
- 41.Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017). URL http://arxiv.org/abs/1707.06347
- 42.Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)Google Scholar
- 43.Seth, A., Hicks, J., Uchida, T., Habib, A., Dembia, C., Dunne, J., Ong, C., DeMers, M., Rajagopal, A., Millard, M., Hamner, S., Arnold, E., Yong, J., Lakshmikanth, S., Sherman, M., Delp, S.: Opensim: Simulating musculoskeletal dynamics and neuromuscular control to study human and animal movement. Plos Computational Biology, 14(7). (2018)CrossRefGoogle Scholar
- 44.Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 387–395 (2014)Google Scholar
- 45.Song, S., Geyer, H.: A neural circuitry that emphasizes spinal feedback generates diverse behaviours of human locomotion. The Journal of physiology 593(16), 3493–3511 (2015)CrossRefGoogle Scholar
- 46.Sosin, I., Svidchenko, O., Malysheva, A., Kudenko, D., Shpilman, A.: Framework for Deep Reinforcement Learning with GPU-CPU Multiprocessing (2018). URL https://doi.org/10.5281/zenodo.1938263
- 47.Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112 (1999)MathSciNetCrossRefGoogle Scholar
- 48.Thelen, D.G., Anderson, F.C., Delp, S.L.: Generating dynamic simulations of movement using computed muscle control. Journal of Biomechanics 36(3), 321–328 (2003)CrossRefGoogle Scholar
- 49.Thelen, D.G., Anderson, F.C., Delp, S.L.: Generating dynamic simulations of movement using computed muscle control. Journal of biomechanics 36(3), 321–328 (2003)CrossRefGoogle Scholar
- 50.Uchida, T.K., Seth, A., Pouya, S., Dembia, C.L., Hicks, J.L., Delp, S.L.: Simulating ideal assistive devices to reduce the metabolic cost of running. PLOS ONE 11(9), 1–19 (2016). https://doi.org/10.1371/journal.pone.0163417 CrossRefGoogle Scholar
- 51.Wu, Y., Tian, Y.: Training agent for first-person shooter game with actor-critic curriculum learning (2017)Google Scholar
- 52.Yoshua, B., Jerome, L., Ronan, C., Jason, W.: Curriculum learning (2009)Google Scholar
Copyright information
© Springer Nature Switzerland AG 2020