Motor Learning at Intermediate Reynolds Number: Experiments with Policy Gradient on the Flapping Flight of a Rigid Wing
This work describes the development of a model-free reinforcement learning-based control methodology for the heaving plate, a laboratory experimental fluid system that serves as a model of flapping flight. Through an optimized policy gradient algorithm, we were able to demonstrate rapid convergence (requiring less than 10 minutes of experiments) to a stroke form which maximized the propulsive efficiency of this very complicated fluid-dynamical system. This success was due in part to an improved sampling distribution and carefully selected policy parameterization, both motivated by a formal analysis of the signal-to-noise ratio of policy gradient algorithms. The resulting optimal policy provides insight into the behavior of the fluid system, and the effectiveness of the learning strategy suggests a number of exciting opportunities for machine learning control of fluid dynamics.
Unable to display preview. Download preview PDF.
- Bennis, A., Leeser, M., Tadmor, G., Tedrake, R.: Implementation of a highly parameterized digital PIV system on reconfigurable hardware. In: Proceedings of the Twelfth Annual Workshop on High Performance Embedded Computing (HPEC), Lexington, MA (2008)Google Scholar
- Howard, M., Klanke, S., Gienger, M., Goerick, C., Vijayakumar, S.: Methods for learning control policies from variable-constraint demonstrations. In: Sigaud, O., Peters, J. (eds.) From Motor Learning to Interaction Learning in Robots. SCI, vol. 264, pp. 253–291. Springer, Heidelberg (2010)Google Scholar
- Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101 (1998)Google Scholar
- Kober, J., Mohler, B., Peters, J.: Imitation and reinforcement learning for motor primitives with perceptual coupling. In: Sigaud, O., Peters, J. (eds.) From Motor Learning to Interaction Learning in Robots. SCI, vol. 264, pp. 209–225. Springer, Heidelberg (2010)Google Scholar
- Meuleau, N., Peshkin, L., Kaelbling, L.P., Kim, K.-E.: Off-policy policy search. In: NIPS (2000)Google Scholar
- Peters, J., Vijayakumar, S., Schaal, S.: Policy gradient methods for robot control (Technical Report CS-03-787). University of Southern California (2003)Google Scholar
- Roberts, J.W., Tedrake, R.: Signal-to-noise ratio analysis of policy gradient algorithms. In: Advances of Neural Information Processing Systems (NIPS), vol. 21, p. 8 (2009)Google Scholar
- Shelley, M.: Personal Communication (2007)Google Scholar
- Tedrake, R., Zhang, T.W., Seung, H.S.: Stochastic policy gradient reinforcement learning on a simple 3D biped. In: Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), Sendai, Japan, pp. 2849–2854 (2004)Google Scholar
- Vandenberghe, N., Childress, S., Zhang, J.: On unidirectional flight of a free flapping wing. Physics of Fluids, 18 (2006)Google Scholar
- Williams, J.L., Fisher III, J.W., Willsky, A.S.: Importance sampling actor-critic algorithms. In: Proceedings of the 2006 American Control Conference (2006)Google Scholar