Abstract
We consider the problem of two active particles in 2D complex flows with the multi-objective goals of minimizing both the dispersion rate and the control activation cost of the pair. We approach the problem by means of multi-objective reinforcement learning (MORL), combining scalarization techniques together with a Q-learning algorithm, for Lagrangian drifters that have variable swimming velocity. We show that MORL is able to find a set of trade-off solutions forming an optimal Pareto frontier. As a benchmark, we show that a set of heuristic strategies are dominated by the MORL solutions. We consider the situation in which the agents cannot update their control variables continuously, but only after a discrete (decision) time, \(\tau \). We show that there is a range of decision times, in between the Lyapunov time and the continuous updating limit, where reinforcement learning finds strategies that significantly improve over heuristics. In particular, we discuss how large decision times require enhanced knowledge of the flow, whereas for smaller \(\tau \) all a priori heuristic strategies become Pareto optimal.
Graphic abstract
Similar content being viewed by others
Data availability statement
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
P. Lermusiaux, D. Subramani, J. Lin, C.S. Kulkarni, A. Gupta, A. Dutt, T. Lolla, P.J. Haley Jr., W. Hajj Ali, C. Mirabito, S. Jana, A future for intelligent autonomous ocean observing systems. J. Mar. Res. 75, 765–813 (2017)
Y. Elor, A.M. Bruckstein, Two-robot source seeking with point measurements. Theor. Comput. Sci. 457, 76–85 (2012)
W. Wu, I.D. Couzin, F. Zhang, Bio-inspired source seeking with no explicit gradient estimation. IFAC Proceedings Volumes 45(26), 240–245 (2012). (3rd IFAC Workshop on Distributed Estimation and Control in Networked Systems)
FSTaxis Algorithm: Bio-Inspired Emergent Gradient Taxis, volume ALIFE 2016, the Fifteenth International Conference on the Synthesis and Simulation of Living Systems of ALIFE 2022: The 2022 Conference on Artificial Life, 07 (2016)
C. Bechinger, R. Di Leonardo, H. Löwen, C. Reichhardt, G. Volpe, G. Volpe, Active particles in complex and crowded environments. Rev. Mod. Phys. 88(4), 045006 (2016)
A. Crisanti, M. Falcioni, A. Vulpiani, G. Paladin, Lagrangian chaos: transport, mixing and diffusion in fluids. Riv. Nuovo Cim. 14(12), 1–80 (1991)
M. Cencini, F. Cecconi, A. Vulpiani, Chaos: From Simple Models to Complex Systems. Series on Advances in Statistical Mechanics (World Scientific, Singapore, 2010)
F. Ginelli, The physics of the vicsek model. Eur. Phys. J. Spec. Top. 225(11), 2099–2117 (2016)
M.C. Marchetti, J.F. Joanny, S. Ramaswamy, T.B. Liverpool, J. Prost, M. Rao, R. Aditi Simha, Hydrodynamics of soft active matter. Rev. Mod. Phys. 85, 1143–1189 (2013)
M. Ballerini, N. Cabibbo, R. Candelier, A. Cavagna, E. Cisbani, I. Giardina, V. Lecomte, A. Orlandi, G. Parisi, A. Procaccini, M. Viale, V. Zdravkovic, Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study. Proc. Natl. Acad. Sci. 105(4), 1232–1237 (2008)
N. Khurana, N.T. Ouellette, Stability of model flocks in turbulent-like flow. New J. Phys. 15(9), 095015 (2013)
L. Biferale, F. Bonaccorso, M. Buzzicotti, P. Clark Di Leoni, K. Gustavsson, Zermelo’s problem: Optimal point-to-point navigation in 2d turbulent flows using reinforcement learning. Chaos 29(10), 103138 (2019)
M. Buzzicotti, L. Biferale, F. Bonaccorso, P. Clark di Leoni, K. Gustavsson. Optimal control of point-to-point navigation in turbulent time dependent flows using reinforcement learning, in AIxIA 2020—Advances in Artificial Intelligence, 223–234. (Springer, Cham, 2021)
J.K. Alageshan, A.K. Verma, J. Bec, R. Pandit, Machine learning strategies for path-planning microswimmers in turbulent flows. Phys. Rev. E 101, 043110 (2020)
G. Reddy, A. Celani, T.J. Sejnowski, M. Vergassola, Learning to soar in turbulent environments. Proc. Natl. Acad. Sci. 113(33), E4877–E4884 (2016)
G. Reddy, J. Wong-Ng, A. Celani, T.J. Sejnowski, M. Vergassola, Glider soaring via reinforcement learning in the field. Nature 562(7726), 236–239 (2018)
N. Orzan, C. Leone, A. Mazzolini, J. Oyero, A.Celani. Optimizing airborne wind energy with reinforcement learning. Europ. Phys. J. E 46, 2 (2023)
S. Verma, G. Novati, P. Koumoutsakos, Efficient collective swimming by harnessing vortices through deep reinforcement learning, in Proceedings of the National Academy of Sciences of the United States of America 115(23), 5849–5854 (2018)
Z. Zou, Y. Liu, Y.N. Young, O.S. Pak, A.C.H. Tsang, Gait switching and target navigation of microswimmers via deep reinforcement learning. Commun. Phys. 5(1), 158 (2022)
J. Qiu, N. Mousavi, L. Zhao, K. Gustavsson, Active gyrotactic stability of microswimmers using hydromechanical signals. Phys. Rev. Fluids 7(1), 014311 (2022)
A. Daddi-Moussa-Ider, H. Löwen, B. Liebchen, Hydrodynamics can determine the optimal route for microswimmer navigation. Commun. Phys. 4, 15 (2021)
F. Borra, L. Biferale, M. Cencini, A. Celani, Reinforcement learning for pursuit and evasion of microswimmers at low Reynolds number. Phys. Rev. Fluids 7(2), 023103 (2022)
G. Zhu, W. Fang, L. Zhu, Optimizing low-Reynolds-number predation via optimal control and reinforcement learning. J. Fluid Mech. 944, A3 (2022)
S. Goh, R. Winkler, G. Gompper, Noisy pursuit and pattern formation of self-steering active particles. New J. Phys. 24, 093039 (2022)
C.A.C. Coello. Handling preferences in evolutionary multiobjective optimization: a survey, in Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512), vol. 1 (2000), pp. 30–37
C. Coello, D. Veldhuizen, G. Lamont. Evolutionary Algorithms for Solving Multi-Objective Problems Second Edition (Springer US, 2007)
C. Liu, X. Xu, D. Hu, Multiobjective reinforcement learning: a comprehensive overview. IEEE Trans. Syst. Man Cybern. Syst. 45(3), 385–398 (2015)
P. Vamplew, R. Dazeley, A. Berry, R. Issabekov, E. Dekker, Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach. Learn. 84(1–2), 51–80 (2011)
S. Natarajan P. Tadepalli. Dynamic preferences in multi-criteria reinforcement learning, in Proceedings of the 22nd International Conference on Machine Learning, ICML’05 (Association for Computing Machinery, New York, NY, USA, 2005), pp. 601–608
A. Castelletti, G. Corani, A.E. Rizzoli, R. SonciniSessa, E. Weber, Reinforcement Learning in the Operational Management of a Water System (Pergamon Press, Oxford, 2002), p.325
P. Vamplew, J. Yearwood, R. Dazeley, A. Berry. On the limitations of scalarisation for multi-objective reinforcement learning of pareto fronts, in AI 2008: Advances in Artificial Intelligence (Springer Berlin Heidelberg, 2008), pp. 372–378
E. Zitzler, L. Thiele, M. Laumanns, C.M. Fonseca, V.G. da Fonseca, Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans. Evol. Comput. 7(2), 117–132 (2003)
J. Bec, Multifractal concentrations of inertial particles in smooth random flows. J. Fluid Mech. 528, 255–277 (2005)
C.W. Gardiner, Handbook of Stochastic Methods for Physics, Chemistry, and the Natural Sciences. Springer complexity (Springer, Berlin, 2004)
R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT Press, Cambridge, 2018)
Acknowledgements
This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 882340).
Author information
Authors and Affiliations
Contributions
All authors conceived the research. CC performed all the numerical simulations and data analysis. All authors discussed the results. CC wrote the paper with revision and input from all the authors.
Corresponding author
Additional information
T.I.: Quantitative AI in Complex Fluids and Complex Flows: Challenges and Benchmarks. Guest editors: Luca Biferale, Michele Buzzicotti, Massimo Cencini.
Appendix A: Q-learning implementation
Appendix A: Q-learning implementation
To solve the optimization problem, we used the Q-learning algorithm [35] which is based on evaluating the action-value function, Q(s, a), that is the expected future cumulative reward given that the agents are in state s and take action a. The algorithm is expected to converge to the optimal policy by the following iterative trial-and-error protocol. At each decision time \(t_j\), the agents pair measures its state \(s_{t_j}\) and selects an action \(a_{t_j}\) using an \(\epsilon \)-greedy strategy, where \(a_{t_j}(s_{t_j})=\arg \max _a\{Q(s_{t_j},a)\}\) with probability \(1-\epsilon \) or \(a_{t_j}\) is chosen randomly with probability \(\epsilon \). Then, we let the dynamical system evolve for a time \(\tau \), according to (1), keeping both control directions and velocity intensity fixed. Afterward, the agents receive a reward \(r_\textrm{tot}(t_{j+1})\) (11) and the Q-matrix is updated as
where \(\alpha \) is the learning rate. Updates are repeated up to the end of the episode \(t=T_{\max }\), when no reward is assigned. The learning protocol is repeated restarting with another pair with the same initial distance in another flow position until we reach a “local” optimum given by the equation \(Q^*(s_{t_j},a)= r_\textrm{tot}(t_{j+1}) +\max _{a} Q^*(s_{t_{j+1}},a)\) and defined by the policy
In order to ease the convergence of the algorithm, the learning rate \(\alpha \) is taken as a decreasing functions of the time spent in the state-action pair, while the exploration parameters decrease with the time spent in the visited state. Thus if n(s, a) is the number of decision times in which the couple (s, a) has been visited, and
\(\epsilon \) and \(\alpha \) are taken as:
with \(\gamma =4/5\), the numerical values of the constants have been determined after some preliminary tests. As for the initialization of the matrix Q, we have taken the same large (optimistic) value for all the state-action pairs.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Calascibetta, C., Biferale, L., Borra, F. et al. Taming Lagrangian chaos with multi-objective reinforcement learning. Eur. Phys. J. E 46, 9 (2023). https://doi.org/10.1140/epje/s10189-023-00271-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1140/epje/s10189-023-00271-0