Taming Lagrangian chaos with multi-objective reinforcement learning

Calascibetta, Chiara; Biferale, Luca; Borra, Francesco; Celani, Antonio; Cencini,  Massimo

doi:10.1140/epje/s10189-023-00271-0

Taming Lagrangian chaos with multi-objective reinforcement learning

Regular Article - Flowing Matter
Published: 03 March 2023

Volume 46, article number 9, (2023)
Cite this article

The European Physical Journal E Aims and scope Submit manuscript

Chiara Calascibetta ORCID: orcid.org/0000-0001-9667-3856¹,
Luca Biferale¹,
Francesco Borra²,
Antonio Celani³ &
…
Massimo Cencini^4,5

466 Accesses
3 Citations
7 Altmetric
Explore all metrics

Abstract

We consider the problem of two active particles in 2D complex flows with the multi-objective goals of minimizing both the dispersion rate and the control activation cost of the pair. We approach the problem by means of multi-objective reinforcement learning (MORL), combining scalarization techniques together with a Q-learning algorithm, for Lagrangian drifters that have variable swimming velocity. We show that MORL is able to find a set of trade-off solutions forming an optimal Pareto frontier. As a benchmark, we show that a set of heuristic strategies are dominated by the MORL solutions. We consider the situation in which the agents cannot update their control variables continuously, but only after a discrete (decision) time, $\tau $. We show that there is a range of decision times, in between the Lyapunov time and the continuous updating limit, where reinforcement learning finds strategies that significantly improve over heuristics. In particular, we discuss how large decision times require enhanced knowledge of the flow, whereas for smaller $\tau $ all a priori heuristic strategies become Pareto optimal.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding efficient swimming strategies in a three-dimensional chaotic flow by reinforcement learning

Article 14 December 2017

Steering undulatory micro-swimmers in a fluid flow through reinforcement learning

Article 12 June 2023

Optimal Control of Point-to-Point Navigation in Turbulent Time Dependent Flows Using Reinforcement Learning

Data availability statement

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

P. Lermusiaux, D. Subramani, J. Lin, C.S. Kulkarni, A. Gupta, A. Dutt, T. Lolla, P.J. Haley Jr., W. Hajj Ali, C. Mirabito, S. Jana, A future for intelligent autonomous ocean observing systems. J. Mar. Res. 75, 765–813 (2017)
Article Google Scholar
Y. Elor, A.M. Bruckstein, Two-robot source seeking with point measurements. Theor. Comput. Sci. 457, 76–85 (2012)
Article MathSciNet MATH Google Scholar
W. Wu, I.D. Couzin, F. Zhang, Bio-inspired source seeking with no explicit gradient estimation. IFAC Proceedings Volumes 45(26), 240–245 (2012). (3rd IFAC Workshop on Distributed Estimation and Control in Networked Systems)
Article Google Scholar
FSTaxis Algorithm: Bio-Inspired Emergent Gradient Taxis, volume ALIFE 2016, the Fifteenth International Conference on the Synthesis and Simulation of Living Systems of ALIFE 2022: The 2022 Conference on Artificial Life, 07 (2016)
C. Bechinger, R. Di Leonardo, H. Löwen, C. Reichhardt, G. Volpe, G. Volpe, Active particles in complex and crowded environments. Rev. Mod. Phys. 88(4), 045006 (2016)
Article ADS MathSciNet Google Scholar
A. Crisanti, M. Falcioni, A. Vulpiani, G. Paladin, Lagrangian chaos: transport, mixing and diffusion in fluids. Riv. Nuovo Cim. 14(12), 1–80 (1991)
Article ADS MathSciNet Google Scholar
M. Cencini, F. Cecconi, A. Vulpiani, Chaos: From Simple Models to Complex Systems. Series on Advances in Statistical Mechanics (World Scientific, Singapore, 2010)
MATH Google Scholar
F. Ginelli, The physics of the vicsek model. Eur. Phys. J. Spec. Top. 225(11), 2099–2117 (2016)
Article Google Scholar
M.C. Marchetti, J.F. Joanny, S. Ramaswamy, T.B. Liverpool, J. Prost, M. Rao, R. Aditi Simha, Hydrodynamics of soft active matter. Rev. Mod. Phys. 85, 1143–1189 (2013)
Article ADS Google Scholar
M. Ballerini, N. Cabibbo, R. Candelier, A. Cavagna, E. Cisbani, I. Giardina, V. Lecomte, A. Orlandi, G. Parisi, A. Procaccini, M. Viale, V. Zdravkovic, Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study. Proc. Natl. Acad. Sci. 105(4), 1232–1237 (2008)
Article ADS Google Scholar
N. Khurana, N.T. Ouellette, Stability of model flocks in turbulent-like flow. New J. Phys. 15(9), 095015 (2013)
Article ADS MathSciNet Google Scholar
L. Biferale, F. Bonaccorso, M. Buzzicotti, P. Clark Di Leoni, K. Gustavsson, Zermelo’s problem: Optimal point-to-point navigation in 2d turbulent flows using reinforcement learning. Chaos 29(10), 103138 (2019)
Article ADS MathSciNet Google Scholar
M. Buzzicotti, L. Biferale, F. Bonaccorso, P. Clark di Leoni, K. Gustavsson. Optimal control of point-to-point navigation in turbulent time dependent flows using reinforcement learning, in AIxIA 2020—Advances in Artificial Intelligence, 223–234. (Springer, Cham, 2021)
J.K. Alageshan, A.K. Verma, J. Bec, R. Pandit, Machine learning strategies for path-planning microswimmers in turbulent flows. Phys. Rev. E 101, 043110 (2020)
Article ADS Google Scholar
G. Reddy, A. Celani, T.J. Sejnowski, M. Vergassola, Learning to soar in turbulent environments. Proc. Natl. Acad. Sci. 113(33), E4877–E4884 (2016)
Article ADS Google Scholar
G. Reddy, J. Wong-Ng, A. Celani, T.J. Sejnowski, M. Vergassola, Glider soaring via reinforcement learning in the field. Nature 562(7726), 236–239 (2018)
Article ADS Google Scholar
N. Orzan, C. Leone, A. Mazzolini, J. Oyero, A.Celani. Optimizing airborne wind energy with reinforcement learning. Europ. Phys. J. E 46, 2 (2023)
S. Verma, G. Novati, P. Koumoutsakos, Efficient collective swimming by harnessing vortices through deep reinforcement learning, in Proceedings of the National Academy of Sciences of the United States of America 115(23), 5849–5854 (2018)
Z. Zou, Y. Liu, Y.N. Young, O.S. Pak, A.C.H. Tsang, Gait switching and target navigation of microswimmers via deep reinforcement learning. Commun. Phys. 5(1), 158 (2022)
Article Google Scholar
J. Qiu, N. Mousavi, L. Zhao, K. Gustavsson, Active gyrotactic stability of microswimmers using hydromechanical signals. Phys. Rev. Fluids 7(1), 014311 (2022)
Article ADS Google Scholar
A. Daddi-Moussa-Ider, H. Löwen, B. Liebchen, Hydrodynamics can determine the optimal route for microswimmer navigation. Commun. Phys. 4, 15 (2021)
Article Google Scholar
F. Borra, L. Biferale, M. Cencini, A. Celani, Reinforcement learning for pursuit and evasion of microswimmers at low Reynolds number. Phys. Rev. Fluids 7(2), 023103 (2022)
Article ADS Google Scholar
G. Zhu, W. Fang, L. Zhu, Optimizing low-Reynolds-number predation via optimal control and reinforcement learning. J. Fluid Mech. 944, A3 (2022)
Article MathSciNet MATH Google Scholar
S. Goh, R. Winkler, G. Gompper, Noisy pursuit and pattern formation of self-steering active particles. New J. Phys. 24, 093039 (2022)
Article ADS MathSciNet Google Scholar
C.A.C. Coello. Handling preferences in evolutionary multiobjective optimization: a survey, in Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512), vol. 1 (2000), pp. 30–37
C. Coello, D. Veldhuizen, G. Lamont. Evolutionary Algorithms for Solving Multi-Objective Problems Second Edition (Springer US, 2007)
C. Liu, X. Xu, D. Hu, Multiobjective reinforcement learning: a comprehensive overview. IEEE Trans. Syst. Man Cybern. Syst. 45(3), 385–398 (2015)
Article Google Scholar
P. Vamplew, R. Dazeley, A. Berry, R. Issabekov, E. Dekker, Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach. Learn. 84(1–2), 51–80 (2011)
Article MathSciNet Google Scholar
S. Natarajan P. Tadepalli. Dynamic preferences in multi-criteria reinforcement learning, in Proceedings of the 22nd International Conference on Machine Learning, ICML’05 (Association for Computing Machinery, New York, NY, USA, 2005), pp. 601–608
A. Castelletti, G. Corani, A.E. Rizzoli, R. SonciniSessa, E. Weber, Reinforcement Learning in the Operational Management of a Water System (Pergamon Press, Oxford, 2002), p.325
Google Scholar
P. Vamplew, J. Yearwood, R. Dazeley, A. Berry. On the limitations of scalarisation for multi-objective reinforcement learning of pareto fronts, in AI 2008: Advances in Artificial Intelligence (Springer Berlin Heidelberg, 2008), pp. 372–378
E. Zitzler, L. Thiele, M. Laumanns, C.M. Fonseca, V.G. da Fonseca, Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans. Evol. Comput. 7(2), 117–132 (2003)
Article Google Scholar
J. Bec, Multifractal concentrations of inertial particles in smooth random flows. J. Fluid Mech. 528, 255–277 (2005)
C.W. Gardiner, Handbook of Stochastic Methods for Physics, Chemistry, and the Natural Sciences. Springer complexity (Springer, Berlin, 2004)
MATH Google Scholar
R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT Press, Cambridge, 2018)
MATH Google Scholar

Download references

Acknowledgements

This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 882340).

Author information

Authors and Affiliations

Department of Physics & INFN, University of Rome ‘Tor Vergata’, Via della Ricerca Scientifica 1, 00133, Rome, Italy
Chiara Calascibetta & Luca Biferale
Laboratory of Physics of the École Normale Supérieure, 24 RueLhomond, 75005, Paris, France
Francesco Borra
Quantitative Life Sciences, The Abdus Salam International Centre for Theoretical Physics, ICTP, 34151, Trieste, Italy
Antonio Celani
Istituto dei Sistemi Complessi, CNR, Via dei Taurini 19, 00185, Rome, Italy
Massimo Cencini
INFN ‘Tor Vergata’, Rome, Italy
Massimo Cencini

Authors

Chiara Calascibetta
View author publications
You can also search for this author in PubMed Google Scholar
Luca Biferale
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Borra
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Celani
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Cencini
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors conceived the research. CC performed all the numerical simulations and data analysis. All authors discussed the results. CC wrote the paper with revision and input from all the authors.

Corresponding author

Correspondence to Chiara Calascibetta.

Additional information

T.I.: Quantitative AI in Complex Fluids and Complex Flows: Challenges and Benchmarks. Guest editors: Luca Biferale, Michele Buzzicotti, Massimo Cencini.

Appendix A: Q-learning implementation

To solve the optimization problem, we used the Q-learning algorithm [35] which is based on evaluating the action-value function, Q(s, a), that is the expected future cumulative reward given that the agents are in state s and take action a. The algorithm is expected to converge to the optimal policy by the following iterative trial-and-error protocol. At each decision time $t_j$, the agents pair measures its state $s_{t_j}$ and selects an action $a_{t_j}$ using an $\epsilon $-greedy strategy, where $a_{t_j}(s_{t_j})=\arg \max _a\{Q(s_{t_j},a)\}$ with probability $1-\epsilon $ or $a_{t_j}$ is chosen randomly with probability $\epsilon $. Then, we let the dynamical system evolve for a time $\tau $, according to (1), keeping both control directions and velocity intensity fixed. Afterward, the agents receive a reward $r_\textrm{tot}(t_{j+1})$ (11) and the Q-matrix is updated as

$$\begin{aligned} \begin{aligned} Q(s_{t_j},a_{t_j}) \leftarrow \,&Q(s_{t_j},a_{t_j}) + \alpha [r_\textrm{tot}(t_{j+1}) + \\ {}&+\max _{a} Q(s_{t_{j+1}},a)-Q(s_{t_j},a_{t_j})], \end{aligned} \end{aligned}$$

(17)

where $\alpha $ is the learning rate. Updates are repeated up to the end of the episode $t=T_{\max }$, when no reward is assigned. The learning protocol is repeated restarting with another pair with the same initial distance in another flow position until we reach a “local” optimum given by the equation $Q^*(s_{t_j},a)= r_\textrm{tot}(t_{j+1}) +\max _{a} Q^*(s_{t_{j+1}},a)$ and defined by the policy

$$\begin{aligned} a(s)=\arg \max _a\{Q^*(s,a)\}. \end{aligned}$$

In order to ease the convergence of the algorithm, the learning rate $\alpha $ is taken as a decreasing functions of the time spent in the state-action pair, while the exploration parameters decrease with the time spent in the visited state. Thus if n(s, a) is the number of decision times in which the couple (s, a) has been visited, and

$$\begin{aligned} n(s)=\sum _a n(s,a)/|{\mathcal {A}}|, \end{aligned}$$

(18)

$\epsilon $ and $\alpha $ are taken as:

$$\begin{aligned} \alpha= & {} 5/[200^{1/\gamma }+\tau n(s,a)]^{\gamma } \end{aligned}$$

(19)

$$\begin{aligned} \epsilon= & {} 5/[200^{1/\gamma }+\tau n(s)]^{\gamma } \end{aligned}$$

(20)

with $\gamma =4/5$, the numerical values of the constants have been determined after some preliminary tests. As for the initialization of the matrix Q, we have taken the same large (optimistic) value for all the state-action pairs.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Calascibetta, C., Biferale, L., Borra, F. et al. Taming Lagrangian chaos with multi-objective reinforcement learning. Eur. Phys. J. E 46, 9 (2023). https://doi.org/10.1140/epje/s10189-023-00271-0

Download citation

Received: 19 December 2022
Accepted: 15 February 2023
Published: 03 March 2023
DOI: https://doi.org/10.1140/epje/s10189-023-00271-0

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Taming Lagrangian chaos with multi-objective reinforcement learning

Abstract

Graphic abstract

Access this article

Similar content being viewed by others

Finding efficient swimming strategies in a three-dimensional chaotic flow by reinforcement learning

Steering undulatory micro-swimmers in a fluid flow through reinforcement learning

Optimal Control of Point-to-Point Navigation in Turbulent Time Dependent Flows Using Reinforcement Learning

Data availability statement

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Appendix A: Q-learning implementation

Appendix A: Q-learning implementation

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation