Skip to main content
Log in

Taming Lagrangian chaos with multi-objective reinforcement learning

  • Regular Article - Flowing Matter
  • Published:
The European Physical Journal E Aims and scope Submit manuscript

Abstract

We consider the problem of two active particles in 2D complex flows with the multi-objective goals of minimizing both the dispersion rate and the control activation cost of the pair. We approach the problem by means of multi-objective reinforcement learning (MORL), combining scalarization techniques together with a Q-learning algorithm, for Lagrangian drifters that have variable swimming velocity. We show that MORL is able to find a set of trade-off solutions forming an optimal Pareto frontier. As a benchmark, we show that a set of heuristic strategies are dominated by the MORL solutions. We consider the situation in which the agents cannot update their control variables continuously, but only after a discrete (decision) time, \(\tau \). We show that there is a range of decision times, in between the Lyapunov time and the continuous updating limit, where reinforcement learning finds strategies that significantly improve over heuristics. In particular, we discuss how large decision times require enhanced knowledge of the flow, whereas for smaller \(\tau \) all a priori heuristic strategies become Pareto optimal.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability statement

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

  1. P. Lermusiaux, D. Subramani, J. Lin, C.S. Kulkarni, A. Gupta, A. Dutt, T. Lolla, P.J. Haley Jr., W. Hajj Ali, C. Mirabito, S. Jana, A future for intelligent autonomous ocean observing systems. J. Mar. Res. 75, 765–813 (2017)

    Article  Google Scholar 

  2. Y. Elor, A.M. Bruckstein, Two-robot source seeking with point measurements. Theor. Comput. Sci. 457, 76–85 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  3. W. Wu, I.D. Couzin, F. Zhang, Bio-inspired source seeking with no explicit gradient estimation. IFAC Proceedings Volumes 45(26), 240–245 (2012). (3rd IFAC Workshop on Distributed Estimation and Control in Networked Systems)

    Article  Google Scholar 

  4. FSTaxis Algorithm: Bio-Inspired Emergent Gradient Taxis, volume ALIFE 2016, the Fifteenth International Conference on the Synthesis and Simulation of Living Systems of ALIFE 2022: The 2022 Conference on Artificial Life, 07 (2016)

  5. C. Bechinger, R. Di Leonardo, H. Löwen, C. Reichhardt, G. Volpe, G. Volpe, Active particles in complex and crowded environments. Rev. Mod. Phys. 88(4), 045006 (2016)

    Article  ADS  MathSciNet  Google Scholar 

  6. A. Crisanti, M. Falcioni, A. Vulpiani, G. Paladin, Lagrangian chaos: transport, mixing and diffusion in fluids. Riv. Nuovo Cim. 14(12), 1–80 (1991)

    Article  ADS  MathSciNet  Google Scholar 

  7. M. Cencini, F. Cecconi, A. Vulpiani, Chaos: From Simple Models to Complex Systems. Series on Advances in Statistical Mechanics (World Scientific, Singapore, 2010)

    MATH  Google Scholar 

  8. F. Ginelli, The physics of the vicsek model. Eur. Phys. J. Spec. Top. 225(11), 2099–2117 (2016)

    Article  Google Scholar 

  9. M.C. Marchetti, J.F. Joanny, S. Ramaswamy, T.B. Liverpool, J. Prost, M. Rao, R. Aditi Simha, Hydrodynamics of soft active matter. Rev. Mod. Phys. 85, 1143–1189 (2013)

    Article  ADS  Google Scholar 

  10. M. Ballerini, N. Cabibbo, R. Candelier, A. Cavagna, E. Cisbani, I. Giardina, V. Lecomte, A. Orlandi, G. Parisi, A. Procaccini, M. Viale, V. Zdravkovic, Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study. Proc. Natl. Acad. Sci. 105(4), 1232–1237 (2008)

    Article  ADS  Google Scholar 

  11. N. Khurana, N.T. Ouellette, Stability of model flocks in turbulent-like flow. New J. Phys. 15(9), 095015 (2013)

    Article  ADS  MathSciNet  Google Scholar 

  12. L. Biferale, F. Bonaccorso, M. Buzzicotti, P. Clark Di Leoni, K. Gustavsson, Zermelo’s problem: Optimal point-to-point navigation in 2d turbulent flows using reinforcement learning. Chaos 29(10), 103138 (2019)

    Article  ADS  MathSciNet  Google Scholar 

  13. M. Buzzicotti, L. Biferale, F. Bonaccorso, P. Clark di Leoni, K. Gustavsson. Optimal control of point-to-point navigation in turbulent time dependent flows using reinforcement learning, in AIxIA 2020—Advances in Artificial Intelligence, 223–234. (Springer, Cham, 2021)

  14. J.K. Alageshan, A.K. Verma, J. Bec, R. Pandit, Machine learning strategies for path-planning microswimmers in turbulent flows. Phys. Rev. E 101, 043110 (2020)

    Article  ADS  Google Scholar 

  15. G. Reddy, A. Celani, T.J. Sejnowski, M. Vergassola, Learning to soar in turbulent environments. Proc. Natl. Acad. Sci. 113(33), E4877–E4884 (2016)

    Article  ADS  Google Scholar 

  16. G. Reddy, J. Wong-Ng, A. Celani, T.J. Sejnowski, M. Vergassola, Glider soaring via reinforcement learning in the field. Nature 562(7726), 236–239 (2018)

    Article  ADS  Google Scholar 

  17. N. Orzan, C. Leone, A. Mazzolini, J. Oyero, A.Celani. Optimizing airborne wind energy with reinforcement learning. Europ. Phys. J. E 46, 2 (2023)

  18. S. Verma, G. Novati, P. Koumoutsakos, Efficient collective swimming by harnessing vortices through deep reinforcement learning, in Proceedings of the National Academy of Sciences of the United States of America 115(23), 5849–5854 (2018)

  19. Z. Zou, Y. Liu, Y.N. Young, O.S. Pak, A.C.H. Tsang, Gait switching and target navigation of microswimmers via deep reinforcement learning. Commun. Phys. 5(1), 158 (2022)

    Article  Google Scholar 

  20. J. Qiu, N. Mousavi, L. Zhao, K. Gustavsson, Active gyrotactic stability of microswimmers using hydromechanical signals. Phys. Rev. Fluids 7(1), 014311 (2022)

    Article  ADS  Google Scholar 

  21. A. Daddi-Moussa-Ider, H. Löwen, B. Liebchen, Hydrodynamics can determine the optimal route for microswimmer navigation. Commun. Phys. 4, 15 (2021)

    Article  Google Scholar 

  22. F. Borra, L. Biferale, M. Cencini, A. Celani, Reinforcement learning for pursuit and evasion of microswimmers at low Reynolds number. Phys. Rev. Fluids 7(2), 023103 (2022)

    Article  ADS  Google Scholar 

  23. G. Zhu, W. Fang, L. Zhu, Optimizing low-Reynolds-number predation via optimal control and reinforcement learning. J. Fluid Mech. 944, A3 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  24. S. Goh, R. Winkler, G. Gompper, Noisy pursuit and pattern formation of self-steering active particles. New J. Phys. 24, 093039 (2022)

    Article  ADS  MathSciNet  Google Scholar 

  25. C.A.C. Coello. Handling preferences in evolutionary multiobjective optimization: a survey, in Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512), vol. 1 (2000), pp. 30–37

  26. C. Coello, D. Veldhuizen, G. Lamont. Evolutionary Algorithms for Solving Multi-Objective Problems Second Edition (Springer US, 2007)

  27. C. Liu, X. Xu, D. Hu, Multiobjective reinforcement learning: a comprehensive overview. IEEE Trans. Syst. Man Cybern. Syst. 45(3), 385–398 (2015)

    Article  Google Scholar 

  28. P. Vamplew, R. Dazeley, A. Berry, R. Issabekov, E. Dekker, Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach. Learn. 84(1–2), 51–80 (2011)

    Article  MathSciNet  Google Scholar 

  29. S. Natarajan P. Tadepalli. Dynamic preferences in multi-criteria reinforcement learning, in Proceedings of the 22nd International Conference on Machine Learning, ICML’05 (Association for Computing Machinery, New York, NY, USA, 2005), pp. 601–608

  30. A. Castelletti, G. Corani, A.E. Rizzoli, R. SonciniSessa, E. Weber, Reinforcement Learning in the Operational Management of a Water System (Pergamon Press, Oxford, 2002), p.325

    Google Scholar 

  31. P. Vamplew, J. Yearwood, R. Dazeley, A. Berry. On the limitations of scalarisation for multi-objective reinforcement learning of pareto fronts, in AI 2008: Advances in Artificial Intelligence (Springer Berlin Heidelberg, 2008), pp. 372–378

  32. E. Zitzler, L. Thiele, M. Laumanns, C.M. Fonseca, V.G. da Fonseca, Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans. Evol. Comput. 7(2), 117–132 (2003)

    Article  Google Scholar 

  33. J. Bec, Multifractal concentrations of inertial particles in smooth random flows. J. Fluid Mech. 528, 255–277 (2005)

  34. C.W. Gardiner, Handbook of Stochastic Methods for Physics, Chemistry, and the Natural Sciences. Springer complexity (Springer, Berlin, 2004)

    MATH  Google Scholar 

  35. R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT Press, Cambridge, 2018)

    MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 882340).

Author information

Authors and Affiliations

Authors

Contributions

All authors conceived the research. CC performed all the numerical simulations and data analysis. All authors discussed the results. CC wrote the paper with revision and input from all the authors.

Corresponding author

Correspondence to Chiara Calascibetta.

Additional information

T.I.: Quantitative AI in Complex Fluids and Complex Flows: Challenges and Benchmarks. Guest editors: Luca Biferale, Michele Buzzicotti, Massimo Cencini.

Appendix A: Q-learning implementation

Appendix A: Q-learning implementation

To solve the optimization problem, we used the Q-learning algorithm [35] which is based on evaluating the action-value function, Q(sa), that is the expected future cumulative reward given that the agents are in state s and take action a. The algorithm is expected to converge to the optimal policy by the following iterative trial-and-error protocol. At each decision time \(t_j\), the agents pair measures its state \(s_{t_j}\) and selects an action \(a_{t_j}\) using an \(\epsilon \)-greedy strategy, where \(a_{t_j}(s_{t_j})=\arg \max _a\{Q(s_{t_j},a)\}\) with probability \(1-\epsilon \) or \(a_{t_j}\) is chosen randomly with probability \(\epsilon \). Then, we let the dynamical system evolve for a time \(\tau \), according to (1), keeping both control directions and velocity intensity fixed. Afterward, the agents receive a reward \(r_\textrm{tot}(t_{j+1})\) (11) and the Q-matrix is updated as

$$\begin{aligned} \begin{aligned} Q(s_{t_j},a_{t_j}) \leftarrow \,&Q(s_{t_j},a_{t_j}) + \alpha [r_\textrm{tot}(t_{j+1}) + \\ {}&+\max _{a} Q(s_{t_{j+1}},a)-Q(s_{t_j},a_{t_j})], \end{aligned} \end{aligned}$$
(17)

where \(\alpha \) is the learning rate. Updates are repeated up to the end of the episode \(t=T_{\max }\), when no reward is assigned. The learning protocol is repeated restarting with another pair with the same initial distance in another flow position until we reach a “local” optimum given by the equation \(Q^*(s_{t_j},a)= r_\textrm{tot}(t_{j+1}) +\max _{a} Q^*(s_{t_{j+1}},a)\) and defined by the policy

$$\begin{aligned} a(s)=\arg \max _a\{Q^*(s,a)\}. \end{aligned}$$

In order to ease the convergence of the algorithm, the learning rate \(\alpha \) is taken as a decreasing functions of the time spent in the state-action pair, while the exploration parameters decrease with the time spent in the visited state. Thus if n(sa) is the number of decision times in which the couple (sa) has been visited, and

$$\begin{aligned} n(s)=\sum _a n(s,a)/|{\mathcal {A}}|, \end{aligned}$$
(18)

\(\epsilon \) and \(\alpha \) are taken as:

$$\begin{aligned} \alpha= & {} 5/[200^{1/\gamma }+\tau n(s,a)]^{\gamma } \end{aligned}$$
(19)
$$\begin{aligned} \epsilon= & {} 5/[200^{1/\gamma }+\tau n(s)]^{\gamma } \end{aligned}$$
(20)

with \(\gamma =4/5\), the numerical values of the constants have been determined after some preliminary tests. As for the initialization of the matrix Q, we have taken the same large (optimistic) value for all the state-action pairs.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Calascibetta, C., Biferale, L., Borra, F. et al. Taming Lagrangian chaos with multi-objective reinforcement learning. Eur. Phys. J. E 46, 9 (2023). https://doi.org/10.1140/epje/s10189-023-00271-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1140/epje/s10189-023-00271-0

Navigation