Pretty Darn Good Control: When are Approximate Solutions Better than Approximate Models

Montealegre-Mora, Felipe; Lapeyrolerie, Marcus; Chapman, Melissa; Keller, Abigail G.; Boettiger, Carl

doi:10.1007/s11538-023-01198-5

Pretty Darn Good Control: When are Approximate Solutions Better than Approximate Models

Original Article
Published: 04 September 2023

Volume 85, article number 95, (2023)
Cite this article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

282 Accesses
1 Altmetric
Explore all metrics

Abstract

Existing methods for optimal control struggle to deal with the complexity commonly encountered in real-world systems, including dimensionality, process error, model bias and data heterogeneity. Instead of tackling these system complexities directly, researchers have typically sought to simplify models to fit optimal control methods. But when is the optimal solution to an approximate, stylized model better than an approximate solution to a more accurate model? While this question has largely gone unanswered owing to the difficulty of finding even approximate solutions for complex models, recent algorithmic and computational advances in deep reinforcement learning (DRL) might finally allow us to address these questions. DRL methods have to date been applied primarily in the context of games or robotic mechanics, which operate under precisely known rules. Here, we demonstrate the ability for DRL algorithms using deep neural networks to successfully approximate solutions (the “policy function” or control rule) in a non-linear three-variable model for a fishery without knowing or ever attempting to infer a model for the process itself. We find that the reinforcement learning agent discovers a policy that outperforms both constant escapement and constant mortality policies—the standard family of policies considered in fishery management. This DRL policy has the shape of a constant escapement policy whose escapement values depend on the stock sizes of other species in the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

Notes

A repository with all the relevant code to reproduce our results may be found at https://github.com/boettiger-lab/approx-model-or-approx-soln in the “src” subdirectory. The data used is found in the “data” subdirectory, but the user may use the code provided to generate new data sets.
As will be explained later, all our models are stochastic. If we set stochasticity to zero in Model 1, CMort matches the performance of the other management strategies.
In our mathematical formulation of the decision problem, we have assumed for simplicity that the fishing effort cost is zero and that fish price is stable over time. This way, we equate economic output with harvested biomass.
In this sense, it is important to note that the classical management strategies we compare against have a similar flow of information. Namely, data is used to estimate a dynamical model, and this model is used to generate a policy function. The difference to our approach is located in the process of *how* the model is used to optimize a policy. Because of this difference, RL-based approaches can produce good heuristic solutions for complex problems.
Transition operators are commonly discussed without having a direct time-dependence for simplicity, but the inclusion of t as an argument to T does not alter the structure of the learning problem appreciably.
Policies are, in general, functions from state space to policy space. In our paper, these are $\pi :[0,1]^{\times 3}\rightarrow {\mathbb {R}}_+$ for the single fishery case, and $\pi :[0,1]^{\times 3}\rightarrow {\mathbb {R}}_+^2$ for two fisheries. The space of all such functions is highly singular, spanning a non-separable Hilbert space Even restricting ourselves to continuous policy functions, we end up with a set of policies which span the infinite dimensional space $L^2([0,1]^{\times 3})$. One way to avoid optimizing over an infinite dimensional ambient space is to discretize state space into a set of bins. This approach runs into tractability problems: First, the dimension of policy space scales exponentially with the number of species. Second, even for a fixed number of species (e.g., 3), the dimension optimized over can be prohibitively large—for example if one uses 1000 bins for each population in a three-species model, the overall number of parameters being optimized over is $10^9$. Neural networks with much smaller number of parameters, on the other hand, can be quite expressive and sufficient to find a rather good (if not optimal) policy function.
All our agents were trained in a local server with two commercial GPUs. The training time was between 30 min and one hour in each case.
https://docs.ray.io/
As noted before, here we equate economic profit with biomass caught. This is done as an approximation to convey the conceptual message more clearly, and we do not expect our results to significantly change if, e.g., “effort cost” is included in the reward function. When we refer to “large differences” in profit, or “paying dearly,” we mean that the ratio between average rewards is considerable—e.g. a 15% loss in profit.
The raw dataset is found at the data/results_data/2FISHERY/RXDRIFT sub-directory in the repository with the source code and data linked above. Scatter plots visualizing this policy are shown in Appendix B.

References

Anderson BDO, Moore JB (2007) Optimal Control: Linear Quadratic Methods. Courier Corporation, USA
Google Scholar
Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: an evaluation platform for general agents. J Artif Intell Res 47:253–79
Article Google Scholar
Burgess MG, Giacomini HC, Szuwalski CS, Costello C, Gaines SD (2017) Describing ecosystem contexts with single-species models: a theoretical synthesis for fisheries. Fish Fish 18(2):264–84
Article Google Scholar
Chapman M, Xu L, Lapeyrolerie M, Boettiger C (2023) Bridging adaptive management and reinforcement learning for more robust decisions. Philos Trans Royal Soc B 378(1881):20220195
Article Google Scholar
Clark CW (1990) Mathematical bioeconomics: the optimal management of renewable resources, 2nd edn. Wiley-Interscience, UK
MATH Google Scholar
Clark CW (1973) Profit maximization and the extinction of animal species. J Polit Econ 81(4):950–61. https://doi.org/10.1086/260090
Article Google Scholar
Collins MSFB, Tett SFB, Cooper C (2001) The internal climate variability of HadCM3, a version of the Hadley Centre coupled model without flux adjustments. Clim Dyn 17:61–81
Article Google Scholar
Costello C, Ovando D, Clavelle T, Strauss CK, Hilborn R, Melnychuk MC, Branch TA et al (2016) lobal fishery prospects under contrasting management regimes. Proc Nat Acad Sci 113(18):5125–29. https://doi.org/10.1073/pnas.1520420113
Article Google Scholar
Degrave J, Felici F, Buchli J, Neunert M, Tracey B, Carpanese F, Ewalds T et al (2022) Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602(7897):414–19
Article Google Scholar
François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J et al (2018) An introduction to deep reinforcement learning. Found Trends in Mach ® Learn 11(3–4):219–354
Article MATH Google Scholar
Gordon C, Cooper C, Senior CA, Banks H, Gregory JM, Johns TC, Mitchell JFB, Wood RA (2000) The simulation of SST, sea ice extents and ocean heat transports in a version of the hadley centre coupled model without flux adjustments. Clim Dyn 16:147–68
Article Google Scholar
Gordon HS, Press C (1954) The economic theory of a common-property resource: the fishery. J Polit Econ 62(2):124–42. https://doi.org/10.1086/257497
Article Google Scholar
Janner M, Fu J, Zhang M, Levine S (2019) When to trust your model: model-based policy optimization. arXiv:1906.08253 [Cs, Stat]
Lapeyrolerie M, Chapman MS, Norman KEA, Boettiger C (2022) Deep reinforcement learning for conservation decisions. Methods Ecol Evol 13(11):2649–62
Article Google Scholar
Mangel M (2006) The theoretical biologist’s toolbox: quantitative methods for ecology and evolutionary biology. Cambridge University Press, UK
Book Google Scholar
Marescot L, Chapron G, Chadès I, Fackler PL, Duchamp C, Marboutin E, Gimenez O (2013) Complex decisions made simple: a primer on stochastic dynamic programming. Methods Ecol Evol 4(9):872–84
Article Google Scholar
May RM (1977) Thresholds and breakpoints in ecosystems with a multiplicity of stable states. Nature 269(5628):471–77
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M. (2013) Playing atari with deep reinforcement learning. arXiv Preprint arXiv:1312.5602
Moerland TM, Broekens J, Plaat A, Jonker CM et al (2023) Model-based reinforcement learning: a survey. Found Trends ® Mach Learn 16(1):1–118
Article MATH Google Scholar
OpenAI (2022) ChatGPT: Optimizing Language Models for Dialogue. https://openai.com/blog/chatgpt/
Polydoros AS, Nalpantidis L (2017) Survey of model-based reinforcement learning: applications on robotics. J Intell Robot Syst 86(2):153–73
Article Google Scholar
Pope VD, Gallani ML, Rowntree PR, Stratton RA (2000) The impact of new physical parametrizations in the Hadley Centre climate model: HadAM3. Clim Dyn 16:123–46
Article Google Scholar
Punt AE, Butterworth DS, de Moor CL, De Oliveira JAA, Haddon M (2016) Management strategy evaluation: best practices. Fish Fish 17(2):303–34. https://doi.org/10.1111/faf.12104
Article Google Scholar
RAM Legacy Stock Assessment Database (2020) RAM Legacy Stock Assessment Database V4.491. https://doi.org/10.5281/zenodo.3676088
Ramirez J, Yu W, Perrusquia A (2022) Model-free reinforcement learning from expert demonstrations: a survey. Artif Intell Rev 1:1–29
Google Scholar
Riahi K, Van Vuuren DP, Kriegler E, Edmonds J, O’neill BC, Fujimori S, Bauer N (2017) The shared socioeconomic pathways and their energy, land use, and greenhouse gas emissions implications: an overview. Glob Environ Chang 42:153–68
Article Google Scholar
Sato Y (2019) Model-free reinforcement learning for financial portfolios: a brief survey. arXiv Preprint arXiv:1904.04973
Schaefer MB (1954) Some aspects of the dynamics of populations important to the management of the commercial marine fisheries. Bull Inter-Am Tropical Tuna Comm 1(2):27–56. https://doi.org/10.1007/BF02464432
Article Google Scholar
Seo J, Na Y-S, Kim B, Lee CY, Park MS, Park SJ, Lee YH (2022) Development of an operation trajectory design algorithm for control of multiple 0d parameters using deep reinforcement learning in KSTAR. Nucl Fusion 62(8):086049
Article Google Scholar
Sethi SP, Sethi SP (2019) What is optimal control theory? Springer, USA
Book MATH Google Scholar
Worm B, Barbier EB, Beaumont N, Duffy JE, Folke C, Halpern BS, Jackson JBC et al (2006) Impacts of biodiversity loss on ocean ecosystem services. Science 314(5800):787–90. https://doi.org/10.1126/science.1132294
Article Google Scholar
Zeng D, Gu L, Pan S, Cai J, Guo S (2019) Resource management at the network edge: a deep reinforcement learning approach. IEEE Netw 33(3):26–33
Article Google Scholar
Zhang Y, Li S, Liao L (2019) Near-optimal control of nonlinear dynamical systems: a brief survey. Annu Rev Control 47:71–80
Article MathSciNet Google Scholar

Download references

Acknowledgements

The title of this piece references a mathematical biology workshop at NIMBioS organized by Paul Armsworth, Alan Hastings, Megan Donahue, and Carl Towes in 2011 which first sought to emphasize ‘pretty darn good’ control solutions to more realistic problems over optimal control to idealized ones. This material is based upon work supported by the National Science Foundation under Grant No. DBI-1942280.

Author information

Authors and Affiliations

University of California Berkeley, Berkeley, USA
Felipe Montealegre-Mora, Marcus Lapeyrolerie, Melissa Chapman, Abigail G. Keller & Carl Boettiger

Authors

Felipe Montealegre-Mora
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Lapeyrolerie
View author publications
You can also search for this author in PubMed Google Scholar
Melissa Chapman
View author publications
You can also search for this author in PubMed Google Scholar
Abigail G. Keller
View author publications
You can also search for this author in PubMed Google Scholar
Carl Boettiger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carl Boettiger.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: Results for Stationary Models

In the main text we focused on the non-stationary model (“three species, two fisheries, non-stationary” in Table 1) for the sake of space and because our results were most compelling there. Here we present the reward distributions for the other models considered—the three stationary models, lines 1-3 in Table 1. These results are shown in Figs. 11, 12 and 13.

Appendix: PPO Policy Function for Non-Stationary Model

In the main text, Fig. 7, we presented a visualization of the PPO+GP policy function obtained for the “three species, two fisheries, non-stationary” model. This policy function is a Gaussian process regression of scatter data of the PPO policy function. In Fig. 14 we present a representation of this scatter data in a similar format as Fig. 7.

Appendix: Gaussian Process Interpolation

Here we summarize the procedure used to interpolate the PPO policy (visualized in Fig. 14). We use the GaussianProcessRegressor object of the sklearn Python library with a kernel given by

$$\begin{aligned} \text {RBF}(\text {length scale = 10}) + \text {WhiteNoise}(\text {noise level = 0.1}). \end{aligned}$$

This interpolation method is applied to scatter data of the PPO policy evaluated on 3 different grids on $(X,Y,Z)$ states: $G_X$, a $51\times 5 \times 5$ grid; $G_Y$, a $5\times 51 \times 5$ grid; and $G_Z$, a $5\times 5 \times 51$ grid. This combination of grids was used instead of a single dense grid in order to reduce the computational intensity of the interpolation procedure. For $G_X$, the 5 values for $Y$ and $Z$ were varied in a “popular window,” i.e. episode time-series data was used to determine windows of $Y$ and $Z$ values which were most likely. The grids $G_Y$ and $G_Z$ were generated in a similar fashion, mutatis mutandis.^{Footnote 10} The length scale and noise level values of this kernel were chosen arbitrarily—no hyperparameter tuning was needed to produce satisfactory interpolation, as will be shown in the results section.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Montealegre-Mora, F., Lapeyrolerie, M., Chapman, M. et al. Pretty Darn Good Control: When are Approximate Solutions Better than Approximate Models. Bull Math Biol 85, 95 (2023). https://doi.org/10.1007/s11538-023-01198-5

Download citation

Received: 31 January 2023
Accepted: 01 August 2023
Published: 04 September 2023
DOI: https://doi.org/10.1007/s11538-023-01198-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pretty Darn Good Control: When are Approximate Solutions Better than Approximate Models

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Multi-agent deep reinforcement learning: a survey

Notes

References

Acknowledgements