Skip to main content

Advertisement

Log in

Learning Equations from Biological Data with Limited Time Samples

  • Original Article
  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

Equation learning methods present a promising tool to aid scientists in the modeling process for biological data. Previous equation learning studies have demonstrated that these methods can infer models from rich datasets; however, the performance of these methods in the presence of common challenges from biological data has not been thoroughly explored. We present an equation learning methodology comprised of data denoising, equation learning, model selection and post-processing steps that infers a dynamical systems model from noisy spatiotemporal data. The performance of this methodology is thoroughly investigated in the face of several common challenges presented by biological data, namely, sparse data sampling, large noise levels, and heterogeneity between datasets. We find that this methodology can accurately infer the correct underlying equation and predict unobserved system dynamics from a small number of time samples when the data are sampled over a time interval exhibiting both linear and nonlinear dynamics. Our findings suggest that equation learning methods can be used for model discovery and selection in many areas of biology when an informative dataset is used. We focus on glioblastoma multiforme modeling as a case study in this work to highlight how these results are informative for data-driven modeling-based tumor invasion predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

Download references

Acknowledgements

Funding was provided by National Science Foundation (Grant Nos. 1638521, IOS-1838314), National Institute on Aging (Grant No. R21AG059099), National Institutes of Health (Grant No. U01CA220378), James S. McDonnell Foundation (Grant No. 220020264) and Engineering and Physical Sciences Research Council (Grant No. EP/N50970X/1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John T. Nardini.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This material was based upon work partially supported by the National Science Foundation under Grant DMS-1638521 to the Statistical and Applied Mathematical Sciences Institute and IOS-1838314 to KBF, and in part by National Institute of Aging Grant R21AG059099 to KBF. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. BM gratefully acknowledges Ph.D. studentship funding from the UK EPSRC (reference EP/N50970X/1). AHD, LC, and KRS gratefully acknowledge funding through the NIH U01CA220378 and the James S. McDonnell Foundation 220020264.

Appendices

Simulating a Learned Equation

To simulate the inferred equation represented by the sparse vector \({\hat{\xi }}\), we begin by removing all zero terms from \({\hat{\xi }}\) as well as the corresponding terms from \(\varTheta \). We can now define our inferred dynamical systems model as

$$\begin{aligned} u_t = \sum _i \xi _i \varTheta _i. \end{aligned}$$
(13)

We use the method of lines approach to simulate this equation, in which we discretize the right-hand side in space and then integrate along the t dimension. The Scipy integration subpackage (version 1.4.1) is used to integrate this equation over time using an explicit fourth-order Runge–Kutta method. We ensure that the simulation is stable by enforcing the CFL condition for an advection equation with speed \(2\sqrt{Dr}\) is satisfied, e.g., \(2\sqrt{Dr}\Delta t \le \Delta x\). Some inferred equations may not be well-posed, e.g., \(u_t=-u_{xx}\). If the time integration fails at any point, we manually set the model output to \(10^6\) everywhere to ensure this model is not selected as a final inferred model.

For the final inferred columns of \(\varTheta =[\varTheta _1 , \varTheta _2 , \dots , \varTheta _n]\), we define nonlinear stencils, \(A_{\varTheta _i}\) such that \(A_{\varTheta _i}u\approx \varTheta _n\). As an example, we an upwind stencil (LeVeque 2007) for first-order derivative terms, such as \(A_{u_x}\), so that \(A_{u_x}u\approx u_x\). We use a central difference stencil for \(A_{u_{xx}}\). For multiplicative terms, we define the stencil for \(A_{uu_x}\) as \(A_{uu_x}v=u\odot (A_{u_x}v),\) where \(\odot \) denotes element-wise multiplication so that \(A_{uu_x}u \approx uu_x\). Similarly, we set \(A_{u_xu_{xx}}=A_{u_x}A_{u_{xx}}\), etc.

Table 5 Learned 1d equations from our equation learning methodology for all simulations with 5% noisy data

Learning the 1d Fisher–KPP Equation with 5% Noisy Data

In Table 5, we present the inferred equations for all 1d datasets considered with \(\sigma = 0.05\).

The slow simulation on the short time interval For noisy data sampled over the short time interval for the slow simulation, our equation learning methodology does not infer the correct underlying equation for any values of N considered. Simulating the inferred equation for \(N=10\) time samples over the short time scale does not lead to an accurate description of the true underlying dynamics on the short time interval or prediction of the true dynamics on the long time interval.

The slow simulation on the long time interval Over the long time interval, our equation learning methodology does infer the Fisher–KPP equation with \(N=10\) time samples. Simulating the inferred equation for \(N=10\) time samples over the long time scale accurately matches the true underlying dynamics on the long time interval and accurately predicts the true dynamics on the short time interval.

The diffuse simulation on the short time interval For noisy data sampled over the short time interval for the diffuse simulation, our equation learning methodology does not infer the correct underlying equation for any values of N considered. Simulating the inferred equation for \(N=10\) time samples over the short time scale does not lead to an accurate description of the true underlying dynamics on the short time interval or prediction of the true dynamics on the long time interval.

The diffuse simulation on the long time interval Over the long time interval, our equation learning methodology does infer the Fisher–KPP equation with \(N=3\) time samples. Simulating the inferred equation for \(N=10\) time samples over the long time scale accurately matches the true underlying dynamics on the long time interval and accurately predicts the true dynamics on the short time interval (Fig. 11 in “Appendix C”).

The fast simulation on the short time interval For noisy data sampled over the short time interval for the fast simulation, our equation learning methodology infers the Fisher–KPP equation with \(N=10\) time samples. Simulating the inferred equation for \(N=10\) time samples over the short time scale accurately matches the true underlying dynamics on the short time interval and accurately predicts the true dynamics on the long time interval (Fig. 11 in “Appendix C”).

The fast simulation on the long time interval Over the long time interval, our equation learning methodology does not infer the correct underlying equation for any values of N considered. Simulating the inferred equation for \(N=10\) time samples over the short long scale does lead to an accurate description of the true underlying dynamics on the long time interval or prediction of the true dynamics on the short time interval.

The nodular simulation on the short time interval For noisy data sampled over the short time interval for the nodular simulation, our equation learning methodology infers the Fisher–KPP equation with \(N=3\) time samples. Simulating the inferred equation for \(N=10\) time samples over the short time scale does not lead to an accurate description of the true underlying dynamics on the short time interval or prediction of the true dynamics on the long time interval.

The nodular simulation on the long time interval Over the long time interval, our equation learning methodology infers the Fisher–KPP equation with \(N=10\) time samples. Simulating the inferred equation for \(N=10\) time samples over the long time scale accurately matches the true underlying dynamics on the long time interval and accurately predicts the true dynamics on the short time interval.

Fit and Predicted Dynamics

The fit and predicted system dynamics for the diffuse, fast, and nodular s with 1% noise and \(N=5\) time samples are depicted in Figs. 8, 9, and 10, respectively. The fit and predicted dynamics for the diffuse and fast s with 5% noise and \(N=10\) time samples are depicted in Fig. 11.

Fig. 8
figure 8

Fit and predicted dynamics for the fast with \(N=5\) time samples and \(1\%\) noise. a The simulated learned equation for the fast that was inferred from data sampled over the time interval [0,0.5]. b The model that was inferred over the time interval [0,0.5] is used to predict the dynamics over the time interval [0,3]. c The simulated learned equation for the fast that was inferred from data sampled over the time interval [0,3]. d The model that was inferred over the time interval [0,3] is used to predict the dynamics over the time interval [0,0.5]. Simulated models are shown in solid lines, and the true underlying dynamics are shown by dots

Fig. 9
figure 9

Fit and predicted dynamics for the diffuse with \(N=5\) time samples and \(1\%\) noise. a The simulated learned equation for the diffuse that was inferred from data sampled over the time interval [0,0.5]. b The model that was inferred over the time interval [0,0.5] is used to predict the dynamics over the time interval [0,3]. c The simulated learned equation for the diffuse that was inferred from data sampled over the time interval [0,3]. d The model that was inferred over the time interval [0,3] is used to predict the dynamics over the time interval [0,0.5]

Fig. 10
figure 10

Fit and predicted dynamics for the nodular with \(N=5\) time samples and \(1\%\) noise. a The simulated learned equation for the nodular that was inferred from data sampled over the time interval [0,0.5]. b The model that was inferred over the time interval [0,0.5] is used to predict the dynamics over the time interval [0,3]. c The simulated learned equation for the nodular that was inferred from data sampled over the time interval [0,3]. d The model that was inferred over the time interval [0,3] is used to predict the dynamics over the time interval [0,0.5]. While the simulations in part c may appear to be the result of an unstable numerical simulation, it instead is the result of a noisy initial condition combined with a an inferred ODE model of the form \(u_t=-28.58u^2+28.55u\). Small bumps in the initial condition grow to confluence over time as depicted in this figure

Fig. 11
figure 11

Sample fit and predicted dynamics for s with \(N=10\) time samples and \(5\%\) noise. a The simulated learned equation for the diffuse that was inferred from data sampled over the time interval [0,3]. b The model that was inferred over the time interval [0,3] is used to predict the dynamics over the time interval [0,0.5]. c The simulated learned equation for the fast that was inferred from data sampled over the time interval [0,0.5]. d The model that was inferred over the time interval [0,0.5] is used to predict the dynamics over the time interval [0,3]

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nardini, J.T., Lagergren, J.H., Hawkins-Daarud, A. et al. Learning Equations from Biological Data with Limited Time Samples. Bull Math Biol 82, 119 (2020). https://doi.org/10.1007/s11538-020-00794-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11538-020-00794-z

Keywords

Navigation