Skip to main content

IDENT: Identifying Differential Equations with Numerical Time Evolution


Identifying unknown differential equations from a given set of discrete time dependent data is a challenging problem. A small amount of noise can make the recovery unstable. Nonlinearity and varying coefficients add complexity to the problem. We assume that the governing partial differential equation (PDE) is a linear combination of few differential terms in a prescribed dictionary, and the objective of this paper is to find the correct coefficients. We propose a new direction based on the fundamental convergence principle of numerical PDE schemes. We utilize Lasso for efficiency, and a performance guarantee is established based on an incoherence property. The main contribution is to validate and correct the results by time evolution error (TEE). A new algorithm, called identifying differential equations with numerical time evolution (IDENT), is explored for data with non-periodic boundary conditions, noisy data and PDEs with varying coefficients. Based on the recovery theory of Lasso, we propose a new definition of Noise-to-Signal ratio, which better represents the level of noise in the case of PDE identification. The effects of data generations and downsampling are systematically analyzed and tested. For noisy data, we propose an order preserving denoising method called least-squares moving average (LSMA), to preprocess the given data. For the identification of PDEs with varying coefficients, we propose to add Base Element Expansion (BEE) to aid the computation. Various numerical experiments from basic tests to noisy data, downsampling effects and varying coefficients are presented.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13


  1. 1.

    Barth, T., Frederickson, P.: Higher order solution of the Euler equations on unstructured grids using quadratic reconstruction. In 28th Aerospace Sciences Meeting, p. 13 (1990)

  2. 2.

    Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, London (2014)

    MATH  Google Scholar 

  3. 3.

    Bongard, J., Lipson, H.: Automated reverse engineering of nonlinear dynamical systems. Proc. Natl. Acad. Sci. 104(24), 9943–9948 (2007)

    Article  Google Scholar 

  4. 4.

    Bongini, M., Fornasier, M., Hansen, M., Maggioni, M.: Inferring interaction rules from observations of evolutive systems i: the variational approach. Math. Models Methods Appl. Sci. 27(05), 909–951 (2017)

    MathSciNet  Article  Google Scholar 

  5. 5.

    Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends ® Mach. Learn. 3(1), 1–122 (2011)

    MATH  Google Scholar 

  6. 6.

    Brunton, S.L., Proctor, J.L., Kutz, J.N.: Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. 113(15), 3932–3937 (2016)

    MathSciNet  Article  Google Scholar 

  7. 7.

    Candès, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)

    MathSciNet  Article  Google Scholar 

  8. 8.

    Donoho, D.L., Huo, X.: Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inf. Theory 47(7), 2845–2862 (2001)

    MathSciNet  Article  Google Scholar 

  9. 9.

    Fannjiang, A., Liao, W.: Coherence pattern-guided compressive sensing with unresolved grids. SIAM J. Imaging Sci. 5(1), 179–202 (2012)

    MathSciNet  Article  Google Scholar 

  10. 10.

    Fuchs, J.-J.: On sparse representations in arbitrary redundant bases. IEEE Trans. Inf. Theory 50(6), 1341–1344 (2004)

    MathSciNet  Article  Google Scholar 

  11. 11.

    Harten, A., Engquist, B., Osher, S., Chakravarthy, S.R.: Uniformly high order accurate essentially non-oscillatory schemes, iii. J. Comput. Phys. 71(2), 231–303 (1987)

    MathSciNet  Article  Google Scholar 

  12. 12.

    Changqing, H., Shu, C.-W.: Weighted essentially non-oscillatory schemes on triangular meshes. J. Comput. Phys. 150(1), 97–127 (1999)

    MathSciNet  Article  Google Scholar 

  13. 13.

    Kaiser, E., Kutz, J.N., Brunton, S.L.: Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proc. R. Soc. A 474(2219), 20180335 (2018)

    MathSciNet  Article  Google Scholar 

  14. 14.

    Khoo, Y., Ying, L.: Switchnet: a neural network model for forward and inverse scattering problems. arXiv preprint arXiv:1810.09675 (2018)

  15. 15.

    Liu, Y., Shu, C.-W., Tadmor, E., Zhang, M.: Central discontinuous Galerkin methods on overlapping cells with a non-oscillatory hierarchical reconstruction. SIAM J. Numer. Anal. 45, 2442–2467 (2007)

    MathSciNet  Article  Google Scholar 

  16. 16.

    Loiseau, J.-C., Brunton, S.L.: Constrained sparse Galerkin regression. J. Fluid Mech. 838, 42–67 (2018)

    MathSciNet  Article  Google Scholar 

  17. 17.

    Long, Z., Lu, Y., Ma, X., Dong, B.: PDE-Net: learning PDEs from data. arXiv preprint arXiv:1710.09668 (2017)

  18. 18.

    Lu, F., Zhong, M., Tang, S., Maggioni, M.: Nonparametric inference of interaction laws in systems of agents from trajectory data. arXiv preprint arXiv:1812.06003 (2018)

  19. 19.

    Lusch, B., Kutz, J.N., Brunton, S.L.: Deep learning for universal linear embeddings of nonlinear dynamics. Nat. Commun. 9(1), 4950 (2018)

    Article  Google Scholar 

  20. 20.

    Mangan, N.M., Kutz, J.N., Brunton, S.L., Proctor, J.L.: Model selection for dynamical systems via sparse regression and information criteria. Proc. R. Soc. A 473(2204), 20170009 (2017)

    MathSciNet  Article  Google Scholar 

  21. 21.

    Qin, T., Wu, K., Xiu, D.: Data driven governing equations approximation using deep neural networks. arXiv preprint arXiv:1811.05537 (2018)

  22. 22.

    Raissi, M.: Deep hidden physics models: Deep learning of nonlinear partial differential equations. arXiv preprint arXiv:1801.06637 (2018)

  23. 23.

    Raissi, M., Karniadakis, G.E.: Hidden physics models: machine learning of nonlinear partial differential equations. J. Comput. Phys. 357, 125–141 (2018)

    MathSciNet  Article  Google Scholar 

  24. 24.

    Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics informed deep learning (part i): data-driven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561 (2017)

  25. 25.

    Rudy, S.H., Brunton, S.L., Proctor, J.L., Kutz, J.N.: Data-driven discovery of partial differential equations. Sci. Adv. 3(4), e1602614 (2017)

    Article  Google Scholar 

  26. 26.

    Schaeffer, H.: Learning partial differential equations via data discovery and sparse optimization. Proc. R. Soc. A Math. Phys. Eng. Sci. 473(2197), 20160446 (2017)

    MathSciNet  MATH  Google Scholar 

  27. 27.

    Schaeffer, H., Caflisch, R., Hauck, C.D., Osher, S.: Sparse dynamics for partial differential equations. Proc. Nat. Acad. Sci. 110(17), 6634–6639 (2013)

    MathSciNet  Article  Google Scholar 

  28. 28.

    Schaeffer, H., Tran, G., Ward, R.: Extracting sparse high-dimensional dynamics from limited data. SIAM J. Appl. Math. 78(6), 3279–3295 (2018)

    MathSciNet  Article  Google Scholar 

  29. 29.

    Schmidt, M., Lipson, H.: Distilling free-form natural laws from experimental data. Science 324(5923), 81–85 (2009)

    Article  Google Scholar 

  30. 30.

    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  31. 31.

    Tran, G., Ward, R.: Exact recovery of chaotic systems from highly corrupted data. Multisc. Model. Simul. 15(3), 1108–1129 (2017)

    MathSciNet  Article  Google Scholar 

  32. 32.

    Tropp, J.A.: Just relax: convex programming methods for subset selection and sparse approximation. ICES report, 404 (2004)

  33. 33.

    Tropp, J.A.: Just relax: convex programming methods for identifying sparse signals in noise. IEEE Trans. Inf. Theory 52(3), 1030–1051 (2006)

    MathSciNet  Article  Google Scholar 

  34. 34.

    Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68(1), 49–67 (2006)

    MathSciNet  Article  Google Scholar 

  35. 35.

    Zhang, S., Lin, G.: Robust data-driven discovery of governing physical laws with error bars. Proc. R. Soc. A 474(2217), 20180305 (2018)

    MathSciNet  Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Wenjing Liao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

S. H. Kang: Research is supported in part by Simons Foundation Grants 282311 and 584960.

W. Liao: Research is supported in part by the NSF Grants DMS 1818751 and DMS 2012652.

Y. Liu: Research is supported in part by NSF Grants DMS-1522585 and DMS-CDS&E-MSS-1622453.

Appendix A: Recovery Theory of Lasso with a Weighted \(L^1\) Norm

Appendix A: Recovery Theory of Lasso with a Weighted \(L^1\) Norm

In the field of compressive sensing, performance guarantees for the recovery of sparse vectors from a small number of noisy linear measurements by Lasso have been established when the sensing matrix satisfies an incoherence property [8] or a restricted isometry property [7]. We establish the incoherence property of Lasso for the case of identifying PDE, where a weighted \(L^1\) norm is used.

Given a sensing matrix \(\varPhi \in {\mathbb {R}}^{n \times m}\) and the noisy measurement

$$\begin{aligned} \mathbf {b}= \varPhi \mathbf {x}^{\mathrm{opt}} + \mathbf {e}\end{aligned}$$

where \(\mathbf {x}^\mathrm{opt}\) is s-sparse (\(\Vert \mathbf {x}^\mathrm{opt}\Vert _0 =s\)), the goal is to recover \(\mathbf {x}_{\mathrm{opt}}\) in a robust way. Denote the support of \(\mathbf {x}^{\mathrm{opt}}\) by \(\varLambda \) and let \(\varPhi _\varLambda \) be the submatrix of \(\varPhi \) whose columns are restricted on \(\varLambda \). Suppose \(\varPhi = [\phi [1] \ \phi [2] \ \ldots \phi [m]]\) where all \(\phi [j]\)’s have unit norm. Let the mutual coherence of \(\varPhi \) be

$$\begin{aligned} \mu (\varPhi ) = \max _{j\ne l} |\phi [j]^T \phi [l]|. \end{aligned}$$

The principle of Lasso with a weighted \(L^1\) norm is to solve

$$\begin{aligned} \min _{\mathbf {x}} \frac{1}{2} \Vert \varPhi {\mathbf {x}}-\mathbf {b}\Vert _2^2 + \gamma \Vert W\mathbf {x}\Vert _1 \end{aligned}$$

where \(W = \mathrm{diag}(w_1,w_2,\ldots ,w_m), w_j \ne 0, j=1,\ldots ,m\) and \(\gamma \) is a balancing parameter. Let \(w_{\max } = \max _{j}|w_j|\) and \(w_{\min } = \min _{j}|w_j|\). Lasso successfully recovers the support of \(\mathbf {x}^\mathrm{opt}\) when \(\mu (\varPhi )\) is sufficiently small. The following proposition is a generalization of Theorem 8 in [33] from \(L^1\) norm regularization to weighted \(L^1\) norm regularization.

Proposition 1

Suppose the support of \(\mathbf {x}^{\mathrm{opt}}\), denoted by \(\varLambda \), contains no more than s indices, \(\mu (s-1)<1\) and

$$\begin{aligned} \frac{\mu s}{1-\mu (s-1)}< \frac{w_{\min }}{w_{\max }}. \end{aligned}$$


$$\begin{aligned} \gamma = \frac{1-\mu (s-1)}{w_{\min }[1 -\mu (s-1)] - w_{\max } \mu s}\Vert e\Vert _2^+, \end{aligned}$$

and \(\mathbf {x}(\gamma )\) be the minimizer of (W-Lasso). Then

  1. 1)

    the support of \(\mathbf {x}(\gamma )\) is contained in \(\varLambda \);

  2. 2)

    the distance between \(\mathbf {x}(\gamma )\) and \(\mathbf {x}^{\mathrm{opt}}\) satisfies

    $$\begin{aligned} \Vert \mathbf {x}(\gamma ) - \mathbf {x}^{\mathrm{opt}}\Vert _\infty \le \frac{w_{\max }}{w_{\min }[1- \mu (s-1)] - w_{\max } \mu s}\Vert \mathbf {e}\Vert _2; \end{aligned}$$
  3. 3)


    $$\begin{aligned} \mathbf {x}^\mathrm{opt}_{\min } := \min _{j \in \varLambda } |x_j^\mathrm{opt}| > \frac{w_{\max }}{w_{\min }[1 -\mu (s-1)] - w_{\max } \mu s}\Vert \mathbf {e}\Vert _2, \end{aligned}$$

    then \(\mathrm{supp}(\mathbf {x}(\gamma )) = \varLambda \).


Under the condition \(\mu (s-1)<1\), \(\varLambda \) indexes a linearly independent collection of columns of \(\varPhi \). Let \(\mathbf {x}^\star \) be the minimizer of (W-Lasso) over all vectors supported on \(\varLambda \). A necessary and sufficient condition on such a minimizer is that

$$\begin{aligned} \mathbf {x}^\mathrm{opt}- \mathbf {x}^\star = \gamma (\varPhi _\varLambda ^*\varPhi _\varLambda )^{-1} \mathbf {g}- (\varPhi _\varLambda ^*\varPhi _\varLambda )^{-1} \varPhi _\varLambda ^*\mathbf {e}\end{aligned}$$

where \(\mathbf {g}\in \partial \Vert W\mathbf {x}^\star \Vert _1\), meaning \(g_j = w_j\mathrm{sign}(x^\star )\) whenever \(x^\star _j \ne 0\) and \(|g_j| \le w_j\) whenever \(x^\star _j = 0\). It follows that \(\Vert \mathbf {g}\Vert _\infty \le w_{\max }\) and

$$\begin{aligned} \Vert \mathbf {x}^\star -\mathbf {x}^\mathrm{opt}\Vert _\infty \le \gamma \Vert (\varPhi _\varLambda ^*\varPhi _\varLambda )^{-1}\Vert _{\infty ,\infty } (w_{\max }+\Vert \mathbf {e}\Vert _2). \end{aligned}$$

Next we prove \(x^\star \) is also the global minimizer of (W-Lasso) by demonstrating that the objective function increases when we change any other component of \(\mathbf {x}^\star \). Let

$$\begin{aligned} L(\mathbf {x}) = \frac{1}{2} \Vert \varPhi \mathbf {x}-\mathbf {b}\Vert _2^2 + \gamma \Vert W\mathbf {x}\Vert _1. \end{aligned}$$

Choose an index \(\omega \notin \varLambda \) and let \(\delta \) be a nonzero scalar. We will develop a condition which ensures that

$$\begin{aligned} L(\mathbf {x}^\star + \delta \mathbf {e}_\omega ) - L(\mathbf {x}^\star )>0 \end{aligned}$$

where \(\mathbf {e}_\omega \) is the \(\omega \)th standard basis vector. Notice that

$$\begin{aligned} L(\mathbf {x}^\star + \delta \mathbf {e}_\omega ) - L(\mathbf {x}^\star )&= \frac{1}{2}\left[ \Vert \varPhi (\mathbf {x}^\star +\delta \mathbf {e}_\omega )-\mathbf {b}\Vert _2^2 - \Vert \varPhi \mathbf {x}^\star -\mathbf {b}\Vert _2^2\right] \\&\quad + \gamma \left( \Vert W(\mathbf {x}^\star +\delta \mathbf {e}_\omega )\Vert _1- \Vert W\mathbf {x}\Vert _1\right) \\&= \frac{1}{2} \Vert \delta \phi [\omega ]\Vert ^2 + \mathrm{Re} \langle \varPhi \mathbf {x}^\star -\mathbf {b},\delta \phi [\omega ] \rangle + \gamma |w_\omega \delta |\\&> \mathrm{Re} \langle \varPhi \mathbf {x}^\star -\mathbf {b},\delta \phi [\omega ] \rangle + \gamma |w_\omega \delta |\\&\ge \gamma w_{\min }|\delta | - |\langle \varPhi \mathbf {x}^\star -\varPhi \mathbf {x}^\mathrm{opt}- \mathbf {e},\delta \phi [\omega ] \rangle | \text { since } \mathbf {b}= \varPhi \mathbf {x}^\mathrm{opt}+\mathbf {e}\\&= \gamma w_{\min }|\delta | - |\langle \varPhi _\varLambda \mathbf {x}_{\varLambda }^\star -\varPhi _\varLambda \mathbf {x}_{\varLambda }^\mathrm{opt}- \mathbf {e},\delta \phi [\omega ] \rangle | \\&\ge \gamma w_{\min }|\delta | - |\langle \varPhi _\varLambda (\mathbf {x}_{\varLambda }^\star - \mathbf {x}_{\varLambda }^\mathrm{opt}),\delta \phi [\omega ] \rangle | - |\langle \mathbf {e},\delta \phi [\omega ]\rangle | \\&= \gamma w_{\min }|\delta | - |\langle \gamma \varPhi _\varLambda ( \varPhi _\varLambda ^* \varPhi _\varLambda )^{-1}\mathbf {g},\delta \phi [\omega ] \rangle | - |\langle \mathbf {e},\delta \phi [\omega ]\rangle | \text { thanks to} (22) \\&\ge \gamma w_{\min }|\delta | - \gamma |\delta |\cdot |\langle \varPhi _\varLambda ( \varPhi _\varLambda ^* \varPhi _\varLambda )^{-1}\mathbf {g}, \phi [\omega ] \rangle | - |\delta |\Vert \mathbf {e}\Vert _2 \\&= \gamma w_{\min }|\delta | - \gamma |\delta |\cdot |\langle (\varPhi _\varLambda ^\dagger )^*\mathbf {g}, \phi [\omega ] \rangle | - |\delta |\Vert \mathbf {e}\Vert _2 \\&= \gamma w_{\min }|\delta | - \gamma |\delta |\cdot |\langle \mathbf {g}, \varPhi _\varLambda ^\dagger \phi [\omega ] \rangle | - |\delta |\Vert \mathbf {e}\Vert _2 \\&\ge \gamma w_{\min }|\delta | - \gamma |\delta |\Vert \mathbf {g}\Vert _\infty \Vert \varPhi _\varLambda ^\dagger \phi [\omega ] \Vert _1 - |\delta |\Vert \mathbf {e}\Vert _2 \\&\ge \gamma w_{\min }|\delta | - \gamma |\delta |w_{\max }\max _{\omega \notin \varLambda } \Vert \varPhi _\varLambda ^\dagger \phi [\omega ]\Vert - |\delta |\Vert \mathbf {e}\Vert _2. \end{aligned}$$

According to [10, 32], \(\max _{\omega \notin \varLambda } \Vert \varPhi _\varLambda ^\dagger \phi [\omega ]\Vert <\frac{\mu s}{1-\mu (s-1)}\). A sufficient condition to guarantee \(L(\mathbf {x}^\star + \delta \mathbf {e}_\omega ) - L(\mathbf {x}^\star )>0\) is

$$\begin{aligned} \gamma \left( w_{\min } - w_{\max } \frac{\mu s}{1-\mu (s-1)} \right) > \Vert \mathbf {e}\Vert _2, \end{aligned}$$

which gives rise to (20). This establishes that \(\mathbf {x}^\star \) is the global minimizer of (W-Lasso). (21) is resulted from (23) along with \(\Vert (\varPhi _\varLambda ^*\varPhi _\varLambda )^{-1}\Vert _{\infty ,\infty } \le [1-\mu (s-1)]^{-1}\). \(\square \)

We prove Theorem 1 based on Proposition 1.

Proof of Theorem 1

Suppose \(\widehat{F}_{\mathrm{unit}}\) is obtained from \(\widehat{F}\) with the columns normalized to unit \(L^2\) norm and let \(W \in {\mathbb {R}}^{N_3 \times N_3}\) be the diagonal matrix with \(W_{jj} =\Vert \widehat{F}[j]\Vert _\infty \Vert \widehat{F}[j]\Vert _2^{-1}\). The Lasso we solve is equivalent to

$$\begin{aligned} {\widehat{\mathbf {y}}} = \arg \min \frac{1}{2} \Vert {\widehat{\mathbf {b}}} - \widehat{F}_{\mathrm{unit}} \mathbf {y}\Vert + \lambda \Vert W \mathbf {y}\Vert _1 \end{aligned}$$

where \(\mathbf {z}= W \mathbf {y}\), \(\mathbf {y}^{\mathrm{opt}}_j = \mathbf {a}_j \Vert \widehat{F}[j]\Vert _2\) and \(\mathbf {e}= {\widehat{\mathbf {b}}} - \widehat{F}_{\mathrm{unit}} \mathbf {y}^{\mathrm{opt}} \). Then we apply Proposition 1. The choice of balancing parameters in (20) suggests

$$\begin{aligned} \lambda = \frac{1-\mu (s-1)}{\min _j\frac{\Vert \widehat{F}[j]\Vert _\infty }{\Vert \widehat{F}[j]\Vert _2}[1-\mu (s-1)] -\max _j\frac{\Vert \widehat{F}[j]\Vert _\infty }{\Vert \widehat{F}[j]\Vert _2}\mu s}\Vert \mathbf {e}\Vert _2^+, \end{aligned}$$

which gives rise to (11). The error bound in (21) gives

$$\begin{aligned} \Vert {\widehat{\mathbf {y}}} - \mathbf {y}^{\mathrm{opt}}\Vert _\infty \le \frac{(\max _j\Vert \widehat{F}[j]\Vert _\infty \Vert \widehat{F}[j]\Vert _2^{-1}+\Vert \mathbf {e}\Vert _2)}{\min _j \Vert \widehat{F}[j]\Vert _\infty \Vert \widehat{F}[j]\Vert _2^{-1} [1- \mu (s-1)] - \max _j \Vert \widehat{F}[j]\Vert _\infty \Vert \widehat{F}[j]\Vert _2^{-1} \mu s}\Vert \mathbf {e}\Vert _2 \end{aligned}$$

which implies

$$\begin{aligned} \max _j \Vert \widehat{F}[j]\Vert _{L^2} \left| \Vert \widehat{F}[j]\Vert _\infty ^{-1}\widehat{\mathbf {a}}_{\text {Lasso}}(\lambda )_j-\mathbf {a}_j\right| \le \frac{w_{\max }+\varepsilon /\sqrt{\varDelta t \varDelta x}}{w_{\min }[1-\mu (s-1)] - w_{\max }\mu s } \varepsilon , \end{aligned}$$

which yields (12). \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kang, S.H., Liao, W. & Liu, Y. IDENT: Identifying Differential Equations with Numerical Time Evolution. J Sci Comput 87, 1 (2021).

Download citation


  • Identifying unknown differential equations
  • Time evolution error (TEE)
  • Varying coefficients
  • Base element expansion (BEE)
  • Denoising
  • Downsampling