Skip to main content
Log in

Design of experiments for the calibration of history-dependent models via deep reinforcement learning and an enhanced Kalman filter

  • Original Paper
  • Published:
Computational Mechanics Aims and scope Submit manuscript


Experimental data are often costly to obtain, which makes it difficult to calibrate complex models. For many models an experimental design that produces the best calibration given a limited experimental budget is not obvious. This paper introduces a deep reinforcement learning (RL) algorithm for design of experiments that maximizes the information gain measured by Kullback–Leibler divergence obtained via the Kalman filter (KF). This combination enables experimental design for rapid online experiments where manual trial-and-error is not feasible in the high-dimensional parametric design space. We formulate possible configurations of experiments as a decision tree and a Markov decision process, where a finite choice of actions is available at each incremental step. Once an action is taken, a variety of measurements are used to update the state of the experiment. This new data leads to a Bayesian update of the parameters by the KF, which is used to enhance the state representation. In contrast to the Nash–Sutcliffe efficiency index, which requires additional sampling to test hypotheses for forward predictions, the KF can lower the cost of experiments by directly estimating the values of new data acquired through additional actions. In this work our applications focus on mechanical testing of materials. Numerical experiments with complex, history-dependent models are used to verify the implementation and benchmark the performance of the RL-designed experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Data availability

The code and data that support the findings of this paper will be posted in an open-source repositories upon the acceptance of the manuscript.


  1. Ames NM, Srivastava V, Chester SA, Anand L (2009) A thermo-mechanically coupled theory for large deformations of amorphous polymers. Part II: applications. Int J Plast 25(8):1495–1539

    Article  MATH  Google Scholar 

  2. Baird L (1995) Residual algorithms: reinforcement learning with function approximation. In: Machine learning proceedings 1995. Elsevier, pp 30–37

  3. Bower AF (2009) Applied mechanics of solids. CRC Press, Boca Raton

    Book  Google Scholar 

  4. Catanach TA (2017) Computational methods for Bayesian inference in complex systems. Ph.D. Thesis, California Institute of Technology

  5. Chaloner K, Verdinelli I (1995) Bayesian experimental design: a review. Stat Sci pp 273–304

  6. Chatzi EN, Smyth AW (2009) The unscented kalman filter and particle filter methods for nonlinear structural system identification with non-collocated heterogeneous sensing. Struct Control Health Monit 16(1):99–123

    Article  Google Scholar 

  7. Darema F (2004) Dynamic data driven applications systems: a new paradigm for application simulations and measurements. In: Computational science-ICCS 2004: 4th international conference, Kraków, Poland, June 6–9, 2004, Proceedings, Part III 4. Springer, pp 662–669

  8. Daum F (2005) Nonlinear filters: beyond the Kalman filter. IEEE Aerosp Electron Syst Mag 20(8):57–69

    Article  Google Scholar 

  9. De Bruin T, Kober J, Tuyls K, Babuška R (2018) Integrating state representation learning into deep reinforcement learning. IEEE Robot Autom Lett 3(3):1394–1401

    Article  MATH  Google Scholar 

  10. Ding Z, Huang Y, Yuan H, Dong H (2020) Introduction to reinforcement learning. In: Deep reinforcement learning: fundamentals, research and applications, pp 47–123

  11. Doya K (2000) Reinforcement learning in continuous time and space. Neural Comput 12(1):219–245

    Article  Google Scholar 

  12. Erazo K, Sen D, Nagarajaiah S, Sun L (2019) Vibration-based structural health monitoring under changing environmental conditions using Kalman filtering. Mech Syst Signal Process 117:1–15

    Article  Google Scholar 

  13. Evensen G (2003) The ensemble Kalman filter: theoretical formulation and practical implementation. Ocean Dyn 53(4):343–367

    Article  Google Scholar 

  14. Feinberg V, Wan A, Stoica I, Jordan MI, Gonzalez JE, Levine S (2018) Model-based value estimation for efficient model-free reinforcement learning. arXiv:1803.00101

  15. Fisher RA et al (1937) The design of experiments. Oliver & Boyd, Edinburgh

    MATH  Google Scholar 

  16. Fuchs A, Heider Y, Wang K, Sun WC, Kaliske M (2021) DNN2: A hyper-parameter reinforcement learning game for self-design of neural network based elasto-plastic constitutive descriptions. Comput Struct 249:106505

    Article  Google Scholar 

  17. Ghanem R, Ferro G (2006) Health monitoring for strongly non-linear systems using the ensemble Kalman filter. Struct Control Health Monit 13(1):245–259

    Article  Google Scholar 

  18. Gnecco G, Sanguineti M et al (2008) Approximation error bounds via Rademacher complexity. Appl Math Sci 2:153–176

    MathSciNet  MATH  Google Scholar 

  19. Gu S, Lillicrap T, Sutskever I, Levine S (2016) Continuous deep q-learning with model-based acceleration. In: International conference on machine learning. PMLR, pp 2829–2838

  20. Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 3389–3396

  21. Heider Y, Wang K, Sun WC (2020) So (3)-invariance of informed-graph-based deep neural network for anisotropic elastoplastic materials. Comput Methods Appl Mech Eng 363:112875

    Article  MathSciNet  MATH  Google Scholar 

  22. Heider Y, Suh HS, Sun WC (2021) An offline multi-scale unsaturated poromechanics model enabled by self-designed/self-improved neural networks. Int J Numer Anal Methods Geomech 45(9):1212–1237

    Article  Google Scholar 

  23. Hester T, Stone P (2013) Texplore: real-time sample-efficient reinforcement learning for robots. Mach Learn 90:385–429

    Article  MathSciNet  Google Scholar 

  24. Huan X, Marzouk YM (2013) Simulation-based optimal Bayesian experimental design for nonlinear systems. J Comput Phys 232(1):288–317

    Article  MathSciNet  Google Scholar 

  25. Huan X, Marzouk YM (2016) Sequential Bayesian optimal experimental design via approximate dynamic programming. arXiv:1604.08320

  26. Huang J, Li D, Li H, Song G, Liang Y (2018) Damage identification of a large cable-stayed bridge with novel cointegrated Kalman filter method under changing environments. Struct Control Health Monit 25(5):e2152

    Article  Google Scholar 

  27. Huang Y, Jianqi Yu, Beck JL, Zhu H, Li H (2020) Novel sparseness-inducing dual Kalman filter and its application to tracking time-varying spatially-sparse structural stiffness changes and inputs. Comput Methods Appl Mech Eng 372:113411

    Article  MathSciNet  MATH  Google Scholar 

  28. Jazwinski AH (2007) Stochastic processes and filtering theory. Courier Corporation, North Chelmsford

    MATH  Google Scholar 

  29. Jin C, Jang S, Sun X, Li J, Christenson R (2016) Damage detection of a highway bridge under severe temperature changes using extended Kalman filter trained neural network. J Civ Struct Heal Monit 6(3):545–560

    Article  Google Scholar 

  30. Jones RE, Frankel AL, Johnson KL (2022) A neural ordinary differential equation framework for modeling inelastic stress response via internal state variables. J Mach Learn Model Comput 3(3)

  31. Julier SJ, Uhlmann JK (1997) New extension of the Kalman filter to nonlinear systems. In: Signal processing, sensor fusion, and target recognition VI, volume 3068. SPIE, pp 182–193

  32. Julier SJ, Uhlmann JK (2004) Unscented filtering and nonlinear estimation. Proc IEEE 92(3):401–422

    Article  Google Scholar 

  33. Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

    Article  Google Scholar 

  34. Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 82(1):35–45

    Article  MathSciNet  Google Scholar 

  35. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  36. Kiumarsi B, Vamvoudakis KG, Modares H, Lewis FL (2017) Optimal and autonomous control using reinforcement learning: a survey. IEEE Trans Neural Netw Learn Syst 29(6):2042–2062

    Article  MathSciNet  Google Scholar 

  37. Kober J, Andrew Bagnell J, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274

    Article  Google Scholar 

  38. Kuss M, Rasmussen C (2003) Gaussian processes in reinforcement learning. Adv Neural Inf Process Syst 16

  39. Landajuela M, Petersen BK, Kim S, Santiago CP, Glatt R, Mundhenk N, Pettit JF, Faissol D (2021) Discovering symbolic policies with deep reinforcement learning. In: International conference on machine learning. PMLR, pp 5979–5989

  40. LaViola JJ (2003) A comparison of unscented and extended Kalman filtering for estimating quaternion motion. In: Proceedings of the 2003 American control conference, 2003, volume 3. IEEE, pp 2435–2440

  41. Lee JH, Lawrence Ricker N (1994) Extended Kalman filter based nonlinear model predictive control. Ind Eng Chem Res 33(6):1530–1541

    Article  Google Scholar 

  42. Lee S-H, Song J (2020) Regularization-based dual adaptive Kalman filter for identification of sudden structural damage using sparse measurements. Appl Sci 10(3)

  43. Li Y (2017) Deep reinforcement learning: an overview. arXiv:1701.07274

  44. Lubliner J (2008) Plasticity theory. Courier Corporation, North Chelmsford

    MATH  Google Scholar 

  45. Ma R, Sun WC (2020) Computational thermomechanics for crystalline rock. Part II: chemo-damage-plasticity and healing in strongly anisotropic polycrystals. Comput Methods Appl Mech Eng 369:113184

    Article  MathSciNet  Google Scholar 

  46. McCuen RH, Knight Z, Gillian Cutter A (2006) Evaluation of the Nash–Sutcliffe efficiency index. J Hydrol Eng 11(6):597–602

    Article  Google Scholar 

  47. Moskovitz T, Parker-Holder J, Pacchiano A, Arbel M, Jordan M (2021) Tactical optimism and pessimism for deep reinforcement learning. Adv Neural Inf Process Syst 34:12849–12863

    Google Scholar 

  48. Murphy KP (1998) Switching Kalman filters. Technical report, DEC/Compaq Cambridge Research Labs

  49. Nguyen LH, Goulet JA (2018) Anomaly detection with the switching Kalman filter for structural health monitoring. Struct Control Health Monit 25(4):e2136

    Article  Google Scholar 

  50. Niv Y (2009) Reinforcement learning in the brain. J Math Psychol 53(3):139–154

    Article  MathSciNet  MATH  Google Scholar 

  51. O’Donoghue B, Osband I, Munos R, Mnih V (2018) The uncertainty bellman equation and exploration. In: International conference on machine learning, pp 3836–3845

  52. Ormoneit D, Sen A (2002) Kernel-based reinforcement learning. Mach Learn 49(2–3):161

    Article  MATH  Google Scholar 

  53. Pukelsheim F (2006) Optimal design of experiments. SIAM, Philadelphia

    Book  MATH  Google Scholar 

  54. Reda D, Tao T, van de Panne M (2020) Learning to locomote: understanding how environment design matters for deep reinforcement learning. In: Motion, interaction and games. ACM, pp 1–10

  55. Ryan EG, Drovandi CC, McGree JM, Pettitt AN (2016) A review of modern computational algorithms for Bayesian optimal design. Int Stat Rev 84(1):128–154

    Article  MathSciNet  Google Scholar 

  56. Scherzinger WM (2017) A return mapping algorithm for isotropic and anisotropic plasticity models using a line search method. Comput Methods Appl Mech Eng 317:526–553

    Article  MathSciNet  MATH  Google Scholar 

  57. Schrittwieser J, Hubert T, Mandhane A, Barekatain M, Antonoglou I, Silver D (2021) Online and offline reinforcement learning by planning with a learned model. Adv Neural Inf Process Syst 34:27580–27591

    Google Scholar 

  58. Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, et al (2017a) Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv:1712.01815

  59. Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, et al (2017b) Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv:1712.01815

  60. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354

    Article  Google Scholar 

  61. Simo JC, Hughes TJR (2006) Computational inelasticity, vol 7. Springer Science & Business Media, Berlin

    MATH  Google Scholar 

  62. Sun N-Z, Sun A (2015) Model calibration and parameter estimation: for environmental and water resource systems. Springer, Berlin

    Book  MATH  Google Scholar 

  63. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge

    MATH  Google Scholar 

  64. Vlassis NN, Sun W (2022) Component-based machine learning paradigm for discovering rate-dependent and pressure-sensitive level-set plasticity models. J Appl Mech 89(2)

  65. Wang K, Sun WC (2019) Meta-modeling game for deriving theory-consistent, microstructure-based traction-separation laws via deep reinforcement learning. Comput Methods Appl Mech Eng 346:216–241

    Article  MathSciNet  MATH  Google Scholar 

  66. Wang Kun, Sun WaiChing, Du Qiang (2019) A cooperative game for automated learning of elasto-plasticity knowledge graphs and models with AI-guided experimentation. Comput Mech 1–33

  67. Wang K, Sun WC, Qiang D (2021) A non-cooperative meta-modeling game for automated third-party calibrating, validating and falsifying constitutive laws with parallelized adversarial attacks. Comput Methods Appl Mech Eng 373:113514

    Article  MathSciNet  MATH  Google Scholar 

  68. West DB et al (2001) Introduction to graph theory, vol 2. Prentice Hall, Upper Saddle River

    Google Scholar 

  69. Williams RJ (1992) Training recurrent networks using the extended Kalman filter. In: [Proceedings 1992] IJCNN international joint conference on neural networks, volume 4. IEEE, pp 241–246

  70. Yang JN, Lin S, Huang H, Zhou L (2006) An adaptive extended Kalman filter for structural damage identification. Struct Control Health Monit 13(4):849–867

    Article  Google Scholar 

  71. Yang Z, Jin C, Wang Z, Wang M, Jordan MI (2020) On function approximation in reinforcement learning: optimism in the face of large state spaces. arXiv:2011.04622

  72. Zhao W, Queralta JP, Westerlund T (2020) Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE symposium series on computational intelligence (SSCI). IEEE, pp 737–744

  73. Zhou L, Shinya W, Yang JN (2008) Experimental study of an adaptive extended Kalman filter for structural damage identification. J Infrastruct Syst 14(1):42–51

    Article  Google Scholar 

Download references


The authors are primarily supported by Sandia National Laboratories Computing and Information Sciences Laboratory Directed Research and Development program, with additional support from the Department of Defense SMART scholarship is provided to support Nhon N. Phan. NAT acknowledges support from the Department of Energy early career research program. This support is gratefully acknowledged. WCS would also like to thank Dr. Christine Anderson-Cook from Los Alamos National Laboratory for a fruitful discussion on the design of experiments in 2019 and the UPS Foundation Visiting Professorship from Stanford University for providing additional funding for this research. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. This paper describes objective technical results and analysis, which is also archived in the internal Sandia report SAND2022-13022. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government. This article has been co-authored by an employee of National Technology and Engineering Solutions of Sandia, LLC under Contract No. DE-NA0003525 with the U.S. Department of Energy (DOE). The employee owns all right, title and interest in and to the article and is solely responsible for its contents. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this article or allow others to do so, for United States Government purposes. The DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan

Author information

Authors and Affiliations


Corresponding author

Correspondence to Nikolaos N. Vlassis.

Ethics declarations

Credit statement

The extended Kalman filter was implemented by Dr. Ruben Villareal, whereas the incorporation of the extended Kalman filter and the framework of the deep reinforcement learning was implemented by Dr. Nikolaos Vlassis. The rest of the authors contributed to developing ideas, writing the manuscript and discussions.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix A: Elastoplasticity

One class of particularly technologically important examples of this problem type is the calibration of traditional elastoplasticity models [44, 56, 61]. In these models the observable stress \({\varvec{\sigma }}\) is a function of elastic strain \({\varvec{\epsilon }}^e\). Typically, a linear relationship between \({\varvec{\sigma }}\) and \({\varvec{\epsilon }}^e\) is assumed:

$$\begin{aligned} {\varvec{\sigma }}= {\mathbb {C}}{\varvec{\epsilon }}^e, \end{aligned}$$

where \({\mathbb {C}}\) is a fourth-order elastic-modulus tensor. For instance, with transverse isotropy,

$$\begin{aligned} C_{1111}= & {} E \frac{(1-\nu _\perp )}{(1-2 \nu ^2 \nu _\perp )}, \nonumber \\ C_{2222}= & {} C_{3333} = E \frac{(1-\nu ^2)}{(1-2 \nu ^2 \nu _\perp )}, \nonumber \\ C_{1122}= & {} C_{1133} = E \frac{\nu }{(1-2 \nu ^2 \nu _\perp )}, \nonumber \\ C_{2233}= & {} E \frac{(\nu ^2+\nu _\perp )}{(1-2 \nu ^2 \nu _\perp )(1+\nu _\perp )}, \nonumber \\ C_{1212}= & {} C_{1313} = E \frac{1}{(1-\nu )}, \nonumber \\ C_{2323}= & {} E \frac{1}{(1-\nu _\perp )}, \end{aligned}$$

where E is an effective Young’s modulus, \(\nu \) is an in-plane Poisson’s ratio and \(\nu _\perp \) is an out-of-plane Poisson’s ratio. History dependence is incorporated via plastic strain \({\varvec{\epsilon }}^p\), which is a hidden material state variable that elicits dissipative behavior. The elastic strain in (28) is the difference between the controllable, observable total strain \({\varvec{\epsilon }}\) and the irreversible plastic strain \({\varvec{\epsilon }}^p\):

$$\begin{aligned} {\varvec{\epsilon }}^e = {\varvec{\epsilon }}- {\varvec{\epsilon }}^p. \end{aligned}$$
Fig. 21
figure 21

The modified Hill yield surface model under different stress axes for parameters \(B = 0.5, 1,\) and 2

A closed, convex yield surface limits the elastic region and demarcates the elastic, reversible behavior in the interior of the surface from the irreversible plastic flow at the limit defined by the surface. For instance, a modified/simplified Hill anisotropic yield surface

$$\begin{aligned} Y= & {} \phi ({\varvec{\sigma }}) \\\equiv & {} \biggl (\frac{1}{3} \left( (\sigma _{22} - \sigma _{33})^2 +(\sigma _{11} - \sigma _{33})^2 +(\sigma _{22} - \sigma _{11})^2 \right) \\{} & {} +\frac{B}{2} \left( \sigma _{23}^2 +\sigma _{13}^2 + \sigma _{21}^2 \right) \biggr )^{1/2} \end{aligned}$$

generalizes the widely-used von Mises yield surface [44]; in fact, Fig. 21 shows that it reduces to von Mises when \(B=1\). The yield surface evolves with hardening of the material

$$\begin{aligned} Y = Y_0 + h(e^p), \end{aligned}$$

where \(Y_0\) is the initial yield strength, h is the hardening function and \(e^p\) is the equivalent plastic strain. For instance, \(h = H e^p\) induces linear hardening. The plastic strain evolves via the (associative) flow rule

$$\begin{aligned} \dot{{\varvec{\epsilon }}}^p = \dot{\lambda } {\varvec{\partial }}_{\varvec{\sigma }}\phi , \end{aligned}$$

where the direction of evolution is given by the normal to the yield surface \({\varvec{\partial }}_{\varvec{\sigma }}\phi \).

The yield surface

$$\begin{aligned} g \equiv \phi ({\varvec{\sigma }}) - Y(e^p) \le 0 \end{aligned}$$

constrains the possible response of the material. When \(g < 0\), the material is in an elastic state, and the material state variables \({\varvec{\epsilon }}^p\) and \(\lambda = e^p\) are fixed so that the stress at the new state k is

$$\begin{aligned} {\varvec{\sigma }}_k = {\mathbb {C}}({\varvec{\epsilon }}_k - {\varvec{\epsilon }}^p) = {\varvec{\sigma }}_{k-1} + {\mathbb {C}}( \Delta {\varvec{\epsilon }}). \end{aligned}$$

where k indexes load steps. Otherwise, the material is in a plastic state; the evolution equations and the constraint \(g=0\) need to be solved through a Newton iteration with increments \(\Delta {\varvec{\sigma }}^{(i)}\)

$$\begin{aligned} {\varvec{\sigma }}_k = {\varvec{\sigma }}_{k-1} + \sum _i \Delta {\varvec{\sigma }}^{(i)}. \end{aligned}$$

where i indexes the Newton iterations. This aspect complicates obtaining parameter sensitivities. Further details can be found in Simo and Hughes [61].

For this exemplar the parameters are \({\varvec{\theta }}= \{ E, \nu , \nu _\perp , B,\)\( Y_0, H \}\). If \(Y_0\rightarrow \infty \) the model reduces to elasticity, and if \(\nu _\perp = \nu \) it reduces to isotropic elasticity, \({\varvec{\theta }}= \{ E, \nu \}\). If \(Y_0\) is finite, \(\nu _\perp = \nu \) and \(B=1\) it reduces to the widely-used von Mises plasticity model, \({\varvec{\theta }}= \{ E, \nu , Y_0, H \}\).

Appendix B: EKF for state and parameter estimation of DAEs

In this appendix we will derive and discuss the extended Kalman filter (EKF) in the context of semi-explicit index-1 differential algebraic equations (DAEs) for joint state and parameter estimation. The plasticity model described in “Appendix A” is an example of a DAE system with an algebraic stress rule (28) and an ordinary differential equation (ODE) prescribing the flow of the hidden state variables (32) subjected to the algebraic yield constraint (33). In general, these DAEs have the form

$$\begin{aligned} \dot{{\textbf{z}}}&= {\textbf{f}}\left( {\textbf{z}}, {\varvec{\sigma }}, {\varvec{\theta }}, {\textbf{x}}, t\right) \end{aligned}$$
$$\begin{aligned} \textbf{0}&= {\textbf{g}}\left( {\textbf{z}}, {\varvec{\sigma }}, {\varvec{\theta }}, {\textbf{x}}, t\right) . \end{aligned}$$

Here \({\textbf{z}}\) are unobserved dynamic states, \({\varvec{\sigma }}\) are unobserved algebraic states, \({\varvec{\theta }}\) are model parameters, \({\textbf{x}}\) are known inputs and t is time. Since we are dealing with index-1 DAEs, we can assume that \(g\left( {\textbf{z}}, {\varvec{\sigma }}, {\textbf{x}}, t\right) = 0\) is solvable for \({\varvec{\sigma }}\). In addition to the DAE process model, we assume that there is an observation model for measurement \({\textsf {d}}\), given by

$$\begin{aligned} {\textsf {d}}= {\textbf{m}}\left( {\textbf{z}}, {\varvec{\sigma }}, {\varvec{\theta }}, {\textbf{x}}, t\right) + \varvec{\varepsilon }, \end{aligned}$$

where \(\varvec{\varepsilon }\) is noise which we assume follows a Gaussian distribution.

Considering that these dynamics are specified using a continuous DAE system, we need to discretize them in time. Further we will assume that the models are not time dependent for simplicity but extending this method to the time-dependent case is straight forward. While there are no closed-form solutions to the DAEs explicitly, for convenience we can define the solution as the function \(\mathcalligra {f}\) for the dynamic state update and \(\mathcalligra {m}\) for the explicit measurement function when a closed-form solution does not exist (e.g., if it depends on \({\varvec{\sigma }}\)). As a result,

$$\begin{aligned} {\textbf{z}}_k&= \mathcalligra {f}\left( {\textbf{z}}_{k-1}, {\varvec{\theta }}, {\textbf{x}}_k\right) + {\varvec{\eta }}_k \end{aligned}$$
$$\begin{aligned} {\textsf {d}}_k&= \mathcalligra {m}\left( {\textbf{z}}_k, {\varvec{\theta }}, {\textbf{x}}_k \right) + \varvec{\varepsilon }_k. \end{aligned}$$

for time \(t_k\). Here the addition of a Gaussian process noise term \({\varvec{\eta }}\) reflects modeling errors due to the discretization in addition to any intrinsic noise. It is important to note that there are many choices of \(\mathcalligra {f}\) depending on the discretization and numerical integration scheme used to solve the DAEs. This explicit construction, though not implementable in a closed form, defines the functions that we need to linearize in order to construct the EKF. We can also augment the state to include fictitious dynamics of the model parameters to aid in model parameter identification:

$$\begin{aligned} {\varvec{\theta }}_k = {\varvec{\theta }}_{k-1} + {\varvec{\delta }}_k, \end{aligned}$$

where \({\varvec{\delta }}\) is again additive Gaussian noise. For exact parameter estimation, \({\varvec{\delta }}_k=0\) because the parameters are fixed; however, in some cases for stability adding small amounts of noise can reduce bias in the estimated parameters at the cost of increased variance and slower convergence of the estimation.

Under this construction, the EKF prediction step has the form

$$\begin{aligned} {\textbf{z}}_{k\mid k-1}&= \mathcalligra {f}\left( {\textbf{z}}_{k-1 \mid k-1}, {\varvec{\theta }}_{k-1 \mid k-1}, {\textbf{x}}_k\right) \end{aligned}$$
$$\begin{aligned} {\varvec{\theta }}_{k\mid k-1}&= {\varvec{\theta }}_{k-1 \mid k-1} \end{aligned}$$
$$\begin{aligned} {\textsf {d}}_{k\mid k-1}&= \mathcalligra {m}\left( {\textbf{z}}_{k\mid k-1}, {\varvec{\theta }}_{k\mid k-1}, {\textbf{x}}_k \right) , \end{aligned}$$

while the uncertainty propagation on the prediction has the form

$$\begin{aligned} \varvec{\Sigma }_{k \mid k-1}&= {\textsf {F}}_k \varvec{\Sigma }_{k-1 \mid k-1} {\textsf {F}}_k^T + {\textsf {Q}}_k \end{aligned}$$
$$\begin{aligned} {\textsf {S}}_{k \mid k-1}&= {\textsf {A}}_k \varvec{\Sigma }_{k \mid k-1}{\textsf {A}}_k^T + {\textsf {R}}, \end{aligned}$$

where \(\varvec{\Sigma }\) is the covariance for the joint state \(\left[ {\textbf{z}}, {\varvec{\theta }}\right] ^T\), \({\textsf {Q}}\) is the process and parameter additive uncertainty assumed to be independent and has covariances \({\textsf {Q}}_{\varvec{\eta }}\) and \({\textsf {Q}}_{\varvec{\delta }}\), respectively, and \({\textsf {R}}\) is the measurement noise covariance. We also must construct the linearizations of the dynamics \({\textsf {F}}\) and the measurement \({\textsf {A}}\):

$$\begin{aligned} {\textsf {Q}}_{k}&=\begin{bmatrix} {\textsf {Q}}_{\varvec{\eta }}&{} \textbf{0} \\ \textbf{0} &{} {\textsf {Q}}_{\varvec{\delta }}\end{bmatrix} \end{aligned}$$
$$\begin{aligned} {\textsf {F}}_{k}&=\begin{bmatrix} {\varvec{\partial }}_{\textbf{z}}\mathcalligra {f}&{} {\varvec{\partial }}_{\varvec{\theta }}\mathcalligra {f}\\ {\textbf{I}}&{} \textbf{0} \end{bmatrix}_{{\textbf{z}}_{k\mid k-1}, {\varvec{\theta }}_{k\mid k-1}, {\textbf{x}}_k} \end{aligned}$$
$$\begin{aligned} {\textsf {A}}_{k}&=\begin{bmatrix} {\varvec{\partial }}_{\textbf{z}}\mathcalligra {m}&{\varvec{\partial }}_{\varvec{\theta }}\mathcalligra {m}\end{bmatrix}_{{\textbf{z}}_{k-1\mid k-1}, {\varvec{\theta }}_{k-1\mid k-1}, {\textbf{x}}_k}. \end{aligned}$$

Once all these variables have been defined and a measurement \({\textsf {d}}_k\) has been made, the EKF update is straight forward. The EKF update is given by:

$$\begin{aligned} {\textsf {r}}_k&= {\textsf {d}}_k - {\textsf {d}}_{k\mid k-1} \end{aligned}$$
$$\begin{aligned} {\textsf {K}}_k&= \varvec{\Sigma }_{k \mid k-1} {\textsf {A}}_{k}^T {\textsf {S}}_{k \mid k-1}^{-1} \end{aligned}$$
$$\begin{aligned} {\textbf{z}}_{k\mid k}&= {\textbf{z}}_{k\mid k-1} + {\textsf {K}}_k {\textsf {r}}_k \end{aligned}$$
$$\begin{aligned} \varvec{\Sigma }_{k \mid k}&= \left( {\textbf{I}}- {\textsf {K}}_k {\textsf {A}}_k \right) \varvec{\Sigma }_{k \mid k-1}. \end{aligned}$$

Therefore, in order to apply the EKF to the DAEs, we must compute the derivatives: \({\varvec{\partial }}_{\textbf{z}}\mathcalligra {f}\), \({\varvec{\partial }}_{\varvec{\theta }}\mathcalligra {f}\), \(\partial _{\textbf{z}}\mathcalligra {m}\) and \(\partial _{\varvec{\theta }}\mathcalligra {m}\).

We will compute these derivatives for the case where the dynamics are implicitly solved using the backward Euler method (as is common for plasticity updates). Thus, returning to the discrete time DAEs, the model becomes

$$\begin{aligned} {\textbf{z}}_{k}&= {\textbf{z}}_{k-1} + \Delta _t {\textbf{f}}\left( {\textbf{z}}_k, {\varvec{\sigma }}_k, {\varvec{\theta }}_k, {\textbf{x}}_k \right) \end{aligned}$$
$$\begin{aligned} {\varvec{\theta }}_k&= {\varvec{\theta }}_{k-1} \end{aligned}$$
$$\begin{aligned} \textbf{0}&= {\textbf{g}}\left( {\textbf{z}}_k, {\varvec{\sigma }}_k, {\varvec{\theta }}_k, {\textbf{x}}_k \right) \end{aligned}$$
$$\begin{aligned} {\textsf {d}}_{k}&= {\textbf{m}}\left( {\textbf{z}}_k, {\varvec{\sigma }}_k, {\varvec{\theta }}_k, {\textbf{z}}_k \right) . \end{aligned}$$

Before we begin with the derivation of the derivatives needed for the EKF, we will derive the following useful derivatives: \({\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\varvec{\theta }}_k\), \({\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} {\textbf{z}}_{k-1}\), \({\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\varvec{\sigma }}_k\), \({\varvec{\partial }}_{{\textbf{z}}_k} {\varvec{\sigma }}_k\) and \({\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} {\varvec{\sigma }}_k\). By inspection, we see that \({\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\varvec{\theta }}_k = \textbf{0}\) and \({\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} {\textbf{z}}_{k-1} = \textbf{0}\). This realization might seem counter-intuitive because obviously there is a relationship between \({\textbf{z}}_{k-1}\) and \({\varvec{\theta }}_{k-1}\); however, that relationship is already being accounted for via \(\varvec{\Sigma }\). To find \({\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\varvec{\sigma }}_k\) we must implicitly compute the derivative of the algebraic constraint

$$\begin{aligned} {\varvec{\partial }}_{{\textbf{z}}_{k-1}} \textbf{0} \equiv \textbf{0}&= {\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\textbf{g}}\left( {\textbf{z}}_k, {\varvec{\sigma }}_k, {\varvec{\theta }}_k, {\textbf{x}}_k \right) \nonumber \\&= {\varvec{\partial }}_{{\textbf{z}}_{k}} {\textbf{g}}\, {\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\textbf{z}}_k + {\varvec{\partial }}_{{\varvec{\theta }}_{k}} {\textbf{g}}\, {\varvec{\partial }}_{{\textbf{z}}_{k-1}}{\varvec{\theta }}_k\nonumber \\&\quad + {\varvec{\partial }}_{{\varvec{\sigma }}_{k}} {\textbf{g}}\, {\varvec{\partial }}_{{\textbf{z}}_{k-1}}{\varvec{\sigma }}_k \nonumber \\&= {\varvec{\partial }}_{{\textbf{z}}_{k}} {\textbf{g}}\, {\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\textbf{z}}_k + {\varvec{\partial }}_{{\varvec{\sigma }}_{k}} {\textbf{g}}\, {\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\varvec{\sigma }}_k \end{aligned}$$
$$\begin{aligned} \implies {\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\varvec{\sigma }}_k&= - \left( {\varvec{\partial }}_{{\varvec{\sigma }}_{k}} {\textbf{g}}\right) ^{-1} \, {\varvec{\partial }}_{{\textbf{z}}_{k}} {\textbf{g}}\, {\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\textbf{z}}_k. \end{aligned}$$

Here we used the fact that \({\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\varvec{\theta }}_k =\textbf{0}\). Also, we know that \({\varvec{\partial }}_{{\varvec{\sigma }}_{k}} g\) is invertible because it is an index-1 DAE. Following the same argument, we also find that \({\varvec{\partial }}_{{\textbf{z}}_k} {\varvec{\sigma }}_k =-\left( {\varvec{\partial }}_{{\varvec{\sigma }}_{k}} {\textbf{g}}\right) ^{-1} {\varvec{\partial }}_{{\textbf{z}}_{k}} {\textbf{g}}\).

Similarly, to find \({\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} {\varvec{\sigma }}_k\) we must implicitly compute the derivative of the algebraic constraint

$$\begin{aligned} {\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} \textbf{0} \equiv \textbf{0}&={\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} \textbf{g} \left( {\textbf{z}}_k, {\varvec{\sigma }}_k, {\varvec{\theta }}_k, {\textbf{x}}_k \right) \nonumber \\&= {\varvec{\partial }}_{{\textbf{z}}_{k}} \textbf{g} \, {\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} {\textbf{z}}_k + {\varvec{\partial }}_{{\varvec{\theta }}_{k}} \textbf{g} \, {\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} {\varvec{\theta }}_k\nonumber \\&\quad + {\varvec{\partial }}_{{\varvec{\sigma }}_{k}} \textbf{g} \, {\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} {\varvec{\sigma }}_k \end{aligned}$$
$$\begin{aligned} \implies {\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} {\varvec{\sigma }}_k&=-\left( {\varvec{\partial }}_{{\varvec{\sigma }}_{k}} {\textbf{g}}\right) ^{-1} \left( {\varvec{\partial }}_{ {\textbf{z}}_{k}} {\textbf{g}}\, {\varvec{\partial }}_{ {\varvec{\theta }}_{k-1}} {\textbf{z}}_k + {\varvec{\partial }}_{ {\varvec{\theta }}_k} {\textbf{g}}\right) . \end{aligned}$$

Now, using the previous definitions, we can use the same implicit strategy to solve for \({\varvec{\partial }}_{{\textbf{z}}} \mathcalligra {f}\). We can derive it as

$$\begin{aligned} {\varvec{\partial }}_{{\textbf{z}}} \mathcalligra {f}&= {\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\textbf{z}}_k\nonumber \\&= {\varvec{\partial }}_{{\textbf{z}}_{k-1}} \left( {\textbf{z}}_{k-1} + \Delta _t \mathcalligra {f}\left( {\textbf{z}}_k, {\varvec{\sigma }}_k, {\varvec{\theta }}_k, {\textbf{x}}_k \right) \right) \nonumber \\&= {\textbf{I}}+ \Delta _t \left( {\varvec{\partial }}_{{\textbf{z}}_k} \mathcalligra {f}\, {\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\textbf{z}}_k +{\varvec{\partial }}_{{\varvec{\theta }}_k} \mathcalligra {f}\, {\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\varvec{\theta }}_k + {\varvec{\partial }}_{{\varvec{\sigma }}_k} \mathcalligra {f}\, {\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\varvec{\sigma }}_k \right) \nonumber \\&= {\textbf{I}}+ \Delta _t \left( {\varvec{\partial }}_{{\textbf{z}}_k} \mathcalligra {f}\, {\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\textbf{z}}_k - {\varvec{\partial }}_{{\varvec{\sigma }}_k} \mathcalligra {f}\left( {\varvec{\partial }}_{{\varvec{\sigma }}_{k}} {\textbf{g}}\right) ^{-1} {\varvec{\partial }}_{{\textbf{z}}_{k}} {\textbf{g}}\, {\varvec{\partial }}_{{\textbf{z}}_{k-1}} {\textbf{z}}_k \right) \end{aligned}$$
$$\begin{aligned}&\!\!\!\!\!\!\implies {\varvec{\partial }}_{{\textbf{z}}} \mathcalligra {f}\nonumber \\&\qquad \quad = \left( {\textbf{I}}- \Delta _t {\varvec{\partial }}_{{\textbf{z}}_k} \mathcalligra {f}+ \Delta _t {\varvec{\partial }}_{{\varvec{\sigma }}_k} \mathcalligra {f}\left( {\varvec{\partial }}_{{\varvec{\sigma }}_{k}} {\textbf{g}}\right) ^{-1} {\varvec{\partial }}_{{\textbf{z}}_{k}} {\textbf{g}}\right) ^{-1}. \end{aligned}$$

Using a similar strategy, we can compute \({\varvec{\partial }}_{{\varvec{\theta }}} \mathcalligra {f}\):

$$\begin{aligned}{} & {} \!\!\!{\varvec{\partial }}_{\theta } \mathcalligra {f}= {\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} {\textbf{z}}_k\nonumber \\{} & {} \quad = {\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} \left( {\textbf{z}}_{k-1} + \Delta _t \mathcalligra {f}\left( {\textbf{z}}_k, {\varvec{\sigma }}_k, {\varvec{\theta }}_k, {\textbf{x}}_k \right) \right) \nonumber \\{} & {} \quad ={\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} {\textbf{z}}_{k-1} + \Delta _t\nonumber \\{} & {} \qquad \left( {\varvec{\partial }}_{{\textbf{z}}_k} \mathcalligra {f}\, {\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} {\textbf{z}}_k + {\varvec{\partial }}_{{\varvec{\theta }}_k} \mathcalligra {f}\, {\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} {\varvec{\theta }}_k + {\varvec{\partial }}_{{\varvec{\sigma }}_k} \mathcalligra {f}\, {\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} {\varvec{\sigma }}_k \right) \nonumber \\{} & {} \quad =\Delta _t \left( {\varvec{\partial }}_{{\textbf{z}}_k} \mathcalligra {f}\, {\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} {\textbf{z}}_k + {\varvec{\partial }}_{{\varvec{\theta }}_k} \mathcalligra {f}- {\varvec{\partial }}_{{\varvec{\sigma }}_k} \mathcalligra {f}\left( {\varvec{\partial }}_{{\varvec{\sigma }}_{k}} {\textbf{g}}\right) ^{-1}\right. \nonumber \\{} & {} \quad \left. \left( {\varvec{\partial }}_{{\textbf{z}}_{k}} {\textbf{g}}\, {\varvec{\partial }}_{{\varvec{\theta }}_{k-1}} {\textbf{z}}_k + {\varvec{\partial }}_{{\varvec{\theta }}_k} {\textbf{g}}\right) \right) \end{aligned}$$
$$\begin{aligned}{} & {} \implies {\varvec{\partial }}_{{\varvec{\theta }}} \mathcalligra {f}= \left( {\textbf{I}}- \Delta _t {\varvec{\partial }}_{{\textbf{z}}_k} \mathcalligra {f}+\Delta _t {\varvec{\partial }}_{{\varvec{\sigma }}_k} \mathcalligra {f}\left( {\varvec{\partial }}_{{\varvec{\sigma }}_k} {\textbf{g}}\right) ^{-1} {\varvec{\partial }}_{{\textbf{z}}_k} {\textbf{g}}\right) ^{-1} \nonumber \\{} & {} \qquad \left( \Delta _t {\varvec{\partial }}_{{\varvec{\theta }}_k} \mathcalligra {f}- \Delta _t {\varvec{\partial }}_{{\varvec{\sigma }}_k} \mathcalligra {f}\left( {\varvec{\partial }}_{{\varvec{\sigma }}_k} {\textbf{g}}\right) ^{-1} {\varvec{\partial }}_{{\varvec{\theta }}_k} {\textbf{g}}\right) . \end{aligned}$$

Computing the derivatives for the measurement function is more straight forward because we are now no longer using the implicit integrator, and all the key derivatives have already been defined. We find that

$$\begin{aligned} {\varvec{\partial }}_{{\textbf{z}}_k} \mathcalligra {m}&= {\varvec{\partial }}_{{\textbf{z}}_k} \mathcalligra {m}+{\varvec{\partial }}_{{\varvec{\sigma }}_k} \mathcalligra {m}\, {\varvec{\partial }}_{{\textbf{z}}_k} {\varvec{\sigma }}_k \end{aligned}$$
$$\begin{aligned} {\varvec{\partial }}_{{\varvec{\theta }}_k} \mathcalligra {m}&= {\varvec{\partial }}_{{\textbf{z}}_k} \mathcalligra {m}\, {\varvec{\partial }}_{\theta } \mathcalligra {f}+ {\varvec{\partial }}_{{\varvec{\theta }}_k} \mathcalligra {m}+{\varvec{\partial }}_{{\varvec{\sigma }}_k} \mathcalligra {m}\, {\varvec{\partial }}_{{\varvec{\theta }}_k} {\varvec{\sigma }}_k. \end{aligned}$$

For the special case when \(m \left( {\textbf{z}}_k, {\varvec{\sigma }}_k, {\varvec{\theta }}_k, {\textbf{x}}_k \right) = {\varvec{\sigma }}_k\), as in our models, we can significantly simplify these equations as

$$\begin{aligned} {\varvec{\partial }}_{{\textbf{z}}_k} \mathcalligra {m}&={\varvec{\partial }}_{{\textbf{z}}_k} {\varvec{\sigma }}_k =-\left( {\varvec{\partial }}_{{\varvec{\sigma }}_{k}} {\textbf{g}}\right) ^{-1} {\varvec{\partial }}_{{\textbf{z}}_{k}} {\textbf{g}}\end{aligned}$$
$$\begin{aligned} {\varvec{\partial }}_{{\varvec{\theta }}_k} \mathcalligra {m}&= {\varvec{\partial }}_{{\varvec{\theta }}_k} {\varvec{\sigma }}_k = - \left( {\varvec{\partial }}_{{\varvec{\sigma }}_{k}} {\textbf{g}}\right) ^{-1} \left( {\varvec{\partial }}_{{\textbf{z}}_{k}} {\textbf{g}}\,{\varvec{\partial }}_{\theta } \mathcalligra {f}+{\varvec{\partial }}_{{\varvec{\theta }}_k} {\textbf{g}}\right) . \end{aligned}$$

Appendix C: GPB algorithm applied to plasticity model calibration

In GPB2, the material response modes occurring at step \(k-1\) and k can be assigned discrete switching variables \(\alpha ,\beta \in \{0,1\}\) respectively, where 0 represents being in an elastic mode and 1 in a plastic mode. The probability of being in either mode depends on the likelihood of the active mode given the observations, the transition model, and mode probabilities. We partition our exemplar into two modes, \({\mathcal {M}}^0\) and \({\mathcal {M}}^1\), which have a corresponding elastic, (34), and plastic, (35), response. This allows the yield criterion (33) to be bypassed and responses from both modes can be simultaneously carried out in order to update the mode probabilities at every step. From the perspective of material science there is a low probability of having any plastic behavior at start of an experiment and this prior knowledge can be incorporated by setting the initial mode probabilities accordingly.

Since we do not know a priori when switching occurs, we assign each mode a probability; \(\pi ({\mathcal {M}}_0| {\varvec{\theta }})\) is the prior probability before we collect any data, and \(\pi ({\mathcal {M}}_k|{\textsf {d}}_{1:k}; {\varvec{\theta }})\) is the probability after we collect data, where \({\textsf {d}}_{1:k}\) is data observed up to step k and \({\varvec{\theta }}\) are the model parameters shared between both modes. The mode probability \(\pi ({\mathcal {M}})\) update equation is derived from Bayes rule for the joint probability \(\pi ({\mathcal {M}}_{k-1}^\alpha ,{\mathcal {M}}_{k}^\beta )\) given the data \({\textsf {d}}\) up to the current step k:

$$\begin{aligned}&\pi ({\mathcal {M}}_{k-1}^\alpha ,{\mathcal {M}}_{k}^\beta | {\textsf {d}}_{k},{\textsf {d}}_{1:k-1}) \nonumber \\&\quad \propto \pi ({\mathcal {M}}_{k-1}^\alpha , {\mathcal {M}}_{k}^\beta ,{\textsf {d}}_{k} | {\textsf {d}}_{1:k-1})\nonumber \\&\quad =\pi ({\textsf {d}}_{k}|{\mathcal {M}}_{k-1}^\alpha ,{\mathcal {M}}_{k}^\beta , {\textsf {d}}_{1:k-1}) \pi ({\mathcal {M}}_{k-1}^\alpha ,{\mathcal {M}}_{k}^\beta | {\textsf {d}}_{1:k-1}) \nonumber \\&\quad =\underbrace{\pi ({\textsf {d}}_{k} | {\mathcal {M}}_{k-1}^\alpha , {\mathcal {M}}_{k}^\beta , {\textsf {d}}_{1:k-1})}_{L_{k}(\alpha ,\beta )} \nonumber \\&\qquad \times \underbrace{\pi ({\mathcal {M}}_{k}^\beta | {\mathcal {M}}_{k-1}^\alpha ,{\textsf {d}}_{1:k-1}) }_{Z(\alpha ,\beta )} \underbrace{\pi ({\mathcal {M}}_{k-1}^\alpha |{\textsf {d}}_{1:k-1})}_{W_{k-1|k-1} (\alpha )}, \end{aligned}$$

The likelihood function \(L_k(\alpha ,\beta )\) is a multivariate Gaussian distribution \( L = {\mathcal {N}}(\textbf{r}, {\textsf {A}}\varvec{\Sigma }{\textsf {A}}^T + {\textsf {R}}) \) based on the parameter distributions where \(\textbf{r}\) is the residual error and the covariance is defined in (C.4). The transition matrix \(Z(\alpha ,\beta )\) contains elements \(z_{\alpha \beta }\) which are the probability of transitioning from mode \({\mathcal {M}}^\alpha \) to \({\mathcal {M}}^\beta \). It has the form

$$\begin{aligned} Z(\alpha ,\beta ) = \begin{bmatrix} z_{00} &{} z_{01}\\ z_{10} &{} z_{11} \end{bmatrix} \end{aligned}$$

and was initialized with the following values:

$$\begin{aligned} Z(\alpha ,\beta ) = \begin{bmatrix} 0.99 &{} 0.01\\ 0.01 &{} 0.99 \end{bmatrix}. \end{aligned}$$

The reasoning for these initial values is that on the average we expect most experimental steps to remain sequentially in either the elastic or the plastic region with only a few steps centering around the yield point where switching occurs. The \(W_{k|k}(\beta ) =\sum _{\alpha }W_{k-1,k|k}(\alpha ,\beta )\) is the posterior distribution for \({\mathcal {M}}\). If the transition is partitioned between modes, the switching becomes soft, and the Kalman filter is effectively a mixture of the two discrete filter modes. Since the material response is either elastic or plastic, we manipulate the transition matrix to be binary. When the material is more likely to deform elastically with a probability of \(\pi ({\mathcal {M}}_k=0|{\textsf {d}}_{1:k},{\varvec{\theta }}_k) > 1/2\), the prior parameters \({\varvec{\theta }}_k\) are updated according to \({\mathcal {M}}^0\); else, if the material begins to deform plastically with probability \(\pi ({\mathcal {M}}_k=1|{\textsf {d}}_{1:k}, {\varvec{\theta }}_k) > 1/2\), then the model parameters are updated according to \({\mathcal {M}}^1\). Treating the calibration as separate modes allows for better estimation of both elastic and plastic parameters considering that new data can be partitioned appropriately by the mode assignment.

The GPB2 algorithm extends the Kalman filter (KF) by incorporating a Markovian jump system that models transitions between discrete behavior modes. At each sequential step \(t_{k-1} \rightarrow t_k\), the algorithm generates estimates conditioned on N modes and all possible transitions between them, resulting in \(N^2\) mode-matched KFs. The estimates are then merged using a mixing probability proportional to the likelihood of each KF to obtain the overall estimate of the system state. The interplay between the multiple KFs and the mixing probability is illustrated in Fig. 22, highlighting the algorithm’s ability to generate accurate state estimates at each time step. The algorithm begins with previous estimates (or priors) \(W^\alpha _{k-1}\), \(\hat{\theta }^\alpha _{k-1}\), and \(\Sigma ^\alpha _{k-1}\), which are the mode probabilities, mode conditioned estimates, and covariances, respectively.

Fig. 22
figure 22

A schematic of one complete cycle of the GPB2 algorithm

A complete recursive cycle of GPB2 is as follows:

  1. 1.

    Mode matched filtering: The \(N^2\) mode matched KFs takes \({\mathcal {N}}(\hat{\theta }^\alpha _{k-1},\Sigma ^\alpha _{k-1})\) and outputs \({\mathcal {N}}(\hat{\theta }^{\beta \alpha }_{k}, \Sigma ^{\beta \alpha }_{k})\) where subscript \(k|k-1\) denotes predicted statistics and k are updated statistics. Details of the Kalman filter are described in Sec. 2.2. Note that in our exemplars, the material model parameters being estimated are constant and therefore the state transition \(F_\beta \) is the identity matrix. The inclusion of \(F_\beta \) in the following filtering step shows generality for calibrating dynamic states.

    $$\begin{aligned}&\hat{\theta }_{k|k-1}^{\alpha \beta } =F_{\beta }\hat{\theta }_{k-1}^{\alpha }, \end{aligned}$$
    $$\begin{aligned}&\Sigma _{k|k-1}^{\alpha \beta } =F_{\beta } \Sigma _{k-1}^{\alpha }F_{\beta }^{T}, \end{aligned}$$
    $$\begin{aligned}&S_{k}^{\alpha \beta } =A_{\beta }\Sigma _{k|k-1}^{\alpha \beta } A_{\beta }^{T}+R_{k}, \end{aligned}$$
    $$\begin{aligned}&K_{k}^{\alpha \beta } =\Sigma _{k|k-1}^{\alpha \beta }A_{\beta }^{T} (S_{k}^{\alpha \beta })^{-1}, \end{aligned}$$
    $$\begin{aligned}&\hat{\theta }_{k}^{\alpha \beta } =\hat{\theta }_{k|k-1}^{\alpha \beta } +K_{k}^{\alpha \beta } (d_{k}-A_{\beta }\hat{\theta }_{k|k-1}^{\alpha \beta }), \end{aligned}$$
    $$\begin{aligned}&\Sigma _{k}^{\alpha \beta } =\Sigma _{k|k-1}^{\alpha \beta } -K_{k}^{\alpha \beta }S_{k}^{\alpha \beta }(K_{k}^{\alpha \beta })^{T}. \end{aligned}$$
  2. 2.

    Mixing probabilities: \(W^{\alpha \beta }_{k-1|k}\) is interpreted as the probability that mode \({\mathcal {M}}^\alpha \) was in effect at step k-1 given that \({\mathcal {M}}^\beta \) is in effect at step k conditioned on data \(d_k\). The likelihood is given by the normal distribution

    $$\begin{aligned} L(\alpha ,\beta ) = {\mathcal {N}}(\tilde{r}^{\alpha \beta }_k;0, S^{\alpha \beta }_k) \end{aligned}$$

    where \(\tilde{r}^{\alpha \beta }_k=d_{k}-A_{\beta } \hat{\theta }_{k|k-1}^{\alpha \beta }\) is the residual between the prediction and the data \(d_k\) and the mixing probabilities are calculated as

    $$\begin{aligned} W^{\alpha \beta }_{k-1|k}=\frac{L(\alpha \beta )Z(\alpha ,\beta ) W^\alpha _{k-1}}{c^\beta _k} \end{aligned}$$

    where \(Z(\alpha ,\beta )\) is the transition matrix, \(W^\alpha _{k-1}\) is a mode probability, and \(c^\beta _k\) is a normalization constant given by

    $$\begin{aligned} c^\beta _k \equiv \sum ^{N}_{\alpha =0} L(\alpha \beta )Z(\alpha ,\beta ) Z(\alpha ,\beta )W^\alpha _{k-1}. \end{aligned}$$
  3. 3.

    Merging: The previous mode history \({\mathcal {M}}^\alpha \) of \(\hat{\theta }^{\alpha \beta }_k\) and \(\Sigma ^{\alpha \beta }_k\) is marginalized out using the mixing probabilities to obtain the conditional posterior estimates and covariances given the current mode \({\mathcal {M}}^\beta \). These are calculated as

    $$\begin{aligned} \hat{\theta }^\beta _k&= \sum ^{N}_{\alpha =0} W^{\alpha \beta }_{k|k-1} \hat{\theta }^{\alpha \beta }_k \end{aligned}$$
    $$\begin{aligned} \Sigma ^\beta _k&= \sum ^{N}_{\alpha =0} W^{\alpha \beta }_{k|k-1} {[}\Sigma ^{\alpha \beta }_k + (\hat{\theta }^\beta _k -\hat{\theta }^{\alpha \beta }_k)\times (\hat{\theta }^\beta _k - \hat{\theta }^{\alpha \beta }_k)^T]. \end{aligned}$$
  4. 4.

    Update mode probabilities: The mode probabilities are updated by the sum of weighted likelihood estimates and are given by

    $$\begin{aligned} W^\beta _k&= \frac{1}{c} \sum ^{N}_{\alpha =0} {\mathcal {N}} (\tilde{r}^{\alpha \beta }_k;0,S^{\alpha \beta }_k) Z(\alpha ,\beta ) W^\alpha _{k-1} = \frac{c^\beta _k}{c} \end{aligned}$$

    where \(c \equiv \sum ^{N}_{\beta =0} c^\beta \).

  5. 5.

    Overall estimate: Combining the mode estimates by the updated mode probabilities (C.14) results in the final state estimate and covariance. These are calculated as

    $$\begin{aligned} \hat{\theta }_k&= \sum ^{N}_{\beta =0} W^\beta _k \hat{\theta }^\beta _k \end{aligned}$$
    $$\begin{aligned} \Sigma _k&= \sum ^{N}_{\beta =0} W^\beta _k [\Sigma ^\beta _k + (\hat{\theta }_k - \hat{\theta }^\beta _k)\times (\hat{\theta }_k - \hat{\theta }^\beta _k)^T]. \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Villarreal, R., Vlassis, N.N., Phan, N.N. et al. Design of experiments for the calibration of history-dependent models via deep reinforcement learning and an enhanced Kalman filter. Comput Mech 72, 95–124 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: