Abstract
Most engineering domains abound with models derived from first principles that have beenproven to be effective for decades. These models are not only a valuable source of knowledge, but they also form the basis of simulations. The recent trend of digitization has complemented these models with data in all forms and variants, such as process monitoring time series, measured material characteristics, and stored production parameters. Theoryinspired machine learning combines the available models and data, reaping the benefits of established knowledge and the capabilities of modern, datadriven approaches. Compared to purely physics or purely datadriven models, the models resulting from theoryinspired machine learning are often more accurate and less complex, extrapolate better, or allow faster model training or inference. In this short survey, we introduce and discuss several prominent approaches to theoryinspired machine learning and show how they were applied in the fields of welding, joining, additive manufacturing, and metal forming.
Similar content being viewed by others
1 Introduction
While early approaches to artificial intelligence (AI) were mostly rulebased and thus relied exclusively on expert knowledge, digitization and the advent of deep learning have triggered an era of purely datadriven modeling where the domain experts’ knowledge appears to have lost its importance. Recently—since purely datadriven modeling is approaching its limits in some application domains—researchers have started to turn back to AI’s roots to combine existing expert knowledge and data in new and promising ways. The scientific communities have realized that not only classical theorydriven models or simulations need to be augmented with available data from measurements and digitization campaigns, but that also AI algorithms need to be adapted to incorporate knowledge from the respective application domains.
In this short survey, which expands on the Portevin Lecture given by the corresponding author at the 2021 International Conference of the International Institute of Welding (IIW), we will introduce and discuss different approaches of how such domain knowledge can be included in datadriven AI or machine learning models (Section 3). We will subsume these approaches under the umbrella of theoryinspired machine learning, contrasting it from machine learning, which predominantly refers to the process of obtaining models exclusively from data. Before presenting these approaches, we will highlight the main features, advantages, and limitations of purely theorydriven and purely datadriven models, respectively, and show that combining these two paradigms has the potential to improve the tradeoffs between accuracy, computational complexity, and data requirements of the respective models (Section 2).
There exist several surveys covering theoryinspired machine learning, both general [1, 2] and domainspecific. Examples of the latter include surveys in turbulence modeling [3], computational fluid dynamics [4], civil engineering [5], chemical engineering [6], earth observation [7], chemical, petroleum, and energy systems [8], material science [9], and heat transfer modeling [10]. We take inspiration from these surveys and structure our manuscript similarly as [1, 7, 9]. Specifically, we categorize approaches to theoryinspired machine learning based on how theory and data interact (e.g., theory selects model class, theory regularizes learning), rather than based on how theory and datadriven models are connected (parallel, in series, subsystems, etc.).
The selection of presented approaches cannot be exhaustive and thus remains at least partially subjective. For one, we focus only on ways how existing theory can be utilized to improve datadriven models, namely via data preprocessing or feature engineering (Section 3.1), model selection (Section 3.2) and regularization (Section 3.3). We thus neglect information flowing in the opposite direction, i.e., we do not consider how theorydriven models can benefit from the increasing amounts of available data. As such, we do not cover datadriven parameterization of theorydriven models or defect modeling, in which datadriven models are used to compensate for overly coarse theoretical approximations. Further, we omit discussions about substituting only parts of a theorydriven model by a datadriven one. Rather, we consider these datadriven submodels as special cases of surrogate models, which we treat in Section 4. There is also a growing body of literature on the topic of hybrid or greybox models, which contain theory and datadriven components, the former often implemented via numerical solvers. While we do not discuss approaches that rely on numerical solvers as critical components, we argue that theoryinspired machine learning is a way of obtaining such hybrid models, for example, by utilizing a known functional relationship to preprocess the data prior to datadriven modeling. Finally, we briefly discuss settings in which prior knowledge is incomplete and may only encompass knowledge of causeeffect relationships (Section 5). Such settings recently received a lot of attention in the field of machine learning, and we believe that they can be put to good use in many application domains.
Our manuscript does not claim to be a complete treatment of the emerging topic of theoryinspired machine learning and hybrid modeling. Rather, it is intended as an introduction from which the interested reader can move forward. To assist the reader in this endeavor, the manuscript builds on several examples for theoryinspired machine learning from the fields of welding and joining, additive manufacturing, and metal forming. This simultaneously illustrates the presented approaches with practical applications and suggests how the existing literature can be categorized based on the concepts introduced in this survey.
2 Theory vs. datadriven modeling
To discuss the fundamental differences between theory and datadriven modeling, let us consider a simple physical phenomenon that we wish to study. The theorydriven model for this physical phenomenon may be the differential equation as depicted in Fig. 1. This differential equation is characterized by the nonlinear operator F and parameterized by a set of parameters, which we collect in the vector 𝜃. We further assume that a forcing function u(t) influences the phenomenon. We are interested in the trajectory of a quantity x describing this phenomenon. In other words, we are interested in solving the differential equation
for a known initial condition x(0) and for all t in a given time period \(\mathcal {T}\), the computational domain.
The theorydriven nature of this model is characterized by the fact that it is deduced by a theoretical understanding of the phenomenon under investigation, i.e., F is derived from existing (physical) theories. It is an inherently causal model, in the sense that the forcing function causes changes in the quantity of interest and not viceversa. However, not for every phenomenon the existing theory is sufficiently evolved, and even if it is, modeling all aspects of a phenomenon in their full details may be impractical or exhibit prohibitive computational complexity. Thus, often the true operator F is replaced by an approximation, highlighting the fundamental tradeoff between accuracy and model complexity. Finally, in many cases the parameterization 𝜃 of the model is not deducible from existing theories.
At the other end of the spectrum are datadriven models (Fig. 2). Assuming that we wish to study the same physical phenomenon of interest, suppose that we have access to a large dataset \(\mathcal {D}\) of observations. Specifically, suppose we have observed the same phenomenon for (potentially) different parameters 𝜃, different forcing functions u(t), and different initial conditions x(0), yielding different trajectories x(t) on (potentially) different computational domains \(\mathcal {T}\). I.e., we have access to a dataset^{Footnote 1}
where i indexes the separate observations. Datadriven modeling now aims at learning a mapping between the elements influencing a quantity of interest (which are called features in machine learning) and the quantity of interest (which is called the target). In other words, we are interested in finding and/or parameterizing a function f such that
is close to x(t) in some welldefined sense, where x(t) is obtained by solving (1) and where \(u(\mathcal {T})\) denotes the entire trajectory of the forcing function. In datadriven modeling, this task is often solved by minimizing a distance function between x(t) and \(\hat x(t)\) over the parameters ψ of the function f, where the distance is computed on the available (training) dataset \(\mathcal {D}\):
In (4), f is taken from a specific model class \(\mathcal {F}\). For example, if f is a linear model, then ψ are its coefficients; if f is a neural network model, then ψ are its architectural parameters, weight matrices, and bias terms. Whether one refers to the process of determining model class \(\mathcal {F}\) and parameters ψ as machine learning, curve fitting, or system identification is immaterial, in all cases we refer to the resulting model as datadriven due to its dependence on \(\mathcal {D}\).
The very nature of these datadriven models is that they model associative relationships rather than causative ones. Essentially, it is equally possible to parameterize a function \(\tilde f\) that maps the trajectory x(t) and the parameter vector 𝜃 to the forcing function u(t)—although the accuracy of the solution to this inverse problem may be much smaller than for the forward problem, especially if the inverse problem does not allow a functional description. Furthermore, while theorydriven modeling is very structured, datadriven modeling is often a trialanderror process, requiring testing several model classes or parameterizations in an iterative and exploratory manner. Furthermore, some model classes (such as neural networks) require large datasets \(\mathcal {D}\) to effectively learn their parameters ψ and, once learned, are considered black boxes lacking interpretability. Finally, datadriven models lack guarantees for physical consistency: If we select a parameterization 𝜃 far from the range covered in the dataset \(\mathcal {D}\), then the solution \(\hat x(t)\) provided by the datadriven model may not only be inaccurate, but even unphysical in the sense of violating fundamental physical laws. While the fact that datadriven models rarely extrapolate well outside of the range of training data is known as lack of generalization in the machine learning community, this shortcoming becomes much more severe when applying datadriven models in domains governed by physical laws.
These drawbacks of purely theorydriven and purely datadriven models call for action. Theoryinspired machine learning, hybrid or greybox modeling, and theoryguided data science are umbrella terms for a variety of approaches to combine the benefits of theory and datadriven modeling, mitigating their respective shortcomings. Data can be used to parameterize theorydriven models, to improve their accuracy by modeling their deficiencies, or to replace (parts of) theorydriven models for computational speedup. Insights from theory can help in selecting the model class for the datadriven model f or in preprocessing the data such that the parameters of f can be learned from less data. Finally, incorporating theory into datadriven models may guarantee (or at least improve) physical consistency and add inherent interpretability. Thus, combining the powers of theory and datadriven models has the potential to achieve better tradeoffs in terms of accuracy, computational complexity, the amounts of required data, physical consistency, and interpretability, cf. [10, Fig. 3].
3 Approaches for theoryinspired machine learning
In the following sections, we will discuss several approaches to theoryinspired machine learning, i.e., to how domain knowledge can be used to improve datadriven models. For elaborations on how theorydriven models can benefit from data, we refer the reader to other surveys on this topic [1,2,3,4,5,6,7,8, 10].
3.1 Theoryinspired feature engineering
As mentioned in Section 2, datadriven models are obtained by minimizing a certain optimization objective, evaluated on a dataset \(\mathcal {D}\), over the parameters ψ of a function f that should eventually model the relationship of interest, cf. (3). If we have prior knowledge about general properties of this relationship, we can utilize this knowledge to prepare the data such that the datadriven model can be learned more effectively (Fig. 3). For example, suppose that x(t) depends in a highly nonlinear fashion on 𝜃, while the dependence on u(t) and x(0) is much simpler. Now suppose further that we have knowledge about the nonlinear behaviour on 𝜃. Then, rather than directly minimizing (4), one may turn to finding the parameters ψ of a function f by modelling
where the function g is chosen based on our knowledge about the nonlinear behavior. Capturing this nonlinear behaviour upfront allows us to choose a less complex model class (see also Section 3.2 below) and simultaneously eases the task of datadriven modeling.
Preprocessing data to simplify datadriven modeling is often referred to as feature engineering. While feature engineering also makes use of unsupervised techniques such as dimensionality reduction or clustering, theoryinspired feature engineering utilizes domain knowledge to preprocess data. Both unsupervised and theoryinspired approaches to feature engineering are standard in traditional machine learning. However, the successes of deep learning rely to some extent on the capabilities of neural networks to learn their own features, allowing them to be applied without any pre or postprocessing. While still successful, the resulting datadriven model is usually more complex than necessary and less interpretable than desired. To give a concrete example, the authors of [11] investigated the problem of clustering patterns in electronic endofline tests in the semiconductor industry. Patterns in these tests allow the engineer to detect deviations in the manufacturing process and to react accordingly. A convolutional variational autoencoder (e.g., [12]) was designed to automatically extract features useful for subsequent pattern classification. Despite its satisfactory performance, the model remained a black box. Interpreting the tests as images, however, allowed the authors of [11] to utilize an interpretable set of features capturing well the structures that constitute the observed test patterns. After linear dimensionality reduction, the resulting features allowed a clustering performance comparable to that obtained from the convolutional variational autoencoder, but with much lower complexity and much higher interpretability. As a second example, the authors of [13] aimed for a surrogate model (see Section 4) for the energy of carbon crystal structures. While the energy landscape is highly complex, the authors achieved excellent results by performing nonlinear regression based on physically meaningful features extracted from the crystal structures, such as average bond lengths, angular and radial density distributions, and the average number of nearest neighbors.
Theoryinspired features can also improve the generalization performance of machine learning models. For example, there is a class of neural networks that can be used to solve systems of partial differential equations on regular meshes (e.g., by approximating derivatives with predefined, nontrainable convolutional filters). The authors of [14] used an elliptic transform as theoryinspired feature engineering, so that these methods can be applied to also to irregular domains. As a second example, the authors of [15] explored generalizable surrogate models for the structural analysis of 3D trusses (structures of connected triangles as in bridges). By using features that encode different geometries, the resulting models generalized better across geometries and outperformed neural network models trained on individual geometries.
Theoryinspired feature engineering has also been employed quite naturally in the fields of welding and manufacturing, e.g., for weld quality assessment. Instead of directly using acoustic emission measurement data for the machine learning model input, the authors of [16] proposed a physicsbased step to produce meaningful features such as absolute signal energy or the centroid frequency of the signal. In [17], the authors suggest to detect abnormal heat using a heat transfer model, the parameters of which are fitted to the data and subsequently used for outlier detection (e.g., via isolation forests). This method, combining offtheshelf outlier detection with theoryinspired features, has the potential to reduce testing time by 43%. Theoryinspired features were also utilized in modeling a steelsheet galvanizing production line [18]. These features included anode voltage (resistance), calculated using Kirchhoff’s laws by summing resistances over the dynamic system which includes anode voltage, electrolyte, steel voltage, and other factors. Using these theoryinspired features in training datadriven machine learning models improved the predictions on the test set. Similarly, the authors of [19] used theoryinspired features for the design of new alloys and showed that transforming data through prior physicochemical knowledge can create more accurate machine learning models for prediction of transformation temperatures. The improvement was explained by the introduction of mathematical nonlinearities given by, e.g., material growth kinetics models which give information on material behavior even in temperature ranges not available in the raw data.
Interesting use cases for theoryinspired feature engineering can also be found in the domain of additive manufacturing (AM). An example is [20], where neural networks are utilized to predict grain structure in deposition processes during AM. Instead of using complex numerical models, the authors trained neural networks to link the thermal data obtained from finite volume simulations (such as temperature gradient and the cooling rate at the liquids temperature) to microstructure characteristics. In another research paper in AM [21], the authors utilized theoryinformed features to predict porosity in selective laser melting. The raw features, being machine and laser settings, are converted to physically meaningful features such as laser energy density in a point of the material powder bed, radiation pressure, and power intensity. The engineered features are used in several nonlinear regression models (support vector regression, Gaussian processes, etc.). A further use case in laserassisted AM is the prediction of balling defects in [22]. The authors constructed theoryinspired features using 3D, transient, heat transfer, and fluid flow models. The inputs to these theorydriven models are process parameters and material properties, while the outputs are 3D temperature and velocity fields. From these outputs, physically meaningful features are computed (e.g., volumetric energy density or surface tension forces), which were subsequently used in a genetic algorithm to understand the relationship to balling defects.
3.2 Theoryinspired model selection
Another avenue to incorporate prior theoretical knowledge in a datadriven model is via an informed selection of the model class \(\mathcal {F}\) (Fig. 4). For example, knowing that the relationship we want to learn is approximately linear or piecewise constant would suggest to select f from the class of linear or decision tree models, respectively. If the relationship is known to be neither linear nor piecewise constant, then one may resort to nonlinear regression models such as polynomial regression, symbolic regression, or support vector machines, where the prior knowledge about the problem at hand can help selecting the polynomial order, candidate functions for symbolic regression, or appropriate kernel functions.
Theoretical insights about the nature of the data and the problem have further been shown useful for choosing the architecture of neural networks: convolutional neural networks [23] were shown to perform superior on images and industrial time series, recurrent neural networks [24] achieve impressive results for speech signals, and attention mechanisms [25] are now stateoftheart in natural language processing. Most recently, neural architectures have been developed that are inspired by decision trees and that achieve stateoftheart performance for tabular data, e.g., [26]. These types of architectural choices are connected with the way how the candidate function f is parameterized (e.g., the class of convolutional neural networks parameterizes f via subsequent convolutions and nonlinear activation functions), and thus influence the inductive bias of the model. An appropriately chosen inductive bias helps the optimization algorithm to select a desirable set of locally optimal function parameters ψ more reliably than if the function would be parameterized differently. A concrete example are prior dictionaries [27] in the context of physicsinformed neural networks (see Section 3.3), which are analytical or learned functions interacting with the main network and thus enforce optimization constraints (for example, boundary or initial conditions of a system of differential equations).
Prior knowledge can help in selecting the neural architecture also in a more narrow sense, such as choosing kernel sizes and stride parameters for convolutional neural networks or the number of layers and their respective widths for fully connected neural networks. This has been done, for example in the design of a neural classifier for engine knock [28]. There, the authors adjusted the kernel size in the underlying network’s initial convolutional layer according to the wavelength of expected vibrations, thus leveraging existing engineering knowledge about the frequencydependent nature of engine knock. Subsequent Fourier analyses of the trained kernel showed that it indeed amplifies the mentioned target frequencies in the input signal, leading to higher detection accuracy when compared to other parameterized models. The authors of [29] designed a convolutional neural network for fault detection in rotating machines, where the kernels in the initial layers were handcrafted based on prior knowledge about the fault modes, outperforming classical, uninformed convolutional neural networks. A similar approach was used to predict the quality of products produced with electrochemical micromachining [30]. The authors employed a fully connected neural network and assumed that the first layer automatically constructs physically meaningful features (such as current density, void fraction) from the input (voltage, pulse time, etc.). To guide the training process towards this feat, network edges that are inconsistent with the corresponding features were eliminated from the network’s first layer, yielding improved performance in all experiments when compared to an exclusively datadriven approach. In other efforts to incorporate theoretical knowledge in machine learning, physicsbased constraints have been incorporated in individual layers of Long ShortTerm Memory networks [31] to improve generalizability of the presented reducedorder model for fluid flows.
Leveraging special knowledge of welding defects, machine learning methods have also been enhanced in more detailed ways, such as changing the nature of one network layer depending on the training example [32]. Here, a customized pooling function is designed, processing the input image in a distinct way. For weld quality assessment, the authors of [16] utilized their understanding of the welding process to select a sequence model approach, which treats recorded time steps as distinct training examples, while in [33] the underlying task was distributed to multiple submodels dedicated to different subtasks. In the former case, the approach proved to be more stable than more commonly employed methods, while in the latter case the thus selected architecture is characterized by increased interpretability and trust.
3.3 Model regularization via theory
Once a model class \(\mathcal {F}\) has been selected, training the model can further benefit from existing domain knowledge. Consider the setting in Fig. 5, where a machine learning, system identification, or curve fitting algorithm is used to find a candidate function f that represents the existing dataset \(\mathcal {D}\).
Very often, the problem of finding the most suitable candidate function f (e.g., of finding the most suitable parameters ψ) within the selected model class is a nonconvex optimization problem. Furthermore, especially in the field of deep learning, this problem is often underdetermined, i.e., there are multiple candidate functions f in the model class that fit the data perfectly. In these cases it is necessary to regularize the algorithm towards prioritizing certain candidate functions over others. Classical approaches in machine learning penalize the ℓ_{2} or ℓ_{1} norms of the model parameters, leading to ridge and LASSO regression [34, Sec. 3.1.4] in linear models or weight decay regularization in neural networks [34, Sec. 5.5], respectively. Loosely speaking, these classical approaches prefer simple models over complicated ones, thus formalizing Occam’s razor. Regularization can furthermore be seen as a “soft” version of constraining the hypothesis space provided by the model class, which we have discussed in Section 3.2.
Domain knowledge can successfully be used for regularization. By appropriately setting the regularization terms, candidate functions f can be prioritized or penalized that are consistent or in conflict with existing theory. For example, in the field of fluid dynamics, we may not only aim at minimizing some ℓnorm between the ground truth flow field x(t) and its estimate \(\hat {x}(t)\), but we may also regularize f such that the vorticity fields of x(t) and \(\hat x(t)\) are similar or that (for incompressible fluids) the divergence of \(\hat {x}(t)\) is minimized [35]. While these regularizers rely on the availability of ground truth, one can also design regularizers that are based solely on properties of f as suggested by domain knowledge (in the form of algebraic or differential equations). For example, in the domain of lake temperature modeling, neural networks were regularized such that the relationship between water density and depth is monotonic, cf. [36, eq. (3.14)]. Such a physicsguided neural networks was also used in [37] to quantify microcrack defects, regularizing the network via approximate mechanistic models. Regularization can also be used to penalize symbolic regression models that violate monotonicity or boundedness constraints [38].
As mentioned in Section 2, the incorporation of domain knowledge has the potential to improve the tradeoff between the need of training data and the capability to achieve good generalization performance. Taken to the extreme, proper regularization can obviate the need for (labeled) training data altogether: One example is the work of [39], where a neural network is trained to regress the height of a falling object from a series of images. Rather than providing object heights as ground truth labels, training is based only on timestamped images and the prior knowledge that the height trajectory of falling objects is parabola. Regularizing training based on this knowledge is here sufficient to allow the neural network to extract the information of interest (i.e., the object’s height) from data that depend on this quantity (i.e., the images). Another class of models, physicsinformed neural networks (PINNs), are regularized via a known system of partial differential equations (PDEs) and can dispense with training data altogether [40]. These PINNs have the capability of solving systems of PDEs. In the setting of Fig. 1 without the forcing function u(t), PINNs take the time instances t within the computational domain \(\mathcal {T}\) of interest as input and respond with an estimate \(\hat x(t)\) of the solution of the differential equation. In their original formulation, PINNs are trained by minimizing two kinds of losses: A loss component that accounts for the initial condition x(0) (and, potentially, boundary conditions) which is provided to the PINN as training data, and a loss component that penalizes candidate solutions violating the differential equation dx(t)/dt = F(x(t);𝜃). PINNs have also been proposed for inverse problems, where the parameterization 𝜃 is learned from the PDE and its solution x(t) [41].
While PINNs are versatile, there have been numerous reports in research showing that standard PINN architectures are often hard to train. Their success and accuracy is problemspecific and typically cannot be determined apriori. One major failure mode of PINNs is their multiobjective nature, relying on data and physicsbased loss components: During model training several loss components, encoding initial and/or boundary conditions and (sets of) PDEs, compete against each other to meet to overall objective. Failing at minimizing a single objective leaves the overall objective not being fulfilled entirely. As a result, large discrepancies between learned and observed solutions are recorded. Whether an optimization algorithm can find a candidate solution \(\hat {x}(t)\) for which all loss components are low is strongly determined by the innate shape of the Pareto front in the multiobjective optimization. System parameters, such as the PDE’s parameterization or the computational domain, have a strong impact on the shape of the Pareto front [42]. Scalability is another issue in the use of PINNs. As the system dimension or complexity increases, PINNs tend to be even more difficult to train.
Proper nondimensionalization of the system under study appears to facilitate optimization. Additionally, several loss weighting techniques have been proposed that deal with the problem at hand. Loss components are either weighted manually or in an adaptive manner based on the history of recorded gradients [43,44,45]. As mentioned in Section 3.2, another approach are prior dictionaries [27], which implement hard constraints for the boundary conditions and, thus, reduce the number of objectives in the multiobjective optimization. Further modifications of PINNs include XPINNs [46], which try to break down the system complexity to multiple, smaller and simpler problems, which are solved separately by multiple PINN instances. While XPINNs show improved accuracy for certain applications, the implementation comes with the cost of computational complexity.
Despite these problems, PINNs and their variants have successfully be used in fluid mechanics [44, 47], aerodynamics [48, 49], (nano)optics [50, 51], and medical science [41, 52], to name a few. Furthermore, PINNs have been applied in solid mechanics including additive manufacturing [21, 53], elastodynamics [54,55,56], and thermal engineering [57]. As concrete example for the latter, PINNs where used in [58] to reduce the need for large datasets when predicting the temperature and melt pool dynamics during metal AM using deep learning methods. In this work, domain knowledge from first physical principles is exploited to physicsinform the learning process, resulting in accurately predicted dynamics with only a moderate amount of labeled data.
4 Datadriven models replacing costly simulations: (reducedorder) surrogate models
In many scientific disciplines, fullorder simulations have prohibitive computational complexity. Examples include computational fluid dynamics as well as multiphysics problems, that often require highresolution finite element analyses. In these cases, it may be necessary to replace the fullorder model simulation by less expensive computations. A classical example is model order reduction, where the fullorder model is replaced by a model with a smaller state space, e.g., using proper orthogonal decomposition (POD); the smaller model remains being solved by classical solver schemes. While also this approach can benefit from using machine learning (e.g., several POD bases can be learned by applying clustering techniques, thus achieving more accurate fits for individual parameter ranges [59]), in this section our focus is on replacing numerical solvers entirely by a learned model (Fig. 6).
Specifically, let us assume that we have access to a dataset \(\mathcal {D}\) of previous simulations of the fullorder model as in (2). With this dataset, it is possible to train a datadriven model that encapsulates the relationship between the respective input parameters (x(0), 𝜃, and u(t)) and the solution x(t), i.e., the datadriven model is a function f that satisfies
for all \(t\in \mathcal {T}_{i}\) and \(i=1,\dots ,N\). If the dataset is sufficiently large and diverse (e.g., the parameters 𝜃_{i} cover a large area of the parameter space), then we may assume that \(\hat {x}(t)\) is a good approximation of the true solution x(t) also for other parameters, initial conditions, and forcing functions. Then, the function f is a surrogate for the fullorder simulation. (In this sense, the PINNs discussed in Section 3.3 can be seen also as surrogate models.) Thus, while surrogate modeling requires a onetime investment in the sense of constructing a dataset \(\mathcal {D}\) based on fullorder simulations, this investment pays off once the model is trained, allowing to substitute the fullorder model at least approximately and within welldefined parameter ranges.
The problem of surrogate modeling simplifies if, instead of the entire solution x(t), only some aggregate statistic is of interest. For example, we may be interested in the solution x(T) at a given time T, or at the average of x(t) over a designated time period; if x(t) is a field, we may further be interested in values at specific positions, etc. In this case, datadriven modeling simplifies as the target to be learned has a lower dimensionality. We call this latter scenario reducedorder surrogate modeling.
There is a huge body of literature regarding surrogate and reducedorder surrogate modeling, covering various fields of science and using various types of surrogate models. For example, graph neural networks, trained on meshbased simulations, were used for surrogate modeling in aerodynamics, structural mechanics, and fabric [60]. Treebased models trained on finite element method (FEM) simulations were used to estimate the biomechanical behavior of breast tissue under compression [61] and the mechanical properties of carbon fiber reinforced plastics [62]. Kernel ridge regression was used to approximate the energy potential of carbon crystal structures to sidestep computationally costly density functional theory computations [13]. Fully connected neural networks, or multilayer perceptrons, were used as surrogate models for 3D trusses [15], the mechanical behavior of livers [63], for forming load prediction of AZ13 material [64], the grain structure of additively manufactured material [20], and the velocity field and location of neutral point of cold flat rolling [65]. In [66], the authors predict damage development in forged brake discs reinforced with AlSiC particles from damage maps using neural networks and Gaussian processes. For threedimensional turbulent flow inside a liddriven cavity, neural and random forestbased surrogate models were trained on simulation data to predict local errors as a function of coarsegrid local flow features [67].
For rapid estimation of forming and cutting forces in hot upsetting and extrusion with given process parameters, the authors of [68] utilized neural networkbased surrogates. To obtain training data, they executed FEM simulations modelling the process of hot upsetting and extrusion of a CK45 steel axisymmetric specimen, respectively, to obtain forming forces. The reducedorder surrogates rapidly computed the process load from the coefficient of friction, temperature, velocity, and heighttodiameter ratio for hot upsetting and from die angle, punch velocity, coefficient of friction, and temperature of billet for hot extrusion, respectively and were shown to interpolate well between training parameters. To estimate the forging load in hot upsetting and hot extrusion processes, the authors of [69] used gene expression programming and neural networks. Using FEM simulation data from [68], they showed that the upsetting process was wellapproximated by the gene expression programming approach, while for extrusion the neural surrogate model was superior. This connects back to our discussion in Section 2, where we mentioned that datadriven modeling is often an iterative procedure relying trialanderror, and that it is not always clear which model class will perform best for a given problem setting. From this perspective, comparative studies and similar guidelines provide useful information to the practitioner. An example for such a comparative study in the field of structural analysis can be found in [70], where the authors compared the performance of several neural and classical surrogate models.
Surrogate and reducedorder surrogate models lend themselves to being used for process or design optimization. For example, surrogate models were used in multiobjective optimization to design the shape of textured surfaces with nonNewtonian viscometrics functions [71], and Gaussian processes were used for hydropower Kaplan turbine design [72]. The authors of [73] used two singlelayer fully connected neural networks for optimizing the forging process for steel discs (the number of neurons in the hidden layer were selected using a cascade learning procedure [74]). The authors proposed a reducedorder surrogate model mapping from workpiece initial temperature, die temperature, and friction value to flank wear and temperature. The resulting model replaced FEM simulations during sequential approximate optimization. To get appropriate training data, the FEM simulations were executed for points in the feature space deemed important, indicating that domain knowledge can also enter in the selection of training data (see also [13]).
5 Incomplete prior knowledge: causal machine learning
Triggered by multiple advances in the field [75], the topic of causality has generated a lot of interest recently, especially in the machine learning community. Causal models can be seen as being located in between purely theorydriven and purely datadriven models [76], with their exact position within this spectrum determined by the availability of domain knowledge.
At one end of the spectrum, the physical phenomenon under study is well understood, e.g., its description may be given in the form of a system of differential equations (e.g., (1), see Section 2). Structural Causal Models (SCM, [77]) are built around these equations, but also integrate (unknown) noise factors, allow for explicit modelling of interventions, and distinguish between observable and/or controllable variables. From this perspective, SCMs can be seen to extend the capabilities of the theorydriven model introduced in (1). For example, while our phenomenon under study certainly has an initial condition x(0), we may only be able to determine it with some measurement noise. Similarly, while we may want to influence the phenomenon via a controlled forcing function u(t), we may only be able to set its values to within a limited precision. All these aspects can be included in SCMs. Indeed, it has been shown that ordinary differential equations can be expressed as SCMs under some (stability) assumptions, as illustrated in [78] for damped harmonic oscillators.
Closer to the other end of the spectrum are models where the available domain knowledge only accounts for the presence (or absence) of individual causal relationships. This type of domain knowledge is often represented via causal graphs [79], where nodes in the graph represent variables and directed edges indicate a direct causal relationship. To give an example, the theorydriven model (1) implies that the trajectory of the quantity of interest \(x(\mathcal {T})\) is causally affected by the forcing function \(u(\mathcal {T})\) and the initial condition x(0), leading to the causal graph depicted in Fig. 7. While the available information in this case is far less than for SCMs, the utility of such models has been shown in a number of applications.
For example, even in the simple setting of a single (unobserved) common cause and two (observed) independent effects, unlabelled data can be used to remove systematic noise from observations and hence improve the prediction performance. This has been shown exemplary for the detection of exoplanets based on satellite data [80], a task that is traditionally tackled either via theorydriven approaches in combination with simple machine learning methods (cf. Section 3.1), or limited preprocessing and complex machine learning methods (e.g., deep learning) [81].
The direction of causal relationships has been shown to be helpful in assessing the utility of unlabelled data for semisupervised classification scenarios. Of particular interest is here the anticausal case where the cause is predicted from the effect, cf. [82, Sec. 3]. Here, the distribution of the cause can be estimated better from unlabelled data if the causeeffect relationship is known [83].
Another advantage of causal models is their ability to make machine learning models robust against changes in the distribution of data, e.g., caused by varying but unknown parameters 𝜃 of the phenomenon under study. As we have discussed in Section 2, purely datadriven models do not generalize or extrapolate well outside of the range of training data. Intuitively, knowledge about the causal relationships underlying the data generation process could be used for regularization, such that the resulting model is consistent with these relationships. Indeed, it has been shown in a use case on gene expressions that varying environments and their distribution shifts are even beneficial for obtaining models [84] that generalize better.
Finally, in settings where not even knowledge about causeeffect relationships is available, causal discovery (such as structure learning or causeeffect discovery) can be applied. Successful applications range from economyrelated scenarios [85] to indoor localization [86].
6 Discussion and conclusion
Tribal knowledge in machine learning suggests that the success of a datadriven modeling problem depends on (at least) the following ingredients:

Data (i.e., amount, quality, etc.),

Modeling assumptions (i.e., what mathematical assumptions do we make about the underlying relationship that we aim to learn),

Implementation choices (i.e., how do we implement the model numerically; e.g., architectural choices for neural networks),

Objective function (i.e., based on what quantities do we decide whether learning was successful), and

Optimization algorithm (i.e., how do we determine from data the parameters of the implemented model such that the objective function is optimized).
Theory and domain knowledge can influence the selection of any of these ingredients, and in this small survey we presented several approaches how this influence can be exerted: Theory can assist selecting or even engineering appropriate features for the subsequent machine learning algorithm (data and modeling assumptions), it can help selecting the model class (modeling assumptions and implementation choices), or regularize model training to ensure consistency with established theory (objective function). Further, we have shown that theorydriven models are often used to generate training data for datadriven modeling, and that the resulting datadriven models can successfully step in for the often computationally costly theorydriven models.
Of course, the distinction between the presented approaches can sometimes be difficult. For example, structural causal models as discussed in Section 5 can be seen as a generalized framework to incorporate data into fully developed theorydriven models, while causal graphs can be used for theoryinspired model selection or regularization. As another example, consider [29], which proposed handcrafting the initial layers of a convolutional neural network based on prior knowledge about the failure modes of rotating machinery. On the one hand, this can be seen as theoryinspired model selection. On the other hand, since the first layers are thus not learnable, these handcrafted convolutional kernels can be interpreted as generating theoryinspired features for the subsequent network layers. This resonates with the fact that also the ingredients of a machine learning algorithm are strongly dependent on each other, and that in some cases modeling choice, objective function, and optimization algorithm turn out to be the different sides of the same coin, cf. [87].
Further, note that the presented approaches are not mutually exclusive. Different approaches can indeed be combined, e.g., theory can assist both model selection and feature engineering (e.g., [16]) or surrogate models can be designed based on theoryinspired features [13, 20]. PINNs can be seen as surrogate models that are trained exclusively using theoryinspired regularization, and if initial and boundary conditions are implemented via prior dictionaries, the PINN architecture is furthermore selected by theory. Indeed, theory and domain knowledge can influence the selection of any of the ingredients mentioned above, and one can expect that the performance of the resulting models will be the better the more ingredients are theoryinspired. We are thus convinced to see theoryinspired machine learning and hybrid modeling on the rise, heading towards an allencompassing synergy between knowledge and data.
Notes
Note that we require that all tuples (𝜃_{i},x^{(i)}(0),u^{(i)}(t),x^{(i)}(t)) in \(\mathcal {D}\) are distinct. However, we do not require that all elements of the tuple are distinct. For example, the dataset may comprise only a single parameterization 𝜃_{i} = 𝜃, but different initial conditions x^{(i)}(0) and forcing functions u^{(i)}(t).
References
Karpatne A, Atluri G, Faghmous JH, Steinbach M, Banerjee A, Ganguly A, Shekhar S, Samatova N, Kumar V (2017) Theoryguided data science: a new paradigm for scientific discovery from data. IEEE Trans Knowl Data Eng 29(10):2318–2331. https://doi.org/10.1109/TKDE.2017.2720168
Karniadakis G, Kevrekidis IG, Lu L, Perdikaris P, Wang S, Yang L (2021) Physicsinformed machine learning. Nat Rev Phys 3:422–440. https://doi.org/10.1038/s42254021003145
Beck A, Kurz M (2021) A perspective on machine learning methods in turbulence modeling. GAMMMitteilungen 44(1):e202100,002. https://doi.org/10.1002/gamm.202100002
Brunton SL (2021) Applying machine learning to study fluid mechanics. arXiv:2110.02083v1 [physics.fludyn]
Vadyala SR, Betgeri SN, Matthews JC, Matthews E (2021) A review of physicsbased machine learning in civil engineering. arXiv:2110.04600
Schweidtmann AM, Esche E, Fischer A, Kloft M, Repke JU, Sager S, Mitsos A (2021) Machine learning in chemical engineering: a perspective. Chemie Ingenieur Technik 93(12):2029–2039. https://doi.org/10.1002/cite.202100083
Reichstein M, CampsValls G, Stevens B, Jung M, Denzler J, Carvalhais N, Prabhat (2019) Deep learning and process understanding for datadriven earth system science. Nature 566:195–204
Zendehboudi S, Rezaei N, Lohi A (2018) Applications of hybrid models in chemical, petroleum, and energy systems: a systematic review. Appl Energy 228:2539–2566. https://doi.org/10.1016/j.apenergy.2018.06.051
Wagner N, Rondinelli JM (2016) Theoryguided machine learning in materials science. Front Mater 3:28. https://doi.org/10.3389/fmats.2016.00028
Hughes MT, Kini G, Garimella S (2021) Status, challenges, and potential for machine learning in understanding and applying heat transfer phenomena. J Heat Transfer 143(12):120,802. https://doi.org/10.1115/1.4052510
Santos T, Schrunner S, Geiger BC, Bluder O, Zernig A, Kaestner A, Kern R (2019) Feature extraction from analog wafermaps: a comparison of classical image processing and a deep generative model. IEEE Trans Semicond Manuf 32(2):190–198. https://doi.org/10.1109/TSM.2019.2911061
Kingma DP, Welling M (2014) In: Proc. Int. Conf. on Learning Representations (ICLR). Banff
Rohrhofer FM, Saha S, Cataldo SD, Geiger BC, von der Linden W, Boeri L (2021) Importance of feature engineering and database selection in a machine learning model: a case study on carbon crystal structures. Technical report: arXiv:2102.00191 [condmat.mtrlsci]
Gao H, Sun L, Wang JX (2021) PhyGeoNet: physicsinformed geometryadaptive convolutional neural networks for solving parameterized steadystate PDEs on irregular domain. J Comput Phys 428:110,079. https://doi.org/10.1016/j.jcp.2020.110079
Nourbakhsh M, Irizarry J, Haymaker J (2018) Generalizable surrogate model features to approximate stress in 3D trusses. Eng Appl Artif Intel 71:15–27
Asif K, Zhang L, Derrible S, Indacochea JE, Ozevin D, Ziebart B (2020) Machine learning model to predict welding quality using aircoupled acoustic emission and weld inputs. J Intell Manuf, 1–15
Baghbanpourasl A, Kirchberger D, Eitzinger C (2021) In Proc. IEEE Int. Workshop on Metrology for Industry 4.0 and IoT, vol 2021. https://10.1109/MetroInd4.0IoT51437.2021.9488550
Lovrić M, Meister R, Steck T, Fadljević L, Gerdenitsch J, Schuster S, Schiefermu̇ller L, Lindstaedt S, Kern R (2020) Parasitic resistance as a predictor of faulty anodes in electro galvanizing: a comparison of machine learning, physical and hybrid models. Advanced Modeling and Simulation in Engineering Sciences 7(46)
Liu S, Kappes BB, Aminahmadi B, Benafan O, Zhang X, Stebner AP (2021) Physicsinformed machine learning for composition–process–property design: shape memory alloy demonstration. Applied Materials Today, 22
Kats D, Wang Z, Gan Z, Liu WK, Wagner GJ, Lian Y (2022) A physicsinformed machine learning method for predicting grain structure characteristics in directed energy deposition. Comput Mater Sci 202(110):958. https://doi.org/10.1016/j.commatsci.2021.110958
Liu R, Liu S, Zhang X (2021) A physicsinformed machine learning model for porosity analysis in laser powder bed fusion additive manufacturing. Int J Adv Manuf Technol 113(7):1943–1958
Du Y, Mukherjee T, DebRoy T (2021) Physicsinformed machine learning and mechanistic modeling of additive manufacturing to reduce defects. Appl Mater Today 24(101):123. https://doi.org/10.1016/j.apmt.2021.101123
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Hochreiter S, Schmidhuber J (1997) Long shortterm memory. Neural Comput 9(8):1735–1780
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) In: Proc advances in neural information processing systems (NeurIPS), vol 30
Katzir L, Elidan G, ElYaniv R (2021) In: Proc. int. conf. on learning representations (ICLR) (virtual)
Peng W, Zhou W, Zhang J, Yao W (2020) Accelerating physicsinformed neural network training with prior dictionaries. arXiv:2004.08151
Ofner AB, Kefalas A, Posch S, Geiger BC (2022) Knock detection in combustion engine time series using a theoryguided 1D convolutional neural network approach. Accepted for publication in IEEE/ASME Trans. Mechatronics.; arXiv:2201.06990 [cs.LG]
Sadoughi M, Hu C (2018) In: Proc annual conf of the IEEE industrial electronics society (IECON), pp 5919–5923
Lu Y, Rajora M, Zou P, Liang SY (2017) Physicsembedded machine learning: case study with electrochemical micromachining. Machines 5(1):4
Pawar S, San O, Nair A, Rasheed A, Kvamsdal T (2021) Model fusion with physicsguided machine learning: projectionbased reducedorder modeling. Phys Fluids 33(6):067,123
Jiang H, Hu Q, Zhi Z, Gao J, Gao Z, Wang R, He S, Li H (2021) Convolution neural network model with improved pooling strategy and feature selection for weld defect recognition. Weld World 65(4):731–744
Mayr A, Lutz B, Weigelt M, Gläßel T, Kißkalt D, Masuch M, Riedel A, Franke J (2018) In: Proc 8th int electric drives production conf (EDPC), pp 1–7
Bishop CM (2006) Pattern recognition and machine learning, 1st edn. Springer
Schweri L, Foucher S, Tang J, Azevedo VC, Günther T., Solenthaler B (2021) A physicsaware neural network approach for flow data reconstruction from satellite observations. Front Clim 3:23. https://doi.org/10.3389/fclim.2021.656505
Karpatne A, Watkins W, Read J, Kumar V (2018) Physicsguided neural networks (PGNN): an application in lake temperature modeling. arXiv:1710.11431 [cs.LG]
Sun H, Peng L, Lin J, Wang S, Zhao W, Huang S (2021) Microcrack defect quantification using a focusing highorder SH guided wave EMAT: the physicsinformed deep neural network GuwNet. IEEE Transactions on Industrial Informatics, 1–1
Kronberger G, de Franca FO, Burlacu B, Haider C, Kommenda M (2021) Shapeconstrained symbolic regression – improving extrapolation with prior knowledge. Evol Comput, 1–24
Stewart R, Ermon S (2017) In: Proc. AAAI Conf on Artificial Intelligence (AAAI), pp 2576–2582
Raissi M, Perdikaris P, Karniadakis G (2019) Physicsinformed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J Comput Phys, 378
Raissi M, Yazdani A, Karniadakis G (2020) Hidden fluid mechanics: learning velocity and pressure fields from flow visualizations. Science 367(6481):1026–1030
Rohrhofer FM, Posch S, Geiger BC (2021) On the Pareto front of physicsinformed neural networks. arXiv:2105.00862 [cs.LG]
Wang S, Teng Y, Perdikaris P (2021) Understanding and mitigating gradient flow pathologies in physicsinformed neural networks. SIAM J Sci Comput 43(5):A3055–A3081
Jin X, Cai S, Li H, Karniadakis G (2021) NSFnets (NavierStokes flow nets): physicsinformed neural networks for the incompressible NavierStokes equations. J Comput Phys 426(109):951
Maddu SM, Sturm D, Müller CL, Sbalzarini IF (2021) InverseDirichlet weighting enables reliable training of physics informed neural networks. Machine Learning, Science and Technology
Jagtap AD, Karniadakis G (2020) Extended physicsinformed neural networks (XPINNs): a generalized spacetime domain decomposition based deep learning framework for nonlinear partial differential equations. Commun Comput Phys 28(5):2002–2041
Almajid MM, AbuAlSaud MO (2022) Prediction of porous media fluid flow using physics informed neural networks. J Pet Sci Eng 208(109):205
Mao Z, Jagtap AD, Karniadakis G (2020) Physicsinformed neural networks for highspeed flows. Comput Methods Appl Mech Eng 360(112):789
Hu L, Zhang J, Xiang Y, Wang W (2020) Neural networksbased aerodynamic data modeling: a comprehensive review. IEEE Access 8:90,805–90,823
Chen Y, Lu L, Karniadakis G, Dal Negro L (2020) Physicsinformed neural networks for inverse problems in nanooptics and metamaterials. Opt Express 28(8):11,618–11,633
Mishra S, Molinaro R (2021) Physics informed neural networks for simulating radiative transfer. J Quant Spectros Radiat Transfer 270(107):705
Sahli Costabal F, Yang Y, Perdikaris P, Hurtado DE, Kuhl E (2020) Physicsinformed neural networks for cardiac activation mapping. Front Phys 8:42
Zhu Q, Liu Z, Yan J (2021) Machine learning for metal additive manufacturing: predicting temperature and melt pool fluid dynamics using physicsinformed neural networks. Comput Mech 67:619–635. https://doi.org/10.1007/s00466020019529
Rao C, Sun H, Liu Y (2021) Physicsinformed deep learning for computational elastodynamics without labeled data. J Eng Mech 147(8):04021,043
Haghighat E, Raissi M, Moure A, Gomez H, Juanes R (2021) A physicsinformed deep learning framework for inversion and surrogate modeling in solid mechanics. Comput Methods Appl Mech Eng 379(113):741
Ghaderi A, Morovati V, Dargazany R (2020) A physicsinformed assembly of feedforward neural network engines to predict inelasticity in crosslinked polymers. Polymers 12(11):2628
Cai S, Wang Z, Wang S, Perdikaris P, Karniadakis G (2021) Physicsinformed neural networks for heat transfer problems. J Heat Transfer 143(6):060,801
Zhu Q, Liu Z, Yan J (2021) Machine learning for metal additive manufacturing: predicting temperature and melt pool fluid dynamics using physicsinformed neural networks. Comput Mech 67(2):619–635
Hess M, Alla A, Quaini A, Rozza G, Gunzburger M (2019) A localized reducedorder modeling approach for PDEs with bifurcating solutions. Comput Methods Appl Mech Eng 351:379–403. https://doi.org/10.1016/j.cma.2019.03.050
Pfaff T, Fortunato M, SanchezGonzalez A, Battaglia PW (2021) In: Proc. Int. Conf. on Learning Representations (ICLR)
MartínezMartínez F, RupérezMoreno MJ, MartínezSober M, SolvesLlorens JA, Lorente D, SerranoLópez A, MartínezSanchis S, Monserrat C, MartínGuerrero JD (2017) A finite elementbased machine learning approach for modeling the mechanical behavior of the breast tissues under compression in realtime, vol 90
Qi Z, Zhang N, Liu Y, Chen W (2019) Prediction of mechanical properties of carbon fiber based on crossscale FEM and machine learning. Compos Struct 212:199–206
PellicerValero OJ, Rupérez MJ, MartínezSanchis S, MartínGuerrero JD (2020) Realtime biomechanical modeling of the liver using machine learning models trained on finite element method simulations. Expert Syst Appl 143(113):083
Önder A (2019) A forming load analysis for extrusion process of AZ31 magnesium. Trans Nonferrous Metals Soc China 29(4):741–753
Gudur P, Dixit U (2008) A neural networkassisted finite element analysis of cold flat rolling. Eng Appl Artif Intel 21(1):43– 52
Roberts S, Kusiak J, Liu Y, Forcellese A, Withers P (1998) Prediction of damage evolution in forged aluminium metal matrix composites using a neural network approach. J Mater Process Technol 80:507–512
Hanna BN, Dinh NT, Youngblood RW, Bolotnov IA (2020) Machinelearning based error prediction approach for coarsegrid computational fluid dynamics (CGCFD). Prog Nucl Energy 118(103):140
Raj KH, Sharma RS, Srivastava S, Patvardhan C (2000) Modeling of manufacturing processes with ANNs for intelligent manufacturing. Int J Mach Tools Manuf 40(6):851–868
Bingöl S, Kılıćgedik HY (2018) Application of gene expression programming in hot metal forming for intelligent manufacturing. Neural Comput Applic 30(3):937–945
Hoffer JG, Geiger BC, Ofner P, Kern R (2021) Meshfree surrogate models for structural mechanic FEM simulation: a comparative study of approaches. Appl Sci 11(20):9411
Dupuis R, Jouhaud JC, Sagaut P (2018) In: Proc. AIAA/ASCE/AHS/ASC structures, structural dynamics, and materials conf., p 1905
Masood Z, Khan S, Qian L (2021) Machine learningbased surrogate model for accelerating simulationdriven optimisation of hydropower Kaplan turbine. Renew Energy 173:827–848
D’Addona DM, Antonelli D (2018) Neural network multiobjective optimization of hot forging. Procedia CIRP 67:498–503
Fahlman SE, et al. (1988) An empirical study of learning speed in backpropagation networks. Carnegie Mellon University, Computer Science Department Pittsburgh, PA USA
Pearl J, Mackenzie D (2018) The book of why: the new science of cause and effect. Basic books
Schölkopf B (2019) Causality for machine learning. arXiv:1911.10500
Bollen KA (1989) Structural equations with latent variables, vol 210. Wiley
Mooij J, Janzing D, Schölkopf B (2013) In: Proc. conf on uncertainty in artificial intelligence (UAI), pp 440–448
Suzuki E, Shinozaki T, Yamamoto E (2020) Causal diagrams: pitfalls and tips. J Epidemiol 30:153–162. https://doi.org/10.2188/jea.JE20190192
Schölkopf B, Hogg DW, Wang D, ForemanMackey D, Janzing D, SimonGabriel CJ, Peters J (2016) Modeling confounding by halfsibling regression. Proc Nat Acad Sci 113(27):7391–7398
Nikolaou N, Waldmann IP, Tsiaras A, Morvan M, Edwards B, Yip KH, Tinetti G, Sarkar S, Dawson JM, Borisov V et al (2020) Lessons learned from the 1st ARIEL, machine learning challenge: correcting transiting exoplanet light curves for stellar spots. arXiv:2010.15996
Schölkopf B, Janzing D, Peters J, Sgouritsa E, Zhang K, Mooij J (2012) In: Proc. int. conf. on machine learning (ICML). Edinburgh, pp 459–466
Castro DC, Walker I, Glocker B (2020) Causality matters in medical imaging. Nat Commun 11(1):1–10
Peters J, Bühlmann P, Meinshausen N (2016) Causal inference by using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society Series B (Statistical Methodology), 947–1012
Pamfil R, Sriwattanaworachai N, Desai S, Pilgerstorfer P, Georgatzis K, Beaumont P, Aragam B (2020) In: Proc. int. conf. on artificial intelligence and statistics (AISTATS). PMLR, pp 1595–1605
Koutroulis G, Botler L, Mutlu B, Diwold K, Römer K, Kern R (2021) KOMPOS: connecting causal knots in large nonlinear time series with nonparametric regression splines. ACM Trans Intell Syst Technol (TIST) 12(5):1–27
Achille A, Soatto S (2018) Information dropout: learning optimal representations through noisy computation. IEEE Trans Pattern Anal Mach Intell 40(12):2897–2905
Funding
Open access funding provided by Graz University of Technology. The work of Johannes G. Hoffer and Bernhard C. Geiger was partially supported by the project BrAIN. BrAIN – Brownfield Artificial Intelligence Network for Forging of High Quality Aerospace Components (FFG Grant No. 881039) is funded in the framework of the program ‘TAKE OFF’, which is a research and technology program of the Austrian Federal Ministry of Transport, Innovation and Technology.
The authors further received financial support from the Austrian COMET – Competence Centers for Excellent Technologies – Programme of the Austrian Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology, the Austrian Federal Ministry for Digital and Economic Affairs, and the States of Styria, Upper Austria, Tyrol, and Vienna for the COMET Centers KnowCenter and LEC EvoLET, respectively. The COMET Programme is managed by the Austrian Research Promotion Agency (FFG).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Johannes G. Hoffer, Andreas B. Ofner, and Franz M. Rohrhofer contributed equally to this work. The order is alphabetical.
Recommended for publication by Commission XIV  Education and Training
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hoffer, J.G., Ofner, A.B., Rohrhofer, F.M. et al. Theoryinspired machine learning—towards a synergy between knowledge and data. Weld World 66, 1291–1304 (2022). https://doi.org/10.1007/s4019402201270z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s4019402201270z