Optimizing machine learning yield functions using query-by-committee for support vector classification with a dynamic stopping criterion

Shoghi, Ronak; Morand, Lukas; Helm, Dirk; Hartmaier, Alexander

doi:10.1007/s00466-023-02440-6

Optimizing machine learning yield functions using query-by-committee for support vector classification with a dynamic stopping criterion

Original Paper
Open access
Published: 12 February 2024

Volume 74, pages 447–466, (2024)
Cite this article

Download PDF

You have full access to this open access article

Computational Mechanics Aims and scope Submit manuscript

Optimizing machine learning yield functions using query-by-committee for support vector classification with a dynamic stopping criterion

Download PDF

926 Accesses
1 Citation
Explore all metrics

Abstract

In the field of materials engineering, the accurate prediction of material behavior under various loading conditions is crucial. Machine Learning (ML) methods have emerged as promising tools for generating constitutive models straight from data, capable of describing complex material behavior in a more flexible way than classical constitutive models. Yield functions, which serve as foundation of constitutive models for plasticity, can be properly described in a data-oriented manner using ML methods. However, the quality of these descriptions heavily relies on the availability of sufficient high-quality and representative training data that needs to be generated by fundamental numerical simulations, experiments, or a combination of both. The present paper addresses the issue of data selection, by introducing an active learning approach for Support Vector Classification (SVC) and its application in training an ML yield function with suitable data. In this regard, the Query-By-Committee (QBC) algorithm was employed, guiding the selection of new training data points in regions of the feature space where a committee of models shows significant disagreement. This approach resulted in a marked reduction in the variance of model predictions throughout the active learning process. It was also shown that the rate of decrease in the variance went along with an increase in the quality of the trained model, quantified by the Matthews Correlation Coefficient (MCC). This demonstrated the effectiveness of the approach and offered us the possibility to define a dynamic stopping criterion based on the variance in the committee results.

Machine learning guided design of experiments to accelerate exploration of a material extrusion process parameter space

Article 23 November 2023

Ensembled Support Vector Machines for Meta-Modeling

Assessing decision boundaries under uncertainty

Article 29 June 2024

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In the field of materials engineering, constitutive equations are indispensable for understanding and predicting material behavior. In mechanics, constitutive equations link the strain in a material to the stress it experiences by providing a mathematical description of how a material deforms under load and the subsequent stresses that develop in response. The complexity of these equations can vary significantly, from simple linear elastic models to highly sophisticated elastoplastic or elastoviscoplastic models that account for plastic deformation and hardening effects as well as rate-dependent effects for the case of viscoplasticity. Among the tools used to describe these phenomena, yield functions are of paramount importance. They serve as a basis for constitutive models of plastic material behavior. Von Mises [1] first introduced a yield criterion, based on the hypothesis that the onset of plastic deformation occurs when the ${J}_{2}$ invariant of the Cauchy stress tensor reaches a critical value. However, the von Mises criterion is only valid for materials that exhibit isotropic plasticity and, as it is based on the deviatoric stress, the yield behavior is insensitive to hydrostatic pressure. For pressure-dependent materials, alternative yield criteria such as the Drucker-Prager [2] models are more appropriate. Despite the proven efficacy of classical constitutive equations in predicting material behavior, these models are often constrained by their inherent assumptions. Additionally, expert intervention is frequently required to modify these models to address adaptability to a wider range of scenarios. As a result, Machine Learning (ML)-based techniques that enable the generation of a surrogate model straight from data for describing complex material behavior present opportunities for improved efficiency and adaptability and thus have been investigated extensively [3,4,5,6,7,8]. The prediction capability of ML models is tightly bound to the nature of the training data; high-quality data sets can lead to models that precisely capture the intricacies of plastic deformation, while low-quality or insufficient data can result in poor performance. Consequently, the strategy for sampling the feature space and the generation of the training data sets is crucial for building accurate and robust ML models.

The following sections provide a detailed overview of various research studies that underscore the pivotal role of high-quality data and efficient data sampling strategies in enhancing the accuracy and robustness of ML models. Zhang and Mohr [9] highlighted the use of a neural network (NN) model to accurately represent the stress–strain response of a Levy–Mises solid with isotropic hardening. The study points to the necessity of reducing the size of the required training data as the feedforward models utilized in their study exhibited limited generalization capacity unless supplied with copious data, posing a challenge for their direct application in engineering materials. This further underscores the importance of sourcing high-quality data for training, possibly through virtual experiments via Representative Volume Element (RVE) analysis. Weber et al. [10] offered a novel perspective to address the challenges associated with data dependency in ML models. They introduced an NN model that integrates physical constraints, focusing on capturing elastoplastic material behavior. This method holds the potential to significantly enhance the model's generalization capabilities, even when the availability of training data is limited. Grytten et al. [11] developed an elastic–plastic constitutive model based on an adaptive distribution approach in the stress directions, recognizing that in the stress space, points tend to cluster in regions where the yield surface's gradient undergoes rapid changes. A space-filling algorithm in combination with the generalized full constrained (FC)-Taylor theory was then used to determine 690 deviatoric stress states at initial yielding. These stress states were later applied to calibrate the yield surface. Research conducted by Ibragimova [12] demonstrated the capability of neural networks trained with data from crystal plasticity (CP) simulations under unique monotonic loading conditions to accurately predict stress–strain curves and texture evolution of face-centered cubic metals. Their work employed Sobol sequences [13, 14] for data sampling that can offer an efficient way to uniformly, yet quasi-randomly, fill the domain. However, this approach resulted in a significantly large dataset, comprising 1,451,161 unique loading condition samples, to effectively train the ML model. Yang et al. [15] developed a data-oriented approach using NN models for constructing elastoplastic constitutive laws for isotopic materials. They utilized homogenized stress–strain data, extracted directly from numerical simulations conducted on an RVE. The average stress corresponding to a given average strain was numerically computed over 500 loading steps, using 22 distinct loading directions in the principal stress space. The proposed approach by Sun and Vlassis [16] utilizes a data augmentation technique that multiplies each original data point into several new ones via the creation of a signed distance function level set. The transformed dataset is used to train an NN model with a specific loss function. This method reduces the stress representation to principal stresses only, thereby decreasing the dimensionality of the input space. Furthermore, their methodology uses a polar coordinate system for yield surface interpolation and applies 140 different Lode angles to partition the $\pi $-plane. Based on the work of Shoghi and Hartmaier [17], an optimal sampling strategy was introduced for uniform sampling of the yield locus in six-dimensional (6D) stress space, using a Monte Carlo and Fibonacci sequence-based strategy. This method facilitated the training of a Support Vector Classification (SVC) model as a yield function with high-quality data. It was shown that 300 loading directions were sufficient to provide a good data-based representation of the yield locus, even for severe anisotropic cases.

The methodologies described, while effective, can sometimes be data-intensive or may not always ensure the optimal selection of informative data for model training. Moreover, these methods often result in the inefficient utilization of computational resources and time, especially for complex materials and higher dimensional feature spaces. In the context of these limitations, active learning methods offer a considerable potential for improving the data sampling process by iteratively guiding the selection of new data in the regions of feature space where the trained model has the largest uncertainties. By strategically choosing the most informative sampling points, it can contribute to building more accurate ML models for yield function prediction with fewer data points. This not only reduces the computational overhead but also allows for an enhanced generalization capability of the models. Among different active learning scenarios in literature, the idea to query for new data points, instead of labeling from a pool or stream of data, was first introduced by Angluin [18] and relies on the learners’ label request at any possible location in the input space. One of the most popular query strategies used in the materials science domain is called uncertainty sampling and is based on probabilistic models, such as Gaussian processes [19,20,21]. New data points are selected at locations where the Gaussian process model predictions show the highest standard deviation and, thus, are most uncertain. However, in this work, the aim is to train SVC models using active learning, which requires a method that incorporates SVCs directly in the active learning loop instead of using Gaussian processes. Therefore, we apply the Query-By-Committee (QBC) algorithm, which was first introduced by Seung et al. [22] as a query selection framework which allows a committee of trained models to vote on the label of a candidate. The most informative query is the instance they most disagree on. Morand et al. [23] pointed out the advantage of enabling the training of arbitrary learning models by QBC and first showcased it’s usage in the materials science domain on the basis of NN models. Using the same method, Wessel et al. [24] employed QBC for efficient sampling of virtual experiments considering the full stress state, which is then used for identifying parameters of anisotropic yield models. A Gaussian process-based approach, however, has also been introduced by Wessel et al. for the reduced stress space in an earlier work [25].

The scientific contribution of the present paper is as follows: we address a fundamental hurdle faced by all data-oriented ML methodologies—the need for high-quality, representative, and optimal training data sets. These optimal data sets are paramount in training accurate ML-based constitutive models, especially where data acquisition is computationally expensive. The novel aim of this study is to overcome this challenge by introducing an active leaning-based approach, specifically the QBC algorithm for training an ML yield function using SVC model. Embracing a 6D stress space, our method extends beyond the conventional 3D principal stress space to provide a more general description of the yield function. The active learning strategy surpasses the limitations of static learning methods, enabling the selection of more informative data for model training. We are particularly interested in investigating whether this improves the training process and overall performance of the ML model and whether a more strategic active learning approach could decrease the reliance on large quantities of training data. Furthermore, this study intends to further enhance the current understanding of the sampling process within the stress space and investigate whether active learning tends to target specific regions of the 6D stress space. Moreover, as part of this study, we introduce and strive to establish a dynamic stopping criterion for the active learning process, which leads to a more efficient use of resources and a finely tuned control over the learning process.

2 Methods

In this section, a comprehensive explanation of the ML yield function and the principle of active learning is provided, with the goal of integrating these two fundamental elements to optimize training of SVC-based yield functions. The ML yield function is detailed first, and then the principles of active learning are delved into, focusing on how it optimizes training by selecting the most informative data points. Among these principles, the QBC technique is thoroughly explained. Finally, we discuss the integration of active learning with ML yield functions, aiming to boost training efficiency and enhance prediction accuracy in SVC. While these methods might seem distinct at first glance, they are intricately interconnected, forming a cohesive and innovative approach to optimizing the learning process of SVC-based yield functions.

2.1 Machine learning yield function

The elastic–plastic deformation of materials highlights the interdependent relationship between the applied force on a material and the ensuing deformation. The yield function is a theoretical framework that demarcates the transition from elastic to plastic deformation—a point where the equivalent stress aligns with the materials’ yield strength, which can be seen as

$$f\left({\varvec{\sigma}}\right)={\sigma }_{eq}\left({\varvec{\sigma}}\right)-{\sigma }_{y}$$

(1)

Here, ${\sigma }_{y}$ is the yield strength and ${\sigma }_{eq}\left({\varvec{\sigma}}\right)$ is the equivalent yield stress, which in this work follows the definition of Hill for anisotropic materials as

$$\begin{aligned}{\sigma }_{eq}\left({\varvec{\sigma}}\right)&=\frac{1}{\sqrt{2}}\big[{H}_{1}{\left({\sigma }_{1}-{\sigma }_{2}\right)}^{2}+{H}_{2}{\left({\sigma }_{2}-{\sigma }_{3}\right)}^{2}\\&\quad+H_{3}{\left({\sigma }_{3}-{\sigma }_{1}\right)}^{2}+6{H}_{4}{\sigma }_{4}^{2}+6{H}_{5}{\sigma }_{5}^{2}+6{H}_{6}{\sigma }_{6}^{2}\big]^{1/2}\end{aligned}$$

(2)

The components ${\sigma }_{1}, {\sigma }_{2}$ and ${\sigma }_{3}$ represent the normal stresses in three mutually orthogonal directions and ${\sigma }_{4}, {\sigma }_{5}$ and ${\sigma }_{6}$ denote the three independent shear stresses. Each coefficient ${H}_{i}$ modulates the influence of its corresponding stress component, thereby encapsulating the material's anisotropic response in that specific stress direction or plane. Notably, in cases where all the ${H}_{i}$ coefficients equate to one, the equation aligns with isotropic behavior, and the equivalent stress definition resonates with the isotropic von Mises (${J}_{2}$) criterion. Given that a symmetric stress tensor can be completely described by six independent stress components, our study harnesses this 6D stress space to develop and train an ML-based yield function.

When the yield limit is reached, $f=0$, the material no longer reverts to its original shape and size upon release of the applied stress, signaling the onset of plastic deformation. Since the focus of this study is solely on onset of plastic yielding, we will assume ideal plasticity where the yield strength remains constant, independent of the material’s deformation history. This assumption simplifies the model by disregarding work hardening effects. Suggested by Hartmaier [26], the yield function can be characterized in a data-oriented approach, utilizing an ML algorithm known as Support Vector Classification (SVC). To effectively train such SVC model to serve as an ML yield function, it is crucial to provide a training dataset comprising stress tensors, each labeled with its corresponding state as either elastic or plastic. Once trained the SVC algorithm can then classify any given stress tensor ${\varvec{\sigma}}$ into distinctive "elastic" ${f}_{ML}\left({\varvec{\sigma}}\right)= -1$ and "plastic" ${f}_{ML}\left({\varvec{\sigma}}\right)=+1$ regions, thereby creating a comprehensive yet discernible map of the material's behavior under different stress conditions. The primary objective of this approach is to establish an optimal hypersurface, the yield locus, which serves as the definitive boundary separating the elastic and plastic regions. For the SVC model, the yield function is formulated as

$${f}_{ML}\left({\varvec{\sigma}}\right)=\sum_{i=1}^{{N}_{SV}}{\alpha }_{i}{y}_{i}\psi \left({\varvec{\sigma}}{,{\varvec{\sigma}}}_{i}\right)+b$$

(3)

where ${N}_{SV}$ is the number of support vectors, which are the critical data points within the training set that lie closest to the decision boundary. ${\alpha }_{i}$ represent the dual coefficients which are determined by solving the dual optimization problem. If ${\alpha }_{i}>0$, the corresponding data point ${({\varvec{\sigma}}}_{i})$ is a support vector, actively contributing to the decision boundary. If ${\alpha }_{i}=0$, the data point ${({\varvec{\sigma}}}_{i})$ does not influence the decision boundary. ${y}_{i}$ are the labels of the training data points chosen as support vectors. ${y}_{i}$ is +1 if the data point belongs to the positive class and −1 if it belongs to the negative class. $b$ is the bias term, adjusting the position of the decision boundary. For the nonlinear problem at hand, the radial basis function (RBF) kernel $\psi \left({\varvec{\sigma}}{,{\varvec{\sigma}}}_{i}\right)=\mathrm{ exp}\left(-\gamma {\Vert {\varvec{\sigma}}-{{\varvec{\sigma}}}_{i}\Vert }_{2}^{2}\right)$ is chosen, where ${\Vert .\Vert }_{2}$ denotes the Euclidean norm. The reason for choosing RBF kernel was due to its inherent flexibility and proven efficiency in handling complex, non-linear relationships present in the data, making it an optimal choice for our high dimensional classification problem, see e.g. [27, 28]. The parameter $\gamma $ determines the width of the kernel function and, consequently, the range of impact of a single training point. A smaller $\gamma $ value corresponds to a more localized influence. To find the optimal hyperplane and construct the decision boundary, the following dual optimization problem needs to be solved.

$$\underset{\alpha }{\text{max}}\left(\sum_{i=1}^{{N}_{SV}}{\alpha }_{i} - \frac{1}{2}\sum_{i=1}^{{N}_{SV}}\sum_{j=1}^{{N}_{SV}}{\alpha }_{i}{\alpha }_{j}{y}_{i}{y}_{j}\psi \left({{\varvec{\sigma}}}_{i}{,{\varvec{\sigma}}}_{j}\right)\right)$$

(4)

Subject to constraints:

$$ \begin{gathered} \mathop \sum \limits_{i = 1}^{{N_{SV} }} \alpha_{i} y_{i} = 0 \hfill \\ 0 \le \alpha_{i} \le C \hfill \\ \end{gathered} $$

(5)

where C penalizes any misclassified data points. A smaller C value implies a less severe penalty for misclassified points, leading to the selection of a wider-margin decision function at the boundary, even though it may result in a greater number of misclassifications. Conversely, a larger C value instructs the training algorithm to restrict the number of misclassified cases by applying a large penalty and a smaller decision boundary. Grid search is a commonly used method to find the optimal hyperparameters C and $\gamma $. It involves searching exhaustively through a specified subset of hyperparameters and selecting the combination that yields the best performance according to a pre-defined metric. Solving the dual problem yields the dual coefficients ${\alpha }_{i}$ and provides the necessary information to construct the decision boundary defined in Eq. 3 [29].

For training an ML yield function, it is essential to provide critical stresses that indicate the start of plastic deformation and use those yielding stresses to generate training data in elastic and plastic regions of the stress space. This process involves sampling points on the surface of a 6D unit sphere within the corresponding stress space. The next step involves determining a scalar multiplier for each unit stress direction by employing a root-finding method with the reference material's yield function. Solving for the zeros of the yield function identifies the multipliers, which, when applied to the unit stress tensors, result in stress states that lie precisely on the yield surface. These yield stresses serve as the basis for generating training data across the elastic and plastic domains of the stress space. Within the elastic region, the magnitude of these stresses is reduced by using a set of 25 linearly spaced multipliers, ranging from 0.1 to 0.95. This scaling guarantees that the stress states remain within the yield surface, with each state being labeled as “elastic” and assigned a numerical value of −1. In the plastic domain, the yield stresses are augmented using an array of 25 linearly spaced multipliers, ranging from 1.05 to 2, to position them in the plastic domain. These amplified stress states receive the label "plastic" and are assigned a numerical value of +1. Through this approach, a labeled data set is generated with the following structure:

$$\left\{\begin{array}{ll}{x}_{t}=\left[{\sigma }_{1},{\sigma }_{2},{\sigma }_{3},{\sigma }_{4},{\sigma }_{5},{\sigma }_{6}\right],&\quad {y}_{t}= +1\quad \text{plastic if}\; f\left({x}_{t}\right)\ge 0 \\ {x}_{t}=\left[{\sigma }_{1},{\sigma }_{2},{\sigma }_{3},{\sigma }_{4},{\sigma }_{5},{\sigma }_{6}\right],&\quad {y}_{t}= -1 \quad \text{elastic if}\; f\left({x}_{t}\right)<0\end{array}\right.$$

(6)

In this context ${x}_{t}$ represents the feature vector in form of a scaled stress tensor and ${y}_{t}$ indicates the labels. As a result, for each unit direction, 50 stress tensors are generated—25 within the elastic region and 25 within the plastic region—each representing a labeled data point. This procedure results in a suited dataset that captures a wide spectrum of stress states, providing the ML model with the necessary information to accurately distinguish between elastic and plastic regions of the stress space. Following this step, the prepared training data can be used for ML training. Figure 1 offers a simplified schematic representation of the method for generating and labeling training data within a two-dimensional stress space, under plane stress condition. It is important to note that this illustration depicts equivalent stresses for clarity, whereas the actual method employs the full 6D stress tensor.

Convexity is an important characteristic for a yield function to ensure it represents material behavior accurately. This convexity ensures physically consistent behavior, preventing any unpredictable transitions or non-physical manifestations. By utilizing optimal hyperparameters, the SVC algorithm can maintain the convexity observed in the training data when forming the decision boundary. As a result, SVC is capable of naturally reflecting this crucial characteristic without requiring any additional enforcement of convexity constraints. This characteristic makes SVC a suitable ML algorithm for developing models intended to serve as yield functions, ensuring both accuracy and adherence to the necessary physical properties.

Given that the ML yield function is defined as convolution sum over the support vectors, the gradient to the SVC decision function can be calculated as

$$\frac{\partial {f}_{ML}\left({\varvec{\sigma}}\right)}{\partial{\varvec{\sigma}}} = \sum_{i=1}^{{N}_{SV}}{-2\gamma \alpha }_{i}{y}_{i}{\text{exp}}\left(-\gamma {\Vert {\varvec{\sigma}}-{{\varvec{\sigma}}}_{i}\Vert }_{2}^{2}\right)\left({\varvec{\sigma}}-{{\varvec{\sigma}}}_{i}\right)$$

(7)

from which the plastic strain increments can be derived directly for plasticity models based on a normality rule, which are typically used within a standard finite element formulation.

2.2 Active learning principles

Unlike the most conventional learning methods where the training data remains static, active learning involves an iterative, targeted selection of training examples based on the current state of the learning algorithm. In this paradigm, the learner (algorithm) is no longer a passive recipient of data but rather an active participant in its collection [30].

In the context of ML, the concept of version space plays a crucial role. As introduced by Mitchell [31] it refers to the set of all hypotheses or models that can explain the data seen so far, as shown in Fig. 2. Essentially, the version space encompasses all possible hypotheses or models that can accurately explain the data seen so far. As more data or training examples are observed, hypotheses that are inconsistent with the new data are removed from the version space. The remaining hypotheses after all the data has been observed are those that are consistent with all the training examples. This is where the active learning approach exhibits its strength. By strategically requesting the most informative examples, the learning algorithm can effectively navigate the version space, aiming to find the hypothesis or model that not only fits the training data but also generalizes well to unseen data. The active learning strategy, such as Query-By-Committee, aims to efficiently constrain this version space, enabling a more precise search [30].

The QBC method, first introduced by Seung et al. [22], offers an approach to active learning that seeks to minimize the inherent prediction uncertainty of an ensemble of models. The main concept of the QBC method is having a committee of models, of which each is trained on currently available data. In the context of classification, each committee member is allowed to vote on the categorization or label of a new data instance. The novel data instance that elicits the most disagreement or conflicting votes among the committee, quantified by a voting entropy measure or similar metric, is chosen for labeling. This process is conducted iteratively, with each new labeled instance being added to the data set, serving to refine and improve the models within the committee. The guiding principle of this method is the optimization of anticipated information-gain from querying a new instance. It operates under the assumption that the instances causing the most disagreement within the committee are likely to provide the most valuable learning insights. In active learning, obtaining new data instances within the design or feature space can be achieved by using a variety of methods. Fundamentally, there are three different data selection strategies: (i) Pool-based sampling, where models evaluate a provided pool of unlabeled data. (ii) Stream-based selective sampling, in which models evaluate data instances as they appear in a data stream. (iii) Membership query synthesis, where models can freely evaluate locations within a specified feature space, thereby independently creating new data instances and querying their label. Each of these strategies focuses on efficiently navigating the feature space, identifying, and labeling the instances that cause the most disagreement among the committee of models [30].

When a new data instance is procured, the crucial question of how to measure the level of disagreement among the models in the committee arises. Cohn et al. [32, 33] suggested using variance as a metric to quantify this disagreement. By generating queries that minimize this variance, the potential for future prediction errors can be reduced. As suggested by Krogh et al. [34], variance can be defined as:

$${s}^{2}\left(x\right)=\frac{1}{N}\left(\sum_{\eta =1}^{N}{\left({f}_{\eta }\left(x\right)-\overline{f }\left(x\right)\right)}^{2}\right)$$

(8)

In this equation N denotes the number of committee members, ${f}_{\eta }\left(x\right)$ is the prediction of the ηth model and $\overline{f }\left(x\right)$ is the mean over all predictions at location x. Based on this formulation the next location to query is determined by solving an optimization problem and finding the new data point in the feature space, i.e., stress space, at which the variance among the committee members is maximized as

$${x}^{*}=\underset{x}{{\text{arg max}}}\left({s}^{2}\left(x\right)\right)$$

(9)

In this approach we generate training data from unit stresses in a 6D stress space encompassing both normal and shear components as detailed in Sect. 2.1. Each unit stress is then systematically escalated until reaching the zero of the yield function indicating the onset of plastic yielding. This approach produces a comprehensive collection of 6D stress tensors right at the threshold of plastic yielding. Given this methodology, it becomes essential to sample points on the surface of a unit sphere in the 6D stress space. Some prior studies, such as the work by Wessel et al. [24], have sought to ensure sampling on a unit sphere using a soft constraint. However, this method is not efficient due to the significant computational resources required to probe infeasible areas of the search space, specifically points outside the unit sphere. To overcome this problem, in this study a novel method is proposed based on transforming the problem so that potential solutions naturally lie on the surface of the 6D unit sphere. This transformation is achieved by shifting from Cartesian to spherical coordinates. In spherical coordinates, one parameter represents the radius (r), and five others correspond to angles $\left({\theta }_{1},\dots ,{\theta }_{5}\right)$. For a unit sphere, the radius is invariably fixed at 1, meaning only the five angles need to be optimized. Consequently, any set of potential angle values corresponds to a point on the sphere, eliminating the requirement for a penalty term to enforce adherence to the constraint. In this approach, the task of optimization focuses on finding the values of the angles that maximize the objective function. By incorporating the constraint directly into the problem formulation, this method enables a more efficient and reliable optimization. In this case, the objective function can be described as:

$${\theta}^{*}={\text{arg}}\,\mathop{\max}\limits_{\theta}\left({s}^{2}\left(\theta \right)\right)$$

(10)

To solve this equation, Storn and Price [35] proposed the Differential Evolution (DE) algorithm. This algorithm, typically utilized for the minimization of functions in continuous space, holds a significant advantage due to its inherent ability to find global minima in optimization problems. In the DE algorithm, an initial population of candidate solutions is generated in the bounded search space. Each solution vector (a so-called individual) in the population represents a potential solution to the optimization problem. During the mutation phase, the algorithm creates a mutant vector for each individual by combining the vectors of three other distinct individuals from the current population. The resultant mutant vector is a potential candidate for the next generation. Following mutation, a crossover operation is performed to create a trial vector by mixing the mutant and target vectors. The extent of this mixing is governed by a crossover rate parameter. In the selection phase, a greedy strategy is applied where the trial vector competes with the original individual. The one that provides better fitness according to the objective function is selected to proceed to the next iteration (so-called generation). This process of mutation, crossover, and selection is repeated for a specified number of iterations or until a stopping criterion is met. As a result, the DE algorithm gradually evolves the population towards the optimal solution. After sampling the in spherical coordinates, the transformation to Cartesian coordinates on the unit sphere is given by:

$$ \begin{aligned} x_{1} = & \cos \left( {\theta_{1} } \right) \\ x_{2} = & \sin \left( {\theta_{1} } \right)\cos \left( {\theta_{2} } \right) \\ x_{3} = & \sin \left( {\theta_{1} } \right)\sin \left( {\theta_{2} } \right)\cos \left( {\theta_{3} } \right) \\ x_{4} = & \sin \left( {\theta_{1} } \right)\sin \left( {\theta_{2} } \right)\sin \left( {\theta_{3} } \right) \cos \left( {\theta_{4} } \right) \\ x_{5} = & \sin \left( {\theta_{1} } \right)\sin \left( {\theta_{2} } \right)\sin \left( {\theta_{3} } \right) \sin \left( {\theta_{4} } \right) \cos \left( {\theta_{5} } \right) \\ x_{6} = & \sin \left( {\theta_{1} } \right)\sin \left( {\theta_{2} } \right)\sin \left( {\theta_{3} } \right) \sin \left( {\theta_{4} } \right) \sin \left( {\theta_{5} } \right) \\ \end{aligned} $$

(11)

where ${\theta }_{1},{\theta }_{2},{\theta }_{3},{\theta }_{4}$ are in range $\left[0,\pi \right]$, and ${\theta }_{5}$ is in range $\left[\mathrm{0,2}\pi \right]$.

In this work, we apply the methodology proposed by Raychaudhuri et al. [36], which involves training models on a random subset of the initial training dataset. This approach is particularly vital when working with SVC models. These models, when trained with identical hyperparameters on the same dataset, yield identical results, which contradicts the requirements for the QBC approach. The QBC approach relies on variation among the committee members' predictions. To introduce this variation, each committee member is assigned a random subset of the dataset in each active learning iteration. A comprehensive grid search is performed in each iteration for each of these models in the committee to find the optimal hyperparameters.

2.3 Active learning for machine learning yield function

Following the work of Shoghi and Hartmaier [17] for training an ML yield function using SVC, in our study, we considered two reference materials that exhibit different levels of complexity: (i) an isotropic material, characterized by Hill coefficients of one, and (ii) an anisotropic material, defined by a varying range of Hill coefficients. The exploration of anisotropic material behavior using this methodology provides the possibility to test the model's ability to generalize across diverse material behaviors, thereby evaluating its reliability and applicability in broader contexts. Both materials are defined using the open-source package pyLabFEA [37] which introduces a simple version of finite element analysis for solid mechanics and elastic–plastic materials, fully written in Python. In accordance with the study conducted by Shoghi and Hartmaier [17] the parameters employed for defining both materials are summarized in Table 1.

Table 1 Parameters defined for the isotropic and anisotropic reference material

Full size table

The described reference materials are used for creating training data for ML training, which is accomplished by first creating unit stresses in 6D stress space with normal and shear components. As mentioned in Sect. 2.1, each unit stress is then increased proportionally until the yield function of the stress tensor reaches zero. At this point, plastic yielding begins for the specific load case. The collection of full 6D stress tensors at the onset of plastic yielding is compiled to represent the yield function in a data-oriented manner and forms the ground truth for training the ML yield function. Using these yield stress tensors as basis, the training data across both elastic and plastic domains is generated, respectively labeled −1 and +1, through methods of downscaling and upscaling, as described in Sect. 2.1. The significance of this 6D stress space representation lies in its ability to capture the complexity of material behavior under various stress conditions, paving the way for more realistic and detailed yield function descriptions. It should be noted that no hardening is considered in this work and the goal is representing the initial yield locus. To initialize the active learning procedure, an initial training set of stresses is generated using the random function available in the NumPy package [38], which generates a random array with 6 components from a uniform distribution within (−1, +1) range. By normalizing, we make sure that any generated data point is located on the surface of a unit 6D sphere. Following this, a committee of five Support Vector Classifiers is trained, each receiving a random subset of the training data, which constitutes 80% of the total dataset. The DE algorithm is then implemented, directed by a sampling scheme aimed at maximizing the variance in the committee's predictions. This optimal solution is used to generate the next unit stress tensor which, as detailed in Sect. 2.1, is scaled, labeled, and then added to the existing dataset. New unit tensors are sampled in areas where the committee's predictions disagree the most, thus facilitating the training of the models on more complex and less-explored areas of the problem space and enhancing the capabilities of the models. After each new unit tensor is sampled, and the training data set is updated, the committee is retrained, and the DE is performed again with the updated dataset. This iterative process continues until a certain termination condition, such as reaching a predetermined number of iterations, is met. Throughout this process, visual inspections of the trained yield functions and support vectors are conducted to assess the quality of the models. Additionally, the variance in the committee's predictions across iterations is monitored and graphed, providing insight into the effectiveness of the active learning procedure. The iterative procedure of QBC strategy used for defining an SVC model can be seen in Fig. 3. The algorithm for the active learning process can be seen in Table 2.

Table 2 Algorithm for QBC as is used in the present paper

Full size table

2.4 Stop criteria

2.4.1 Variance and rate of performance improvement

In active learning, as opposed to its static counterpart, the establishment of well-defined stopping criteria is necessary to ensure optimal resource utilization and securing model efficiency. Margatina et al. [39] identified this challenge as the determination of the 'optimum' stopping point beyond which the model's learning is considered sufficient. According to their discussion, 'optimal' is largely domain-dependent and requires a careful balance between accuracy and cost. Bloodgood [40] investigated development of efficient and adaptable stopping methods. Based on this work, it was suggested to explore user-adjustable stopping criteria that account for different annotation/performance tradeoff valuations, which could offer users the flexibility to choose a stopping criterion that aligns best with their specific requirements. The assignment of a fixed number of iterations in active learning is rendered impractical and illogical due to two intrinsic characteristics of the process. Firstly, the training often commences with random data points of varying sizes, introducing a degree of unpredictability. Secondly, the algorithm employed for optimization in active learning typically exhibits a stochastic nature. Therefore, the criteria must adapt based on the current performance of the model, leading us to the question: when should the QBC strategy stop its search for new data points for our specific use case?

Variance, as a reflection of the uncertainty in the model predictions, is one of the measures that is typically considered. However, relying solely on variance as a stopping criterion presents a challenge—it neglects the critical aspect of data exploration and can be overly sensitive to minor fluctuations. Also, for a user, setting an a priori level of variance can pose a significant challenge. To mitigate these issues, we propose a more dynamic approach. Instead of defining the variance to reach a fixed minimum value, we suggest a dynamic threshold. This threshold is defined by the user as a desired percentage reduction from the maximum variance observed during the initial n iterations (in this study, we use n = 10), which can adapt based on the specific nature of the problem. This approach is particularly effective as it aligns with the inherent dynamics of the learning process; as it progresses, the disagreement among committee members is expected to reduce. This leads to a decrease in the prediction variance of the committee members, underscoring the relevance of a dynamically defined threshold. Additionally monitoring the rate of change in variance can serve as an additional stop criterion. This can provide a robust way to handle fluctuations and identify when the model has reached a point of stability similar to early stopping for neural network training [41]. By observing the variance's change from one iteration to the next, we can determine the rate of uncertainty reduction. When this rate falls below a specified threshold, indicating minimal decrease in variance, it is reasonable to stop the active learning process. By considering both the magnitude of the variance and the rate of change in variance it is possible to formulate a robust stopping criterion considering the general decreasing trend in variance without being overly influenced by minor fluctuations. The proposed dual stopping criterion is applicable not only to QBC learning processes but also to other approaches like Gaussian processes.

For the practical usage of this dual stopping criterion: firstly, the variance that quantifies the committee disagreement as described in Eq. 8 must fall below a predefined value, ${{\varepsilon }_{{\text{crit}}}}_{1}$ which can be chosen based on a desired percentage reduction from the maximum variance value observed during the first $n$ iterations. For $\alpha $ being the desired percentage reduction, ${{\varepsilon }_{{\text{crit}}}}_{1}$ can be defined as:

$$ \begin{gathered} \varepsilon_{{{\text{crit1}}}} = {\text{max}}\left( {s^{2} \left( x \right)_{0 } , s^{2} \left( x \right)_{1} , \ldots , s^{2} \left( x \right)_{n } } \right) \left( {1 - \left( {\alpha / 100} \right)} \right) \hfill \\ s^{2} \left( x \right)_{i} < \varepsilon_{{{\text{crit1}}}} ,\quad 0 < \alpha < 100\% \hfill \\ \end{gathered} $$

(12)

This allows the user to adapt the stopping criteria according to their specific needs and the characteristics of the data being used.

Secondly, the rate of change of the variance over a sequence of iterations must reach a critical threshold ${{\varepsilon }_{{\text{crit}}}}_{2}$, which denotes that the committee disagreement is not significantly decreasing anymore. This rate is computed as the difference between the maximum and minimum variances within the sequence, divided by the sequence length $\Delta t$ (the number of iterations per sequence). Essentially, this measures the slope of the variance within the sequence as:

$$ \begin{aligned} &\frac{{\Delta s^{2} }}{\Delta n} = \frac{{\left( {\max \left( {s^{2} \left( x \right)_{i - n } , s^{2} \left( x \right)_{i - n + 1} , \ldots , s^{2} \left( x \right)_{i } } \right) - \min \left( {s^{2} \left( x \right)_{i - n } , s^{2} \left( x \right)_{i - n + 1} , \ldots , s^{2} \left( x \right)_{i } } \right)} \right)}}{\Delta n} \hfill \\ &\left| {\frac{{\Delta s^{2} }}{\Delta n}} \right| < \varepsilon_{{{\text{crit2}}}} \hfill \\ \end{aligned} $$

(13)

The learning process stops only when both conditions are met and can be summarized as:

$${\text{if }} {s}^{2}{\left(x\right)}_{i}< {{\varepsilon }_{{\text{crit}}}}_{1}\quad {\text{and}}\quad \left|\frac{{\Delta s}^{2}}{\Delta n}\right|< {{\varepsilon }_{{\text{crit}}}}_{2}:{\text{stop}}$$

(14)

2.4.2 Validation of the stopping criteria

Monitoring the variance and its rate of change can provide insights into the reduction of the committee disagreement and acts as a potential stopping criterion during the learning process. However, to validate the model's predictive performance and generalization ability, a testing process is crucial. This assessment aids in verifying the reliability of the variance-based stopping criterion. While initial testing validates the criterion's effectiveness, users can subsequently use the variance-based approach independently to determine when to halt the learning process. Such an evaluation process involves examining the model's actual predictive performance and its capability to generalize learning to unseen data. During testing, the definition of unbiased test cases is of paramount importance. The model's performance can be assessed using a proper score after each active learning iteration. A higher rate of improvement is typically expected at the start of the learning process, which is likely to slow down over time.

Here, we use the Confusion Matrix, which is a powerful tool often utilized in supervised ML to evaluate the performance of classification models. It provides a comprehensive overview of the prediction results compared to the actual classifications, neatly presenting both correct predictions and the types of errors made. The structure of a typical confusion matrix is shown in Fig. 4.

The confusion matrix consists of four main components: True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN). True Positives are the correctly identified positive cases, and True Negatives are the correctly identified negative cases. On the other hand, False Positives are negative cases that were incorrectly identified as positive, and False Negatives are positive cases incorrectly identified as negative. These components provide valuable insights into the model's performance. For instance, a high number of True Positives and True Negatives indicates a model’s good predictive power. In contrast, a high number of False Positives and False Negatives indicates that the model is struggling to make accurate predictions. Based on the confusion matrix, different metrics such as accuracy, precision, recall, and F1 score can be calculated to evaluate the performance of the trained classification model. The Matthews Correlation Coefficient (MCC) is regarded as a superior metric for evaluating binary classifications in ML, particularly in situations involving imbalanced data sets. This is because MCC takes into account all values in the confusion matrix, rather than concentrating on a single dimension. The MCC is a correlation coefficient between the observed and the predicted binary classifications; it returns a value between −1 and +1. A coefficient of +1 represents a perfect prediction, 0 an average random prediction, and −1 an inverse prediction. This makes it a balanced measure, even when the classes are of very different sizes. MCC is generally regarded as a balanced metric because it considers both over-predictions and under-predictions. For instance, a model with a high number of False Positives and False Negatives will be penalized in the MCC score [42]. The MCC score is calculated as

$$MCC=\frac{(TP)(TN)-(FP)(FN)}{\sqrt{\left(TP+FP\right) \left(TP+FN\right) \left(TN+FP\right) \left(TN+FN\right)}}$$

(15)

However, an effective application of the confusion matrix hinges on the careful definition of test cases. Creating these cases can pose a challenge, particularly in complex or nuanced domains. It is essential that the test cases accurately capture the diversity and distribution of the data the model is expected to encounter in real-world scenarios. Additionally, a balance between positive and negative cases is crucial to avoid a bias in the model's performance assessment. Poorly defined or biased test cases can lead to a skewed confusion matrix, compromising the reliability of the derived performance metrics. Therefore, establishing robust, representative test cases is critical to fully harness the evaluative power of the confusion matrix and to ensure a fair and accurate assessment of the model's ability to generalize its predictions. To facilitate a comprehensive evaluation, 5 test sets have been created, each including 600 test points situated closely to the decision boundary (hyperplane) as shown in Fig. 5. 300 of the test cases are in the elastic area and 300 in the plastic area, thus forming a balanced test set. These test cases will be used in all the evaluations of this work.

For the test cases, a similar approach is taken to define randomly distributed unit vectors in the 6D stress space. Using the yield function, the scale in each direction to the ideal decision boundary (the critical point where a material undergoes a transition from elastic to plastic deformation) can be determined. To generate test data points that lie close to this decision boundary, the calculated scale is multiplied by factors of 0.99 and 1.01, respectively, which results in two additional sets of points—one just below the yield point (representing elastic states) and another just above the yield point (representing plastic states). The sets of test points form the near boundary test lines as depicted in Fig. 5. The proposed near-boundary test lines lie within a critical zone or margin that extends from 0.95 to 1.05, surrounding the hyperplane located at 1.0 approximated by the SVC model. The test line at 1.01 resides just above the decision boundary, falling within the upper part of this critical margin. Conversely, the test line at 0.99 is just below the boundary, occupying the lower part of the margin. These test cases located near the decision boundary often represent challenging instances for the model to correctly classify. Yet, these test cases offer invaluable insights into the model's capacity to generalize from the training data. If the model can successfully classify these instances, this strongly indicates high model performance and robustness. These near-boundary test lines are not only positioned strategically close to the decision boundary but are also randomly generated, thus introducing a valuable element of variability to the testing process, ensuring that the model's performance is evaluated across a diverse range of instances.

In evaluating the model's performance across iterations, the emphasis is not placed on the exact values. Rather, the focus is on the observable trends in these metrics over time. This approach is based on the understanding that inherent randomness in the learning process can cause fluctuations in the exact values, which may not necessarily reflect the overall learning trajectory of the model. By comparing trends instead of exact values, a more meaningful assessment of the model's learning progress and generalization ability can be achieved.

3 Results and discussion

3.1 Active learning results: the most informative points

3.1.1 Isotropic material

The training process, as detailed earlier, commenced with an initial training set of 100 random unit stresses in a 6D stress space. Figure 6 illustrates the results of the training after 200 iterations. This figure plots the ${J}_{2}$ equivalent stress of the stress tensors at the onset of plastic yielding against the polar angle of the stress tensor in the π-plane, which represents the deviatoric plane in the space of principal stresses. The yield function values are represented by two color schemes: orange and purple. Orange colors signify positive yield function values, indicative of plastic yielding, while purple colors denote negative values of the yield function, indicating that the stress is within the elastic regime. It is important to note that no color scale is provided since the absolute value of the ML yield function has no physical significance. The blue and black lines in the figure represent the stress points where the yield function is zero for the reference material. The blue line corresponds to an analytically formulated Hill-like yield criterion, while the black line corresponds to the trained ML yield function. The open symbols in the figure denote the support vectors identified during the training procedure.

As can be seen, a good agreement between the trained ML function (black line) and the Hill yield locus (blue line) can be observed after the final iteration. In Fig. 7, the evolution of variance and the MCC score over the course of iterations in the learning process is shown.

Each data point, represented by a blue dot, corresponds to the variance in each iteration. The variance values have been displayed on a logarithmic scale to allow for a detailed understanding of the alterations, specifically for lower variance values. A general diminishing trend in variance was noticed as the iterations increased, signifying a decrease in the prediction uncertainties as training progressed. To smoothen out short-term fluctuations and highlight longer-term trends or cycles, the moving average over a window of 10 iterations has been computed for variance, signifying a gradual decrease in the model's uncertainty as training progressed. This is represented by a solid blue line, delineating the reduction in the logarithmic variance throughout the active learning process.

The line corresponding to ${{\varepsilon }_{{\text{crit}}}}_{1}$ which represents the dynamic threshold for committee disagreement, is superimposed on the plot. This threshold, set at 10⁻², reflects a 99.98% reduction from the maximum variance of 50, observed during the initial 10 iterations. This line is depicted in the figure to visually represent the first stopping criterion based on the defined threshold.

The MCC is demonstrated in red. The moving average over 10 iterations has been calculated for MCC, indicating an overall positive trend in the predictive power of the model, which is represented by a solid red line, demonstrating the gradual improvement in binary classifications of the model over time. The x-axis labels are positioned at multiples of 10, enabling a clear distinction of the corresponding variance and MCC for every tenth iteration. From the graph, it is evident that as the number of iterations increased, the variance decreased while the MCC improved, validating the effectiveness of the active learning process in our model. It should be noted that reaching an MCC of 1 is not always possible or even expected in many real-world situations. Also, this might not be a consistent outcome, even under the same conditions.

It is crucial to note that due to the inherent randomness in the learning process, particular attention should be paid to the moving averages rather than the exact values of MCC scores. Owing to the stochastic nature of the initial training set selection, if the learning process were to be repeated, both the MCC score values and the rate at which the optimal score is reached could vary. This unpredictability underscores the importance of observing overall moving averages rather than focusing on precise metric values. The moving averages offer a more accurate estimation of the model's learning trajectory and its ability to generalize and make accurate predictions over time. By focusing on these trends, a more comprehensive understanding of the model's learning progression can be obtained, even amid the potential variability in the learning outcomes caused by the randomness in the process. This approach is vital for ensuring the reliable evaluation and interpretation of the model's performance throughout the active learning iterations. This suggests that the model's learning progression has either plateaued or drastically slowed. Further training under these conditions often leads to minimal performance improvements, making it an inefficient practice.

To address the critical need for establishing a stopping criterion, the rate of change of variance was calculated. This involved an analysis of variances over blocks of iterations, where each block comprised of 10 iterations. The minimum and maximum variance within each block were identified and used to calculate the slope representing the rate of change. This analysis is graphically represented in Fig. 8. On the x-axis, the iteration number is plotted, while on the y-axis, the calculated rate of change in variance is plotted. It can be assumed that the larger the slope, the more significant the improvement in the model over these iterations. Additionally, a line denoting the critical threshold ${{\varepsilon }_{{\text{crit}}}}_{2}$ value of 0.001 is also depicted. This critical slope line serves as a threshold, indicating when the rate of change in variance has reached the desired minimum level as per our second stopping criterion.

In this case, after approximately 95 iterations, the rate of change of variance had reached the predefined critical value, and the variance itself had also fallen within the predefined acceptable range as shown in Fig. 7. This satisfied both our first and second stopping criteria. Thus, extending the training beyond this point would have likely yielded only marginal improvements, reinforcing the efficiency of our proposed stopping criteria.

3.1.2 Anisotropic material

To further validate the versatility and general applicability of the proposed active learning method, it was applied to an anisotropic material. This choice of material introduces additional complexity due to the direction-dependent properties, thus presenting a more challenging scenario for the method under consideration. The results shown in Fig. 9 display a good agreement between the ML-trained yield function (represented by the black line) and the anisotropic Hill yield locus (denoted by the blue line) after the final iteration.

In Fig. 10, the learning process in an anisotropic case is displayed. As in Fig. 7, variance is represented in blue, while MCC is shown in red. Variance, displayed on a logarithmic scale, is denoted by blue dots, each representing an active learning iteration. A general declining trend in variance is observed as the number of iterations increases, which signifies a reduction in the committee's disagreement throughout the learning process. This decreasing trend is represented by the solid blue line, computed as the moving average over a window of 10 iterations. A line representing the dynamic threshold ${{\varepsilon }_{{\text{crit}}}}_{1}$ is superimposed on the plot. This threshold, set at 10⁻² reflects a user-defined reduction of 99.98% from the maximum initial variance, providing the acceptable level of variance according to our first stopping criterion. Red dots are used to denote the MCC values. The moving average over a span of 10 iterations is calculated for the MCC, illustrating a generally positive trend, reflecting a gradual improvement in the model's binary classification capability over the course of the learning process.

As in the previous case, the randomness inherent in the learning process requires the focus to be placed on the moving averages rather than on the exact MCC scores. Observing the overall moving averages provides a more accurate depiction of the model's learning trajectory and its capability to generalize and make accurate predictions over time.

Like in the previous case, the rate of change of variance was evaluated, which involved an examination of variances over sets of iterations, with each set comprising 10 iterations. The minimum and maximum variance within each set were discerned and utilized to compute the slope indicative of the rate of change. This analysis is visually presented in Fig. 11, where the iteration number is plotted on the x-axis, while the calculated rate of change in variance is depicted on the y-axis, allowing us to infer that a larger slope corresponds to a more significant improvement in the model over the iterations in question. A line corresponding to the critical threshold ${{\varepsilon }_{{\text{crit}}}}_{2}$ of 0.001 is also added to the plot, signifying the rate of change threshold according to our second stopping criterion. In this case, like the isotropic scenario, after about 120 iterations, the rate of change of variance falls below the critical value, suggesting that further training does not result in significant improvements. Furthermore, the variance values also fall within the acceptable range set by our first criterion, confirming that a training size of approximately 220 data points can be considered sufficient.

3.2 Influence of size of initial training set

In active learning, the size of the initial training set is a critical factor influencing the learning process. This initial set is the primary source from which the model acquires knowledge of the given task. Many active learning strategies necessitate a substantial quantity of initially labeled data to achieve a certain level of quality. This enables the model to 'warm up' and subsequently function optimally. Prior to this warm-up phase, random selection often surpasses most active learning strategies in performance. This situation is typically identified as a high-budget regime. The term 'cold start' as investigated by Zhu et al. [43] refers to the limited capacity of a model to capture uncertainty, a problem that is particularly pronounced in low-budget regimes. In these scenarios, budgetary constraints lead to a smaller initially labeled dataset, exacerbating the model's difficulty in handling uncertainty [44, 45].

Based on these considerations, as part of this study, we aim to explore how the size of the initial training set affects model performance, given a fixed budget of 200 training data instances. For instance, we might begin with 20 initially labeled instances followed by 180 active learning iterations. Alternatively, we could start with a larger seed set of 40 instances, leading to 160 subsequent active learning iterations. Variations on this theme will be tested to explore the range of possible outcomes. The principal motivation behind this method is to delineate the trade-offs between the size of the initial training set and the number of active learning iterations under budget constraints. This approach promises to shed light on how best to allocate resources for optimal learning outcomes. For example, we aim to ascertain whether an increased initial seed set improves model performance sufficiently to counterbalance a reduced number of active learning iterations. Conversely, a more advantageous pathway might emerge in having a smaller initial seed set coupled with an increased number of opportunities for the model to actively learn from new instances. The insights gleaned from this investigation will significantly enhance our understanding of the effects of initial model configuration on long-term learning efficiency within given budgetary parameters.

Building on this, our experiment was designed to test initial training sizes of 20, 40, 80, 100, and 120 instances within a total data budget of 200. For each of these scenarios, the training was done using an anisotropic material as described in Sect. 3.1.2. After training, the average MCC scores using five distinct test sets as described in Sect. 2.4 were recorded and shown in Fig. 12.

Based on Fig. 12, the MCC scores improve with an increase in the initial training size up to a point. Starting from the lowest initial training size of 20, the minimum MCC score was just 0.15 and the maximum 0.89. As the initial training size increased to 40, the minimum MCC increased significantly to 0.41 and the maximum dipped slightly to 0.83. A substantial leap in performance is observed when the initial training size reaches 80, yielding a minimum MCC of 0.75 and a maximum MCC of 0.95. However, the crucial consideration is the trade-off between the initial training size and the subsequent active learning iterations. The results show that the optimal point is reached with an initial training size of 100 instances. Here, the MCC ranges from 0.79 to 0.98, indicating both a higher minimum and maximum performance than with smaller initial training sizes. Furthermore, increasing the initial training size to 120 did not improve the performance (with MCC values ranging from 0.78 to 0.98), and slightly reduced the minimum MCC compared to the initial size of 100. Given these findings, it can be concluded that, within the budget of 200 training instances, starting with 100 initially labeled instances provides the most effective balance between the warm-up phase and the number of active learning iterations. It should be noted here that the performance of SVC is heavily influenced by the initial training data, and later improvement of active learning is constructed based on that initial warm up. At very low initial training sizes the initial performance is poor, and constructing the learning process based on that in a high-dimensional feature space cannot further improve the performance compared to increasing the initial training size. However, it is important to note that this does not imply that active learning is ineffective. On the contrary, it highlights the importance of reaching an adequate warm-up phase before expecting significant improvement through active learning.

3.3 Comparison of active learning with static learning using random and uniform training data

This section comprehensively examines and evaluates three data generation strategies: Active Learning, Uniform Data, and Random Data. The motivation behind this comparison lies in understanding their effectiveness under various resource constraints, shedding light on their adaptability and performance across different budget sizes—in this case limited to the range between 140 and 200. As shown in Sect. 3.1.2, after this range the active learning strategy is not capable of further improving the quality as the rate of change in variance is reduced, and continuing the learning is not efficient. In the static learning models, the initial training set comprises the budget-defined number of data points (here stresses). The stresses can either be distributed randomly or uniformly on the surface of a unit sphere with a ratio of 6D to 3D unit stresses equal to 2 as suggested by Shoghi and Hartmaier [17]. For the active learning model, the size of the initial training set is fixed at 100. The remaining budget is utilized for iterative generation of new data points, based on the learning derived from the initial set. For each case, training was performed using an anisotropic material as described in Sect. 3.1.2. The MCC scores were recorded post-training across five distinct test sets, as described in Sect. 2.4. The result is shown in Fig. 13.

Figure 13, illustrates the comparison of the average MCC, see Sect. 2.4, against various budget levels in three different learning strategies: Static Learning Using Uniform Data, Active Learning, and Static Learning Using Random Data. The x-axis represents the 'Budget', while the y-axis represents the 'average MCC' score. Each strategy is represented as a separate line plot with distinct colors: blue for the Uniform Data, red for the Active Learning data, and green for the Random Data. Each point on the line represents a specific budget value and its corresponding average MCC. From the plot, the performance of each strategy at different budget levels can be observed. A direct comparison of the strategies at a given budget level can be made by comparing the y-values (average MCC) for a particular budget. This representation enables the effectiveness of each learning strategy under different budget and offers a comparative analysis of various learning strategies under varying budget conditions, providing insights into their performance and efficiency. This might be helpful for selecting the most suitable learning strategy based on the resources available and could be particularly beneficial in scenarios where data collection is costly or time-consuming. Based on such analysis, the trade-offs between time, cost, and achieving a specific score can be understood. For instance, in the active learning approach, half of the budget (100) is used for an initial warm-up phase with the remaining amount allocated for iterative learning. This upfront cost can be offset by the strategic selection of informative data points during the iterative phase, potentially resulting in quicker achievement of high scores and greater overall efficiency. In contrast, static learning strategies, both with uniform and random data, utilize the entire budget. When the budget is not restricted, the static learning approach using uniformly distributed data can perform comparably well with active learning due to the even coverage of the data space. However, the same cannot be said for static learning with randomly selected data. Given the lack of strategic or even distribution in data selection, this approach may require more resources or time to reach similar performance levels as the other strategies.

3.4 Analysis of the sampled points

In this section, the objective is to thoroughly examine the specific regions from which most of the information is gathered by the active learner and to determine whether certain areas are favored for data sampling. The 6D sampling space is subdivided such that the first three dimensions are represented by normal components and the remaining three are represented by shear components. To investigate whether active learning methods display a preference for specific regions or maintain a consistent proportion within these two types of components, a ratio of shear components over normal components is introduced. If ${\varvec{\sigma}}=\left({\sigma }_{11},{\sigma }_{22},{\sigma }_{33},{\sigma }_{23},{\sigma }_{13},{\sigma }_{12}\right)$ is a data point in our search space, this ratio can be defined as:

$$\frac{{\text{Shear}}}{{\text{Normal}}}=\frac{\sqrt{6\left({\sigma }_{23}^{2}+{\sigma }_{13}^{2}+{\sigma }_{12}^{2}\right)}}{\sqrt{\left({\left({\sigma }_{11}-{\sigma }_{22}\right)}^{2}+{\left({\sigma }_{11}-{\sigma }_{33}\right)}^{2}+{\left({\sigma }_{33}-{\sigma }_{11}\right)}^{2}\right)}}$$

(16)

In Fig. 14, the distributions of the ratio of shear components to normal components for iterations 0, 25, 50, 100 of the active learning process are illustrated. For each iteration, a histogram has been plotted that depicts the frequency of occurrence of different $Shear/Normal$ ratios within the sampled points. The transparency in the histograms allows for the comparison of the distributions at various iterations, with darker areas indicating overlaps in the distributions. Additionally, the trend line for each iteration aids in understanding the shape and spread of the distributions.

Based on the insights drawn from Fig. 14, it can be observed that the learner consistently selects areas with a shear to normal ratio around 0.6, irrespective of the iteration. The persistent preference for regions with this ratio could imply that they are particularly valuable during the learning process, especially in the early stages. Shoghi and Hartmaier [17] noted that utilizing only 6D stresses may not provide a comprehensive representation for the training, thereby potentially underrepresenting the subspace of normal stresses. The same calculation was done for the anisotropic case, and as can be seen in Fig. 15, the average ratio is also 0.6. This can further emphasize the necessity for the targeted sampling of areas with a lower ratio of shear to normal stress.

4 Conclusion

In materials engineering, constitutive equations play a pivotal role in predicting how materials will respond when subjected to a load. Central to these models are yield functions, providing the fundamental framework for understanding plasticity. Machine Learning (ML) approaches offer a novel way to craft these yield functions directly from data, adeptly capturing the complex behaviors of materials. However, the efficacy of these ML models relies significantly on the quality of training data. For ML to accurately reflect and predict complex material behaviors, it's important to provide high-quality, representative training data. Building on this foundation, this paper delves into the critical process of generating optimal training data using Query-By-Committee (QBC) algorithm, specifically designed for Support Vector Classification (SVC). The essence of the QBC approach is to identify new training data from feature areas where there's a significant divergence between multiple model predictions, ensuring a more comprehensive and efficient training set.

Starting with different initial training sets, the effectiveness of active learning combined with SVC was examined over a series of iterations, for an isotropic and an anisotropic reference material with key metrics such as variance and the Matthews Correlation Coefficient (MCC) being evaluated for a test data set. The observed trends—a consistent decrease in variance and a general increase in MCC—affirm the effectiveness of the active learning approach. It was concluded that reaching a user-defined value of variance, determined by a desired percentage reduction from the maximum variance, alongside a reduction in the rate of variance, can be considered as a dynamic stopping criterion that reliably indicates the convergence of the training process to an accurate model. When the rate of change is not significant in comparison with the previous value, it signifies that the model's learning progression has plateaued or at least significantly slowed down. At such a juncture, further training of the model becomes inefficient as the performance improvements are marginal, which was observed for isotropic material after 95 iterations and for anisotropic material after 120 iterations.

Our study highlights the importance of the size of the initial training set and its crucial role in the subsequent learning process. As in any active learning strategy it is necessary to have an optimal size of initial training data to reach a certain level of performance quality, which we referred to as warm-up phase. This crucial phase allows the model to establish a solid foundation before it can function at its best. Our investigation was done in a fixed-budget scenario with a predefined maximum number of training data points to establish a fine-tuned approach to finding a balance between the initial training size with sufficient quality and possible gain in the learning process. Our findings underscore that while increasing the initial training size can indeed improve model performance, there is an optimal point of balance between the initial training size and the number of active learning iterations. In this work, starting with 100 initially labeled instances, within the established budget of a total of 200 training data points, yielded the best performance of the final model. This demonstrated that having a substantial initial seed set effectively primes the model, allowing for optimal performance in the subsequent active learning phases. On the other hand, increasing the initial training size beyond this point did not result in significant further improvement. These findings contribute valuable insights for efficient resource allocation and decision-making concerning initial model configuration for long-term efficiency in active learning, considering budgetary constraints.

Our comparative evaluation of three different learning strategies—Active Learning, Static Learning Using Uniform Data, and Static Learning Using Random Data—sheds light on their effectiveness and adaptability under varying resource constraints. Active learning was more efficient in achieving higher scores faster, especially when resources are limited. When resources are plentiful, static learning using uniform data can compete with active learning. However, static learning using random data will likely still underperform without additional resources, reaffirming the importance of strategic data selection even within static learning strategies. This result implies that both strategies have their unique strengths, and the choice between them may depend on the specific budgetary constraints and the performance goals at hand.

The thorough examination of the specific regions within the six-dimensional sampling space from which the active learner predominantly gathers information revealed a consistent preference for areas with a shear to normal stress ratio of around 0.6, regardless of the iteration stage. This finding, which emanates from an initial random distribution, implies that these regions might hold a particular value during the learning process. This becomes more apparent during the early learning stages, when the model is still in the process of establishing its foundational knowledge. This persistent pattern may also emphasize the necessity for a more targeted sampling strategy, particularly focusing on areas with the given ratio of shear to normal stress components.

Data and code availability

The code and data used in this study are available upon request via email to the corresponding author.

References

von Mises R (1928) Mechanik der plastischen Formänderung von Kristallen. ZAMM-Journal Appl Math Mech für Angew Math und Mech 8(3):161–185
Article Google Scholar
Drucker DC, Prager W (1952) Soil mechanics and plastic analysis or limit design. Q Appl Math 10(2):157–165
Article MathSciNet Google Scholar
Ibanez R, Abisset-Chavanne E, Aguado JV, Gonzalez D, Cueto E, Chinesta F (2018) A manifold learning approach to data-driven computational elasticity and inelasticity. Arch Comput Methods Eng 25(1):47–57
Article MathSciNet Google Scholar
Chinesta F, Ladeveze P, Ibanez R, Aguado JV, Abisset-Chavanne E, Cueto E (2017) Data-driven computational plasticity. Procedia Eng 207:209–214
Article Google Scholar
Fuhg JN, Hamel CM, Johnson K, Jones R, Bouklas N (2023) Modular machine learning-based elastoplasticity: generalization in the context of limited data. Comput Methods Appl Mech Eng 407:115930
Article MathSciNet Google Scholar
Jung S, Ghaboussi J (2006) Characterizing rate-dependent material behaviors in self-learning simulation. Comput Methods Appl Mech Eng 196(1–3):608–619
Article Google Scholar
Nascimento A, Roongta S, Diehl M, Beyerlein IJ (2023) A machine learning model to predict yield surfaces from crystal plasticity simulations. Int J Plast 161:103507
Article Google Scholar
Soare SC, Diehl M (2023) Calibration and fast evaluation algorithms for homogeneous orthotropic polynomial yield functions. Comput Mech, pp 1–21
Zhang A, Mohr D (2020) Using neural networks to represent von Mises plasticity with isotropic hardening. Int J Plast 132:102732
Article Google Scholar
Weber P, Wagner W, Freitag S (2023) Physically enhanced training for modeling rate-independent plasticity with feedforward neural networks. Comput Mech, pp 1–31
Grytten F, Holmedal B, Hopperstad OS, Børvik T (2008) Evaluation of identification methods for YLD2004-18p. Int J Plast 24(12):2248–2277
Article Google Scholar
Ibragimova O, Brahme A, Muhammad W, Levesque J, Inal K (2021) A new ANN based crystal plasticity model for FCC materials and its application to non-monotonic strain paths. Int J Plast 144:103059
Article Google Scholar
Sobol IM (1967) On the distribution of points in a cube and the approximate evaluation of integrals. Zhurnal Vychislitel’noi Mat i Mat Fiz 7(4):784–802
MathSciNet Google Scholar
Sobol IM (1976) Uniformly distributed sequences with an additional uniform property. USSR Comput Math Math Phys 16(5):236–242
Article Google Scholar
Yang H, Qiu H, Xiang Q, Tang S, Guo X (2020) Exploring elastoplastic constitutive law of microstructured materials through artificial neural network—A mechanistic-based data-driven approach. J Appl Mech 87(9):91005
Article Google Scholar
Vlassis NN, Sun W (2021) Sobolev training of thermodynamic-informed neural networks for interpretable elasto-plasticity models with level set hardening. Comput Methods Appl Mech Eng 377:113695
Article MathSciNet Google Scholar
Shoghi R, Hartmaier A (2022) Optimal data-generation strategy for machine learning yield functions in anisotropic plasticity. Virtual Mater Des 879614154
Angluin D (1988) Queries and concept learning. Mach Learn 2:319–342
Article MathSciNet Google Scholar
Bessa MA et al (2017) A framework for data-driven analysis of materials under uncertainty: countering the curse of dimensionality. Comput Methods Appl Mech Eng 320:633–667
Article MathSciNet Google Scholar
Balachandran PV, Xue D, Theiler J, Hogden J, Lookman T (2016) Adaptive strategies for materials design using uncertainties. Sci Rep 6(1):1–9
Article Google Scholar
Kalidindi SR (2019) A Bayesian framework for materials knowledge systems. MRS Commun 9(2):518–531
Article Google Scholar
Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the fifth annual workshop on computational learning theory, pp 287–294
Morand L, Link N, Iraki T, Dornheim J, Helm D (2022) Efficient exploration of microstructure-property spaces via active learning. Front Mater 8:628
Article Google Scholar
Wessel A, Morand L, Butz A, Helm D, Volk W (2022) Machine learning-based sampling of virtual experiments within the full stress state to identify parameters of anisotropic yield models. arXiv:2211.00090
Wessel A, Morand v, Butz A, Helm D, Volk W (2021) A new machine learning based method for sampling virtual experiments and its effect on the parameter identification for anisotropic yield models. In: IOP conference series: materials science and engineering, vol 1157, no 1, p 12026
Hartmaier A (2020) Data-oriented constitutive modeling of plasticity in metals. Materials (Basel) 13(7):1600
Article Google Scholar
Smola AJ, Schölkopf B (1998) Learning with kernels, vol 4. Citeseer
Thurnhofer-Hemsi K, López-Rubio E, Molina-Cabello MA, Najarian K (2020) Radial basis function kernel optimization for support vector machine classifiers. arXiv:2007.08233
Pedregosa F et al (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet Google Scholar
Settles B (2009)Active learning literature survey
Mitchell TM (1977) Version spaces: a candidate elimination approach to rule learning. In Proceedings of the 5th international joint conference on artificial intelligence, vol 1, pp 305–310
Cohn D (1994) Neural network exploration using optional experiment design. Massachusetts Inst Of Tech Cambridge Artificial Intelligence Lab
Cohn D, Ghahramani Z, Jordan M (1994) Active learning with statistical models. Adv Neural Inf Process Syst 7
Krogh A, Vedelsby J (1994) Neural network ensembles, cross validation, and active learning. Adv Neural Inf Process Syst 7
Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359
Article MathSciNet Google Scholar
RayChaudhuri T, Hamey LGC (1995) Minimisation of data collection by active learning. In: Proceedings of ICNN’95—international conference on neural networks, vol 3, pp 1338–1341
Hartmaier A, Menon S, Shoghi R (2022) Python laboratory for finite element analysis (PyLabFEA). Zenodo
Harris CR et al (2020) Array programming with NumPy. Nature 585(7825):357–362
Article Google Scholar
Margatina K, Aletras N (2023) On the limitations of simulating active learning. arXiv:2305.13342
Bloodgood M, Vijay-Shanker K (2014) A method for stopping active learning based on stabilizing predictions and the need for user-adjustable stopping. arXiv:1409.5165
Prechelt L (2002) Early stopping-but when? In: Neural networks: tricks of the trade. Springer, pp 55–69
Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1):6
Article Google Scholar
Zhu Y et al (2019) Addressing the item cold-start problem by attribute-driven active learning. IEEE Trans Knowl Data Eng 32(4):631–644
Article Google Scholar
Attenberg J, Provost F (2010) Why label when you can search? Alternatives to active learning for applying human resources to build classification models under extreme class imbalance. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 423–432
Hacohen G, Dekel A, Weinshall D (2022) Active learning on a budget: opposite strategies suit high and low budgets. arXiv2202.02794

Download references

Acknowledgements

AH gratefully acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Project-ID 190 389 738—TRR 103. LM and DH gratefully acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Project-ID 415 804 944.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Interdisciplinary Center for Advanced Materials Simulation (ICAMS), Bochum, Germany
Ronak Shoghi & Alexander Hartmaier
Fraunhofer Institute for Mechanics of Materials IWM, Freiburg, Germany
Lukas Morand & Dirk Helm

Authors

Ronak Shoghi
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Morand
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Helm
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Hartmaier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ronak Shoghi.

Ethics declarations

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Shoghi, R., Morand, L., Helm, D. et al. Optimizing machine learning yield functions using query-by-committee for support vector classification with a dynamic stopping criterion. Comput Mech 74, 447–466 (2024). https://doi.org/10.1007/s00466-023-02440-6

Download citation

Received: 09 September 2023
Accepted: 30 December 2023
Published: 12 February 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s00466-023-02440-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Optimizing machine learning yield functions using query-by-committee for support vector classification with a dynamic stopping criterion

Abstract

Similar content being viewed by others

Machine learning guided design of experiments to accelerate exploration of a material extrusion process parameter space

Ensembled Support Vector Machines for Meta-Modeling

Assessing decision boundaries under uncertainty

1 Introduction