A multi-task learning-based optimization approach for finding diverse sets of microstructures with desired properties

Iraki, Tarek; Morand, Lukas; Dornheim, Johannes; Link, Norbert; Helm, Dirk

doi:10.1007/s10845-023-02139-8

A multi-task learning-based optimization approach for finding diverse sets of microstructures with desired properties

Open access
Published: 26 May 2023

Volume 35, pages 1887–1903, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Intelligent Manufacturing Aims and scope Submit manuscript

A multi-task learning-based optimization approach for finding diverse sets of microstructures with desired properties

Download PDF

Tarek Iraki ORCID: orcid.org/0000-0001-8236-8367¹,
Lukas Morand²,
Johannes Dornheim³,
Norbert Link¹ &
…
Dirk Helm²

1670 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Optimization along the chain processing-structure-properties-performance is one of the core objectives in data-driven materials science. In this sense, processes are supposed to manufacture workpieces with targeted material microstructures. These microstructures are defined by the material properties of interest and identifying them is a question of materials design. In the present paper, we addresse this issue and introduce a generic multi-task learning-based optimization approach. The approach enables the identification of sets of highly diverse microstructures for given desired properties and corresponding tolerances. Basically, the approach consists of an optimization algorithm that interacts with a machine learning model that combines multi-task learning with siamese neural networks. The resulting model (1) relates microstructures and properties, (2) estimates the likelihood of a microstructure of being producible, and (3) performs a distance preserving microstructure feature extraction in order to generate a lower dimensional latent feature space to enable efficient optimization. The proposed approach is applied on a crystallographic texture optimization problem for rolled steel sheets given desired properties.

A predictive machine learning approach for microstructure optimization and materials design

Article Open access 23 June 2015

A Review of Application of Machine Learning in Design, Synthesis, and Characterization of Metal Matrix Composites: Current Status and Emerging Applications

Article 14 May 2021

Self-supervised optimization of random material microstructures in the small-data regime

Article Open access 21 March 2022

Introduction

Motivation

The demand for more and more specific and individually designed products with certain performance requirements has become a driving force in the world of manufacturing. For this reason, the optimization along the causal chain processing-structure-properties-performance (Olson, 1997) became a fast growing research topic in the field of integrated computational materials engineering (ICME) (Panchal et al., 2013). Nowadays, such optimization problems can be solved efficiently with the help of machine learning techniques (Ramprasad et al., 2017). On this background, in a previous work, we investigated the use of reinforcement learning for finding optimal processing routes in a simulated metal forming process aiming to produce microstructures with targeted crystallographic textures (Dornheim et al., 2021). To bridge the remaining gap between microstructures and desired properties, we focus in this work on solving materials design problems. These are to identify appropriate material microstructures or microstructural features (e.g. the crystallographic texture) for given desired properties. It is thereby of particular importance to identify sets of near-optimal and preferably diverse microstructures in order to guarantee a robust design (McDowell, 2007).

Paper structure

In the following we summarize the related work and point out the contribution of this paper to the actual research. In “Methods” section, first, we describe the siamese multi-task learning and optimization approach. Then, we introduce the fundamentals in materials modeling that are needed for the purpose of this work. After that, in “Results” section, the results are shown when applying the approach to a texture optimization problem for steel sheets (in particular, we fit the material model to DC04 steel). In “Discussion” section, the presented results are discussed. Finally, in “Summary and Outlook” section, we summarize our findings and give an outlook on further research.

Related work

A generic approach to solve materials design problems is the microstructure sensitive design (MSD) approach introduced in Adams et al. (2001). Following Fullwood et al. (2010), MSD can be described by seven steps. First, the properties of interest as well as candidate materials have to be defined. After that, a suitable microstructure definition is applied for these materials yielding a microstructure design space. On this basis, relevant homogenization relations are identified and applied over the whole design space. The resulting property closure can be used to select desired properties, which are then mapped back to the microstructure design space in order to identify optimal microstructures. The last step of MSD aims to determine processes and processing routes needed to produce the identified microstructure.

The works by Adams et al. (2001) and Kalidindi et al. (2004) instantiate the MSD approach for texture optimization. The first one describes how optimal crystallographic textures can be identified in order to improve the deformation behavior of a compliant beam. In the latter, a similar approach is shown to optimize the crystallographic texture for the design of an orthotropic plate. The core of both approaches lies in the usage of a lower dimensional spectral representation of the orientation distribution, cf. Bunge (2013). For more complex microstructure representations, like two-point correlations, feature extraction methods can be applied to reduce the dimensionality. Methods that are used in the context of materials design are principal component analysis (PCA) (Paulson et al., 2017; Gupta et al., 2015) and multidimensional scaling (Jung et al., 2019) for example. A general review of dimensionality reduction techniques can be found in Van Der Maaten et al. (2009).

Besides the MSD approach, also machine learning-based approaches for crystallographic texture optimization exist. Liu et al. (2015) and Paul et al. (2019) describe iterative sampling approaches that interact with crystal plasticity simulations aiming to identify crystallographic textures for given desired properties. Therefore, an initial set of texture-property tuples (crystallographic textures and corresponding macroscopic properties) is generated. Via supervised learning, significant features of the parameterized orientation distribution (and in Liu et al. (2015) also significant regions) are identified that yield optimal or near-optimal solutions. Based on the identified features and regions, new texture-property data points are sampled in order to get closer to the optima.

Another approach for identifying optimal textures is described in Kuroda and Ikawa (2004). Therein, a real-coded genetic algorithm (Goldberg, 1991) is described that interacts with a crystal plasticity model in order to find optimal combinations of typical rolling texture components of face-centered cubic metal (Cu, Brass, S, Cube and Goss) for given desired properties. The algorithm starts with an initial set of textures consisting of different fractions of these components. The set of textures evolves iteratively by combining them using operators such as mutation, crossover and selection (Herrera et al., 1998).

Recent works (i.e., Jung et al. (2020) and Kamijyo et al. (2022)) use Bayesian optimization for microstructure design. In Kamijyo et al. (2022), a deep neural network is used for the estimation of mechanical properties. On this basis, Bayesian optimization is used to determine optimal volume fractions of texture components of aluminum (cf. Kuroda and Ikawa (2004)) for a desired formability. For designing complex microstructure models, in Jung et al. (2020), the use of the latent space of a convolutional autoencoder as a low dimensional design space is proposed. Within this design space, Bayesian optimization is adopted to search for optimal dual phase microstructures for given desired properties (i.e, tensile strength).

Predicting dual phase microstructure properties using convolutional neural networks (CNN) is also used in Mann and Kalidindi (2022), however, to explore the properties space defined by the material stiffness. The CNN architecture was developed for approximating the highly non-linear microstructure–property linkage, while using also two-point spatial correlation functions of the microstructure as input.

A further convolutional approach is described in Tan et al. (2020), in which a deep convolutional generative adversarial network (DCGAN) and a CNN is proposed for the design of materials. Therein, the CNN links the micostructures to its properties and acts as a surrogate model, whereas the DCGAN generates design candidates for a desired compliance tensor.

Summarized, for the solution of microstructure design problems, a linkage from properties to microstructures is required. Such a linkage is often achieved by genetic or optimization algorithms that interact with numerical simulations. However, as these algorithms generally need a lot of function evaluations, it is not reasonable to apply them to complex numerical simulations directly. Instead, the performance can be increased by using numerically simpler surrogate models, see for example (Simpson et al., 2001). Typically, these are supervised learning models that learn the input–output relations of the numerical simulation under consideration.

To run optimization algorithms in combination with supervised learning models it is necessary to limit the region in which they operate to the region, which is covered by the training data. One way to achieve this is by training unsupervised learning models on the input data, as it is done in Jung et al. (2019) for example using support vector machines (SVM). From a machine learning perspective such an approach can be seen as anomaly detection. Anomaly detection aims to separate data that is characteristically different from the known data of the sample data set, which has been used for training. An extensive overview of anomaly detection methods is given in Chandola et al. (2009). Moreover, Ruff et al. (2021) and Chalapathy and Chawla (2019) gives an overview on recent deep learning-based approaches for anomaly detection, from which we want to point out neural network-based autoencoders (Hinton & Salakhutdinov, 2006), which fit especially well into multi-task learning (MTL) (Caruana, 1997) schemes other than SVMs.

Autoencoder approaches assume that features of a data set can be mapped into a lower dimensional latent feature space, in which the known data points differ substantially from unknown data points. By backmapping into the original space, anomalies can be identified by evaluating the reconstruction error, see for example Sakurada and Yairi (2014). In Sakurada and Yairi (2014) it is also shown that autoencoder networks are able to detect subtle anomalies, which cannot be detected by linear methods like PCA. Furthermore, autoencoder networks require less complex computations compared to a nonlinear kernel-based PCA.

Recent developments in anomalie detection include deep learning approaches, like the deep support vector data description method (Deep SVDD) (Ruff et al., 2018). Deep SVDD is an unsupervised anomaly detection method, which is inspired by kernel-based one-class classification and minimum volume estimation, and can be traced back to traditional methods, which are One-Class SVM (Schölkopf et al., 2001) and SVDD (Tax and Duin, 2004). In contrast to autoencoder approaches, Deep SVDD is based on an anomaly detection objective, rather than relying on the reconstruction error. By using Deep SVDD, a neural network is trained while minimizing the volume of a hypersphere that encloses the network representations of the data. By minimizing the objective, Deep SVDD aims to find a preferably small data-enclosing hypersphere and learns to extract the common factors of variation of the data distribution. The aim of the approach is that representations of the normal data lie inside the hypersphere, while anomalous data points lie outside the hypersphere. Thereby, anomalies can be detected based on their distance to the centroid of the hypersphere.

An extension of Deep SVDD is given by the method Improved AutoEncoder for Anomaly Detection (IAEAD) (Cheng et al., 2021) by combining Deep SVDD with autoencoders. The autoencoder is used for the embedding of the features and to preserve the local structure of the data generating distribution, whereas Deep SVDD detects anomalies in the feature space. This is achieved by adding the Deep SVDD loss as a regularization term to the original autoencoder optimization objective (i.e. the minimization of the reconstruction error). However, instead of using the reconstruction error, IAEAD uses the distance to the centroid in feature space for anomaly detection like the original Deep SVDD approach.

Another recently developed approach uses an autoencoder at the example of learning image data by minimizing the reconstruction error (defined as the loss function) (Kwon et al., 2020). The trained model is used for anomaly detection by evaluating the gradients of the reconstruction error with respect to the neural network weights. Gradients are generated through backpropagation to train neural networks by minimizing designed loss functions (Rumelhart et al., 1986). While feeding new input data into to the neural network, the gradients originating from normal data cause only slight changes with respect to the neural network weights, whereas the gradients from anomalous data cause more drastic changes. Thus, anomalies can be detected by measuring how much of the input data does not correspond to the learned information of the network in terms of the gradients.

Summary of related work and contribution

Optimizing the crystallographic texture of sheet metal has been studied in various publications. So far, classic optimization approaches exist, that operate on well-estabilished crystallographic texture representations from the field of materials science (i.e. using texture components or a spectral decomposition). In addition, machine learning-based approaches have been developed in order to efficiently guide optimization algorithms to promising regions in texture space.

Regarding microstructure optimization in general, the usage of machine learning models has become popular during the last years. Often supervised learning models are used to learn and replace time-consuming numerical simulations for propertiy prediction. Furthermore, unsupervised learning models (often PCA) are used reduce the dimensionality of complex microstructure representations. In the field of machine learning, however, more sophisticated approaches exist, such as nonlinear methods for feature extraction and MTL approaches that combine different learning tasks into one model with the advantage of having a universal latent feature space.

For the processing of optimal microstructures (which is the next step in the processing-structure-properties chain), it is advantageous to identify not only one optimal microstructure for given desired properties, but a set of nearly optimal microstructures that is as diverse as possible. Such approaches are, however, lacking in literature. A further important requirement for optimal processing not much addressed in microstructure optimization approaches is to consider the producibility of identified microstructures.

Therefore, in the present paper, we introduce a generic MTL-based optimization approach to efficiently identify sets of microstructures, which are highly diverse and guaranteed to be producible by a dedicated manufacturing process. The approach is based on an optimization algorithm interacting with a machine learning model that combines MTL with siamese neural networks (Bromley et al., 1993). In contrast to Liu et al. (2015), Paul et al. (2019) and also to Kuroda and Ikawa (2004), in our approach, a surrogate model is set up in order to replace the numerical simulation, which maps microstructures to properties. The microstructure–property mapping can be executed efficiently by means of the surrogate model within the optimization procedure.

In order to increase the efficiency of the optimization approach, the microstructure representation is transformed into a lower dimensional latent feature space by a non-linear data-driven encoder. The encoder in turn provides the input signal for three attached learning tasks of the MTL-approach. The first learning task maps the features to properties (surrogate model). To address the issue of producibility, we include a second learning task, which estimates the validity of a microstructure in the sense of being producible (being part of the region enclosed by the underlying data set). The third learning task is the decoder for the microstructure representation.

As learning takes place simultaneously for the encoder and the attached tasks, it is ensured that the lower dimensional feature space is optimal for all tasks. In addition, we enforce the latent feature space to preserve microstructure distances by employing a siamese neural network and multidimensional scaling. On this basis, we force the optimizer to find a diverse set of optimal microstructures in the latent feature space.

Methods

Materials design via siamese multi-task learning (SMTL) and optimization

General concept

First of all, we present the general concept of our MTL-based optimization approach. The approach can be applied to general materials design problems and starts by defining the desired properties and corresponding tolerances. This in turn defines a target region, for which the approach is supposed to identify a diverse set of microstructures. The approach is schematically depicted in Fig. 1 and summarized in Table 1. It basically consists of three components: optimizer, microstructure–property mapping (m-p-m) and validity-prediction (v-p). The optimizer generates candidate microstructures that minimize the combined costs, which result from evaluations based on the m-p-m and v-p components.

The m-p-m component assigns properties to a candidate microstructure. The deviation of the assigned properties to the target region determines the cost. In general, the m-p-m component can be realized by a numerical simulation. However, since numerical simulations are computationally expensive, a surrogate model is used instead. The surrogate model is realized by a regression model that learns the relations from a priori generated microstructure–property data.

Table 1 Summary of important symbols used throughout this study

Full size table

The v-p component is realized by an anomaly detection method which determines the validity of a candidate microstructure by comparing it to the set of valid microstructures. The concept of the anomaly detection is illustrated in Fig. 2. The v-p component returns a value that can be seen as an estimate of a candidate microstructure being an element of the microstructure set under consideration. This is for example the set, which can be produced by a dedicated process (e.g. rolling). The value returned by the v-p component defines the validity cost and is supposed to drive the optimizer solution to a valid microstructure region. Besides, such a microstructure region can alternatively be identified by a further optimization loop that interacts with a numerical simulation of the dedicated process, which, however, suffers from high computational costs.

The two components m-p-m and v-p can be realized by training two separate machine learning models. However, when the training procedures are isolated from each other, the models are not able to mutually access information already learned by the other model. Therefore, we combine the two components as tasks into one MTL model (Caruana, 1997). Both tasks have a common backbone (the feature extraction part of a network) and different heads (feature processing part of a network) operating on the backbone output. The backbone output vectors form the so-called latent feature space.

The proposed MTL approach furthermore uses the backbone as an encoder network of an autoencoder, where the decoder is also attached to the latent feature space with the purpose to reconstruct the input pattern of the backbone. This is achieved by adding the reconstruction of the microstructures from the latent feature space as a third task. In the MTL approach, all the three tasks are represented by a single neural network-based model. The neural weights of the model are trained simultaneously based on a combined loss function. After training the MTL model, the optimizer can operate very efficiently in the lower dimensional latent feature space.

However, it has some limitations, which are mentioned in the following. Since our concept is based on a data-driven modeling (machine learning) and optimization approach, an adequate data set is required. The described components which are learned within the concept are approximations of the numerical simulations and are accordingly not equally exact. The model quality of the components depends on the size and quality of the underlying data set. Therefore, the application of an efficient sampling strategy for exploring the microstructure and property space can be suitable (Morand et al., 2022). However, under the assumption of low model errors, the components can be efficiently used as surrogate models in the application of the concept (except for extrapolation).

The remainder of this section presents the optimization approach and the MTL approach in detail, as well as an extension based on siamese neural networks (Bromley et al., 1993) to enforce the representation of microstructures in the latent feature space to preserve the microstructure distances in the original representation space.

Multi-task learning (MTL)

The MTL model, as shown in Fig. 3, is trained on pairs of microstructures and corresponding properties $({\varvec{x}},{\varvec{p}})$. The input microstructures are transformed into latent features ${\varvec{z}}$. The individual outputs of the connected tasks are the estimated properties $\varvec{{\hat{p}}}$, the reconstructed microstructure ${\varvec{x}}^\prime $ and the reconstructed latent features ${\varvec{z}}^\prime $. In the following, we introduce the information processing scheme of the MTL model in detail.

The processing scheme starts with an encoder network which extracts significant features by mapping the microstructure space ${\varvec{x}} \in {\mathbb {R}}^K$ into a lower dimensional latent feature space ${\varvec{z}} \in {\mathbb {R}}^M$ via the learned function

$$\begin{aligned} {\varvec{z}} = f_{\textrm{enc}} ({\varvec{x}}, \varvec{\theta }_{\textrm{enc}}), \end{aligned}$$

(1)

in which the encoder network is parameterized by its weight values $\varvec{\theta }_{\textrm{enc}}$. All three previously described tasks are attached to the encoder in the form of feedforward neural networks. Besides, the encoder can be easily adapted to higher dimensional microstructure representing data types, like images (EBSD or micrograph images) or three dimensional microstructure data by using convolutional neural networks (see Krizhevsky et al. (2012)). The latter is used in Cecen et al. (2018) in the materials sciences domain for example.

To train the MTL model, a loss function that combines all the three tasks is needed. This is achieved by a function that cumulates the loss terms of the three tasks ${\mathscr {L}}_{\textrm{regr}}$ (regression loss), ${\mathscr {L}}_{\textrm{recon}}$ (reconstruction loss) and ${\mathscr {L}}_{\textrm{valid}}$ (validity loss), and weights them using ${\mathscr {W}}_{\textrm{regr}}$, ${\mathscr {W}}_{\textrm{recon}}$ and ${\mathscr {W}}_{\textrm{valid}}$ to allow for prioritization. The total loss function is defined as

$$\begin{aligned} \begin{aligned} {\mathscr {L}}_{\textrm{MTL}} = {\mathscr {W}}_{\textrm{regr}} {\mathscr {L}}_{\textrm{regr}} + {\mathscr {W}}_{\textrm{recon}} {\mathscr {L}}_{\textrm{recon}} \\ + {\mathscr {W}}_{\textrm{valid}} {\mathscr {L}}_{\textrm{valid}} + \lambda R(\varvec{\theta }), \end{aligned} \end{aligned}$$

(2)

where $R(\varvec{\theta })$ is a regularization term that is used to prevent overfitting with the hyperparameter $\lambda $ defining the strength of the regularization (also known as weight decay, see Krogh and Hertz (1991) and Hinton (1987)). Each of the feedforward neural networks is parameterized by the respective weight values $\varvec{\theta }_{\textrm{enc}}$, $\varvec{\theta }_{\textrm{regr}}$, $\varvec{\theta }_{\textrm{recon}}$ and $\varvec{\theta }_{\textrm{valid}}$, which are adjusted simultaneously during training and altogether form the weight vector $\varvec{\theta }$. In the following we introduce the three individual loss terms.

1.
The forward mapping of the latent feature vector ${\varvec{z}}$ to the properties vector $\varvec{{\hat{p}}} \in {\mathbb {R}}^N$ is represented by the learned function
$$\begin{aligned} \varvec{{\hat{p}}} = f_{\textrm{regr}} ({\varvec{z}}, \varvec{\theta }_{\textrm{regr}}) = f_{\textrm{regr}}(f_{\textrm{enc}} ({\varvec{x}}, \varvec{\theta }_{\textrm{enc}}), \varvec{\theta }_{\textrm{regr}}). \end{aligned}$$
(3)
The regression loss is given by the mean squared error between the predicted properties $\varvec{{\hat{p}}}$ and the true properties ${\varvec{p}}$:
$$\begin{aligned} {\mathscr {L}}_{\textrm{regr}} ({\varvec{p}}, \varvec{{\hat{p}}}) = \frac{1}{N} \sum _{i=1}^N ({p_i} - {{\hat{p}}_i} )^2, \end{aligned}$$
(4)
where N denotes the number of properties.
2.
The decoder network, which is responsible for the reconstruction, transforms the latent feature vectors ${\varvec{z}}$ back to the original microstructure space:
$$\begin{aligned} {\varvec{x}}^\prime = f_{\textrm{recon}}({\varvec{z}}, \varvec{\theta }_{\textrm{recon}}) = f_{\textrm{recon}}(f_{\textrm{enc}}({\varvec{x}}, \varvec{\theta }_{\textrm{enc}}), \varvec{\theta }_{\textrm{recon}}). \end{aligned}$$
(5)
The reconstruction loss is defined on the basis of a distance measure between two microstructural feature vectors $\text {dist}({\varvec{x}}, \varvec{x^\prime } )$:
$$\begin{aligned} {\mathscr {L}}_{\textrm{recon}} ({\varvec{x}}, \varvec{x^\prime }) = \text {dist}({\varvec{x}}, \varvec{x^\prime } ). \end{aligned}$$
(6)
The distance measure between depends on the microstructure representation and has to be chosen appropriately.
3.
On the basis of the latent feature space, an extra autoencoder network is set up transforming ${\varvec{z}} \in {\mathbb {R}}^M$ into an even lower-dimensional feature sub-space ${\varvec{s}} \in {\mathbb {R}}^S$ with $S<M$ and transforming back to ${\varvec{z}}^\prime \in {\mathbb {R}}^M$ via
$$\begin{aligned} \varvec{z^\prime } = f_{\textrm{valid}}({\varvec{z}}, \varvec{\theta }_{\textrm{valid}}) = f_{\textrm{valid}}(f_{\textrm{enc}}({\varvec{x}}, \varvec{\theta }_{\textrm{enc}}), \varvec{\theta }_{\textrm{valid}}). \end{aligned}$$
(7)
The validity loss is defined by the mean squared error between ${\varvec{z}}$ and ${\varvec{z}}^\prime $:
$$\begin{aligned} {\mathscr {L}}_{\textrm{valid}} ({\varvec{z}}, {\varvec{z}}^\prime ) = \frac{1}{M} \sum _{i=1}^M ({z_i} - {z_i}^\prime )^2. \end{aligned}$$
(8)

Distance preserving feature extraction using siamese neural networks

The above described MTL approach is used in combination with an optimizer that searches for candidate microstructures with desired properties in the latent feature space. However, our approach aims to identify a diverse set of microstructures with high diversity. For the diversity quantification a distance measure in the latent feature space is required. The MTL approach as defined above, is not able to preserve the distances of the original space in the latent feature space. In order to construct a distance preserving latent feature space, the MTL model is embedded in a siamese neural network (Bromley et al., 1993; Chicco, 2021), which we describe next.

Siamese neural networks consist of two identical networks, which share weights in the encoder part, see Fig. 4. Both networks embed different microstructures $\textbf{x}_L$ and $\textbf{x}_R$ as $\textbf{z}_L$ and $\textbf{z}_R$ in the latent feature space which is finally processed by two identical MTL networks. The distance preservation is enforced by defining a distance preservation loss ${\mathscr {L}}_{\textrm{pres}}$ that minimizes the difference between the distance of two different input microstructures in the original space $\text {dist}({\varvec{x}}_L, {\varvec{x}}_R)$ and the corresponding distance in the latent feature space $\text {dist}({\varvec{z}}_L, {\varvec{z}}_R)$, with ${\varvec{x}}_L \ne {\varvec{x}}_R$ (Utkin et al., 2017):

$$\begin{aligned} {\mathscr {L}}_{\textrm{pres}} = (\text {dist}({\varvec{x}}_L, {\varvec{x}}_R) - \text {dist}({\varvec{z}}_L, {\varvec{z}}_R))^2, \end{aligned}$$

(9)

while $\text {dist}({\varvec{x}}_L, {\varvec{x}}_R)$ and $ \text {dist}({\varvec{z}}_L, {\varvec{z}}_R)$ are not necessarily the same distance measures. Applying such loss terms leads to multi dimensional scaling, see Kruskal (1964) and Cox and Cox (2008). Using the distance preservation loss ${\mathscr {L}}_{\textrm{pres}}$, the MTL loss function, defined in Eq. 2, is extended by the weighted preservation loss ${\mathscr {W}}_{\textrm{pres}} {\mathscr {L}}_{\textrm{pres}}$ to

$$\begin{aligned} \begin{aligned} {\mathscr {L}}_{\textrm{SMTL}} =&\ {\mathscr {W}}_{\textrm{regr}} {\mathscr {L}}_{\textrm{regr}} + {\mathscr {W}}_{\textrm{recon}} {\mathscr {L}}_{\textrm{recon}} \\&+ {\mathscr {W}}_{\textrm{valid}} {\mathscr {L}}_{\textrm{valid}} + {\mathscr {W}}_{\textrm{pres}} {\mathscr {L}}_{\textrm{pres}} \\&+ \lambda R(\varvec{\theta }). \end{aligned} \end{aligned}$$

(10)

The SMTL approach delivers a function which can map a microstructure representation in the latent feature space on properties. Now, an optimizer can operate on a lower dimensional feature space to find microstructures with desired properties. The SMTL framework also allows to reconstruct the original represenation of microstructures, to asses the distances between them and to validate them in the latent feature space.

Microstructure optimizer

The microstructure optimization with respect to desired properties uses the distance preserving SMTL framework with the tasks microstructure–property mapping, validity-prediction and reconstruction. The optimization minimizes a loss function, which consists of the cost terms ${\mathscr {C}}_\textrm{prop}$, ${\mathscr {C}}_\textrm{valid}$ and ${\mathscr {C}}_\textrm{divers}$ and the corresponding weights ${\mathscr {V}}_\textrm{prop}$, ${\mathscr {V}}_\textrm{valid}$ and ${\mathscr {V}}_\textrm{divers}$:

$$\begin{aligned} {\mathscr {F}} = {\mathscr {V}}_\textrm{prop} {\mathscr {C}}_\textrm{prop} + {\mathscr {V}}_\textrm{valid} {\mathscr {C}}_\textrm{valid} + {\mathscr {V}}_\textrm{divers} (1+{\mathscr {C}}_\textrm{divers}). \end{aligned}$$

(11)

${\mathscr {C}}_\textrm{prop}$, ${\mathscr {C}}_\textrm{valid}$ and ${\mathscr {C}}_\textrm{divers}$ denote the property, validity and diversity cost terms, respectively. While the property cost term drives the candidate microstructures to lie inside a specified target properties region, the validity cost aims that the optimizer operates inside the region of valid microstructures and the diversity cost ensures that candidate microstructures differ from each other. To minimize the loss function we use genetic algorithms, which generate a population set of P candidate microstructures $\varvec{{\tilde{z}}}^*$ in the latent feature space in every iteration. The three cost terms are described in more detail in the following.

1.
The property cost is defined by the mean squared error between the desired properties and the predicted properties from the SMTL regression model:
$$\begin{aligned} {\mathscr {C}}_\textrm{prop} = \frac{1}{N} \sum _{i=1}^N (\widetilde{{\mathscr {C}}}_{\textrm{prop},i} )^2. \end{aligned}$$
(12)
If one of the predicted properties lies inside the target region, the cost $\widetilde{{\mathscr {C}}}_{\textrm{prop},i}$ equals 0. Otherwise, $\widetilde{{\mathscr {C}}}_{\textrm{prop},i}$ equals the minimum squared distance from the predicted properties to the target region borders.
2.
The validity prediction is used to asses whether an identified candidate microstructure is likely to be represented by the sample data set. The validity cost is defined by
$$\begin{aligned} {\mathscr {C}}_\textrm{valid} = \textrm{max}({\mathscr {A}} - \xi _\textrm{valid}, 0), \end{aligned}$$
(13)
in which $\xi _\textrm{valid}$ is a threshold to define the maximum tolerated reconstruction error for valid textures and ${\mathscr {A}}$ denotes the anomaly score
$$\begin{aligned} {\mathscr {A}} = \frac{1}{M} \sum _{i=1}^M ({z^*_i} - z^{*\prime }_i )^2. \end{aligned}$$
(14)
3.
The diversity cost is based on the sum of the distances between the candidate microstructure ${\varvec{z}}^*$ in the latent feature space and every other microstructure in the population:
$$\begin{aligned} {\mathscr {C}}_\textrm{divers} = - \sum _{i=1}^P \text {dist}({\varvec{z}}^*_i, {\varvec{z}}^*), \end{aligned}$$
(15)
in which for $\text {dist}({\varvec{z}}^*_i, {\varvec{z}}^*)$ the same distance measure has to be used as for the latent feature vectors in Eq. 9.

Materials science fundamentals

Representation of crystallographic texture

Crystallographic texture is typically described by the orientation distribution function, which is defined by

$$\begin{aligned} f(g)\textrm{d}g = \frac{\textrm{d}V}{V}, \end{aligned}$$

(16)

for an orientation g (a point in SO(3)) and the volume V(g) in SO(3). The orientation distribution function f(g) often underlies specific symmetry conditions, for which various regions in SO(3) are equivalent. Therefore, depending on the symmetries, orientations can be mapped into an elementary region of SO(3), the so-called fundamental zone. The orientation distribution function on the basis of the orientations mapped into the fundamental zone is indistinguishable from the original orientation distribution function. Rolling textures, for example, underlie a cubic crystal and an orthorhombic sample symmetry, for which 96 elementary regions exist (Hansen et al., 1978).

A popular way to represent the orientation distribution function is by approximating it via generalized spherical harmonic functions (Bunge, 2013). Yet, as there is no straightforward way to measure the distance between two orientation distribution functions in terms of generalized spherical harmonics, we make use of the orientation histogram-based texture descriptor, which is introduced in Dornheim et al. (2021). Therefore, the cubic fundamental zone is discretized into a set O of J nearly uniform distributed orientations $o_j$. For each individual orientation g in a set of orientations G, a weight vector $w_\textrm{g}$ is constructed via a soft-assignment approach

$$\begin{aligned} w_\textrm{g} = \left\{ \begin{array}{ll} \frac{\varPhi (g,o_j)}{\sum _{o_i \in N_l} \varPhi (g,o_i)}, &{} \text {if} ~~ o_j \in N_l \\ 0, &{} \text {else} \end{array}\right. , \end{aligned}$$

(17)

where $N_l$ is the set of l nearest neighbor orientations of g in terms of the orientation distance $\varPhi $. The orientation distance between two orientations g and o is defined by

$$\begin{aligned} \varPhi = \min \varPhi ({\overline{g}},{\overline{o}}). \end{aligned}$$

(18)

where ${\overline{g}}$ and ${\overline{o}}$ is from the set of all equivalent orientations of g and o in terms of cubic crystal symmetry. The orientation distance measure in SO(3) is defined as

$$\begin{aligned} \varPhi (q_\textrm{g},q_\textrm{o}) = \min (||q_\textrm{g}-q_\textrm{o}||,||q_\textrm{g}+q_\textrm{o}||), \end{aligned}$$

(19)

where $q_\textrm{g}$ and $q_\textrm{o}$ are the quaternion representations of the orientations g and o (Huynh, 2009).

On this basis, the weight vector for the orientation histogram ${\varvec{b}}$ can be calculated by a volume average of the weight vectors of the individual orientations

$$\begin{aligned} {\varvec{b}} = \frac{1}{V}\sum _{j=1}^J V(o_j)w_{o_j}. \end{aligned}$$

(20)

The distance between two orientation distribution functions can be measured via any kind of histogram-based distance measure, such as the Chi-Squared distance (Pele and Werman, 2010)

$$\begin{aligned} \chi ^2({\varvec{b}}^{(1)},{\varvec{b}}^{(2)}) = \sum _{j=1}^J \frac{(b^{(1)}_j-b^{(2)}_j)^2}{b^{(1)}_j+b^{(2)}_j}. \end{aligned}$$

(21)

The set of nearly uniform distributed orientations O, needed for the histogram-based texture descriptor, can be generated using the algorithm described in Quey et al. (2018), which is implemented in the software neper (Quey et al., 2011). For the purpose of this study, we sample 512 nearly uniform distributed orientations over the cubic fundamental zone and chose a soft assignment of $l=3$.

Crystallographic texture of steel sheets

After rolling body centered cubic (bcc) materials, typically so-called fiber textures are formed. Following (Ray et al., 1994), these textures are composed of the five fibers $\alpha $, $\gamma $, $\eta $, $\epsilon $, and $\beta $, which are defined in detail in Table 2. Among these fibers, the $\alpha $ and $\gamma $ fiber are most prominent (Kocks et al., 1998), whereas the presence of the $\beta $ fiber is only reported from theoretical predictions (Von Schlippenbach et al., 1986). To give an idea on how the fibers affect forming properties, we refer to Ray et al. (1994). Therein, it is found out that the $\gamma $ fiber causes good deep drawability and the $\alpha $ fiber has a contrary effect.

Table 2 Definition of the fibers of bcc rolling textures following Kocks et al. (1998)

Full size table

Table 3 Definition of the parameters $D_i$ of the texture model, cf. Delannay et al. (1999)

Full size table

In order to generate a data base of (artificial) rolling textures, in this work, a 25-parameter model is used, as it is proposed in Delannay et al. (1999) to describe steel sheet textures. The model is based on textures that are composed of the fibers $\alpha $, $\gamma $, and $\eta $. As the $\eta $-fiber is not always present in steel sheet textures, we limit ourselves to textures that consist of an $\alpha $ and $\gamma $ fiber. Therefore, 6 of the 25 parameters can be neglected.

The texture model describes the orientation distribution function as a set of weighted Gaussian distributions placed along the fibers. The model parameters $D_i$ are listed in Table 3 and define the standard deviations and the mean values of the distributions based on the fiber thickness and the shifts from their ideal positions. Furthermore, the model parameters define the weights of the distributions among each other based on the probability given by the orientation distribution function, what we will can fiber intensity in the following.

To construct the set of Gaussian distributions, the seven base distributions from Table 3 are placed at their ideal positions with respect to the shifts. Between these seven distributions, further distributions are placed with a distance of about $3^\circ $ to each other, leading to overall 41 Gaussians. Their weights $w_i$ and the values for the standard deviation $\sigma _i$ and mean value $\mu _i$ are interpolated linearly based on the values of the two neighboring base distributions. This yields a set of Gaussian distributions ${\mathcal {N}}_1(\mu _1,\sigma _1),..., {\mathcal {N}}_{41}(\mu _{41},\sigma _{41})$. The orientation distribution function f(g) is defined by the normalized sum of this set:

$$\begin{aligned} f(g) = \frac{1}{\sum _i w_i} \sum _{i=1}^{n} w_i {\mathcal {N}}_i(\mu _i,\sigma _i). \end{aligned}$$

(22)

Based on this definition, discrete orientations can be sampled. In the following, we denote the set of orientations as G. As f(g) is defined in the cubic-orthorhombic fundamental zone, it is necessary to add the equivalent orientations regarding the orthorhombic sample symmetry to the set of discrete orientations. This is done by applying rotation operations $g_s$ on each orientation $g_i$ in G

$$\begin{aligned} g_i^\textrm{equiv} = g_s ~ g_i. \end{aligned}$$

(23)

The rotation operations $g_s$ for orthorhombic sample symmetry can be found in Hansen et al. (1978).

Material model

The sheet metal properties which we focus on in this study are the Young’s moduli and the r-values at 0, 45 and 90 degree to rolling direction. In this study, the properties are calculated by applying uniaxial tension on a crystal plasticity-based material model. As time efficiency is essential for the generation of data, a material model of Taylor-type is implemented, as it is described in Dornheim et al. (2021).

The Taylor-type material model is based on the volume averaged stress of a set of n crystals (Kalidindi et al., 1992):

$$\begin{aligned} \varvec{{\overline{T}}} = \frac{1}{V}\sum _{i=1}^n {\varvec{T}}^{(i)}V^{(i)}. \end{aligned}$$

(24)

In the above equation, ${\varvec{T}}$ denotes the Cauchy stress tensor, which can be derived by the stress tensor in the intermediate configuration, given by

$$\begin{aligned} {\varvec{T}}^*=\frac{1}{2} ~ {\mathbb {C}}:({\varvec{F}}_\textrm{e}^T\cdot {\varvec{F}}_\textrm{e}-{\varvec{I}}), \end{aligned}$$

(25)

with the second order identity tensor ${\varvec{I}}$ and the fourth order elastic stiffness tensor ${\mathbb {C}}$. The elastic constants $C_{11}$, $C_{12}$ and $C_{44}$ are set to 218.37, 131.13 and 105.34 GPa, respectively (Eghtesad and Knezevic, 2020). ${\varvec{F}}_\textrm{e}$ is the elastic part of the deformation gradient ${\varvec{F}}$ and can be calculated by a multiplicative decomposition

$$\begin{aligned} {\varvec{F}} = {\varvec{F}}_\textrm{e} \cdot {\varvec{F}}_\textrm{p}. \end{aligned}$$

(26)

The intermediate stress tensor can be converted into Cauchy stress using the relation

$$\begin{aligned} \varvec{T^*} = {\varvec{F}}_\textrm{e}^{-1}\cdot ({\text {det}}({\varvec{F}}_\textrm{e})~{\varvec{T}})\cdot {\varvec{F}}_\textrm{e}^{-\top }. \end{aligned}$$

(27)

To describe the evolution of the plastic deformation, the plastic part of the velocity gradient ${\varvec{L}}_\textrm{p}$ is considered by

$$\begin{aligned} {\varvec{L}}_\textrm{p} = \dot{{\varvec{F}}}_\textrm{p} \cdot {\varvec{F}}_\textrm{p}^{-1}, \end{aligned}$$

(28)

and the flow rule (Rice, 1971)

$$\begin{aligned} {\varvec{L}}_\textrm{p}=\sum _\eta {\dot{\gamma }}^{(\eta )} {\varvec{m}}^{(\eta )} \otimes {\varvec{n}}^{(\eta )}, \end{aligned}$$

(29)

where ${\dot{\gamma }}^{(\eta )}$ denotes the shear rates on the active slip systems $\eta $, defined by the slip plane normal ${\varvec{n}}^{(\eta )}$ and the slip direction ${\varvec{m}}^{(\eta )}$. For bcc materials, the slip system families in terms of the Miller index are {110}<111>, {112}<111>, and {123}<111>, while the latter is neglected due to simplicity.

The shear rates are defined by a phenomenological power-law (Asaro & Needleman, 1985):

$$\begin{aligned} {\dot{\gamma }}^{(\eta )}={\dot{\gamma }}_0 \left| \frac{\tau ^{(\eta )}}{r^{(\eta )}} \right| ^{1/m}\textrm{sign}(\tau ^{(\eta )}), \end{aligned}$$

(30)

where $r^{(\eta )}$ is the slip system resistance, ${\dot{\gamma }}_0$ the reference shear rate and m the shear rate sensitivity. Here, ${\dot{\gamma }}_0$ and m are set to 0.001 $\hbox {sec}^{-1}$ and 0.0125, respectively (Pagenkopf et al., 2016). Following Schmid’s law, the resolved shear stress on slip system $\tau ^{(\eta )}$ is given by

$$\begin{aligned} \tau ^{(\eta )}=(({\varvec{F}}_\textrm{e}^T\cdot {\varvec{F}}_\textrm{e})\cdot {\varvec{T}}^*):({\varvec{m}}^{(\eta )}\otimes {\varvec{n}}^{(\eta )}), \end{aligned}$$

(31)

and the evolution of the slip system resistance is defined by

$$\begin{aligned} {\dot{r}}^{(\eta )} = \frac{\textrm{d}{\hat{\tau }}^{(\eta )}}{\textrm{d}\varGamma }\sum _\xi q_{\eta \xi }|{\dot{\gamma }}^{(\xi )}|. \end{aligned}$$

(32)

The matrix $q_{\eta \xi }$ describes the ratio between self and latent hardening. It consists of diagonal elements equal to 1.0 and off-diagonal elements $q_1$ and $q_2$, cf. Baiker et al. (2014). Both, $q_1$ and $q_2$, are set to 1.4 (Asaro and Needleman, 1985). Further, the hardening behavior is realized by an extended Voce-type model (Tome et al., 1984):

$$\begin{aligned} {\hat{\tau }}^{(\eta )}=\tau _0+(\tau _1+\vartheta _1\varGamma )(1-e^{-\varGamma \vartheta _0/\tau _1}). \end{aligned}$$

(33)

The material dependent parameters are calibrated to DC04 steel^{Footnote 1} and are $\tau _0=94.9$ MPa, $\tau _1=50$ MPa, $\vartheta _0=258$ MPa and $\vartheta _1=32.8$ MPa (Pagenkopf, 2019). The accumulated plastic shear is defined by

$$\begin{aligned} \varGamma = \int _0^t \sum _\eta \left| {\dot{\gamma }}^{(\eta )} \right| \textrm{d}t. \end{aligned}$$

(34)

Although material parameters for DC04 steel are used in this study, it is to remark that the described Taylor-type crystal plasticity model and the texture generation approach can be applied to any kind of metallic material with bcc crystal structure.

Results

Texture-property data set

For training, 50000 sets of 2000 discrete orientations are sampled via Latin hypercube design (McKay et al., 1979), based on Eq. 22. In order to have an independent test set, further 10000 sets are generated randomly. The ranges inside which the parameters of the texture model vary are adjusted manually such that typical bcc rolling textures found in literature Das (2017), Hölscher et al. (1991), Inagaki and Suda (1972), Kestens and Pirgazi (2016), Klinkenberg et al. (1992), Kocks et al. (1998), and Pagenkopf et al. (2016) are covered. The parameter ranges are listed in Table 4. In addition, to evaluate the anomaly detection, a set of artificial textures is needed, which slightly differ from the generated rolling textures. For this purpose, 10000 anomalies are generated by shifting the $\alpha $-fiber (i.e. the ideal position of $a_1$, $a_2$, $a_4$ and $a_5$) about 20 degrees in $\varphi _1$-direction.

Table 4 Parameter ranges for $D_i$

Full size table

Moreover, we validate the texture-property mapping and the validity-prediction on experimental data. For this purpose, an experimentally measured texture of cold rolled DC04 steel from Schreijäg (2012) is used. Based on this measurement, an orientation distribution function is approximated via the MATLAB toolbox mtex (Bachmann et al., 2010), rotated into its symmetry axis assuming orthorhombic sample symmetry and mirrored. To visualize the $\alpha $- and $\gamma $-fiber of the orientation distribution, an intersection plot of the Euler space at $\varphi _2=45^\circ $ is depicted in Fig. 5.

Table 5 Used hyperparamters

Full size table

Table 6 Results for varying numbers of latent features (LF) of the texture-property mapping and the distance preservation applied to the artificially generated textures and experimentally measured texture

Full size table

For the generated textures in the training and test set, the corresponding Young’s moduli and R-values in 0, 45, and 90 degree to rolling direction are determined using the Taylor-type crystal plasticity model described in “Material model” section. Both quantities, Young’s modulus and especially R-values, are highly affected by the crystallographic texture, which is why these are chosen exemplary for the purpose of this study.

Validation of SMTL

In this study, the individual tasks of the SMTL model are realized via feedforward neural networks with tanh activation functions to obtain features between $-1$ and $+1$ in the latent feature space. The SMTL model is implemented based on the Python TensorFlow API (Abadi et al., 2015). The base network of the siamese architecture is illustrated in Fig. 6. The Glorot Normal method (Glorot & Bengio, 2010) is used for weight initialization. In order to adjust the hyperparameters, a random search method (Bergstra & Bengio, 2012) is applied using 5-fold cross-validation.

The best model configuration that was found is shown in Table 5. We use the Chi-Squared distance introduced in Eq. 21 as distance measure in the input space. In the latent feature space, we use the sum of squared errors (SSE) between two vectors ${\varvec{z}}_{\textrm{1}}$ and ${\varvec{z}}_{\textrm{2}}$ as distance measure

$$\begin{aligned} \textrm{SSE} ({\varvec{z}}_{\textrm{1}}, {\varvec{z}}_{\textrm{2}}) = \sum _{i=1}^M ({\varvec{z}}_{\textrm{1},i} - {\varvec{z}}_{\textrm{2},i})^2. \end{aligned}$$

(35)

The SMTL model is trained for 200 epochs, while the best intermediate result of the test set is retained, which can be interpreted as early stopping (Prechelt, 1998). Before the model training is executed, the loss terms are scaled to values between 0 and 1 in order to make them comparable. The following weights for the scaled loss terms were based on hyper parameter optimization: ${\mathscr {W}}_\textrm{regr} = 0.05$, ${\mathscr {W}}_\textrm{recon} = 0.05$, ${\mathscr {W}}_\textrm{valid} = 0.05$ and ${\mathscr {W}}_\textrm{pres} = 0.85$.

The results for the texture-property mapping and the distance preservation are shown in Table 6, in which the regression errors $\hbox {MAE}_\textrm{E}$ and $\hbox {MAE}_\textrm{r}$ denote the mean absolute error between the true and predicted Young’s moduli and r-values depending on the dimension of the latent feature space ${\varvec{z}}$. The quality of the distance preservation is measured by the coefficient of determination $R^2$, between the distances of two input textures and their corresponding latent feature vectors

$$\begin{aligned} R^2( \chi ^2({\varvec{x}}_L, {\varvec{x}}_R), SSE({\varvec{z}}_L, {\varvec{z}}_R) ). \end{aligned}$$

(36)

It is shown that texture-property mappings with an adequate prediction quality can be achieved by extensively reducing the dimensionality of the latent feature space. However, regarding the distance preservation quality, a lower bound of at least 10 latent features can be identified, below which the distance preservation is unsatisfactory. Additionally, the texture-property mapping is evaluated on the experimentally measured texture and the corresponding properties. The results are listed in Table 6. It can be seen that a satisfactory prediction quality (Regr. $\hbox {MAE}_\textrm{E} \le 1000$ MPa and Regr. $\hbox {MAE}_\textrm{r} \le 0.1$) can only be achieved for at least 16 latent features.

On the basis of this 16-dimensional feature space, the validity-prediction is evaluated. The anomaly scores for the textures in the test set and for the artificially generated anomalies are shown in Fig. 7. It can be seen, that the anomalies can be separated in a sufficient manner from the textures in the test set.

Rolling texture identification

To validate the texture identification, we define two target regions in property space, see Fig. 8. The first one is defined by the properties of the experimentally measured texture, which lies in a sparsely populated region and is labeled as Target Region 1. As a consequence of its location in the sparsely populated region, the anomaly score of this texture is 0.0099 and lies in the transition zone shiftet towards the generated anomalies (cf. Fig. 7). It is of interest if the optimizer is generally able to find a whole set of microstructures with properties in this region. The second target region represents a densely populated region located near the center of the properties point cloud and is labeled as Target Region 2. The center of each target region is listed in Table 7. The target regions are defined by adding a tolerance of $\pm 1000$ MPa to the Young’s moduli and $\pm 0.10$ to the r-values, yielding a sufficiently small properties window from an engineering point of view. As a baseline, we collect all data points from the training set, that lie inside the target regions. In Target Region 1 only two textures can be found, whereas in Target Region 2 13 textures can be found.

Table 7 Center points of the two target regions (TR)

Full size table

To identify a diverse set of textures, we use the optimization algorithm JADE (Zhang and Sanderson, 2009), which is an extension of the differential evolution algorithm (Storn and Price, 1997). Before starting the optimization via JADE, an initial population has to be selected: Therefore, 100 textures are sampled from the test set, which are approximately uniformly distributed over the property space. For the cost function, defined in Eq. 11, we use the weights ${\mathscr {V}}_\textrm{prop}=0.90$, ${\mathscr {V}}_\textrm{valid}=0.03$ and ${\mathscr {V}}_\textrm{divers}=0.07$ and scale $C_\textrm{props}$ and $C_\textrm{divers}$ to values between 0 and 1 based on the selected 100 initial textures. The threshold $\xi _\textrm{valid}$ is set to 0.01 based on the maximum anomaly score in the data set, cf. Fig. 7. The optimization is performed for 300 iterations with a fixed population size of 100. During the optimization, all valid textures that fulfill the target properties are collected, according to the texture-property mapping. Based on the results from the previous section, we use the trained SMTL model with a 16-dimensional latent feature space.

Target region 1

Our approach is able to find a diverse set of textures that meet the property requirements of Target Region 1, according to the texture-property mapping. Figure 9 depicts the mutual distances in the latent feature space between all the found textures and between the two baseline textures. It is shown, that the set of identified textures contains 221 diverse textures in contrary to only two in the baseline set. In order to compare the results to the experimentally measured texture, the closest texture to the center point of Target Region 1 is depicted in Fig. 10 as a section through the Euler space at $\varphi _2 = 45^\circ $. By comparing the two textures, it can be seen that they are roughly the same in terms of the magnitude of the intensities and the shape of the $\alpha $- and $\gamma $-fibers. However, they also show differences in terms of smoothness and the location of the intensity peaks.

Target region 2

Compared to Target Region 1, an even more diverse set of 1315 textures can be identified for Target Region 2, which can be seen in the histogram of the mutual distances in Fig. 11. To get an idea of the differences between the textures, two exemplary textures are plotted in Fig. 12 as a section through the Euler space at $\varphi _2 = 45^\circ $. It can be seen that the $\alpha $- and $\gamma $-fiber of both textures differ significantly in terms of intensity. However, the locations of the intensity peaks and the thickness of the $\alpha $- and $\gamma $-fiber are similar.

Discussion

The results presented in “Validation of SMTL” section show that the two tasks texture-property mapping and validity-prediction are solved by the SMTL model. To achieve a sufficient prediction quality for both tasks in the test set as well as for the experimentally measured texture, a minimum dimensionality of the latent feature space is needed. Here, also the dimensionality requirements of the siamese distance preservation goal has to be considered. 16 latent features were found to be sufficient for our example task regarding the texture of cold rolled bcc steel sheets.

However, the prediction error for the experimentally measured texture is higher than for the test set using the same latent feature space dimensionality. This can be explained by the fact that the corresponding property is in a texture space region with low sampling density and the model therefore is not well supported by data. This results also in an instability of the model quality depending on the dimensionality of the latent feature space in this region. This instability can be seen by studying the r-value in Table 6. By choosing the latent feature space size of 16, also the results for the experimentally measured texture are satisfactory, especially keeping in mind that the experimentally measured texture differs naturally from the artificially generated data and additionally lies in a sparsely sampled region, cf. Target Region 1 in Fig. 8.

Due to the sparsity of Target Region 1, the identification of textures in this region is challenging. Nevertheless, the optimization approach is able to identify a set of textures that contains more diverse individuals compared to the two baseline textures from the training set. Regarding the identified texture, which is closest to the experimentally measured texture in terms of properties, one can see that they are also similar in terms of crystallographic texture, what basically proofs the concept of our approach.

The most obvious difference between both textures is smoothness. The irregular distribution of intensity peaks of the identified texture is due to the resolution of the histogram-based texture descriptor. Also, the orthorhombic sample symmetry is not represented locally. However, by increasing the resolution, these two issues can be solved. Furthermore, a higher resolution of the descriptor decreases the descriptor error, which reflects the deviation between the properties of the original texture and the properties of the texture described by the descriptor. However, the choice of resolution is a trade-off between accuracy and descriptor complexity. Generally, with the use of the SMTL model and the incorporated feature extraction, the resolution is limited only by computational power.

Compared to Target Region 1, the identification task for Target Region 2 seems to be less challenging as the target region is located in a densely sampled region. However, as there already exists a proper set of diverse textures in the baseline, the main challenge is to outperform the baseline set in terms of diversity. Figure 11 shows that the materials design problem (the identification of multiple equivalent microstructures/ textures) is accomplished by the optimization approach. This is exemplary shown when comparing two of the identified textures in Fig. 12 with each other: similar properties can be reached by different microstructures. The identification of such a highly diverse set of microstructures with similar properties is an important precondition to construct robust optimizing process control algorithms, which need to choose among multiple optimal paths leading to desired properties.

Summary and outlook

In this work, we present an approach to solve materials design problems. The approach is based on an optimization strategy that incorporates machine learning models for mapping microstructures to properties and for assessing the validity of input microstructures in the sense of the likeliness with the underlying data. To model these tasks, we use a siamese multi-task learning (SMTL) neural network model. Furthermore, we incorporate feature extraction in order to transform input microstructures to a lower dimensional latent feature space, in which an optimizer (identifying microstructures with dedicated properties) can operate efficiently.

By training the SMTL model with a dedicated loss function term, we are able to preserve the distances between microstructures in the original input space also in the latent feature space. The distance preservation allows to directly assess the diversity of the solution set (found by the optimizer) directly in the latent feature space and therefore enables optimizers to efficiently identify sets of diverse microstructures. By applying the approach to crystallographic texture optimization, we show the ability to identify diverse sets of textures that lie within given properties bounds. Such sets of textures form the input of optimal processing control approaches like in Dornheim et al. (2021).

In the present work, we applied our approach on data from mean-field simulations. The next step is to apply the approach on spatially resolved data from full-field simulations. The proposed methods can be easily extended for this task by modifying the encoder part of the SMTL model. However, the problem arises that typically fewer data can be generated via full-field simulations. Nevertheless, such sparse high quality data can be used to support the modeling with lower quality data. Concepts to incorporate multi-fidelity data (Batra, 2021) in our SMTL model will be considered in the future.

Data availability

The data used to validate the SMTL approach is made available via the Fraunhofer repository Fordatis at https://fordatis.fraunhofer.de/handle/fordatis/204Morand et al. (2021).

Notes

Experiments performed at IUL Dortmund during DFG project Graduate School 1483 (Pagenkopf, 2019).

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro,C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., & Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. White paper.
Adams, B. L., Henrie, A., Henrie, B., Lyon, M., Kalidindi, S., & Garmestani, H. (2001). Microstructure-sensitive design of a compliant beam. Journal of the Mechanics and Physics of Solids, 49(8), 1639–1663.
Google Scholar
Asaro, R. J., & Needleman, A. (1985). Overview no. 42 texture development and strain hardening in rate dependent polycrystals. Acta Metallurgica, 33(6), 923–953.
Google Scholar
Bachmann, F., Hielscher, R., & Schaeben, H. (2010). Texture analysis with mtex - free and open source software toolbox. Solid State Phenomena, 160, 63–68.
Google Scholar
Baiker, M., Helm, D., & Butz, A. (2014). Determination of mechanical properties of polycrystals by using crystal plasticity and numerical homogenization schemes. Steel Research International, 85(6), 988–998.
Google Scholar
Batra, R. (2021). Accurate machine learning in materials science facilitated by using diverse data sources. Nature, 589.
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(10), 281–305.
Google Scholar
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., & Shah, R. (1993). Signature verification using a Siamese time delay neural network. Advances in Neural Information Processing Systems, 6, 737–744.
Google Scholar
Bunge, H.-J. (2013). Texture analysis in materials science: Mathematical methods. Burlington: Elsevier Science.
Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75.
Google Scholar
Cecen, A., Dai, H., Yabansu, Y. C., Kalidindi, S. R., & Song, L. (2018). Material structure-property linkages using three-dimensional convolutional neural networks. Acta Materialia, 146, 76–84.
Google Scholar
Chalapathy, R., & Chawla, S. (2019). Deep learning for anomaly detection: A survey. arXiv:1901.03407
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys (CSUR), 41, 3.
Google Scholar
Cheng, Z., Wang, S., Zhang, P., Wang, S., Liu, X., & Zhu, E. (2021). Improved autoencoder for unsupervised anomaly detection. International Journal of Intelligent Systems, 36, 7103–7125.
Google Scholar
Chicco, D. (2021). Siamese neural networks: An overview. Artificial Neural Networks, 73–94.
Cox, M. A., & Cox, T. F. (2008). Multidimensional scaling. In Handbook of data visualization (pp. 315–347). Springer.
Das, A. (2017). Calculation of crystallographic texture of bcc steels during cold rolling. Journal of Materials Engineering and Performance, 26(6), 2708–2720.
Google Scholar
Delannay, L., Van Houtte, P., & Van Bael, A. (1999). New parameter model for texture description in steel sheets. Texture, Stress, and Microstructure, 31(3), 151–175.
Google Scholar
Dornheim, J., Morand, L., Zeitvogel, S., Iraki, T., Link, N., & Helm, D. (2021). Deep reinforcement learning methods for structure-guided processing path optimization. Journal of Intelligent Manufacturing
Eghtesad, A., & Knezevic, M. (2020). High-performance full-field crystal plasticity with dislocation-based hardening and slip system back-stress laws: Application to modeling deformation of dual-phase steels. Journal of the Mechanics and Physics of Solids, 134, 103750.
Google Scholar
Fullwood, D. T., Niezgoda, S. R., Adams, B. L., & Kalidindi, S. R. (2010). Microstructure sensitive design for performance optimization. Progress in Materials Science, 55(6), 477–562.
Google Scholar
Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th international conference on artificial intelligence and statistics (pp. 249–256). JMLR Workshop and Conference Proceedings
Goldberg, D. (1991). Real-coded genetic algorithms, virtual alphabets and blocking. Complex Systems,5.
Gupta, A., Cecen, A., Goyal, S., Singh, A. K., & Kalidindi, S. R. (2015). Structure-property linkages using a data science approach: Application to a non-metallic inclusion/steel composite system. Acta Materialia, 91, 239–254.
Google Scholar
Hansen, J., Pospiech, J., & Lücke, K. (1978). Tables for texture analysis of cubic crystals. Springer.
Herrera, F., Lozano, M., & Verdegay, J. L. (1998). Tackling real-coded genetic algorithms: Operators and tools for behavioural analysis. Artificial Intelligence Review, 12(4), 265–319.
Google Scholar
Hinton, G. E. (1987). Learning translation invariant recognition in a massively parallel networks. In International conference on parallel architectures and languages Europe (pp. 1–13). Springer.
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.
Google Scholar
Hölscher, M., Raabe, D., & Lücke, K. (1991). Rolling and recrystallization textures of bcc steels. Steel Research, 62(12), 567–575.
Google Scholar
Huynh, D. Q. (2009). Metrics for 3D rotations: Comparison and analysis. Journal of Mathematical Imaging and Vision, 35(2), 155–164.
Google Scholar
Inagaki, H., & Suda, T. (1972). The development of rolling textures in low-carbon steels. Texture, Stress, and Microstructure, 1(2), 129–140.
Google Scholar
Jung, J., Yoon, J. I., Park, H. K., Jo, H., & Kim, H. S. (2020). Microstructure design using machine learning generated low dimensional and continuous design space. Materialia, 11, 100690.
Google Scholar
Jung, J., Yoon, J. I., Park, H. K., Kim, J. Y., & Kim, H. S. (2019). An efficient machine learning approach to establish structure-property linkages. Computational Materials Science, 156, 17–25.
Google Scholar
Jung, J., Yoon, J. I., Park, S.-J., Kang, J.-Y., Kim, G. L., Song, Y. H., Park, S. T., Oh, K. W., & Kim, H. S. (2019). Modelling feasibility constraints for materials design: Application to inverse crystallographic texture problem. Computational Materials Science, 156, 361–367.
Google Scholar
Kalidindi, S. R., Bronkhorst, C. A., & Anand, L. (1992). Crystallographic texture evolution in bulk deformation processing of FCC metals. Journal of the Mechanics and Physics of Solids, 40(3), 537–569.
Google Scholar
Kalidindi, S. R., Houskamp, J. R., Lyons, M., & Adams, B. L. (2004). Microstructure sensitive design of an orthotropic plate subjected to tensile load. International Journal of Plasticity, 20(8–9), 1561–1575.
Google Scholar
Kamijyo, R., Ishii, A., Coppieters, S., & Yamanaka, A. (2022). Bayesian texture optimization using deep neural network-based numerical material test. International Journal of Mechanical Sciences, 223, 107285.
Google Scholar
Kestens, L., & Pirgazi, H. (2016). Texture formation in metal alloys with cubic crystal structures. Materials Science and Technology, 32(13), 1303–1315.
Google Scholar
Kingma, D. P. & Ba, J. (2015). Adam: A method for stochastic optimization. In: 3rd international conference on learning representations
Klinkenberg, C., Raabe, D., & Lücke, K. (1992). Influence of volume fraction and dispersion rate of grain-boundary cementite on the cold-rolling textures of low-carbon steel. Steel Research, 63(6), 263–269.
Google Scholar
Kocks, U. F., Tomé, C. N., & Wenk, H.-R. (1998). Texture and anisotropy: Preferred orientations in polycrystals and their effect on materials properties. Cambridge University Press.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1106–1114.
Google Scholar
Krogh, A., & Hertz, J. A. (1991). A simple weight decay can improve generalization. Advances in Neural Information Processing Systems, 4, 950–995.
Google Scholar
Kruskal, J. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29, 1–27.
Google Scholar
Kuroda, M., & Ikawa, S. (2004). Texture optimization of rolled aluminum alloy sheets using a genetic algorithm. Materials Science and Engineering: A, 385(1–2), 235–244.
Google Scholar
Kwon, G., Prabhushankar, M., Temel, D., & AlRegib, G. (2020). Backpropagated gradient representations for anomaly detection. In: European conference on computer vision
Liu, R., Kumar, A., Chen, Z., Agrawal, A., Sundararaghavan, V., & Choudhary, A. (2015). A predictive machine learning approach for microstructure optimization and materials design. Scientific Reports, 5(1), 1–12.
Google Scholar
Mann, A., & Kalidindi, S. R. (2022). Development of a robust CNN model for capturing microstructure-property linkages and building property closures supporting material design. In Frontiers in materials
McDowell, D. L. (2007). Simulation-assisted materials design for the concurrent design of materials and products. JOM, 59(9), 21–25.
Google Scholar
McKay, M. D., Beckman, R. J., & Conover, W. J. (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2), 239.
Google Scholar
Morand, L., Iraki, T., Dornheim, J., Pagenkopf, J., & Helm, D. (2021). Artificially generated crystallographic textures of steel sheets and their corresponding properties calculated by a Taylor-type crystal plasticity model. Retrieved from https://fordatis.fraunhofer.de/handle/fordatis/204
Morand, L., Link, N., Iraki, T., Dornheim, J., & Helm, D. (2022). Efficient exploration of microstructure-property spaces via active learning. Frontiers in Materials, 8, 824441. https://doi.org/10.3389/fmats
Article Google Scholar
Olson, G. B. (1997). Computational design of hierarchically structured materials. Science, 277(5330), 1237–1242.
Google Scholar
Pagenkopf, J. (2019). Bestimmung der Plastischen Anisotropie von Blechwerkstoffen durch ortsaufgelöste Simulationen auf Gefügeebene. PhD thesis, Fakultät für Maschinenbau des Karlsruher Instituts für Technologie (KIT).
Pagenkopf, J., Butz, A., Wenk, M., & Helm, D. (2016). Virtual testing of dual-phase steels: Effect of martensite morphology on plastic flow behavior. Materials Science and Engineering A, 674, 672–686.
Google Scholar
Panchal, J. H., Kalidindi, S. R., & McDowell, D. L. (2013). Key computational modeling issues in integrated computational materials engineering. Computer-Aided Design, 45(1), 4–25.
Google Scholar
Paul, A., Acar, P., Liao, W.-K., Choudhary, A., Sundararaghavan, V., & Agrawal, A. (2019). Microstructure optimization with constrained design objectives using machine learning-based feedback-aware data-generation. Computational Materials Science, 160, 334–351.
Google Scholar
Paulson, N. H., Priddy, M. W., McDowell, D. L., & Kalidindi, S. R. (2017). Reduced-order structure-property linkages for polycrystalline microstructures based on 2-point statistics. Acta Materialia, 129, 428–438.
Google Scholar
Pele, O., & Werman, M. (2010). The quadratic-chi histogram distance family. In European conference on computer vision (pp. 749–762). Springer.
Prechelt, L. (1998). Early stopping-but when?. In Neural networks: Tricks of the trade (pp. 55–69). Springer
Quey, R., Dawson, P., & Barbe, F. (2011). Large-scale 3d random polycrystals for the finite element method: Generation, meshing and remeshing. Computer Methods in Applied Mechanics and Engineering, 200(17–20), 1729–1745.
Google Scholar
Quey, R., Villani, A., & Maurice, C. (2018). Nearly uniform sampling of crystal orientations. Journal of Applied Crystallography, 51(4), 1162–1173.
Google Scholar
Ramprasad, R., Batra, R., Pilania, G., Mannodi-Kanakkithodi, A., & Kim, C. (2017). Machine learning in materials informatics: Recent applications and prospects. NPJ Computational Materials, 3(1), 1–13.
Google Scholar
Ray, R., Jonas, J. J., & Hook, R. (1994). Cold rolling and annealing textures in low carbon and extra low carbon steels. International Materials Reviews, 39(4), 129–172.
Google Scholar
Rice, J. R. (1971). Inelastic constitutive relations for solids: An internal-variable theory and its application to metal plasticity. Journal of the Mechanics and Physics of Solids, 19(6), 433–455.
Google Scholar
Ruff, L., Görnitz, N., Deecke, L., Siddiqui, S. A., Vandermeulen, R. A., Binder, A., Müller, E., & Kloft, M. (2018). Deep one-class classification. In International Conference on Machine Learning.,
Ruff, L., Kauffmann, J. R., Vandermeulen, R. A., Montavon, G., Samek, W., Kloft, M., Dietterich, T. G., & Müller, K.-R. (2021). A unifying review of deep and shallow anomaly detection. Proceedings of the IEEE, 109(5), 756–795.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.
Google Scholar
Sakurada, M., & Yairi, T. (2014). Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis (pp. 4–11).
Schölkopf, B., Platt, J. C., Shawe-Taylor, J. C., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13, 1443–1471.
Google Scholar
Schreijäg, S. (2012). Microstructure and mechanical behavior of deep drawing DC04 steel at different length scales. PhD thesis, Fakultät für Maschinenbau des Karlsruher Instituts für Technologie (KIT).
Simpson, T. W., Poplinski, J., Koch, P. N., & Allen, J. K. (2001). Metamodels for computer-based engineering design: Survey and recommendations. Engineering with Computers, 17(2), 129–150.
Google Scholar
Storn, R., & Price, K. (1997). Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11(4), 341–359.
Google Scholar
Tan, R. K., Zhang, N. L., & Ye, W. (2020). A deep learning-based method for the design of microstructural materials. Structural and Multidisciplinary Optimization, 61, 1417–1438.
Google Scholar
Tax, D. M. J., & Duin, R. P. W. (2004). Support vector data description. Machine Learning, 54, 45–66.
Google Scholar
Tome, C., Canova, G. R., Kocks, U. F., Christodoulou, N., & Jonas, J. J. (1984). The relation between macroscopic and microscopic strain hardening in f.c.c. polycrystals. Acta Metallurgica, 32(10), 1637–1653.
Google Scholar
Utkin, L. V., Zaborovsky, V. S., Lukashin, A. A., Popov, S. G., & Podolskaja, A. V. (2017). A Siamese autoencoder preserving distances for anomaly detection in multi-robot systems. In 2017 international conference on control, artificial intelligence, robotics & optimization (ICCAIRO) (pp. 39–44). IEEE.
Van Der Maaten, L., Postma, E., Van den Herik, J., et al. (2009). Dimensionality reduction: A comparative. Journal of Machine Learning Research, 10(66–71), 13.
Google Scholar
Von Schlippenbach, U., Emren, F., & Lücke, K. (1986). Investigation of the development of the cold rolling texture in deep drawing steels by ODF-analysis. Acta Metallurgica, 34(7), 1289–1301.
Google Scholar
Zhang, J., & Sanderson, A. C. (2009). Jade: Adaptive differential evolution with optional external archive. IEEE Transactions on Evolutionary Computation, 13(5), 945–958.
Google Scholar

Download references

Acknowledgements

The authors would like to thank the German Research Foundation (DFG) for funding the presented work, which was carried out within the research Project Number 415804944: ‘Taylored Material Properties via Microstructure Optimization: Machine Learning for Modelling and Inversion of Structure-Property-Relationships and the Application to Sheet Metals’. Also, we would like to thank Jan Pagenkopf for providing the crystal plasticity routine on which the implemented Taylor-type material model is based.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Intelligent Systems Research Group (ISRG), Karlsruhe University of Applied Sciences, Moltkestr. 30, 76133, Karlsruhe, Germany
Tarek Iraki & Norbert Link
Fraunhofer Institute for Mechanics of Materials IWM, Wöhlerstraße 11, 79108, Freiburg, Germany
Lukas Morand & Dirk Helm
Institute for Applied Mechanics - Computational Materials Sciences IAM-CMS, Karlsruhe Institute of Technology, Karlsruhe, Germany
Johannes Dornheim

Authors

Tarek Iraki
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Morand
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Dornheim
View author publications
You can also search for this author in PubMed Google Scholar
Norbert Link
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Helm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tarek Iraki.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Iraki, T., Morand, L., Dornheim, J. et al. A multi-task learning-based optimization approach for finding diverse sets of microstructures with desired properties. J Intell Manuf 35, 1887–1903 (2024). https://doi.org/10.1007/s10845-023-02139-8

Download citation

Received: 21 December 2021
Accepted: 25 April 2023
Published: 26 May 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s10845-023-02139-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A multi-task learning-based optimization approach for finding diverse sets of microstructures with desired properties

Abstract

Similar content being viewed by others

A predictive machine learning approach for microstructure optimization and materials design

A Review of Application of Machine Learning in Design, Synthesis, and Characterization of Metal Matrix Composites: Current Status and Emerging Applications

Self-supervised optimization of random material microstructures in the small-data regime

Introduction

Motivation

Paper structure

Related work

Summary of related work and contribution

Methods

Materials design via siamese multi-task learning (SMTL) and optimization

General concept

Multi-task learning (MTL)

Distance preserving feature extraction using siamese neural networks

Microstructure optimizer

Materials science fundamentals

Representation of crystallographic texture

Crystallographic texture of steel sheets

Material model

Results

Texture-property data set

Validation of SMTL

Rolling texture identification

Target region 1

Target region 2

Discussion

Summary and outlook

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation