Introduction

Motivation

The demand for more and more specific and individually designed products with certain performance requirements has become a driving force in the world of manufacturing. For this reason, the optimization along the causal chain processing-structure-properties-performance (Olson, 1997) became a fast growing research topic in the field of integrated computational materials engineering (ICME) (Panchal et al., 2013). Nowadays, such optimization problems can be solved efficiently with the help of machine learning techniques (Ramprasad et al., 2017). On this background, in a previous work, we investigated the use of reinforcement learning for finding optimal processing routes in a simulated metal forming process aiming to produce microstructures with targeted crystallographic textures (Dornheim et al., 2021). To bridge the remaining gap between microstructures and desired properties, we focus in this work on solving materials design problems. These are to identify appropriate material microstructures or microstructural features (e.g. the crystallographic texture) for given desired properties. It is thereby of particular importance to identify sets of near-optimal and preferably diverse microstructures in order to guarantee a robust design (McDowell, 2007).

Paper structure

In the following we summarize the related work and point out the contribution of this paper to the actual research. In “Methods” section, first, we describe the siamese multi-task learning and optimization approach. Then, we introduce the fundamentals in materials modeling that are needed for the purpose of this work. After that, in “Results” section, the results are shown when applying the approach to a texture optimization problem for steel sheets (in particular, we fit the material model to DC04 steel). In “Discussion” section, the presented results are discussed. Finally, in “Summary and Outlook” section, we summarize our findings and give an outlook on further research.

Related work

A generic approach to solve materials design problems is the microstructure sensitive design (MSD) approach introduced in Adams et al. (2001). Following Fullwood et al. (2010), MSD can be described by seven steps. First, the properties of interest as well as candidate materials have to be defined. After that, a suitable microstructure definition is applied for these materials yielding a microstructure design space. On this basis, relevant homogenization relations are identified and applied over the whole design space. The resulting property closure can be used to select desired properties, which are then mapped back to the microstructure design space in order to identify optimal microstructures. The last step of MSD aims to determine processes and processing routes needed to produce the identified microstructure.

The works by Adams et al. (2001) and Kalidindi et al. (2004) instantiate the MSD approach for texture optimization. The first one describes how optimal crystallographic textures can be identified in order to improve the deformation behavior of a compliant beam. In the latter, a similar approach is shown to optimize the crystallographic texture for the design of an orthotropic plate. The core of both approaches lies in the usage of a lower dimensional spectral representation of the orientation distribution, cf. Bunge (2013). For more complex microstructure representations, like two-point correlations, feature extraction methods can be applied to reduce the dimensionality. Methods that are used in the context of materials design are principal component analysis (PCA) (Paulson et al., 2017; Gupta et al., 2015) and multidimensional scaling (Jung et al., 2019) for example. A general review of dimensionality reduction techniques can be found in Van Der Maaten et al. (2009).

Besides the MSD approach, also machine learning-based approaches for crystallographic texture optimization exist. Liu et al. (2015) and Paul et al. (2019) describe iterative sampling approaches that interact with crystal plasticity simulations aiming to identify crystallographic textures for given desired properties. Therefore, an initial set of texture-property tuples (crystallographic textures and corresponding macroscopic properties) is generated. Via supervised learning, significant features of the parameterized orientation distribution (and in Liu et al. (2015) also significant regions) are identified that yield optimal or near-optimal solutions. Based on the identified features and regions, new texture-property data points are sampled in order to get closer to the optima.

Another approach for identifying optimal textures is described in Kuroda and Ikawa (2004). Therein, a real-coded genetic algorithm (Goldberg, 1991) is described that interacts with a crystal plasticity model in order to find optimal combinations of typical rolling texture components of face-centered cubic metal (Cu, Brass, S, Cube and Goss) for given desired properties. The algorithm starts with an initial set of textures consisting of different fractions of these components. The set of textures evolves iteratively by combining them using operators such as mutation, crossover and selection (Herrera et al., 1998).

Recent works (i.e., Jung et al. (2020) and Kamijyo et al. (2022)) use Bayesian optimization for microstructure design. In Kamijyo et al. (2022), a deep neural network is used for the estimation of mechanical properties. On this basis, Bayesian optimization is used to determine optimal volume fractions of texture components of aluminum (cf. Kuroda and Ikawa (2004)) for a desired formability. For designing complex microstructure models, in Jung et al. (2020), the use of the latent space of a convolutional autoencoder as a low dimensional design space is proposed. Within this design space, Bayesian optimization is adopted to search for optimal dual phase microstructures for given desired properties (i.e, tensile strength).

Predicting dual phase microstructure properties using convolutional neural networks (CNN) is also used in Mann and Kalidindi (2022), however, to explore the properties space defined by the material stiffness. The CNN architecture was developed for approximating the highly non-linear microstructure–property linkage, while using also two-point spatial correlation functions of the microstructure as input.

A further convolutional approach is described in Tan et al. (2020), in which a deep convolutional generative adversarial network (DCGAN) and a CNN is proposed for the design of materials. Therein, the CNN links the micostructures to its properties and acts as a surrogate model, whereas the DCGAN generates design candidates for a desired compliance tensor.

Summarized, for the solution of microstructure design problems, a linkage from properties to microstructures is required. Such a linkage is often achieved by genetic or optimization algorithms that interact with numerical simulations. However, as these algorithms generally need a lot of function evaluations, it is not reasonable to apply them to complex numerical simulations directly. Instead, the performance can be increased by using numerically simpler surrogate models, see for example (Simpson et al., 2001). Typically, these are supervised learning models that learn the input–output relations of the numerical simulation under consideration.

To run optimization algorithms in combination with supervised learning models it is necessary to limit the region in which they operate to the region, which is covered by the training data. One way to achieve this is by training unsupervised learning models on the input data, as it is done in Jung et al. (2019) for example using support vector machines (SVM). From a machine learning perspective such an approach can be seen as anomaly detection. Anomaly detection aims to separate data that is characteristically different from the known data of the sample data set, which has been used for training. An extensive overview of anomaly detection methods is given in Chandola et al. (2009). Moreover, Ruff et al. (2021) and Chalapathy and Chawla (2019) gives an overview on recent deep learning-based approaches for anomaly detection, from which we want to point out neural network-based autoencoders (Hinton & Salakhutdinov, 2006), which fit especially well into multi-task learning (MTL) (Caruana, 1997) schemes other than SVMs.

Autoencoder approaches assume that features of a data set can be mapped into a lower dimensional latent feature space, in which the known data points differ substantially from unknown data points. By backmapping into the original space, anomalies can be identified by evaluating the reconstruction error, see for example Sakurada and Yairi (2014). In Sakurada and Yairi (2014) it is also shown that autoencoder networks are able to detect subtle anomalies, which cannot be detected by linear methods like PCA. Furthermore, autoencoder networks require less complex computations compared to a nonlinear kernel-based PCA.

Recent developments in anomalie detection include deep learning approaches, like the deep support vector data description method (Deep SVDD) (Ruff et al., 2018). Deep SVDD is an unsupervised anomaly detection method, which is inspired by kernel-based one-class classification and minimum volume estimation, and can be traced back to traditional methods, which are One-Class SVM (Schölkopf et al., 2001) and SVDD (Tax and Duin, 2004). In contrast to autoencoder approaches, Deep SVDD is based on an anomaly detection objective, rather than relying on the reconstruction error. By using Deep SVDD, a neural network is trained while minimizing the volume of a hypersphere that encloses the network representations of the data. By minimizing the objective, Deep SVDD aims to find a preferably small data-enclosing hypersphere and learns to extract the common factors of variation of the data distribution. The aim of the approach is that representations of the normal data lie inside the hypersphere, while anomalous data points lie outside the hypersphere. Thereby, anomalies can be detected based on their distance to the centroid of the hypersphere.

An extension of Deep SVDD is given by the method Improved AutoEncoder for Anomaly Detection (IAEAD) (Cheng et al., 2021) by combining Deep SVDD with autoencoders. The autoencoder is used for the embedding of the features and to preserve the local structure of the data generating distribution, whereas Deep SVDD detects anomalies in the feature space. This is achieved by adding the Deep SVDD loss as a regularization term to the original autoencoder optimization objective (i.e. the minimization of the reconstruction error). However, instead of using the reconstruction error, IAEAD uses the distance to the centroid in feature space for anomaly detection like the original Deep SVDD approach.

Another recently developed approach uses an autoencoder at the example of learning image data by minimizing the reconstruction error (defined as the loss function) (Kwon et al., 2020). The trained model is used for anomaly detection by evaluating the gradients of the reconstruction error with respect to the neural network weights. Gradients are generated through backpropagation to train neural networks by minimizing designed loss functions (Rumelhart et al., 1986). While feeding new input data into to the neural network, the gradients originating from normal data cause only slight changes with respect to the neural network weights, whereas the gradients from anomalous data cause more drastic changes. Thus, anomalies can be detected by measuring how much of the input data does not correspond to the learned information of the network in terms of the gradients.

Summary of related work and contribution

Optimizing the crystallographic texture of sheet metal has been studied in various publications. So far, classic optimization approaches exist, that operate on well-estabilished crystallographic texture representations from the field of materials science (i.e. using texture components or a spectral decomposition). In addition, machine learning-based approaches have been developed in order to efficiently guide optimization algorithms to promising regions in texture space.

Regarding microstructure optimization in general, the usage of machine learning models has become popular during the last years. Often supervised learning models are used to learn and replace time-consuming numerical simulations for propertiy prediction. Furthermore, unsupervised learning models (often PCA) are used reduce the dimensionality of complex microstructure representations. In the field of machine learning, however, more sophisticated approaches exist, such as nonlinear methods for feature extraction and MTL approaches that combine different learning tasks into one model with the advantage of having a universal latent feature space.

For the processing of optimal microstructures (which is the next step in the processing-structure-properties chain), it is advantageous to identify not only one optimal microstructure for given desired properties, but a set of nearly optimal microstructures that is as diverse as possible. Such approaches are, however, lacking in literature. A further important requirement for optimal processing not much addressed in microstructure optimization approaches is to consider the producibility of identified microstructures.

Therefore, in the present paper, we introduce a generic MTL-based optimization approach to efficiently identify sets of microstructures, which are highly diverse and guaranteed to be producible by a dedicated manufacturing process. The approach is based on an optimization algorithm interacting with a machine learning model that combines MTL with siamese neural networks (Bromley et al., 1993). In contrast to Liu et al. (2015), Paul et al. (2019) and also to Kuroda and Ikawa (2004), in our approach, a surrogate model is set up in order to replace the numerical simulation, which maps microstructures to properties. The microstructure–property mapping can be executed efficiently by means of the surrogate model within the optimization procedure.

In order to increase the efficiency of the optimization approach, the microstructure representation is transformed into a lower dimensional latent feature space by a non-linear data-driven encoder. The encoder in turn provides the input signal for three attached learning tasks of the MTL-approach. The first learning task maps the features to properties (surrogate model). To address the issue of producibility, we include a second learning task, which estimates the validity of a microstructure in the sense of being producible (being part of the region enclosed by the underlying data set). The third learning task is the decoder for the microstructure representation.

As learning takes place simultaneously for the encoder and the attached tasks, it is ensured that the lower dimensional feature space is optimal for all tasks. In addition, we enforce the latent feature space to preserve microstructure distances by employing a siamese neural network and multidimensional scaling. On this basis, we force the optimizer to find a diverse set of optimal microstructures in the latent feature space.

Methods

Materials design via siamese multi-task learning (SMTL) and optimization

General concept

First of all, we present the general concept of our MTL-based optimization approach. The approach can be applied to general materials design problems and starts by defining the desired properties and corresponding tolerances. This in turn defines a target region, for which the approach is supposed to identify a diverse set of microstructures. The approach is schematically depicted in Fig. 1 and summarized in Table 1. It basically consists of three components: optimizer, microstructure–property mapping (m-p-m) and validity-prediction (v-p). The optimizer generates candidate microstructures that minimize the combined costs, which result from evaluations based on the m-p-m and v-p components.

Fig. 1
figure 1

General concept of the MTL-based optimization approach to solve materials design problems

The m-p-m component assigns properties to a candidate microstructure. The deviation of the assigned properties to the target region determines the cost. In general, the m-p-m component can be realized by a numerical simulation. However, since numerical simulations are computationally expensive, a surrogate model is used instead. The surrogate model is realized by a regression model that learns the relations from a priori generated microstructure–property data.

Table 1 Summary of important symbols used throughout this study

The v-p component is realized by an anomaly detection method which determines the validity of a candidate microstructure by comparing it to the set of valid microstructures. The concept of the anomaly detection is illustrated in Fig. 2. The v-p component returns a value that can be seen as an estimate of a candidate microstructure being an element of the microstructure set under consideration. This is for example the set, which can be produced by a dedicated process (e.g. rolling). The value returned by the v-p component defines the validity cost and is supposed to drive the optimizer solution to a valid microstructure region. Besides, such a microstructure region can alternatively be identified by a further optimization loop that interacts with a numerical simulation of the dedicated process, which, however, suffers from high computational costs.

Fig. 2
figure 2

Schematic illustration of the structure space \(s_1, s_2\) including the valid region and unknown microstructures

The two components m-p-m and v-p can be realized by training two separate machine learning models. However, when the training procedures are isolated from each other, the models are not able to mutually access information already learned by the other model. Therefore, we combine the two components as tasks into one MTL model (Caruana, 1997). Both tasks have a common backbone (the feature extraction part of a network) and different heads (feature processing part of a network) operating on the backbone output. The backbone output vectors form the so-called latent feature space.

The proposed MTL approach furthermore uses the backbone as an encoder network of an autoencoder, where the decoder is also attached to the latent feature space with the purpose to reconstruct the input pattern of the backbone. This is achieved by adding the reconstruction of the microstructures from the latent feature space as a third task. In the MTL approach, all the three tasks are represented by a single neural network-based model. The neural weights of the model are trained simultaneously based on a combined loss function. After training the MTL model, the optimizer can operate very efficiently in the lower dimensional latent feature space.

However, it has some limitations, which are mentioned in the following. Since our concept is based on a data-driven modeling (machine learning) and optimization approach, an adequate data set is required. The described components which are learned within the concept are approximations of the numerical simulations and are accordingly not equally exact. The model quality of the components depends on the size and quality of the underlying data set. Therefore, the application of an efficient sampling strategy for exploring the microstructure and property space can be suitable (Morand et al., 2022). However, under the assumption of low model errors, the components can be efficiently used as surrogate models in the application of the concept (except for extrapolation).

The remainder of this section presents the optimization approach and the MTL approach in detail, as well as an extension based on siamese neural networks (Bromley et al., 1993) to enforce the representation of microstructures in the latent feature space to preserve the microstructure distances in the original representation space.

Multi-task learning (MTL)

The MTL model, as shown in Fig. 3, is trained on pairs of microstructures and corresponding properties \(({\varvec{x}},{\varvec{p}})\). The input microstructures are transformed into latent features \({\varvec{z}}\). The individual outputs of the connected tasks are the estimated properties \(\varvec{{\hat{p}}}\), the reconstructed microstructure \({\varvec{x}}^\prime \) and the reconstructed latent features \({\varvec{z}}^\prime \). In the following, we introduce the information processing scheme of the MTL model in detail.

The processing scheme starts with an encoder network which extracts significant features by mapping the microstructure space \({\varvec{x}} \in {\mathbb {R}}^K\) into a lower dimensional latent feature space \({\varvec{z}} \in {\mathbb {R}}^M\) via the learned function

$$\begin{aligned} {\varvec{z}} = f_{\textrm{enc}} ({\varvec{x}}, \varvec{\theta }_{\textrm{enc}}), \end{aligned}$$
(1)

in which the encoder network is parameterized by its weight values \(\varvec{\theta }_{\textrm{enc}}\). All three previously described tasks are attached to the encoder in the form of feedforward neural networks. Besides, the encoder can be easily adapted to higher dimensional microstructure representing data types, like images (EBSD or micrograph images) or three dimensional microstructure data by using convolutional neural networks (see Krizhevsky et al. (2012)). The latter is used in Cecen et al. (2018) in the materials sciences domain for example.

Fig. 3
figure 3

MTL architecture

To train the MTL model, a loss function that combines all the three tasks is needed. This is achieved by a function that cumulates the loss terms of the three tasks \({\mathscr {L}}_{\textrm{regr}}\) (regression loss), \({\mathscr {L}}_{\textrm{recon}}\) (reconstruction loss) and \({\mathscr {L}}_{\textrm{valid}}\) (validity loss), and weights them using \({\mathscr {W}}_{\textrm{regr}}\), \({\mathscr {W}}_{\textrm{recon}}\) and \({\mathscr {W}}_{\textrm{valid}}\) to allow for prioritization. The total loss function is defined as

$$\begin{aligned} \begin{aligned} {\mathscr {L}}_{\textrm{MTL}} = {\mathscr {W}}_{\textrm{regr}} {\mathscr {L}}_{\textrm{regr}} + {\mathscr {W}}_{\textrm{recon}} {\mathscr {L}}_{\textrm{recon}} \\ + {\mathscr {W}}_{\textrm{valid}} {\mathscr {L}}_{\textrm{valid}} + \lambda R(\varvec{\theta }), \end{aligned} \end{aligned}$$
(2)

where \(R(\varvec{\theta })\) is a regularization term that is used to prevent overfitting with the hyperparameter \(\lambda \) defining the strength of the regularization (also known as weight decay, see Krogh and Hertz (1991) and Hinton (1987)). Each of the feedforward neural networks is parameterized by the respective weight values \(\varvec{\theta }_{\textrm{enc}}\), \(\varvec{\theta }_{\textrm{regr}}\), \(\varvec{\theta }_{\textrm{recon}}\) and \(\varvec{\theta }_{\textrm{valid}}\), which are adjusted simultaneously during training and altogether form the weight vector \(\varvec{\theta }\). In the following we introduce the three individual loss terms.

  1. 1.

    The forward mapping of the latent feature vector \({\varvec{z}}\) to the properties vector \(\varvec{{\hat{p}}} \in {\mathbb {R}}^N\) is represented by the learned function

    $$\begin{aligned} \varvec{{\hat{p}}} = f_{\textrm{regr}} ({\varvec{z}}, \varvec{\theta }_{\textrm{regr}}) = f_{\textrm{regr}}(f_{\textrm{enc}} ({\varvec{x}}, \varvec{\theta }_{\textrm{enc}}), \varvec{\theta }_{\textrm{regr}}). \end{aligned}$$
    (3)

    The regression loss is given by the mean squared error between the predicted properties \(\varvec{{\hat{p}}}\) and the true properties \({\varvec{p}}\):

    $$\begin{aligned} {\mathscr {L}}_{\textrm{regr}} ({\varvec{p}}, \varvec{{\hat{p}}}) = \frac{1}{N} \sum _{i=1}^N ({p_i} - {{\hat{p}}_i} )^2, \end{aligned}$$
    (4)

    where N denotes the number of properties.

  2. 2.

    The decoder network, which is responsible for the reconstruction, transforms the latent feature vectors \({\varvec{z}}\) back to the original microstructure space:

    $$\begin{aligned} {\varvec{x}}^\prime = f_{\textrm{recon}}({\varvec{z}}, \varvec{\theta }_{\textrm{recon}}) = f_{\textrm{recon}}(f_{\textrm{enc}}({\varvec{x}}, \varvec{\theta }_{\textrm{enc}}), \varvec{\theta }_{\textrm{recon}}). \end{aligned}$$
    (5)

    The reconstruction loss is defined on the basis of a distance measure between two microstructural feature vectors \(\text {dist}({\varvec{x}}, \varvec{x^\prime } )\):

    $$\begin{aligned} {\mathscr {L}}_{\textrm{recon}} ({\varvec{x}}, \varvec{x^\prime }) = \text {dist}({\varvec{x}}, \varvec{x^\prime } ). \end{aligned}$$
    (6)

    The distance measure between depends on the microstructure representation and has to be chosen appropriately.

  3. 3.

    On the basis of the latent feature space, an extra autoencoder network is set up transforming \({\varvec{z}} \in {\mathbb {R}}^M\) into an even lower-dimensional feature sub-space \({\varvec{s}} \in {\mathbb {R}}^S\) with \(S<M\) and transforming back to \({\varvec{z}}^\prime \in {\mathbb {R}}^M\) via

    $$\begin{aligned} \varvec{z^\prime } = f_{\textrm{valid}}({\varvec{z}}, \varvec{\theta }_{\textrm{valid}}) = f_{\textrm{valid}}(f_{\textrm{enc}}({\varvec{x}}, \varvec{\theta }_{\textrm{enc}}), \varvec{\theta }_{\textrm{valid}}). \end{aligned}$$
    (7)

    The validity loss is defined by the mean squared error between \({\varvec{z}}\) and \({\varvec{z}}^\prime \):

    $$\begin{aligned} {\mathscr {L}}_{\textrm{valid}} ({\varvec{z}}, {\varvec{z}}^\prime ) = \frac{1}{M} \sum _{i=1}^M ({z_i} - {z_i}^\prime )^2. \end{aligned}$$
    (8)

Distance preserving feature extraction using siamese neural networks

The above described MTL approach is used in combination with an optimizer that searches for candidate microstructures with desired properties in the latent feature space. However, our approach aims to identify a diverse set of microstructures with high diversity. For the diversity quantification a distance measure in the latent feature space is required. The MTL approach as defined above, is not able to preserve the distances of the original space in the latent feature space. In order to construct a distance preserving latent feature space, the MTL model is embedded in a siamese neural network (Bromley et al., 1993; Chicco, 2021), which we describe next.

Siamese neural networks consist of two identical networks, which share weights in the encoder part, see Fig. 4. Both networks embed different microstructures \(\textbf{x}_L\) and \(\textbf{x}_R\) as \(\textbf{z}_L\) and \(\textbf{z}_R\) in the latent feature space which is finally processed by two identical MTL networks. The distance preservation is enforced by defining a distance preservation loss \({\mathscr {L}}_{\textrm{pres}}\) that minimizes the difference between the distance of two different input microstructures in the original space \(\text {dist}({\varvec{x}}_L, {\varvec{x}}_R)\) and the corresponding distance in the latent feature space \(\text {dist}({\varvec{z}}_L, {\varvec{z}}_R)\), with \({\varvec{x}}_L \ne {\varvec{x}}_R\) (Utkin et al., 2017):

$$\begin{aligned} {\mathscr {L}}_{\textrm{pres}} = (\text {dist}({\varvec{x}}_L, {\varvec{x}}_R) - \text {dist}({\varvec{z}}_L, {\varvec{z}}_R))^2, \end{aligned}$$
(9)

while \(\text {dist}({\varvec{x}}_L, {\varvec{x}}_R)\) and \( \text {dist}({\varvec{z}}_L, {\varvec{z}}_R)\) are not necessarily the same distance measures. Applying such loss terms leads to multi dimensional scaling, see Kruskal (1964) and Cox and Cox (2008). Using the distance preservation loss \({\mathscr {L}}_{\textrm{pres}}\), the MTL loss function, defined in Eq. 2, is extended by the weighted preservation loss \({\mathscr {W}}_{\textrm{pres}} {\mathscr {L}}_{\textrm{pres}}\) to

$$\begin{aligned} \begin{aligned} {\mathscr {L}}_{\textrm{SMTL}} =&\ {\mathscr {W}}_{\textrm{regr}} {\mathscr {L}}_{\textrm{regr}} + {\mathscr {W}}_{\textrm{recon}} {\mathscr {L}}_{\textrm{recon}} \\&+ {\mathscr {W}}_{\textrm{valid}} {\mathscr {L}}_{\textrm{valid}} + {\mathscr {W}}_{\textrm{pres}} {\mathscr {L}}_{\textrm{pres}} \\&+ \lambda R(\varvec{\theta }). \end{aligned} \end{aligned}$$
(10)
Fig. 4
figure 4

Architecture of the SMTL approach. The dotted line between the encoders \(E_\textrm{L}\) and \(E_\textrm{R}\) indicates shared weights

The SMTL approach delivers a function which can map a microstructure representation in the latent feature space on properties. Now, an optimizer can operate on a lower dimensional feature space to find microstructures with desired properties. The SMTL framework also allows to reconstruct the original represenation of microstructures, to asses the distances between them and to validate them in the latent feature space.

Microstructure optimizer

The microstructure optimization with respect to desired properties uses the distance preserving SMTL framework with the tasks microstructure–property mapping, validity-prediction and reconstruction. The optimization minimizes a loss function, which consists of the cost terms \({\mathscr {C}}_\textrm{prop}\), \({\mathscr {C}}_\textrm{valid}\) and \({\mathscr {C}}_\textrm{divers}\) and the corresponding weights \({\mathscr {V}}_\textrm{prop}\), \({\mathscr {V}}_\textrm{valid}\) and \({\mathscr {V}}_\textrm{divers}\):

$$\begin{aligned} {\mathscr {F}} = {\mathscr {V}}_\textrm{prop} {\mathscr {C}}_\textrm{prop} + {\mathscr {V}}_\textrm{valid} {\mathscr {C}}_\textrm{valid} + {\mathscr {V}}_\textrm{divers} (1+{\mathscr {C}}_\textrm{divers}). \end{aligned}$$
(11)

\({\mathscr {C}}_\textrm{prop}\), \({\mathscr {C}}_\textrm{valid}\) and \({\mathscr {C}}_\textrm{divers}\) denote the property, validity and diversity cost terms, respectively. While the property cost term drives the candidate microstructures to lie inside a specified target properties region, the validity cost aims that the optimizer operates inside the region of valid microstructures and the diversity cost ensures that candidate microstructures differ from each other. To minimize the loss function we use genetic algorithms, which generate a population set of P candidate microstructures \(\varvec{{\tilde{z}}}^*\) in the latent feature space in every iteration. The three cost terms are described in more detail in the following.

  1. 1.

    The property cost is defined by the mean squared error between the desired properties and the predicted properties from the SMTL regression model:

    $$\begin{aligned} {\mathscr {C}}_\textrm{prop} = \frac{1}{N} \sum _{i=1}^N (\widetilde{{\mathscr {C}}}_{\textrm{prop},i} )^2. \end{aligned}$$
    (12)

    If one of the predicted properties lies inside the target region, the cost \(\widetilde{{\mathscr {C}}}_{\textrm{prop},i}\) equals 0. Otherwise, \(\widetilde{{\mathscr {C}}}_{\textrm{prop},i}\) equals the minimum squared distance from the predicted properties to the target region borders.

  2. 2.

    The validity prediction is used to asses whether an identified candidate microstructure is likely to be represented by the sample data set. The validity cost is defined by

    $$\begin{aligned} {\mathscr {C}}_\textrm{valid} = \textrm{max}({\mathscr {A}} - \xi _\textrm{valid}, 0), \end{aligned}$$
    (13)

    in which \(\xi _\textrm{valid}\) is a threshold to define the maximum tolerated reconstruction error for valid textures and \({\mathscr {A}}\) denotes the anomaly score

    $$\begin{aligned} {\mathscr {A}} = \frac{1}{M} \sum _{i=1}^M ({z^*_i} - z^{*\prime }_i )^2. \end{aligned}$$
    (14)
  3. 3.

    The diversity cost is based on the sum of the distances between the candidate microstructure \({\varvec{z}}^*\) in the latent feature space and every other microstructure in the population:

    $$\begin{aligned} {\mathscr {C}}_\textrm{divers} = - \sum _{i=1}^P \text {dist}({\varvec{z}}^*_i, {\varvec{z}}^*), \end{aligned}$$
    (15)

    in which for \(\text {dist}({\varvec{z}}^*_i, {\varvec{z}}^*)\) the same distance measure has to be used as for the latent feature vectors in Eq. 9.

Materials science fundamentals

Representation of crystallographic texture

Crystallographic texture is typically described by the orientation distribution function, which is defined by

$$\begin{aligned} f(g)\textrm{d}g = \frac{\textrm{d}V}{V}, \end{aligned}$$
(16)

for an orientation g (a point in SO(3)) and the volume V(g) in SO(3). The orientation distribution function f(g) often underlies specific symmetry conditions, for which various regions in SO(3) are equivalent. Therefore, depending on the symmetries, orientations can be mapped into an elementary region of SO(3), the so-called fundamental zone. The orientation distribution function on the basis of the orientations mapped into the fundamental zone is indistinguishable from the original orientation distribution function. Rolling textures, for example, underlie a cubic crystal and an orthorhombic sample symmetry, for which 96 elementary regions exist (Hansen et al., 1978).

A popular way to represent the orientation distribution function is by approximating it via generalized spherical harmonic functions (Bunge, 2013). Yet, as there is no straightforward way to measure the distance between two orientation distribution functions in terms of generalized spherical harmonics, we make use of the orientation histogram-based texture descriptor, which is introduced in Dornheim et al. (2021). Therefore, the cubic fundamental zone is discretized into a set O of J nearly uniform distributed orientations \(o_j\). For each individual orientation g in a set of orientations G, a weight vector \(w_\textrm{g}\) is constructed via a soft-assignment approach

$$\begin{aligned} w_\textrm{g} = \left\{ \begin{array}{ll} \frac{\varPhi (g,o_j)}{\sum _{o_i \in N_l} \varPhi (g,o_i)}, &{} \text {if} ~~ o_j \in N_l \\ 0, &{} \text {else} \end{array}\right. , \end{aligned}$$
(17)

where \(N_l\) is the set of l nearest neighbor orientations of g in terms of the orientation distance \(\varPhi \). The orientation distance between two orientations g and o is defined by

$$\begin{aligned} \varPhi = \min \varPhi ({\overline{g}},{\overline{o}}). \end{aligned}$$
(18)

where \({\overline{g}}\) and \({\overline{o}}\) is from the set of all equivalent orientations of g and o in terms of cubic crystal symmetry. The orientation distance measure in SO(3) is defined as

$$\begin{aligned} \varPhi (q_\textrm{g},q_\textrm{o}) = \min (||q_\textrm{g}-q_\textrm{o}||,||q_\textrm{g}+q_\textrm{o}||), \end{aligned}$$
(19)

where \(q_\textrm{g}\) and \(q_\textrm{o}\) are the quaternion representations of the orientations g and o (Huynh, 2009).

On this basis, the weight vector for the orientation histogram \({\varvec{b}}\) can be calculated by a volume average of the weight vectors of the individual orientations

$$\begin{aligned} {\varvec{b}} = \frac{1}{V}\sum _{j=1}^J V(o_j)w_{o_j}. \end{aligned}$$
(20)

The distance between two orientation distribution functions can be measured via any kind of histogram-based distance measure, such as the Chi-Squared distance (Pele and Werman, 2010)

$$\begin{aligned} \chi ^2({\varvec{b}}^{(1)},{\varvec{b}}^{(2)}) = \sum _{j=1}^J \frac{(b^{(1)}_j-b^{(2)}_j)^2}{b^{(1)}_j+b^{(2)}_j}. \end{aligned}$$
(21)

The set of nearly uniform distributed orientations O, needed for the histogram-based texture descriptor, can be generated using the algorithm described in Quey et al. (2018), which is implemented in the software neper (Quey et al., 2011). For the purpose of this study, we sample 512 nearly uniform distributed orientations over the cubic fundamental zone and chose a soft assignment of \(l=3\).

Crystallographic texture of steel sheets

After rolling body centered cubic (bcc) materials, typically so-called fiber textures are formed. Following (Ray et al., 1994), these textures are composed of the five fibers \(\alpha \), \(\gamma \), \(\eta \), \(\epsilon \), and \(\beta \), which are defined in detail in Table 2. Among these fibers, the \(\alpha \) and \(\gamma \) fiber are most prominent (Kocks et al., 1998), whereas the presence of the \(\beta \) fiber is only reported from theoretical predictions (Von Schlippenbach et al., 1986). To give an idea on how the fibers affect forming properties, we refer to Ray et al. (1994). Therein, it is found out that the \(\gamma \) fiber causes good deep drawability and the \(\alpha \) fiber has a contrary effect.

Table 2 Definition of the fibers of bcc rolling textures following Kocks et al. (1998)
Table 3 Definition of the parameters \(D_i\) of the texture model, cf. Delannay et al. (1999)

In order to generate a data base of (artificial) rolling textures, in this work, a 25-parameter model is used, as it is proposed in Delannay et al. (1999) to describe steel sheet textures. The model is based on textures that are composed of the fibers \(\alpha \), \(\gamma \), and \(\eta \). As the \(\eta \)-fiber is not always present in steel sheet textures, we limit ourselves to textures that consist of an \(\alpha \) and \(\gamma \) fiber. Therefore, 6 of the 25 parameters can be neglected.

The texture model describes the orientation distribution function as a set of weighted Gaussian distributions placed along the fibers. The model parameters \(D_i\) are listed in Table 3 and define the standard deviations and the mean values of the distributions based on the fiber thickness and the shifts from their ideal positions. Furthermore, the model parameters define the weights of the distributions among each other based on the probability given by the orientation distribution function, what we will can fiber intensity in the following.

To construct the set of Gaussian distributions, the seven base distributions from Table 3 are placed at their ideal positions with respect to the shifts. Between these seven distributions, further distributions are placed with a distance of about \(3^\circ \) to each other, leading to overall 41 Gaussians. Their weights \(w_i\) and the values for the standard deviation \(\sigma _i\) and mean value \(\mu _i\) are interpolated linearly based on the values of the two neighboring base distributions. This yields a set of Gaussian distributions \({\mathcal {N}}_1(\mu _1,\sigma _1),..., {\mathcal {N}}_{41}(\mu _{41},\sigma _{41})\). The orientation distribution function f(g) is defined by the normalized sum of this set:

$$\begin{aligned} f(g) = \frac{1}{\sum _i w_i} \sum _{i=1}^{n} w_i {\mathcal {N}}_i(\mu _i,\sigma _i). \end{aligned}$$
(22)

Based on this definition, discrete orientations can be sampled. In the following, we denote the set of orientations as G. As f(g) is defined in the cubic-orthorhombic fundamental zone, it is necessary to add the equivalent orientations regarding the orthorhombic sample symmetry to the set of discrete orientations. This is done by applying rotation operations \(g_s\) on each orientation \(g_i\) in G

$$\begin{aligned} g_i^\textrm{equiv} = g_s ~ g_i. \end{aligned}$$
(23)

The rotation operations \(g_s\) for orthorhombic sample symmetry can be found in Hansen et al. (1978).

Material model

The sheet metal properties which we focus on in this study are the Young’s moduli and the r-values at 0, 45 and 90 degree to rolling direction. In this study, the properties are calculated by applying uniaxial tension on a crystal plasticity-based material model. As time efficiency is essential for the generation of data, a material model of Taylor-type is implemented, as it is described in Dornheim et al. (2021).

The Taylor-type material model is based on the volume averaged stress of a set of n crystals (Kalidindi et al., 1992):

$$\begin{aligned} \varvec{{\overline{T}}} = \frac{1}{V}\sum _{i=1}^n {\varvec{T}}^{(i)}V^{(i)}. \end{aligned}$$
(24)

In the above equation, \({\varvec{T}}\) denotes the Cauchy stress tensor, which can be derived by the stress tensor in the intermediate configuration, given by

$$\begin{aligned} {\varvec{T}}^*=\frac{1}{2} ~ {\mathbb {C}}:({\varvec{F}}_\textrm{e}^T\cdot {\varvec{F}}_\textrm{e}-{\varvec{I}}), \end{aligned}$$
(25)

with the second order identity tensor \({\varvec{I}}\) and the fourth order elastic stiffness tensor \({\mathbb {C}}\). The elastic constants \(C_{11}\), \(C_{12}\) and \(C_{44}\) are set to 218.37, 131.13 and 105.34 GPa, respectively (Eghtesad and Knezevic, 2020). \({\varvec{F}}_\textrm{e}\) is the elastic part of the deformation gradient \({\varvec{F}}\) and can be calculated by a multiplicative decomposition

$$\begin{aligned} {\varvec{F}} = {\varvec{F}}_\textrm{e} \cdot {\varvec{F}}_\textrm{p}. \end{aligned}$$
(26)

The intermediate stress tensor can be converted into Cauchy stress using the relation

$$\begin{aligned} \varvec{T^*} = {\varvec{F}}_\textrm{e}^{-1}\cdot ({\text {det}}({\varvec{F}}_\textrm{e})~{\varvec{T}})\cdot {\varvec{F}}_\textrm{e}^{-\top }. \end{aligned}$$
(27)

To describe the evolution of the plastic deformation, the plastic part of the velocity gradient \({\varvec{L}}_\textrm{p}\) is considered by

$$\begin{aligned} {\varvec{L}}_\textrm{p} = \dot{{\varvec{F}}}_\textrm{p} \cdot {\varvec{F}}_\textrm{p}^{-1}, \end{aligned}$$
(28)

and the flow rule (Rice, 1971)

$$\begin{aligned} {\varvec{L}}_\textrm{p}=\sum _\eta {\dot{\gamma }}^{(\eta )} {\varvec{m}}^{(\eta )} \otimes {\varvec{n}}^{(\eta )}, \end{aligned}$$
(29)

where \({\dot{\gamma }}^{(\eta )}\) denotes the shear rates on the active slip systems \(\eta \), defined by the slip plane normal \({\varvec{n}}^{(\eta )}\) and the slip direction \({\varvec{m}}^{(\eta )}\). For bcc materials, the slip system families in terms of the Miller index are {110}<111>, {112}<111>, and {123}<111>, while the latter is neglected due to simplicity.

The shear rates are defined by a phenomenological power-law (Asaro & Needleman, 1985):

$$\begin{aligned} {\dot{\gamma }}^{(\eta )}={\dot{\gamma }}_0 \left| \frac{\tau ^{(\eta )}}{r^{(\eta )}} \right| ^{1/m}\textrm{sign}(\tau ^{(\eta )}), \end{aligned}$$
(30)

where \(r^{(\eta )}\) is the slip system resistance, \({\dot{\gamma }}_0\) the reference shear rate and m the shear rate sensitivity. Here, \({\dot{\gamma }}_0\) and m are set to 0.001 \(\hbox {sec}^{-1}\) and 0.0125, respectively (Pagenkopf et al., 2016). Following Schmid’s law, the resolved shear stress on slip system \(\tau ^{(\eta )}\) is given by

$$\begin{aligned} \tau ^{(\eta )}=(({\varvec{F}}_\textrm{e}^T\cdot {\varvec{F}}_\textrm{e})\cdot {\varvec{T}}^*):({\varvec{m}}^{(\eta )}\otimes {\varvec{n}}^{(\eta )}), \end{aligned}$$
(31)

and the evolution of the slip system resistance is defined by

$$\begin{aligned} {\dot{r}}^{(\eta )} = \frac{\textrm{d}{\hat{\tau }}^{(\eta )}}{\textrm{d}\varGamma }\sum _\xi q_{\eta \xi }|{\dot{\gamma }}^{(\xi )}|. \end{aligned}$$
(32)

The matrix \(q_{\eta \xi }\) describes the ratio between self and latent hardening. It consists of diagonal elements equal to 1.0 and off-diagonal elements \(q_1\) and \(q_2\), cf. Baiker et al. (2014). Both, \(q_1\) and \(q_2\), are set to 1.4 (Asaro and Needleman, 1985). Further, the hardening behavior is realized by an extended Voce-type model (Tome et al., 1984):

$$\begin{aligned} {\hat{\tau }}^{(\eta )}=\tau _0+(\tau _1+\vartheta _1\varGamma )(1-e^{-\varGamma \vartheta _0/\tau _1}). \end{aligned}$$
(33)

The material dependent parameters are calibrated to DC04 steelFootnote 1 and are \(\tau _0=94.9\) MPa, \(\tau _1=50\) MPa, \(\vartheta _0=258\) MPa and \(\vartheta _1=32.8\) MPa (Pagenkopf, 2019). The accumulated plastic shear is defined by

$$\begin{aligned} \varGamma = \int _0^t \sum _\eta \left| {\dot{\gamma }}^{(\eta )} \right| \textrm{d}t. \end{aligned}$$
(34)

Although material parameters for DC04 steel are used in this study, it is to remark that the described Taylor-type crystal plasticity model and the texture generation approach can be applied to any kind of metallic material with bcc crystal structure.

Results

Texture-property data set

For training, 50000 sets of 2000 discrete orientations are sampled via Latin hypercube design (McKay et al., 1979), based on Eq. 22. In order to have an independent test set, further 10000 sets are generated randomly. The ranges inside which the parameters of the texture model vary are adjusted manually such that typical bcc rolling textures found in literature Das (2017), Hölscher et al. (1991), Inagaki and Suda (1972), Kestens and Pirgazi (2016), Klinkenberg et al. (1992), Kocks et al. (1998), and Pagenkopf et al. (2016) are covered. The parameter ranges are listed in Table 4. In addition, to evaluate the anomaly detection, a set of artificial textures is needed, which slightly differ from the generated rolling textures. For this purpose, 10000 anomalies are generated by shifting the \(\alpha \)-fiber (i.e. the ideal position of \(a_1\), \(a_2\), \(a_4\) and \(a_5\)) about 20 degrees in \(\varphi _1\)-direction.

Table 4 Parameter ranges for \(D_i\)

Moreover, we validate the texture-property mapping and the validity-prediction on experimental data. For this purpose, an experimentally measured texture of cold rolled DC04 steel from Schreijäg (2012) is used. Based on this measurement, an orientation distribution function is approximated via the MATLAB toolbox mtex (Bachmann et al., 2010), rotated into its symmetry axis assuming orthorhombic sample symmetry and mirrored. To visualize the \(\alpha \)- and \(\gamma \)-fiber of the orientation distribution, an intersection plot of the Euler space at \(\varphi _2=45^\circ \) is depicted in Fig. 5.

Fig. 5
figure 5

\(\varphi _2=45^\circ \) section of the orientation distribution function to visualize the \(\gamma \)-fiber of the colled rolled DC04 steel texture

Fig. 6
figure 6

One twin part of the SMTL model with the annotation of the dimension size of the layers. Fc denotes fully-connected layers and tanh denotes hyperbolic tangent activation function

Table 5 Used hyperparamters
Table 6 Results for varying numbers of latent features (LF) of the texture-property mapping and the distance preservation applied to the artificially generated textures and experimentally measured texture

For the generated textures in the training and test set, the corresponding Young’s moduli and R-values in 0, 45, and 90 degree to rolling direction are determined using the Taylor-type crystal plasticity model described in “Material model” section. Both quantities, Young’s modulus and especially R-values, are highly affected by the crystallographic texture, which is why these are chosen exemplary for the purpose of this study.

Validation of SMTL

In this study, the individual tasks of the SMTL model are realized via feedforward neural networks with tanh activation functions to obtain features between \(-1\) and \(+1\) in the latent feature space. The SMTL model is implemented based on the Python TensorFlow API (Abadi et al., 2015). The base network of the siamese architecture is illustrated in Fig. 6. The Glorot Normal method (Glorot & Bengio, 2010) is used for weight initialization. In order to adjust the hyperparameters, a random search method (Bergstra & Bengio, 2012) is applied using 5-fold cross-validation.

The best model configuration that was found is shown in Table 5. We use the Chi-Squared distance introduced in Eq. 21 as distance measure in the input space. In the latent feature space, we use the sum of squared errors (SSE) between two vectors \({\varvec{z}}_{\textrm{1}}\) and \({\varvec{z}}_{\textrm{2}}\) as distance measure

$$\begin{aligned} \textrm{SSE} ({\varvec{z}}_{\textrm{1}}, {\varvec{z}}_{\textrm{2}}) = \sum _{i=1}^M ({\varvec{z}}_{\textrm{1},i} - {\varvec{z}}_{\textrm{2},i})^2. \end{aligned}$$
(35)

The SMTL model is trained for 200 epochs, while the best intermediate result of the test set is retained, which can be interpreted as early stopping (Prechelt, 1998). Before the model training is executed, the loss terms are scaled to values between 0 and 1 in order to make them comparable. The following weights for the scaled loss terms were based on hyper parameter optimization: \({\mathscr {W}}_\textrm{regr} = 0.05\), \({\mathscr {W}}_\textrm{recon} = 0.05\), \({\mathscr {W}}_\textrm{valid} = 0.05\) and \({\mathscr {W}}_\textrm{pres} = 0.85\).

The results for the texture-property mapping and the distance preservation are shown in Table 6, in which the regression errors \(\hbox {MAE}_\textrm{E}\) and \(\hbox {MAE}_\textrm{r}\) denote the mean absolute error between the true and predicted Young’s moduli and r-values depending on the dimension of the latent feature space \({\varvec{z}}\). The quality of the distance preservation is measured by the coefficient of determination \(R^2\), between the distances of two input textures and their corresponding latent feature vectors

$$\begin{aligned} R^2( \chi ^2({\varvec{x}}_L, {\varvec{x}}_R), SSE({\varvec{z}}_L, {\varvec{z}}_R) ). \end{aligned}$$
(36)

It is shown that texture-property mappings with an adequate prediction quality can be achieved by extensively reducing the dimensionality of the latent feature space. However, regarding the distance preservation quality, a lower bound of at least 10 latent features can be identified, below which the distance preservation is unsatisfactory. Additionally, the texture-property mapping is evaluated on the experimentally measured texture and the corresponding properties. The results are listed in Table 6. It can be seen that a satisfactory prediction quality (Regr. \(\hbox {MAE}_\textrm{E} \le 1000\) MPa and Regr. \(\hbox {MAE}_\textrm{r} \le 0.1\)) can only be achieved for at least 16 latent features.

On the basis of this 16-dimensional feature space, the validity-prediction is evaluated. The anomaly scores for the textures in the test set and for the artificially generated anomalies are shown in Fig. 7. It can be seen, that the anomalies can be separated in a sufficient manner from the textures in the test set.

Fig. 7
figure 7

Histograms of the anomaly scores for the data from the test set and the set of artificially generated anomalies. The anomaly scores are based on the model that uses 16 latent features

Rolling texture identification

To validate the texture identification, we define two target regions in property space, see Fig. 8. The first one is defined by the properties of the experimentally measured texture, which lies in a sparsely populated region and is labeled as Target Region 1. As a consequence of its location in the sparsely populated region, the anomaly score of this texture is 0.0099 and lies in the transition zone shiftet towards the generated anomalies (cf. Fig. 7). It is of interest if the optimizer is generally able to find a whole set of microstructures with properties in this region. The second target region represents a densely populated region located near the center of the properties point cloud and is labeled as Target Region 2. The center of each target region is listed in Table 7. The target regions are defined by adding a tolerance of \(\pm 1000\) MPa to the Young’s moduli and \(\pm 0.10\) to the r-values, yielding a sufficiently small properties window from an engineering point of view. As a baseline, we collect all data points from the training set, that lie inside the target regions. In Target Region 1 only two textures can be found, whereas in Target Region 2 13 textures can be found.

Fig. 8
figure 8

Density of the training set projected on different planes of the property space: the Young’s modulus at a 0 vs. 90 degree and b 45 vs. 90 degree for the r-values at c 0 vs. 90 degree and d 45 vs. 90 to rolling direction. The orange and red squares mark the projections of Target Region 1 and Target Region 2, respectively. The green dots show the projected samples from the training set that lie inside the target region

Table 7 Center points of the two target regions (TR)

To identify a diverse set of textures, we use the optimization algorithm JADE (Zhang and Sanderson, 2009), which is an extension of the differential evolution algorithm (Storn and Price, 1997). Before starting the optimization via JADE, an initial population has to be selected: Therefore, 100 textures are sampled from the test set, which are approximately uniformly distributed over the property space. For the cost function, defined in Eq. 11, we use the weights \({\mathscr {V}}_\textrm{prop}=0.90\), \({\mathscr {V}}_\textrm{valid}=0.03\) and \({\mathscr {V}}_\textrm{divers}=0.07\) and scale \(C_\textrm{props}\) and \(C_\textrm{divers}\) to values between 0 and 1 based on the selected 100 initial textures. The threshold \(\xi _\textrm{valid}\) is set to 0.01 based on the maximum anomaly score in the data set, cf. Fig. 7. The optimization is performed for 300 iterations with a fixed population size of 100. During the optimization, all valid textures that fulfill the target properties are collected, according to the texture-property mapping. Based on the results from the previous section, we use the trained SMTL model with a 16-dimensional latent feature space.

Target region 1

Our approach is able to find a diverse set of textures that meet the property requirements of Target Region 1, according to the texture-property mapping. Figure 9 depicts the mutual distances in the latent feature space between all the found textures and between the two baseline textures. It is shown, that the set of identified textures contains 221 diverse textures in contrary to only two in the baseline set. In order to compare the results to the experimentally measured texture, the closest texture to the center point of Target Region 1 is depicted in Fig. 10 as a section through the Euler space at \(\varphi _2 = 45^\circ \). By comparing the two textures, it can be seen that they are roughly the same in terms of the magnitude of the intensities and the shape of the \(\alpha \)- and \(\gamma \)-fibers. However, they also show differences in terms of smoothness and the location of the intensity peaks.

Fig. 9
figure 9

Histogram of pairwise SSE distances of the set of identified textures and the baseline set for Target Region 1. The distance between the two textures from the baseline set is indicated by the dashed line

Fig. 10
figure 10

Texture that yields properties which are closest to the center of Target Region 1. The plot shows the \(\varphi _2=45^\circ \) section of the orientation distribution function

Fig. 11
figure 11

Histogram of mutual distances of the set of identified textures and the baseline set for Target Region 2

Fig. 12
figure 12

Two exemplary textures from the set of identified textures. Both plots show \(\varphi _2 = 45^\circ \) sections of the respective orientation distribution functions

Target region 2

Compared to Target Region 1, an even more diverse set of 1315 textures can be identified for Target Region 2, which can be seen in the histogram of the mutual distances in Fig. 11. To get an idea of the differences between the textures, two exemplary textures are plotted in Fig. 12 as a section through the Euler space at \(\varphi _2 = 45^\circ \). It can be seen that the \(\alpha \)- and \(\gamma \)-fiber of both textures differ significantly in terms of intensity. However, the locations of the intensity peaks and the thickness of the \(\alpha \)- and \(\gamma \)-fiber are similar.

Discussion

The results presented in “Validation of SMTL” section show that the two tasks texture-property mapping and validity-prediction are solved by the SMTL model. To achieve a sufficient prediction quality for both tasks in the test set as well as for the experimentally measured texture, a minimum dimensionality of the latent feature space is needed. Here, also the dimensionality requirements of the siamese distance preservation goal has to be considered. 16 latent features were found to be sufficient for our example task regarding the texture of cold rolled bcc steel sheets.

However, the prediction error for the experimentally measured texture is higher than for the test set using the same latent feature space dimensionality. This can be explained by the fact that the corresponding property is in a texture space region with low sampling density and the model therefore is not well supported by data. This results also in an instability of the model quality depending on the dimensionality of the latent feature space in this region. This instability can be seen by studying the r-value in Table 6. By choosing the latent feature space size of 16, also the results for the experimentally measured texture are satisfactory, especially keeping in mind that the experimentally measured texture differs naturally from the artificially generated data and additionally lies in a sparsely sampled region, cf. Target Region 1 in Fig. 8.

Due to the sparsity of Target Region 1, the identification of textures in this region is challenging. Nevertheless, the optimization approach is able to identify a set of textures that contains more diverse individuals compared to the two baseline textures from the training set. Regarding the identified texture, which is closest to the experimentally measured texture in terms of properties, one can see that they are also similar in terms of crystallographic texture, what basically proofs the concept of our approach.

The most obvious difference between both textures is smoothness. The irregular distribution of intensity peaks of the identified texture is due to the resolution of the histogram-based texture descriptor. Also, the orthorhombic sample symmetry is not represented locally. However, by increasing the resolution, these two issues can be solved. Furthermore, a higher resolution of the descriptor decreases the descriptor error, which reflects the deviation between the properties of the original texture and the properties of the texture described by the descriptor. However, the choice of resolution is a trade-off between accuracy and descriptor complexity. Generally, with the use of the SMTL model and the incorporated feature extraction, the resolution is limited only by computational power.

Compared to Target Region 1, the identification task for Target Region 2 seems to be less challenging as the target region is located in a densely sampled region. However, as there already exists a proper set of diverse textures in the baseline, the main challenge is to outperform the baseline set in terms of diversity. Figure 11 shows that the materials design problem (the identification of multiple equivalent microstructures/ textures) is accomplished by the optimization approach. This is exemplary shown when comparing two of the identified textures in Fig. 12 with each other: similar properties can be reached by different microstructures. The identification of such a highly diverse set of microstructures with similar properties is an important precondition to construct robust optimizing process control algorithms, which need to choose among multiple optimal paths leading to desired properties.

Summary and outlook

In this work, we present an approach to solve materials design problems. The approach is based on an optimization strategy that incorporates machine learning models for mapping microstructures to properties and for assessing the validity of input microstructures in the sense of the likeliness with the underlying data. To model these tasks, we use a siamese multi-task learning (SMTL) neural network model. Furthermore, we incorporate feature extraction in order to transform input microstructures to a lower dimensional latent feature space, in which an optimizer (identifying microstructures with dedicated properties) can operate efficiently.

By training the SMTL model with a dedicated loss function term, we are able to preserve the distances between microstructures in the original input space also in the latent feature space. The distance preservation allows to directly assess the diversity of the solution set (found by the optimizer) directly in the latent feature space and therefore enables optimizers to efficiently identify sets of diverse microstructures. By applying the approach to crystallographic texture optimization, we show the ability to identify diverse sets of textures that lie within given properties bounds. Such sets of textures form the input of optimal processing control approaches like in Dornheim et al. (2021).

In the present work, we applied our approach on data from mean-field simulations. The next step is to apply the approach on spatially resolved data from full-field simulations. The proposed methods can be easily extended for this task by modifying the encoder part of the SMTL model. However, the problem arises that typically fewer data can be generated via full-field simulations. Nevertheless, such sparse high quality data can be used to support the modeling with lower quality data. Concepts to incorporate multi-fidelity data (Batra, 2021) in our SMTL model will be considered in the future.