Introduction

An important technique for innovative materials research is high-energy X-ray diffractometry, available at synchrotron beamlines, where diffraction patterns are usually recorded on a 2D area detector [1]. These 2D XRD patterns, where each reflection of a grain in a specific phase produces a distinctive spot on a ring, offers a wealth of information for unraveling the intricacies of the atomistic structure and microstructure of materials. However, the comprehensive analysis of all information contained in these diffraction patterns is a tedious and time-consuming process with current conventional methods relying on laborious manual operations. Recent advancements in computational methods [2, 3] and deep-learning techniques, especially convolutional neural networks (CNNs), have shown outstanding capabilities in addressing the complexities associated with XRD pattern analysis. Hongyang et al. [4] devised a CNN-based model capable of predicting precise scale factors, lattice parameters, and crystallite size maps for multi-phase catalytic material systems. Similarly, WB et al. [5] employed CNNs for classifying crystal structures across approximately 150,000 powder XRD patterns. Hin-Woong et al. [6] constructed a CNN that identifies the constituent phases in multi-phase inorganic compounds. Furthermore, Jerardo et al. [7] extended the utility of CNNs to classify a broad spectrum of materials’ crystal systems and space groups under both simulated and experimental conditions.

While much of these research has centered on 1D XRD patterns, the inherent strengths of CNNs, especially their capacity to autonomously learn hierarchical features and discern patterns, make them particularly well-suited for extracting information from image data [8]. Additionally, 2D patterns contain more detailed information than their 1D pattern. Building upon our previous work [9], we have demonstrated the efficacy of CNNs in accurately predicting phase fractions for 2D high-energy synchrotron XRD patterns. In this context, the “ground truth” of the 2D XRD pattern’s attributes plays an important role for developing models. In order for the deep-learning model to learn important features in a set of data, it must be identified by experts on enough representative data sets, so that the computer can be properly trained. Such data are referred to as “ground truth” in the artificial intelligence (AI) and deep-learning community. The utilization of simulated data stands out as a noteworthy strategy. By employing a Kinematic-Diffraction Simulator (KDS) [10] to generate simulated 2D XRD patterns, researchers can gain a unique advantage in precisely controlling the attributes of interest, ensuring accurate ground truth. This allows for the creation of diverse datasets tailored to specific materials, enhancing the model’s adaptability and performance.

To showcase the efficacy of this approach, we focused our study on the Ti–6Al–4V (Ti64) alloy, a material that is widely used in industries [11]. By leveraging simulated 2D XRD patterns for this alloy (assuming Ti64 only contains \(\upalpha\)- and \(\upbeta\)-phases), we aimed to train a CNN to predict the \(\upbeta\)-phase volume fraction within each 2D XRD pattern. Notably, unlike typical CNN training cases, we exclusively used pure \(\upbeta\)- and pure \(\upalpha\)-phases in simulated 2D XRD patterns for training. Then, trained models were employed to predict \(\upbeta\)-phase fraction volume for 2D XRD patterns with a mixture of \(\upalpha\)- and \(\upbeta\)-phases. This innovative approach, using related and easily generated pure-phase XRD data during real-world experiments for training, enhances the model’s practical utility. This targeted and controlled strategy ensures that the model learns intricate features specific to the Ti64 alloy, highlighting the potential of simulated data in advancing the accuracy and efficiency of XRD pattern analysis.

Methodology

Data set description

The KDS program we employed applies the theory of diffraction physics for the generation of simulated 2D XRD patterns [10]. This model, written in Wolfram language [12], operates on the fundamental principles of diffraction, specifically utilizing the Fraunhofer diffraction theory [13] to produce diffractograms based on lattice parameters for the \(\upalpha\)- and \(\upbeta\)-phases. To effectively generate these patterns, the model requires metadata from the diffractometer (beamline), including the wavelength of the primary beam, energy spread, and beam divergence. On the sample side, the model necessitates information about the phases present, their crystal structures, mole fractions, and grain-size distributions. Additionally, users have the flexibility to introduce texture or strain into the microstructure. The KDS accommodates the specification of the number of grains within the illuminated volume. For further insights into the intricacies of the KDS, details can be found in [10]. We employed KDS to simulate a total of 6,372 2D XRD patterns, each with a size of \(2048^2\). These patterns exclusively consisted of two phases of titanium: the \(\upalpha\)-phase (hexagonal close-packed, HCP) and the \(\upbeta\)-phase (body-centered cubic, BCC). The total volume fraction for both \(\upalpha\)- and \(\upbeta\)-phases always adds up to 1. In order to generate continuous rings, we used 100,000 grains. Each pattern takes about 15 minutes on a single core and 4 GB of memory. We fleet out these jobs on a High-Performance Computing (HPC) cluster [14] to achieve the patterns at scale. The package generates “.tiff” files for each simulated diffractogram. The “.tiff” format was chosen since it does not compress any data and keeps the original data quality preserved. More information of parameters is checked in Table 2 under appendix section. Among all the data, 5,372 2D XRD patterns containing either \(\upalpha\)- or \(\upbeta\)-phases (\(\upbeta\)-phase volume fraction for these patterns is either 0 or 1) were used to train CNN models. The remaining 1,000 2D XRD patterns constituted our testing set, with each pattern representing a specific \(\upbeta\)-phase volume fraction incrementally increasing by 0.05, ranging from 0 to 1. Approximately, 75 XRD patterns exist for every 0.1 change in \(\upbeta\)-phase volume fraction, and 25 patterns for 0.05 changes (Fig. 1).

Fig. 1
figure 1

Examples of 2D XRD patterns generated by KDS. \(\upbeta\)-phase volume fractions from left to right are 0, 0.05, 0.5, 0.95, and 1.00, respectively

CNN implementation

Input to CNN models are pixel intensities of 2D XRD patterns and output is the \(\upbeta\)-phase volume fraction. To limit pixel intensities to values between 0 and 1, the pixel intensities of each 2D XRD pattern were divided by maximum value among all intensities. The CNN models were trained using the Tensorflow framework [15, 16] implemented in the HPC computing environment CRADLE [14]. For the training of the CNN models, the 5,372 diffraction patterns were split into a training set comprising 80% of the data and a validation set comprising 20% of the data. The training set is employed to facilitate model learning and the validation set permits monitoring of the model’s performance change during training. We employed the mean squared error (MSE) as the loss function. It can be computed by calculating the average of the squared differences between the predicted values and the ground truth. The goal is to identify a model that minimizes the MSE, signifying that its predicted values closely align with the ground truth.

The process of training the model involves the important task of selecting optimal hyperparameters to achieve the best model performance with the importance of this task described in references [9, 17,18,19,20]. Hyperparameters that control the architecture of the CNN include the number of convolutional layers, max-pooling layers, and fully connected layers [21]. In general, a model with more convolutional and fully connected layers has the capacity to handle larger datasets and capture more intricate features. However, the trade-off is an increase in training time. It is important to note that a model that is overly complex may result in overfitting, performing well on the training set but failing to generalize new, unseen data. In addition, we also tuned various other hyperparameters that affect the training process. The learning rate [22], a hyperparameter that governs the scale of parameter adjustments throughout a single training epoch, holds substantial importance. The batch size [23], a hyperparameter, dictates the volume of data processed on the GPU in a single iteration. It plays a crucial role in balancing computational efficiency and memory requirements during the training of the neural network. Moreover, we carefully considered the selection of activation functions. As each pixel value of a 2D XRD pattern is input to the CNN model, these functions empower neural networks to capture intricate non-linear relationships within these pixel values. Without activation functions, the neural network would behave as a linear model, limited to capturing linear relationships in these pixel values.

In this study, two types of activation functions, ReLU and Leaky-ReLU [24], were explored to evaluate their impact on the model’s performance. The ReLU and Leaky-ReLU function is defined as follows:

$$\begin{aligned} \text {ReLU}(x ) = \text {max}(0,x ) \end{aligned}$$
(1)
$$\begin{aligned} \text {LeakyReLU}(x ) = \text {max}( cx , x ) \end{aligned}$$
(2)

where x is the input numeric value and c is a small positive constant. We trained a total of 33 CNNs with different combinations of these hyperparameters. Each model was allocated 64 GB memory, with 32 GB from a V100 GPU [25] dedicated to training. Given that our models’ structures are generally not overly complex, each epoch requires approximately 3 minutes. We set 20 epochs for each model, resulting in roughly 1 hour of training time per model.

Results

Table 1 All 33 trained models with pure \(\upalpha\)- and \(\upbeta\)-phases, along with their hyperparameters and performance metrics

All details regarding model performance and hyperparameters employed are comprehensively outlined in Table 1. The table furnishes the following information:

  1. 1.

    Architecture of each model: Specifics encompass the number of convolutional layers (#CL), which ranges from 4 to 7. The number of fully connected layers (#FC) ranges from 1 to 3 consisting different number of nodes (128, 64, 32). The number of max-pooling layers (#MP) varies between 3 and 5.

  2. 2.

    Learning rate: The range of learning rates utilized is documented.

  3. 3.

    Batch size: Batch sizes employed during training are detailed, ranging from 8 to 32.

  4. 4.

    Activation functions: Activation functions used in each model are presented, featuring both ReLU and Leaky-ReLU. For Leaky-ReLU, the slope for negative values is also provided.

  5. 5.

    Model performance: Via the MSE observed for testing data sets.

We observed that the utilization of different activation functions yields distinct trends in the predicted results. Thus, we plotted predicted results curves for the top 5 models using ReLU and Leaky-ReLU to visualize model performance. Figure 2a and  b provides visual insights into the model performance with the ReLU activation function.  Figure 2a depicts the model’s predicted results versus the ground truth for each XRD pattern in the testing set.  Figure 2b illustrates the differences between the ground truth and predicted values, where a smaller difference signifies superior model performance. Meanwhile, Fig. 2c and  d displays the same situations but with Leaky-ReLU activation function. The pattern sequence is ordered from 0 to 1 of the \(\upbeta\)-phase fraction volume.

Fig. 2
figure 2

a and b visualize the performance of best 5 models using the ReLU activation function. In (a), the ground truth is compared to the predicted values of models for the testing dataset. b displays the differences between the ground truth and the predicted values of models for each 2D XRD pattern. The 2D XRD patterns are ordered from 0 to 1 based on their \(\upbeta\)-phase fraction volume. Similar visualizations for best 5 models using the Leaky-ReLU activation function are presented in (c) and (d)

The performance curve of all models employing the Leaky-ReLU activation function exhibits a bow-shaped curve (shown in Fig. 2c), indicating that the predicted results for the “end points”—pure \(\upbeta\)-phase or \(\upalpha\)-phase 2D XRD patterns—are highly accurate and gradually decline as they approach the central positions. On the other hand, the performance curve for models utilizing ReLU activation is a sigmoidal curve (shown in Fig. 2a). Again, the end points are accurate but an additional nearly accurate value is found near the middle of the 0.50 phase fraction position. Model #5, characterized by 5 convolution layers, 5 max-pooling layers, and 2 fully connected layers, and employing the ReLU activation function, achieves the lowest MSE among all models, registering a value of 9.4\(\times 10^{-4}\) (visualized in Fig. 3).

Fig. 3
figure 3

Model #5 generated the lowest MSE (\(9.4 \times 10^{-4}\)) in the testing data set. The top graph shows its predicted values versus ground truth, and the bottom graph displays the differences between the ground truth and the predicted values for each 2D XRD pattern

Discussion

Strategy of hyperparameter tuning

Numerous hyperparameters significantly influence the final results of the models. Effectively determining the optimal hyperparameters remains one of the most complex challenges in the AI community [9, 19, 20]. Due to resource constraints, exploring all conceivable combinations of the hyperparameters is impractical, which makes the appropriate selection of hyperparameters critical.

In our previous work [9], we trained 168 different CNN models to predict the \(\upbeta\)-phase volume fraction using experimental Ti64 XRD data sets [26]. Consequently, we selected some models that exhibited superior performance and used their architectures as our initial selected models. In Table 1, the performance of these models (#1, 2, 3, 10, 11, 12, 13, and 14) is presented. The models share some common points: they all employ \(1 \times 10^{-3}\) as the learning rate, 8 as batch size, and ReLU as the activation function. We observed that models #1, 2, and 3, each including 4 or 5 convolutional layers, with each followed by a max-pooling layer, tended to deliver better performance. To evaluate the influence of previously unexplored hyperparameters, namely learning rate and batch sizes, in the training process, we selected the architecture of model #3 as our baseline. This architecture, characterized by fewer parameters compared to models #1 and #2, resulted in reduced training time. We experimented with batch sizes of 16 and 32, and learning rates of 5\(\times 10^{-5}\), 1\(\times 10^{-4}\), 1\(\times 10^{-3}\), and 5\(\times 10^{-3}\). The results are represented as performance of models #4 through 9 in Table 1. After evaluation, model #5 achieved the top performance (shown in Fig. 3), specifically with a learning rate of 1\(\times 10^{-4}\) and a batch size of 16.

We also observed that for an input 2D XRD pattern with 0 \(\upbeta\)-phase volume fraction (pure \(\upalpha\)-phases), the output of the last layer of the CNN consists of all negative values. Since we used a ReLU activation function, these negative values were adjusted to 0 (see Equation 1) so that the final predicted value would be very close to 0, which is the ground truth of the input 2D XRD pattern. We also sought to investigate the effects of negative values, opting not to merely filter them out. To address this, we implemented the Leaky-ReLU activation function. It multiplies a small, non-zero constant c (see Equation 2) for negative input values to prevent a node from becoming inactive during training.

For models trained with Leaky-ReLU, we maintained a consistent learning rate of 1\(\times 10^{-4}\) and a batch size of 16 (same as the setting used for model #5). We initially explored architectures with a lower number of max-pooling and convolutional layers, progressively increasing them. We noted model #20 demonstrates good performance (a MSE of 6.4\(\times 10^{-3}\)) when employing 5 convolutional layers and 4 max-pooling layers. The architecture of model #20 closely resembled the optimal model (#5) trained with ReLU. Building on this, we further refined both architectures: one with 5 fully connected layers and 5 max-pooling layers, and the other with 5 fully connected layers and 4 max-pooling layers. We trained models #20 to 27, experimenting with different fully connected layer configurations and Leaky-ReLU constant c (0.1, 0.01, and 0.005). Among these, models #21, 22, 25, and 27 demonstrated improved results (with smaller MSE than 6.4\(\times 10^{-4}\)) compared to the original model (#20).

Performance of CNN models in analyzing 2D XRD patterns

Current research on deep learning tends to treat neural networks as ’black boxes.’ While achieving good results is crucial, understanding what a CNN specifically learns from 2D XRD patterns is equally important. Upon observing the feature maps following the convolutional layers, we found that these layers extract ’rings’ from the input 2D XRD patterns [9, 27]. Then, the fully connected layers will ’learn’ the significance of each pixel position within these extracted rings by adjusting the associated weights. During the learning process, the model refines these weights based on the intensity of the rings. To generate the final output, the model aggregates the weights multiplied by the intensity values of these rings. For example, when the model is trained with labels of \(\upbeta\)-phase fractions, the weights corresponding to \(\upbeta\) rings will be notably high. As a result, if there is significant intensity in these regions, the model will produce a higher output value.

A noteworthy point is the potential overlap of \(\upalpha\) and \(\upbeta\) rings in the 2D XRD patterns. We hypothesize that such overlapping rings might not adversely affect CNN’s performance. This is because during training, the model allocates lower weights to the overlapping regions. Therefore, even if the intensity in these overlapping areas is high, the contribution to the final output, such as the \(\upbeta\)-phase fraction, remains minimal due to the smaller weights assigned. However, it is important to highlight that our dataset only comprises \(\upalpha\)- and \(\upbeta\)-phases, resulting in limited ring overlap scenarios. In cases where there is extensive ring overlap between two phases, we anticipate that it might introduce more dispersion in the model’s predictions.

In our cases, Fig. 3 shows the performance of model #5 in this task. The model clearly distinguishes 2D XRD patterns with different \(\upbeta\)-phase volume fractions ranging from 0 to 1. Model #5 achieved an MSE of \(9.4 \times 10^{-4}\). This result underscores the model’s exceptional accuracy, with an average error of approximately \(\pm 0.03\) for each XRD pattern compared with its ground truth. Other models depicted in Fig. 2 also exhibit similar performance, indicating that the performance of model #5 is not accidental. Therefore, compared to our previous work [9], we assert that it is feasible to train effective models exclusively with pure \(\upalpha\)- and \(\upbeta\)-phases. The trained model can then be utilized to predict 2D XRD patterns with mixed phases.

We also believe that exploring more possible cases of hyperparameters could lead to the discovery of a better model [17, 18, 21]. Throughout the current tuning process, we infer optimal bounding ranges for certain hyperparameters relevant to this task:

  • The optimal number of layers for convolutional and max-pooling should fall within the range of 4 to 5, while fully connected layers can be limited to 1 or 2.

  • Optimal learning rates for training are observed at \(1\times 10^{-4}\) or \(1\times 10^{-3}\), while batch sizes of 8 or 16 are recommended.

  • The ReLU curve exhibits a sigmoidal curve, while the Leaky-ReLU curve takes on a bow-shaped form. We consider this dissimilarity to the fact that Leaky-ReLU does not completely discard negative values, leading to smaller predictive outcomes. We also observe that models exhibiting superior performance with Leaky-ReLU often employ a relatively small constant c value. When this value approaches 0, its behavior becomes increasingly similar to that of ReLU. As depicted in Fig. 2, regardless of the activation function employed, the models can effectively discern different \(\upbeta\)-phases volume fractions in XRDs, even when there is some divergence in their predictions. Therefore, for this specific task, we consider ReLU to remain the optimal activation function.

Conclusion

In this study, we employed simulated 2D XRD patterns to ensure accurate ground truth for the \(\upbeta\)-phases volume fraction and utilized them for training CNN models. We expanded on our previous work by exploring additional hyperparameters and demonstrated their impact on model performance. Through our hyperparameter tuning strategy, we successfully showed that multiple models produce close predictive results compared to the ground truth. These models, even when trained solely on pure-phase data, exhibit the ability to predict 2D XRD patterns with mixed phases. Generating pure-phase data is easily achievable in real-world scenarios, and our work highlights the potential of using pure phases XRD data to train models, enhancing accuracy while conserving resources. This also underscores the feasibility of deep-learning techniques in XRD pattern analysis and provides a robust approach for future material research endeavors.