Introduction

In recent years, deep convolutional neural networks (CNNs) have led to prominent advances in a variety of fields relying on or now exploring the use of Big Data, such as physics,1 medicine,2 civil engineering,3,4 and materials science.5,6 These new approaches allow the exploration of an enormous number of new possibilities. In materials science, and specifically microscopy, the implementation of visual feature extraction7 and different image classification tasks8,9 is particularly popular and has led to significant progress.

However, many of the commonly used neural networks are very large in the sense that they require training and adjustment of millions of parameters to be able to perform their respective tasks. This makes them vulnerable to many detrimental effects that impact on their performance. One of the largest risks in using deep neural networks is overfitting,10,11 where the neural network does not perform as well on unseen (test) data as on previously seen (training) data. Further key aspects are the lack of generalizability,11 where the network does not perform well on data whose properties do not closely match those of the training data. Also, difficulties can arise when dealing with unbalanced classification tasks, where some of the categories between which the network is trained to discriminate are much more prevalent than others in the training dataset.10

To alleviate these problems, several solutions and strategies have been proposed and employed in the literature, such as dropout regularization,12 batch normalization,13,14 transfer learning,15 and pretraining,16 which are based on model architecture manipulation. Data augmentation, on the other hand, is based on the idea of extracting more information from the original dataset while maintaining the same model configuration.17

Previous work by the authors has shown that CNNs can support research in materials science by providing an automated pathway for the analysis of physical mechanisms of damage nucleation in steel.6 To this end, a first set of CNNs was trained to classify the damage sites observed in a high-resolution panoramic image acquired by scanning electron microscopy. That work highlighted that a CNN can be trained successfully for this task based on only a few thousand images, which can be readily labeled by a human researcher.6

The main focus of the current work is to improve the generalizability of the neural network developed in Ref. 6. In our search for the underlying damage mechanisms in forming of steel or deformation during application, it is important to recognize, describe, and model the effect of variations in the strain path on the mechanisms, location, and amount of damage induced by deformation. Here, we aim to provide a data augmentation strategy to transfer an existing classification network originally formulated based on uniaxial straining experiments to images recorded during biaxial straining. This will form the basis for any flexible and transferable use of such networks for as complete a range of straining conditions as possible, ultimately to include industrial processing. Furthermore, we also investigate here how to address overtraining with a limited dataset and the inherent imbalance of classes with different damage mechanisms occurring at different frequencies in more detail.

Damage Analysis of Dual-Phase Steel by Deep Learning

In materials engineering, dual-phase (DP) steels are a class of advanced alloys that are frequently used in automotive applications due to their good specific mechanical properties and ductility allowing lightweight design.18 However, their damage behavior during processing of components and deformation is not yet fully understood, and their heterogeneity at different length scales (internal structure of martensite islands,19 variable phase and grain size,20 and banding of martensite at the sheet level21) impedes full characterization. However, elucidation of the damage behavior of finely structured DP steels requires detailed investigation of many damage sites at high resolution over large areas, which favors automatic statistical analysis rather than extrapolation of comparatively few sites analyzed manually.

To this end, we developed a deep-learning-based algorithm to classify the damage sites detected in the microstructure across large areas and simultaneously high resolution using panoramic SEM images.6 Using this approach, damage sites imaged at different strains have been successfully classified with respect to the underlying damage nucleation mechanism.6,22,23 More specifically, this involved two CNNs designed to first separate inclusions in the material and then further categorize the remaining, deformation-induced damage sites present after forming or mechanical testing. Although this approach allows us to automate the analysis of the microstructure across large regions of the deformed microstructure on the order of 1 mm2, significant challenges remain. The previous work was carried out on uniaxial test specimens and also applied to the outer (tensile) fiber of bending specimens.24 Ideally, we would like to apply the same method directly to samples with a different deformation history and hence damage morphology. However, this desired versatility is directly related to the images available in the training dataset. If the networks have never been exposed to images showing, for example, damage patterns in biaxial stress in addition to uniaxial stress during training, it cannot necessarily be expected that the network can identify these patterns reliably. Ideally, one would be able to obtain a large sample of all types of patterns that should be identified in the later analysis. However, in practice obtaining a labeled training dataset as “ground truth” is very time intensive and costly, as it must be done manually. In the case of the analysis of DP steel, the initial sample was obtained6 by manually analyzing samples exposed to uniaxial stress and creating labeled images for thousands of identified damage sites. The resulting dataset is limited in two important aspects: Since the original samples were only exposed to uniaxial stress, damage patterns that occur in samples exposed to biaxial stress are not present and hence cannot be learned by the network. Furthermore, large convolutional neural networks show better performance in general if the training dataset is much larger.

Here, we therefore aimed to investigate the degree to which the existing networks obtained from damage analysis in uniaxially strained samples (our previous approach) can be transferred to those deformed in biaxial tension (our current issue). To this end, we investigated whether this (1) requires a large number of additional labeling steps and therefore manual labor, or (2) might be achieved by data augmentation and other strategies to attain greater flexibility of the network. Ultimately, only the latter will allow generalization to the complex strain paths encountered in metal forming (our future goal) and would also provide a much wider scope for the use of neural networks and their actual potential to reduce labor-intensive tasks or allow statistical analysis of microscopy data on a regular basis. We therefore focus here on the above strategies to improve classification performance, but as an example of the scientific application, we also use the newly developed configurations to analyze and compare the distribution of martensite crack angles under uniaxial versus biaxial loading conditions.

Data Augmentation for Image Analysis by Neural Networks

Data augmentation is a general technique to improve both the generalizability and convergence of neural networks,10,17 enhancing the training and convergence of the neural networks,17 since it tackles the qualitative and quantitative limitations of the employed data.10,25,26 Data augmentation techniques can be loosely grouped into two categories: The more traditional approach17,27 is based on methods such as image rotation, shearing, cropping, translation, color transformation, etc., whereas generative adversarial networks (GANs)10,25,28 show great promise to generate “fake” images.

Using the data generated by both methods can increase the data in terms of quantity, quality, and variety.29 In this work, we investigate whether our method can be expanded directly by data augmentation to increase the generalizability of the automated analysis with respect to different strain paths. Ideally, this would minimize any need to manually acquire and label a large amount of new ground-truth data; For example, to assess the damage induced during the forming of dual-phase steels in a general manner, deformation under different straining conditions, such as uniaxial or biaxial tension, has to be included. Furthermore, augmentation of the microscopy data is expected to lead to an improved performance benchmark of damage site classification. This again reduces the need for manual interventions, aiding the opportunity to achieve statistical analysis of a large number of sites within a given microstructure deformed under variable conditions.

Here, we consider in particular the effect of a change from uniaxial to biaxial straining conditions for the same DP800 steel. We are interested then in (1) the performance of the original network trained on a uniaxially strained sample under these new conditions, (2) how and by which method of data augmentation an improvement in performance might be achieved on both uniaxial and biaxial data, and (3) whether manual labeling of additional ground-truth data is beneficial, and ultimately (4) which changes in damage behavior are accessible as a result of the analysis of micrographs acquired after the different strain paths.

Materials and Methods

Experimental Methods

The material used in this study is commercial DP800 dual-phase steel (ThyssenKrupp Steel Europe AG). As such, it consists of a hard martensite and a softer ferrite phase. The investigated sheets had a thickness of 1.5 mm. All samples were cut by wire erosion, and samples were deformed with the rolling direction in uniaxial tension and along and perpendicular to the rolling direction in biaxial tension. The samples were subsequently prepared by grinding to 4000 grit sandpaper, followed by mechanical polishing with 6-µm, 3-µm, and 1-µm diamond suspension and light etching in 1% Nital for 5 s to obtain a visible microstructure through a topographic step formed between ferrite and martensite due to preferential etching of the ferrite phase. The samples were deformed in a microtensile stage that is capable of processing both uniaxial and biaxial tensile tests (Proxima 100; MicroMecha SAS, France). The gauge length of the tensile samples was 3.65 mm, with a square cross-section of 1.5 mm. The geometry of the cruciform biaxial samples included a spherical notch on the underside of the samples to ensure homogeneous biaxial deformation in the center of the sample. Panoramic images taken from the deformed and metallographically prepared samples were acquired using a field-emission scanning electron microscope (LEO 1530; Carl Zeiss, Germany). The spatial resolution of the obtained images was 32.5 nm/pixel.

Figure 1 shows a selection of micrographs containing martensite cracks imaged after uniaxial and biaxial straining. While in uniaxial tension all martensite cracks are aligned approximately perpendicular to the stress axis, the martensite cracks formed under biaxial stress exhibit a range of orientations.

Fig. 1
figure 1

Comparison between sample geometry and martensite crack orientation appearing in the sample after (a) biaxial and (b) uniaxial tensile testing.

The original training dataset acquired as part of the work described in Ref. 6. contained only damage sites induced in a uniaxial tensile test and therefore predominantly cracks oriented vertically in the images. As might be expected, martensite cracks inclined towards the horizontal were not correctly classified by the original network, necessitating the data augmentation strategies presented in the following.

Damage Analysis Method Design and Implementation

Data Preprocessing and Augmentation

The sequence of steps applied to analyze the electron micrographs is as follows: As a first step, potential damage sites are identified.6 This is done by defining a threshold as a cutoff value that is based on the grayscale values of the pixels in the image. After this threshold is applied, we keep the darkest parts of the micrograph and apply a clustering algorithm (DBSCAN, Density-Based Spatial Clustering of Applications with Noise30) implemented in the scikit-learn library.31 Fundamentally, clustering by this algorithm is based on grouping of points with the closest Euclidean distance from each other, depending on a defined density of points.30 Thus, noise from a single or a few pixels that are below the grayscale cutoff value is filtered out from the actual damage sites.

Around 6000 damage site images have been collected in 250 × 250 pixel window sizes. We originally considered four different categories of damage: inclusion, martensite–ferrite interface decohesion, brittle martensitic cracks, and notches.6 The philosophy behind the definition of each of them can be found elsewhere.6,22,23 However, experimentally, we found that shadows that originate from the imaging process in the electron microscope introduce a systematic uncertainty in the network performance. This is because the networks rely on the imaging information alone and cannot include further domain knowledge that would allow them to distinguish between the damage mechanism and imaging artifacts that closely resemble actual damage. The shadowing sites are formed due to the surface topography during imaging with secondary electrons. Since the martensite islands are higher than the matrix, they can cause shadows that appear as dark as the damage sites and large enough in size not to be filtered out by the clustering algorithm (some examples of such sites can be found in Fig. 5). We therefore added “shadows” as a fifth category to improve the overall performance, reducing misclassifications of shadows as, for example, martensite–ferrite interface decohesion. This also reduces a potential bias in counting the relative fraction of damage types as a function of the applied stress.

Furthermore, the classes in the training data are naturally imbalanced, meaning that the classes do not have the same number of elements. This is mainly due to two reasons: From a materials science perspective, the prevalent damage mechanisms change with the amount of strain applied to the sample.6,32 In fact, the relative occurrence of damage mechanisms, or classes to the neural network, is integral to understanding the properties of the material as various levels of strain are applied. We therefore need to augment the data such that each category has the same number of entries such that the network cannot learn an a priori probability of a specific damage category.

To approach these issues, we use several augmentation techniques. These have only been applied on the training dataset after splitting the data into training and test data explicitly, avoiding any overlap between the training and test dataset. To increase the robustness against the orientation of damage sites, we apply random rotations in the range from 0° to 180° to the training dataset. This is particularly important for the case of martensite cracks that appear mostly vertical (perpendicular to the tensile direction) in uniaxial tests but change inclination in a biaxial stress state (see “Results” section). We also shear the image by applying a random smooth distortion within the range from – 16° to + 16°. By shearing, a horizontal parallelogram-like transformation is applied to the image, where the image rectangle is rotated and deformed to its corresponding parallelogram. This mimics damage sites that are near a shear band. To make the network more robust with respect to images that are not sharply focused due to incorrect readjustment of imaging conditions during automated image acquisition, we apply a Gaussian blur to some images. Finally, we remove a small number of pixels randomly from some training images to simulate random image acquisition failures by augmenter dropout with a value of 0.2. Each of these operations is demonstrated in Fig. 2b–e. We use the mechanisms provided by tensorflow.data from the TensorFlow 2.1 data library33 to read and process the data. During this step, a random horizontal or vertical flip, or both, was also applied. The remaining augmentation steps were performed using imgaug.34 So far, our attempts to synthesize data by deep convolutional generative adversarial networks (DCGAN)35 and conditional GANs (cGAN),36 as a promising alternative, has not led to promising suitable results, since they did not mimic the expected features, such as martensite cracks lying within a distinctive martensite island.

Fig. 2
figure 2

Demonstration of the applied classical augmentations (a) original image, (b) random rotation, (c) random pixel dropout, (d) Gaussian blur, and (e) random shear. Top row: image section for augmentation. Red box: training data (Color figure online).

Model Optimization

Although augmenting the training dataset as described in “Data Preprocessing and Augmentation” section already improves both the generalizability and robustness of the neural networks, we investigate further approaches to prevent overfitting. In the two-stage approach used in our method, the first network that classifies the damage sites into two basic categories, as inclusion or not inclusion, does not overfit during the training. However, the second network that classifies the non-inclusion damage sites into four categories, viz. martensite cracks, interface decohesion, notch, and shadowing, overfits after some iterations, as measured by comparing the training error with the corresponding number obtained from an independent test sample.

A popular technique to reduce overfitting is dropout.12 This approach was originally developed for fully connected neural networks. However, using dropout in convolutional neural networks often does not result in significant improvements and can even lead to negative effects.37 We also found no improvement in our case. Instead, we penalize large network weights, favoring a small network by applying an L2 regularization to the whole network, using a regularization constant of 0.001.

Derivation of Martensite Crack Orientation Variation with Strain State

To quantify the strain dependence of damage in the microstructure, we designed a method to obtain the geometric change in damage morphology of martensitic cracks with strain state. Among all microstructural features prone to failure (damage sites), martensitic islands appear to mimic the stress/strain state in the material by alternating the inclinations of their cracks under different strain paths, which is not as immediately obvious for the other damage mechanisms. Therefore, we chose here the statistics of martensite crack angles as an example of how damage identification and classification can make subsequent statistical analyses accessible to grant insights into material behavior and underlying mechanisms.

To calculate the inclination angle of each martensite crack, we follow the procedure described below. In most cases, martensite cracks resemble elongated ellipses (as opposed to curved shapes). We therefore detect straight lines on the crack circumference and calculate their inclination angles as a simple representation of the overall crack inclination.

As a first step, we concentrate on the images that have been identified as a martensite crack (see Fig. 3a as an example). To make the subsequent analysis easier, we remove the remaining noise (indicated by the red arrows) by applying a color thresholding filter followed by a clustering step using DBSCAN (Fig. 3b). After this step, the martensite damage remains as the only feature in the image (Fig. 3c). The approximate elliptical edge of the martensite crack is then identified using Canny edge detection.38

Fig. 3
figure 3

Martensite crack angle calculation flow: (a) detection of crack, (b) example of noise due to shadowing after thresholding, (c) filtered crack by application of DBSCAN, (d) detection of crack circumference by Canny edge detection, (e) Gaussian blurred edge, and (f) example of lines detected on crack edge overlaid on micrograph.

Finally, we apply a Hough line transform to the image. We use the implementation in Ref. 39. This method was originally proposed for computationally efficient detection of lines in an image using radius–angle (r, θ) pairs.40 There are almost always imperfections from the previous edge detection step. Therefore, the most relevant lines that correspond to the major axis of a crack for the calculation of the inclination are identified by a voting procedure in which higher votes correspond to longer lengths.39

Analysis Pipeline

Our final automated analysis of electron micrographs consists of the following five successive steps:

  1. (i)

    Detection by DBSCAN clustering.

  2. (ii)

    Cropping All damage site candidates are cropped out of the original panoramic electron micrograph into separate images of fixed size.

  3. (iii)

    Identification of inclusions The transformation-induced damage sites (here inclusions) are separated from the rest by the first network.

  4. (iv)

    Identification of damage mechanisms The second network classifies all remaining damage site candidates into three classes of deformation-induced damage as well as shadowing artifacts. During the last two steps, the classification is performed if the probability for each decision exceeds a predefined threshold (0.7); otherwise the detected site is labeled as not classified.

  5. (v)

    Analysis Calculation of quantitative damage parameters (here the martensite crack angle)

Technical Configuration

All the calculations, i.e., damage site detection, training and testing of the network, and calculation of the numerical results, were carried out on a workstation equipped with an Intel Core i9-9900 K CPU, an NVIDIA GeForce RTX 2070 GPU, 32 GB of memory, and 1 TB of storage. Under this system configuration, training of the first network took around 5 h and training of the second network took just under 3 h.

Results

Classification Results

The first assessment of network performance after including shadowing as a new class and applying data augmentation as well as regularization was based on a comparison between the initial and new configuration on unseen micrographs from a sample deformed by uniaxial tension. The confusion matrices for both configurations are shown in Fig. 4. The confusion matrix shows the predicted class, i.e., inclusions (IN), shadows (SH), martensite cracks (MC), notch effect (NE), and interface decohesion (ID), on the x-axis and the true class, obtained from the labeled test data, on the y-axis. The new configuration shows better performance, especially regarding the erroneous classification of interface decohesion events.

Fig. 4
figure 4

(a) Examples of each damage category: inclusion (IN), martensite crack (MC), notch effect (NE), and interface decohesion (ID). Shadowing (SH) is not shown here (see Fig. 5). Confusion matrices for damage mechanism classification on uniaxial data by the (b) original configuration of the networks and (c) new configuration after the improvements applied in this work.

Fig. 5
figure 5

Examples of damage site classification of the extended network including shadowing sites and inclined martensite cracks.

As a benchmark for measuring network performance, we first tested both networks. The new configuration contained uniaxial data with data augmentation as described in “Data Preprocessing and Augmentation” section and the model improvements described in “Model Optimization”. The test data consisted of 1750 images that had not been used during the training of the network, corresponding to 7% of the total augmented data. We measured the following accuracy of the neural networks: The first network discriminating between inclusions and all other classes achieves an accuracy of 97%, and the second network used to identify the remaining deformation-induced damage mechanisms and shadowing effects reaches a final validation accuracy of 81 ± 0.3% (see below for further detail on the fluctuation in the results for the new configuration). In comparison, the original configuration of the network achieved an accuracy of 80 ± 6%.6

The two networks were then applied to analyze new test datasets with micrographs of DP steel samples that had been subjected to biaxial stress. As an example, Fig. 5 illustrates the classification results overlaid on micrograph images. Note in particular with respect to individual aspects of classification performance:

  • New class “shadows” The algorithm can correctly identify shadows that resemble damage sites. This is particularly important, as these artifacts introduce a bias in the subsequent analysis and their removal is therefore important. In particular, the network was able to learn how to discriminate between shadows and interface decohesion.

  • Influence of stress state/martensite crack orientation Due to the application of biaxial stress, martensite cracks were found with apparently unchanged morphology but at different angles compared with the predominantly vertical cracks in the original training dataset from uniaxial straining. These cracks are also identified correctly using the augmented data.

  • The remaining damage types (inclusions, interface decohesion, and notch) are classified similarly to our previous work using the same network but non-augmented data from uniaxial tensile tests.6

The confusion matrices shown in Fig. 6 highlight the improvement in the classification performance on image data from a biaxial straining experiment achieved by the developments described in this work, for three different datasets (test data 1–3) containing experimental micrographs from biaxial straining experiments depicted in different rows.

Fig. 6
figure 6

True class versus reconstructed class visualized as confusion matrices for training datasets including (a) only raw images and the original configuration, and (b) raw and augmented images. Images containing artifacts in the form of shadows are included explicitly as part of the test data, but their prediction was not included in the old configuration, indicated by the hatched background in the vertical column “SH” in (a). Small values in (b) represent one standard deviation; more detail on their calculation is given in the text.

Figure 6a shows the starting point for this work. In this case, the networks were trained with the original configuration without augmentation and without the new class “shadows”, i.e., using only the raw dataset obtained from the sample exposed to uniaxial stress. In Fig. 6b, the confusion matrices for the use of all five classes and also augmented data from uniaxial straining experiments as training data are shown. In this case, the performance on the unseen biaxial test data improves considerably, and the confusion matrices are closer to a diagonal form.

Discussion

Performance of Extended Network with Augmented Training Data

In this work, we used several augmentation techniques to extend the available training data to achieve a transfer of the original networks trained on images from uniaxial tensile tests to the biaxial loading case. Furthermore, we added shadowing sites as an additional category for classification to avoid a contribution of this common artifact in the experimental data to potential bias. We used L2 regularization to further improve the performance of the second network and prevent overtraining. As a result, the performance of the new configuration on uniaxial data, compared with the original setup, has much improved. Here, the aim is to also classify damage mechanisms on samples that had been deformed under biaxial tension. When applying the networks on biaxial data, the new network with augmented and regularized training data from uniaxial straining experiments now shows similar performance on the new biaxial data to that achieved before by the original configuration on data from uniaxial tensile tests only. The overall accuracy remains almost unchanged between the old and new configuration quantitatively, but we see a notable improvement when looking at the more detailed evaluation in the confusion matrices.

As a major benefit of using augmented data, the networks generalize well and can be used to analyze samples that have been subjected to biaxial stress even though the original dataset was obtained from a sample that has been exposed to uniaxial stress only. This means that using augmented images alone is sufficient to improve the generalizability of the network. Time-consuming labeling of new training data is therefore not required in this case, and the preparation of a (much smaller) labeled test dataset to confirm transferability would be sufficient initially for future use of the networks on materials strained under different conditions. Therefore, we gain a significant improvement in the flexibility of using this technique in a wider range of application scenarios without having to resort to the time-intensive and costly approach of labeling ground-truth data for each use case.

Although there are many augmentation strategies available in literature as outlined in the “Introduction,” the augmentation strategies employed here have in common that they protect the data from an unintended loss of their nature. In the case of deformation-induced damage, there are determinative features for each damage class which must be conserved and whose alteration would often also prevent a human investigator from correctly classifying a damage site. For this reason, we did not apply any augmentation methods that would manipulate the color or contrast or arrangement related to the martensite or ferrite phase (ferrite appears almost always darker than martensite islands) and damage sites (which are always black).

Using two different approaches, our first exploration of employing GANs did not yield usable data within the scope of this work. However, more advanced strategies might reasonably be expected to alleviate the encountered difficulty, for example, by generating new data using a reverse segmentation approach. Namely, data similar to an SEM image (discriminator) may be regenerated from a segmented microstructure (generator).

Application to Analysis of Stress-State-Dependent Damage

In the light of this successful transfer of the networks to use with biaxial data, the analysis pipeline can now be used to investigate differences in damage behavior between the two stress states. Here, we take an analysis of the martensite crack orientation as an example.

The distribution of inclination angles of martensite cracks after uniaxial tension peaks towards an angle of 90° to the tensile axis (Fig. 7a), as shown in the example micrographs in Fig. 1. In contrast, the corresponding distribution obtained from a sample exposed to biaxial stress shows a uniform distribution of crack angles. This outcome agrees well with expectations based on the influence of stress state on mode I cracking in martensite. In the uniaxial case, the internal structure of the martensite islands19 and alteration of the global stress state at the local level of interacting ferrite and martensite phases40 may lead to deviations from a purely perpendicular crack path in uniaxial tension. The observed preference for fracture perpendicular to the tensile axis but including a deviation of a few tens of degrees is therefore in agreement with expectations in the investigated dual-phase steel. Similarly, the diminished orientation dependence in biaxial tension is consistent with the absence of a dominant stress axis. The result obtained here suggests that there is also no or little anisotropy in the average fracture stress in rolling or transverse direction, as this should lead to the preference of specific angles even in the presence of a perfectly biaxial stress state. Whether slight preferences for 0°, 45°, 90°, or other orientations may in fact be present can now be analyzed efficiently using the presented analysis pipeline for future experimental campaigns in biaxial tension.

Fig. 7
figure 7

Comparison between statistics of martensite crack angles after (a) uniaxial tension and (b) biaxial tension.

Conclusion

We investigated the use of data augmentation to enable the transfer of a previously developed damage classification pipeline for micrographs of samples subjected to uniaxial tension to material strained under different conditions, namely biaxial stress. We conclude that:

  • Data augmentation as a data-space solution plays a significant role in achieving the desired network invariance and robustness.

  • The inclusion of additional classification categories for common imaging artifacts (here shadowing) further improves network performance with respect to the physical mechanisms to be studied.

  • The orientation of martensite cracks depends strongly on the stress state, with uniaxial stress inducing cracks predominantly perpendicular to the stress axis and biaxial loading leading to a random distribution of crack orientation.

The use of neural networks for damage classification is therefore not only an efficient tool for the analysis of recurring deformation conditions but can achieve sufficient transferability for use with other strain paths (here uniaxial to biaxial) without the need for additional labor-intensive labeling of image datasets for training.