Performance evaluation of CNN and R-CNN based line by line analysis algorithms for fibre placement defect classification

The Automated Fibre Placement process is commonly used in aerospace for the manufacturing of structural components, but requires a subsequent inspection to meet the corresponding safety requirements. In order to improve this mostly manual inspection step, machine learning methods for the interpretation of 2D surface images are being increasingly utilised in research. Depending on the manufacturing process, a very long time can elapse between the appearance of a manufacturing defect and its recognition. Hence, in this paper Convolutional and Recurrent Neural Network techniques are presented that allow a line-by-line analysis of the incoming height profile scans of a Laser Line Scan Sensor as a 1D signal, which enables a direct reaction to a defect, even if only one or a few individual height profiles of the defect have been recorded. The combination of Convolutional and Recurrent Neural Network structures is particularly beneficial for this application. The investigations in this paper are especially interesting for developers of automated inspection systems in composite engineering.


Introduction
Lightweight structures are widely used in the aerospace industry, for instance in the Airbus A350 XWB or the Boeing 787 [1,2]. The Carbon Fibre Reinforced Plas-tic (CFRP) material often have a greater stiffness and strength than metallic materials, which makes them ideal for lightweight structures. The production of these usually complex structures is often quite expensive, which is why fast and efficient manufacturing techniques are essential to realise an economical production. To ensure the quality of the components, an additional visual inspection is carried out in the aerospace industry after manufacturing.
This visual inspection often takes up to 50% [3] of the overall manufacturing time and it is also often very difficult to achieve the required inspection quality, which offers great potential for improvement.
An important step in automated inspection is the reliable classification of manufacturing defects from sensor data [4,5]. Machine learning methods are particularly well-suited for this purpose, although currently mostly the entire 2D images of a material surface are evaluated [6,7]. This approach involves two challenges: Firstly, a large number of full training images are required to train modern Neural Network based classifiers [8][9][10], and secondly, a full image of the defect must be recorded before a classification and subsequent system response can be performed [11][12][13]. Thus, depending on the size of the captured image and the speed of the manufacturing process, considerable time can elapse between the appearance of a manufacturing defect and its classification. To address this issue, it has been investigated approaches to classify fibre layup defects from typical Automated Fibre Placement (AFP) processes through line-by-line interpretation of the input data. On the one hand, this offers the potential to reduce the time between the defect appearance and classification, and on the other hand, it offers the possibility to provide significantly more training data.
For the investigations in this study, the application case of AFP manufacturing is considered, as this is a widely used process in industry and thus the transferability of the research results is ensured [14][15][16][17]. A Laser Line Scan 1 3 Sensor (LLSS) is often applied for inline inspection in this process, which is why data from such a sensor system is used in this research [11,15,18,19]. This sensor type projects a laser line onto a surface and calculates the surface topology line by line from the reflected beam. [20] The research question addressed in this publication is: Which Artificial Neural Network (ANN) architectures are suitable to perform a line-by-line interpretation of LLSS height profile scans with respect to AFP fibre placement defects?
The structure of this paper involves a description of the state of the art and research, followed by the development of suitable ANN architectures. Then the performance of the various ANN designs are examined for real and synthetically generated data of fibre layup defects.
For the fabrication of more complex lightweight components, the AFP process is often used, which lays down narrow strips of fibre material along a given path [17,24,25]. This involves an effector guiding the fibre material to the mould's surface where it is heated up and pressed onto the mould or layer underneath [21,24]. Rudberg [26] expects an increasing application of this flexible manufacturing process in the future.
During AFP, various defects can arise which reduce the mechanical properties of a component and are therefore undesirable [17,27].Typical defect types from the literature are wrinkles, twists, foreign bodies, overlaps and gaps, which are visualised in Fig. 2.

Composite inspection and data
The monitoring of AFP processes is being increasingly investigated in industry and research. InFactory Solutions [15], Electroimpact [16,28], Danobat Composites [29] and Profactor [30] use LLSS for AFP monitoring, which captures the 3D topology of the deposited composite material.
Sacco et al. [18,31] studied the Convolutional Neural Network (CNN) based segmentation of defects from LLSS depth images of AFP layup defects. They utilised 800 × 800 pixel LLSS depth images for training their fully linked CNN. Zambal et al. [10,32] presented an end-to-end deep learning approach for AFP defect segmentation under consideration of synthesised training images. For this purpose, they used a U-net CNN introduced by Ronneberger et al. [33] in 2015 along with realistic validation images from a LLSS.
In previous research Meister et al. addresses the segmentation of fibre layup defects in full scan images [7,11], the synthetic generation of training images for the AFP inspection [12] as well as the robust and traceable classification of such defects [4,5,34]. For the investigations in this paper, the real defect images and synthetic data from the previous papers [7,11,12] are utilised. For recording the real scan images, manually produced defect samples are scanned with a LLSS. The defect samples are made of 1/4" prepreg tows, with one or more defects located on a single defect sample. For the scanning process, the LLSS is attached to an articulated robot, which moves it parallel to the composites surface at a speed of 200 mm/s. The image data recorded in this way is then smoothed using the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm. Subsequently, the synthetic defect images are generated on the basis of the real defects via a Deep Convolutional Generative Adversarial Network (DCGAN) algorithm in order to obtain a significantly larger data base. The studies from Meister et al. [7,11] describe in detail the acquisition and pre-processing of the real defect images, whereas the paper [12] outlines the synthesis of the synthetic defect images additionally applied in this study. Randomly selected images from the real defect  Fig. 3 and from the synthetic dataset in Fig. 4. In addition to CNNs, Recurrent Neural Network (RNN) architectures are often applied to evaluate time series data, for instance to predict faults or identify changes in sequential data [35][36][37].Such techniques are explained in more detail in the following section.

Recurrent neural networks
RNNs are ANNs which are dedicated to process sequential data. It processes a sequence of data vectors t = ( 1 , 2 , … , T ) with time step t, where in reality the RNN works on individual mini-batches [38].In actual operation, gated RNNs such as Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU) are frequently used. A fundamental idea of LSTM is the ability to forget an older state of an entry by so-called self-loops. This is implemented by internal recurrence in individual LSTM cells. Essential components of these LSTM cells are the state unit t,i and the forget gate t,i which adjusts the self-loop weights and can be formulated as [35,38,39]: where t is the input data and t the hidden layer vector with all outputs of the individual LSTM cells. Moreover W f are recurrent weights, U f the input weights and b f represent respective biases.
The internal LSTM long-term state t,i is updated with the conditional self-loop weight t,i as follows [35,38,39]: with t,i as external input gate unit which is calculated analogous to the forget gate in Equation 1. The output t,i of a hidden LSTM cell, which represents the short-term state, can also be switched off via the output gate t,i using a sigmoid function, which can be expressed as [35,38,39]: t,i is again calculated analogous to the forget gate from Eq. 1 with it's own parameters W 0 , U 0 and b f .
The other very popular method is GRU, as stated above. The main difference to LSTM is, that GRU has a single gating unit that controls both the forgetting factor as well as the update decision and there is no distinction between short-and long-term state: Randomly chosen real defect depth maps captured by a LLSS for different classes, each having the dimension of 128 × 128 px from the previous paper Meister et al. [11,12] Fig. 4 Randomly chosen synthetic defect depth maps for different classes, each having the dimension of 128 × 128 px from the previous paper Meister et al. [12] with i representing the update gate and t−1,j the reset gate [38,40].

Experimental data and processing
For the investigations, suitable defect types were selected with reference to Sect. 2.1. Accordingly, flawless areas, wrinkles, twists, foreign bodies, gaps and overlaps are examined. These defect types have already been used in related studies by Oromiehie et al. [17], Harik et al. [27] as well as Heinecke and Willberg [41] and are therefore a suitable choice for the investigations in this paper. As mentioned above, the basis for this paper was the real and synthetic data from previous studies [7,11,12] which are illustrated in Figs. 3 and 4. The number and distribution of the available data is summarised in Table 1 once again. As explained in Meister et al. [12], the synthetic data was generated using a conditional DCGAN. The applied DCGAN consists of a generator with six convolutional layers and a discriminator with five convolutional layers. The processing was performed on a machine with Intel Xeon Gold 5122 @ 3.60 GHz CPU, 48 GB RAM and a NVIDIA Quadro P6000 GPU.

Analysis concept of images
The idea behind this research was to analyse individual image rows in order to classify a fibre placement defect. Thus, instead of examining an entire image, as often described in the related research from Sect. 2.2, only a single image row was considered. The signal that was derived from each individual image row represents the grey scale values along the pixels of that respective row. This is illustrated in Fig. 5 using a twist defect as an example. On the left side in Fig. 5 a full defect image is displayed, from which in this paper only the individual signals per row were used for processing, as presented on the right side in Fig. 5b.
The idea behind was, based on the modelling from Meister et al. [20], that a single height profile of the LLSS, which represents the height profile of the surface along the laser line at a certain point in time, can also be understood as a sequence of point by point scans of the defect's height profile. Therefore, the laser line, which was recorded at a given

Neural network setups
In accordance with the analysis idea outlined in Sect. 3 Table 2.
The first network architecture to be used as the 1D CNN is given in Table 3, where the configurations and dimensions of the individual layers are indicated. The column 'output shape' indicates the size of the output data after each individual layer, where 'None' in this case means that the dimension is variable, depending on the configuration of the Neural Network. Table 4 presents the layered structure including configuration of the LSTM and GRU. Table 5 provides an overview of the total number of network parameters for the individual architectures and describes the amount of trainable and non-trainable    parameters. The non-trainable parameters are constant settings which were not updated during the training, whereas the trainable parameters are slightly modified after each iteration in the training process. This gives an impression of the network's complexity and allows conclusions about possible overfitting issues.

Performance validation
In order to investigate the performance of the individual ANN architectures, the real image data of the height profile scans of the fibre placement defects were initially taken. This was then divided in half into a training and a test data set and then separated into individual lines to be used for the analysis. In this way, a proper separation of the training and test data sets was achieved. For the following investigations, the test data was again divided into validation data, which was used for validation during the training, and test data, which was finally taken in order to analyse the performance of an approach. This results in a final data split of 50% training data, 25% validation data and 25% test data. This split was chosen in particular because in some cases only very few truly different real data sets per class were available, as indicated in Table 1. In order to provide sufficient data for a robust training process, but still allow a meaningful validation/ testing, the distribution given above was chosen. The investigations were first carried out for the setups outlined above in Sect. 3.3. The accuracy and the loss for the training and validation data were analysed for each of the 40 training epochs. This served to verify that the training of the respective ANN converged accurately and that possible overfitting issues could be detected. In the case of overfitting, it would be evident that the fitting of the respective ANN shows increasingly improved training accuracies, but the accuracy on the validation data decreases.
Subsequently, the seed for the random training, validation and test data split was varied and the mean and standard deviation over five runs for the different random training and test data sets were determined in order to evaluate the robustness of the training process and the classification results. Thus, error scores were also established, which can be used as rough deviation estimates for the interpretation of the classification results in this paper. The results for different ANN configurations were presented as individual confusion matrices in order to investigate the deviations for each defect category.
Subsequently, the performance of the ANNs were analysed for the synthetic defect images generated by Meister et al. [12] via a DCGAN. The accuracy and loss of the training and validation data were plotted and the confusion matrices for the the results on the test data were calculated. The analyses were carried out only for one single random split of training, validation and test data, assuming that the error rates determined above also apply as well in a rough approximation. Moreover, only 200 training images, 100 validation images and 100 test images per class were used for the analysis in order to reduce calculation time.
Finally, the validation method GAN-Train GAN-Test [42] already used for fibre placement defect images by Meister et al. [12] was applied to estimate the classification behaviour of the different ANN for altered test data.
The sequence of the different experiments and their characteristical properties are illustrated in the flowchart in Fig. 6 for a better overview.

Original process data
Initially, the results for training, validating and testing with real defect image data are discussed. Please note that the data sets are truly different from each other, as described in Sect. 3.4. Figure 7a displays the training history for the training data over the training epochs, with accuracy and loss given on the ordinate for each individual ANN from Sect. 3.3. Accordingly, in Fig. 7b the accuracy is plotted for the validation data per training epoch. Figure 7a shows that during the training process the accuracy converges towards one and the loss towards zero. This can be interpreted as an indicator of a stable training process for the utilised data. In Fig. 6 Flowchart of the sequence of the different conducted experiments addition, the respective ANN seems to be able to distinguish the training samples well from each other. Figure 7b presents the actual classification accuracy for the validation data set, whereby the results of the LSTMs and the GRUs converge towards a certain value> 0.9 or even continue to rise marginally. The 1D CNN, however, seems to have more problems classifying the validation data correctly at the beginning, but also yield a stable classification level > 0.9 from about epoch 30 onwards. Since none of the configurations reveal a reduction in validation accuracy with increasing number of epochs, it can be assumed that there is no overfitting of the trained ANNs after 40 training epochs, but that a stationary classification state has been reached, which is why 40 training epochs are chosen for the further experiments in this paper. Figure 8 presents the individual confusion matrices for the classification performance of the trained ANNs on the test dataset. As described in Sect. 3.4, the seed for the random splitting of the data were changed several times and the training and testing steps are repeated. Based on this, mean classification rates and the corresponding standard deviations are calculated.
The overall accuracy is for GRU and LSTM only 2.9-3.7% increased compared to the 1D CNN architecture. Due to the standard deviations of up to 1.84% for certain defect categories, the overall performance is considered to be slightly increased for the RNN architectures but relatively similar across the individual ANN architectures. It is noteworthy that such a high standard deviation is evident across all ANN architectures for the flawless category. Hence, when looking at the individual defect classes, larger differences are partially evident. None, twists and gaps yield relatively equal classification rate across all ANN architectures. Wrinkles and overlaps show improved classification accuracies for the RNN architectures but lower classification performance for the 1D CNN. Foreign bodies generally have a relatively high misclassification rate compared to the respective overall accuracy. This difference in accuracy is 32.36% for the 1D CNN, 20.85% for LSTM and 26.77% for GRU.

Synthesised data
In order to assess the classification performance of the developed ANN architectures and configurations for alternative training and test data, the networks are trained and tested with synthetic defect images. Figure 9a presents the loss and accuracy of the training process for different training epochs and Fig. 9b gives the validation accuracy per epoch for the test data. In Fig. 9b it is evident that the training accuracy and training loss converge towards one and towards zero, respectively, and thus a stable training process can be concluded. This behaviour is very similar to the analysis of the real defect images from above. In Fig. 9a the validation accuracy of LSTM and GRU converges towards 0.85. This is slightly less than for the real data and might be related to the design or parametrisation of the upstream convolutional layers in the ANN layouts. This could possibly be related to the construction or parametrisation of the upstream The confusion matrices in Fig. 10 show the classification results for the synthetic data in detail. However, only one run was performed to get an impression of the ANN performance, but to keep the computation time for this much larger data set manageable. Please note that the standard deviation for the synthetic data might be slightly increased compared to the real data since the overall In general, it is apparent that the overall accuracy for all three ANN layouts is reduced by about 10% compared to the results for the real defect images. Compared to the real defect data, it is observed that almost all classification rates of the individual defects are above the mean classification rates of each ANN. The only exceptions are twists for the LSTM architecture and overlaps across all ANN designs. For twists, the classification rate is only 1.66% below the mean accuracy of the LSTM. Overlaps, on the other hand, have very severe misclassification rates from at least 25.53% for the GRU and up to 34.01% for the 1D CNN. These synthetic overlaps are mistakenly identified as flawless areas within the range of 22.9% and 25.91%, which is critical for practical applications. On the other hand, however, flawless regions are also incorrectly categorised as overlaps by between 7.58% (LSTM) and 13.99% (1D CNN). The misclassification as 'none' is somewhat lower for gaps, but is still between 9.45% and 10.39% for the three ANN setups considered.
Presumably this classification behaviour can be due to artificially generated artefacts within the images, which were added through the DCGAN based synthesis process. Such artefacts are particularly noticeable when comparing synthetic to real data, but do not represent the actual defect. They occur predominantly with less pronounced defect types such as none, gaps and overlaps, as can be seen in the example images in Fig. 4. Thus, the 1D analysis approach introduced in this paper may have difficulties to classify these defects correctly.

Cross-validaton
In this final experiment, the performance of the ANNs are evaluated crosswise when trained with synthetic data and validated/ tested with real data, and compared to the performance of the inverse case when trained with real data and validated/ tested with synthetic data. For this purpose, the loss and accuracy for the training processes of the ANNs as well as the validation accuracy for both scenarios are illustrated in Fig. 11. The rates are displayed on the ordinate and the number of training epochs on the abscissa.
At first, it can be seen once more in Fig. 11a, b that the loss and accuracy for the training of all ANN converge towards zero and one, respectively, both for the training with synthetic data and for the training with real data. When training with synthetic data, however, a stationary state is reached after approximately 35-40 epochs, whereas for training with real data this is already achieved after about 20 training epochs. In general, however, a stable training process can be assumed for all considered ANN architectures and both training datasets.
The results are different for the validation accuracy. For training with synthetic data and validating with real data in Fig. 11c, the validation accuracy is around 0.85 for LSTM and GRU, where the accuracy over the various epochs is slightly greater and less fluctuating for LSTM compared to The loss and accuracy for the various training processes as well as the accuracy for a different data set are displayed GRU. For the 1D CNN architecture, no stable classification accuracy can be observed, which indicates that the 1D CNN has much more difficulty in extracting suitable features from the input defect images.
For training with real data and validation with synthetic data, it is evident again that the validation accuracy of the 1D CNN fluctuates strongly and is very unstable. The validation accuracy of LSTM and GRU stabilises at a relatively low value of slightly above 0.7, which is about 0.15 lower than for the inverse case when trained with synthetic data. From this it can be concluded that the synthetic data represent the real data much better than the other way round. However, the significantly lower classification rate compared to the results from Sects. 4.1 and 4.2 suggests that the usage of synthetic data for training ANNs offers minor advantages, especially when sufficient training data for the respective use case is available. Figure 12 presents the classification results in detail as a confusion matrix, where the matrices on the left side represent the performance for training the ANNs with synthetic data and the right side represents the results for training the ANNs with real data. The best mean classification accuracy of 85.07% is obtained in Fig. 12c for LSTM with 200 recurrent units, trained with synthetic data. The worst accuracy results in Fig. 12b for the 1D CNN trained with real images.
Furthermore, it is evident that flawless regions yield relatively high classification rates compared to the respective mean overall classification rates with classification scores This also applies to the right column of Fig. 12 where the ANNs have been trained with real data and tested on synthetic data. Moreover, it is noteworthy that gaps are always classified between 4.45 and 7.89% worse compared to the respective mean classification rate when the ANNs are trained with synthetic data, whereas these defects are always classified with at least 2.52% and up to 11.25% greater accuracy in relation to the respective average classification rate when the ANNs are trained with real images. Synthetic gap images do not seem to represent real gap images quite well. Beyond that, an increased tendency of misclassifying overlaps and gaps as none is apparent. For training the ANNs with synthetic data, these misclassification rates are between 7.24 and 20.33%. For training with real data, the misclassification scores are considerably worse with at least 13.12% and up to 47.32%. During training with real data as well as in general for the 1D CNN setup, an increased misclassification of wrinkles, twists and foreign bodies among each other appears. For training the LSTM and GRU with synthetic data, this tendency is significantly lower, except for the misclassification of foreign bodies as wrinkles, which reaches a misclassification rate of up to 14.43%.
The observed lower classification rate with different levels of severity for the individual classes results from the diverging nature of real and synthetic images. This does not necessarily mean that the synthetic data are in general not suitable for the 1D data analysis considered in this study, but artefacts or noise may have been added to the artificial defect images during the data synthesis, which may have a much stronger effect on the 1D analysis case than when considering the entire 2D input image. However, from this it can certainly be concluded that training the ANNs with representative data sets from a given application case is to be preferred, unless a data augmentation is absolutely necessary.

Discussion
The presented approach of a line-by-line interpretation of fibre layup defect images provides a sound alternative to the 2D CNN approaches from the previous research of Meister et al. [4,5,12]. Especially the issue of having an insufficient amount of training data can be significantly reduced. Using a window-based implementation of the presented method, presumably also defect segmentation like described by Sacco et al. [18,31] can probably be realised quite well. However, this has not been part of this study. Through the investigations it has been realised, however, that deviations between training and test data or image artefacts potentially have a greater influence on the result of this 1D analysis than it is possibly the case when examining the overall defect image. This concern has also already been discussed in Meister et al. [4] Fig. 12 (continued) when examining the input data of a CNN. Accordingly, the results of this paper show that while the amount of image data can be reduced with the presented 1D analysis approach, it is even more crucial to use representative data for training the ANNs.
Regarding the research question, it can be concluded that especially the recurrent network architectures LSTM and GRU with 200 recurrent units show particularly stable classification properties. When representative training and test data are used, average classification accuracies of > 94 % can be achieved with these methods. The pure CNN setup, however, reveals quite unstable classification results in certain cases.
The results of this paper are particularly beneficial for developers of LLSS based inline inspection systems, which perform a direct, line-by-line interpretation of the height profile of a surface. The proposed RNN architectures can be trained directly for the respective application case after adapting the input dimension. That enables a rapid transfer of the findings into industrial applications. It can be assumed that the layout and configuration of these ANNs can also be applied beyond inspection in composite manufacturing.
In future research, the presented methodology should be applied to height profiles of an entire fibre placement course with several parallel aligned tows. In this context, a window-based analysis approach or the direct assignment of a classification result to an image coordinate could be investigated. Furthermore, the presented technique needs to be investigated within a realistic manufacturing process on a geometrically more complex component in order to evaluate the method's performance for this use case.

Conclusion
Three line-based analysis algorithms were investigated, where LSTM and GRU enable a stable training process with classification accuracies of > 94 %. In particular, the ability to perform the ANN training with much smaller datasets is a major advantage of this approach. The classification rate is comparable to that of 2D image analysis methods.
The contribution for the community and the industry are suitable ANN setups, which are suitable for the line-by-line evaluation of topographical data.
Funding Open Access funding enabled and organized by Projekt DEAL. This research is part of the project HyStor and was financially supported by the Investment and Development Bank of Lower Saxony -NBank. This project has received funding from the NBank under the funding code No. ZW180159715.
Availability of data and material The image data used will be provided on request.
Code availability The used Python code is available on request.

Conflict of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.