Machine Learning Based Preprocessing to Ensure Validity of Cross-Correlated Ultrasound Signals for Time-of-Flight Measurements

High precision ultrasonic time-of-flight measurement is a well known part of non-destructive evaluation used in many scientific and industrial applications, for example stress evaluation or defect detection. Although ultrasonic time-of-flight measurements are widely used there are some limitations where high noise and distorted ultrasonic signals are conflicting with the demand for high precision measurements. Cross-correlation based time-of-flight measurement is one strategy to increase reliability but also exhibits some ambiguous correlation states yielding to wrong time-of-flight results. To improve the reliability of these measurements a new machine learning based approach is presented based on experimental data collected on tightened bolts. Due to the complex structure of the bolts the ultrasonic signal is influenced by boundary conditions of the geometry which lead to high number of the ambiguous cross-correlation results in practice. In this particular application, bolts are in practice evaluated discontinuously and without knowledge of the time-of-flight in the unloaded condition which prevents the use of all other available comparative preprocessing techniques to detect time-of-flight shifts. Three different preprocessing strategies were investigated based on variations in the bolting configurations to ensure a machine learning based model capable of predicting the state of the cross-correlation function for different bolting parameters. With this approach, we achieve up to 100% classification accuracy for both longitudinal and transversal ultrasonic signals under laboratory conditions. In the future the method should be extended to become more robust and be applicable in real-time for industrial applications.


Introduction
The time-of-flight (TOF) measurement is an important application of ultrasound. It is widely used in scientific and industrial applications, e.g. in distance measuring, characterization of elastic constants or defect evaluation. The precision of analysis of a TOF measurement depends on the signal quality and ambiguity. For signals with little noise or superposition of boundary conditions and applications with typically low accuracy requirements, gating and peak detection are sufficient e.g. for a thickness measurement.
In the presence of distorted or noisy ultrasonic data, for example caused by a complex structure combined with changes in the material properties, which are in the range of parts per thousands, inhomogeneous material, interferences, attenuation, scattering and mode conversion for example, the TOF analysis becomes more complex [1,2]. In these cases cross-correlation based TOF processing has been shown as a reliable analysis, achieving a theoretical precision only limited by the sampling rate of the signal digitalization.
Cross-correlation provides a measure for the similarity of two signals with the maximum of the cross-correlation representing the temporal shift resulting in the highest similarity of both signals [3]. This temporal shift is used to determine the TOF (Fig. 1). To calculate the TOF, segments of two back wall echoes with starting Simon Herter and Sargon Youssef have contributed equally to this work.

3
20 Page 2 of 9 points coordinates SPW1 and SPW2 and identical window length WL (Fig. 1a), are selected in the ultrasonic time signal and extracted (Fig. 1b). Then the maximum position of the cross-correlation (MPCC) function is determined (Fig. 1c). Together with the sampling frequency (SF) of the digitalized signal the time-of-flight can analytically be determined based on Eq. 1 [1,3]. Unfortunately, even cross-correlation can yield ambiguous results for example for preload determinations of tightened bolts. Based on the complex geometry of bolts and the required high precision, which is in the range of few nanoseconds, the TOF evaluation becomes a limiting factor due to superposition of the above mentioned effects. The ambiguity of the cross-correlation function due to interferences, mode conversion and scattering as well as bad coupling conditions results in so called "TOF-shifts" when the self-similarity function has several maxima with similar height (Fig. 2). Due to the present artefacts, the global maximum is not necessarily indicative for the real TOF value of the ultrasonic wave and therefore yields errors in the determination of the time-of-flight. The shift of the maximum is a multiple of the excited ultrasound frequency hence the calculated TOF value can be classified as valid or invalid if physical references including dimension, sound velocity or mechanical properties of the investigated object are available. The absence of these references in the bolting application targeted by the authors and the non-Gaussian appearance of the TOF-shifts generates the necessity for transferable comparative evaluations. In practice, this artefact strongly impacts the reliability and usability of the TOF method for the determination of preloads in bolts already tightened as there is a risk of overor under-estimating the preload which could lead to failure of the structures or loosening of the bolts (Fig. 3). The TOF-shifts cause a parallel shift of the predicted preloads and result in misinterpretation of the preloads. While some values can be excluded based on physical reasoning (e.g. negative preloads or loads above the tensile strength), Fig. 1 Principal of cross-correlation as used on an ultrasonic signal from TOF measurement: the pulses of the first and second back-wallecho (red windows) (a) are selected (b) and cross-correlated (c) Fig. 2 Illustration of a TOF-shift observed in cross-correlation function measured on a bolt with identical loading conditions Fig. 3 Effect of TOF-shifts on calculated preloads based on a TOF measurement on bolts using two ultrasound modes with different physical correlation between sound velocities in direction of an applied mechanical load identification of the presence of TOF-shifts is not purely possible based on some physical or mathematical constraints.
To eliminate this limitation of TOF for demanding, high precision applications, the authors propose to use a machine learning based model, capable of classifying measurements based on the presence of TOF-shifts in order to exclude ambiguous values from analysis. This means the machine learning model is applied on the cross-correlated signals to exclude data with TOF-shifts before even calculating the TOF. Consequently the reliability of the TOF measurements is improved significantly.
In this publication, the principle will be demonstrated as preprocessing for cross-correlated ultrasound backwall echoes leading to TOF measurements exemplarily performed on bolts subject to different preloading conditions but can be transferred to other applications where these types of ambiguities in the cross-correlated signals occur.

Materials
All experiments were performed on 34CrNiMo6 steel M24 hexagon head bolts, with varying lengths, shaft lengths and thread lengths (Fig. 4a). The parameters for the ten configurations are listed in Fig. 4b. Bolt 1* and Bolt 2* have identical geometrical features as Bolt 1 and 2 but exhibit different manufacturing tolerances. The end of all threads were mechanically planarized to reduce the influences of the concave shaped ends and thereby ensure a good reflection of the ultrasonic pulse.

Experimental Method
A custom-built experimental setup [4] enables the TOF measurement on the bolts under increasing preload from 0 to 100 kN in 20 kN steps and simultaneously collect reference data of real preloads using a load cell (type 8524, Burster GmbH & co kg) to classify the results according to the presence or absence of TOF-shifts. At each preload level, at least 100 individual TOF measurements were collected with slightly different sensor positions to generate data with and without TOF-shifts.
A piezoelectric longitudinal transducer with a mean frequency of 5 MHz and a diameter of 12.7 mm is used for TOF measurements (SMP212). Additionally a piezoelectric shear wave transducer with a mean frequency of 5 MHz and a diameter of 8.7 mm (V156 (66416)) is used to induce shear waves. The excited ultrasound signals are digitized at a sampling rate of 240 MHz and shear wave compatible coupling agent is used for the longitudinal as well as for the shear wave.
For ultrasonic excitation, an UNIUS system is used. UNIUS is a high-performance single-channel ultrasound electronics system fabricated by the Fraunhofer Institute for Nondestructive Testing IZFP [5]. Additionally a customized software that enables the user to set all ultrasonic parameters in a User Interface (UI) is part of the experimental setup. For acquisition of the ultrasonic A-scans the system is parameterized with the following settings: 5 MHz filter (range: 3.7-7.6 MHz), 2 repetitions, SAP-length 100 ns resulting in an excitation frequency of 5 MHz.

Computational Methods
For preprocessing and classification, Python (Version: 3.7, Python Software Foundation) is used in conjunction with the scikit-learn library for the machine learning algorithms [6]. Additionally for data preparation and handling the numpy and pandas libraries are used. Visual representation is done via matlab and matplotlib [7-10].
The first step of data preprocessing is to perform the cross-correlation and determine the TOF as shown in Fig. 1. Based on empirical findings, the window length (WL) of 1024 data points is used throughout this analysis, resulting in 2047 data points of the cross-correlation function. Crosscorrelation is calculated using normalized xcorr function in Matlab (Version: MATLAB 2019a) and the results labeled in data with and without TOF-shift based on reference measurements with the load cell. To avoid negative influence by the curse of dimensionality [11] and reduce computational time, the 2047 features from the cross-correlation function will be reduced to 256 data points selected around the maximum of the cross-correlation function.
For further dimensionality reduction, three machine learning algorithms were studied: 1. Linear Discriminant Analysis (LDA) [12,13] 2. Principal Component Analysis (PCA) [14] 3. Independent Component Analysis (ICA) [15] LDA is a supervised dimensionality reduction technique and can be used as linear classifier. Supervised dimensionality reduction uses the given features and the associated classes of the dataset. Hence the LDA algorithm is able to transform data into feature spaces with a dimension smaller than the amount of classes. In contrast to the LDA the PCA and ICA are unsupervised techniques. Unsupervised dimensionality reduction is only based on the features and no information about the associated classes is needed. Therefore the possible resulting dimension of the dataset can be between one and the number of features. To determine the algorithm with highest classification accuracy, loop functions are used to optimize algorithm-specific parameters considered.
The classification is performed by a k-nearest-neighbor (KNN) classifier. Within the KNN classifier the Euclidean metric is used as distance indicator, the number of k-nearest neighbors is set to 7 and a weighted distance measurement is used [6,11].
The training and test dataset are determined with two alternatives: dividing the dataset based on a random generator (using the train-test-split algorithm from scikit-learn) or manually (based on different experimental specimens predetermined to be training and test data).
The quality of the used preprocessing is evaluated by the accuracy of the model as well as with the False Omission Rate (FOR) and the False Discovery Rate (FDR). There are two possible false classifications: "False-invalid TOF" refers to valid measurements, which are classified as invalid TOF and "False-valid TOF" describes data, which represent an invalid TOF value but is classified as valid. The FDR and FOR accuracies are calculated based on the following two equations (Eq. 2) and (Eq. 3): In Eq. 2 FIT describes the number of "False-invalid TOF" classified samples and IT is the number of correct "invalid TOF" classified samples. In Eq. 3 FVT describes the number of "False-valid TOF" classified samples and VT is equivalent to the number of correct "valid TOF" classified samples. The following equation (Eq. 4) displays the way the overall accuracy of the model is calculated.

Results
The results are presented separately for longitudinal-and shear waves because the cross-correlation function is slightly different for the both wave types. For both wave types, we evaluated two levels of complexity for the machine learning model: -Training and test data are chosen as subsets from measurements performed on the same bolts, no unknown configurations are introduced in the test set. -Test data include some experimental configurations that are not included in training data to evaluate how robust the method works and how well it can be generalized.
Results are presented in terms of classification accuracy. With respect to our experiments, this is the percentage of experiments that were accurately classified as without and with TOF-shift. Labeling of the training and test data set is based on the separately determined ultrasonic velocities for the pristine material as well as the measured length of the bolts. As described in the Materials and Methods section, the custom build setup enables a time synchronous measurement of actual preloads in the bolts which can be compared to the predicted preload based on the TOF to determine classification in SHIFT or OK data sets.

Longitudinal Waves
For the first level of complexity, training and test dataset are determined via the train-test-split algorithm. Data are collected on Bolt 1 and Bolt 2 taking into consideration three clamping lengths and six different preload levels (Fig. 4b).
For each preload level and each clamping length as well as for both bolts at least 100 measurements were recorded. 3610 measurements were recorded in total (2140 OK and 1570 SHIFT) and subdivided into 60% training and 40% test data.
The LDA algorithm results in only one component due to its supervised manner. The PCA and ICA algorithms can transform the data in all dimensions smaller or equal to 256 and require a systematic study of the optimal number of components to be taken into account. For each number of components a KNN classifier carries out the classification after the transformation.
In Fig. 5 the accuracy for PCA and ICA algorithm over the number of components is shown. After an increase in accuracy for few components both algorithms exhibit a range with optimal performance. The PCA shows a constant performance starting at 7 components up to 256 components. The ICA provides the best results between 6 and 12 components. Beyond 20 components the ICA shows a significant drop in accuracy as the number of components increases which is in accordance with literature [16].
Based on the results shown in Fig. 5 it is visible that the PCA and ICA show a perfect classification result for this test data configuration. Results from transformation based on LDA, PCA and ICA followed by classification with a KNN are shown in Table 1. LDA was able to classify the data with an accuracy of 99.58% with some instances of false-invalid TOF (FOR = 0.24%) and false-valid TOF (FDR = 0.66%) classification. PCA and ICA have successfully classified 100% of the test data.
To increase complexity of the classification task, the training and test datasets were compiled manually to evaluate the capability of the algorithms to classify data with parameters not included in the training dataset (e.g. different preload levels, bolt parameters). Training data are based on data from Bolt 1 and 2 ( Table 2). The test data were recorded on Bolt 3 and 4 as well as Bolt 1* and Bolt 2* (Table 3) and data for different preload levels on Bolt 1 with clamping length 196.3 mm were added. The test data set covers 3000 measurements with 684 data points of invalid TOF values. The training data consists of 3610 measurements including 1570 invalid TOF instances.
The highest classification accuracy was achieved with 4 components for PCA (Fig. 6a) and 11 components for the ICA (Fig. 6b). The trends of accuracy with increasing number of components are similar to the trends described above, but the overall accuracy for both algorithms is lower

Shear Waves
Similar to the procedure for the longitudinal wave data, the train-test-split algorithm is applied first to compile the training and test dataset for the shear wave from data recorded on bolt 1 and 2. For both bolts, three different clamping lengths and six preload levels are considered (18 configurations for each bolt). A total of 3600 measurements (2729 valid TOF samples and 871 invalid TOF samples) are divided into 60% training and 40% test.
The best results for the PCA algorithm are obtained between 5 and 256 components except 6 components (Fig. 7a). For the ICA algorithm, 10 to 21 components yield best results (Fig. 7b). Similarly to the longitudinal  wave data the ICA algorithm also exhibits a drop in accuracy by increasing the number of components. Table 5 shows the results for each algorithm. LDA classified the data with 97.85% accuracy with a small percentage of false-invalid TOF values and 4.55% FDR. The PCA reaches up to 99.65% with a FDR of 1.42%. ICA was able to classify the data set with 100% accuracy.
Following the same procedure as for longitudinal waves to increase complexity of the data set, the training and test dataset were compiled manually. The experimental configurations chosen as training and test dataset are summarized in Tables 6 and 7. The training data includes  3000 measurements with 871 invalid TOF occurrences.  The test data covers 2400 measurements with 161 invalid  TOF values. For PCA, 6 components shows the best results (Fig. 8a), while for ICA 17 components provides the highest accuracy (Fig. 8b). Based on the optimal number of components for the three algorithms, results are summarized in Table 8. LDA exhibits the lowest accuracy with 98.33% with a low FOR of 1.21% but a relatively high FDR of 8.07%. PCA as well as ICA algorithm provide an accuracy of more than 99%. PCA reached 99.79% accuracy with a FDR of 1.24% as well as a low FOR of 0.13%. The ICA achieved 99.92% accuracy with 0% FOR and 1.24% FDR.

Discussion
The results (Tables 1 and 5) indicate a high accuracy up to 100% for preprocessing with PCA and ICA and up to 98% for LDA when the test dataset does not includes any unknown parameters. This fact is valid for both wave types. If unknown data are included in the test dataset for longitudinal wave data, the accuracy decreased. For the longitudinal   waves, only the ICA showed accuracy of above 98% whereas the PCA and the LDA exhibited an accuracy of 81.4% and 78% respectively. For the shear wave data, the decrease of the accuracy for the more complex data sets is low compared to the longitudinal waves. One reason for the low decrease in accuracy is that manual configured test set for the shear wave data includes not so many unknown parameters. Furthermore the invalid TOF measurements in shear wave data is often of higher order than for the longitudinal wave data, which means the maximum shift of the cross-correlation function is not happening to the next maxima but to the second or even higher order maxima. It is very likely that TOF-shifts of a higher order are easier to detect due to a more characteristic change in the crosscorrelation function and result in the higher accuracies for shear wave data.
The three algorithms studied in this paper exhibited varying accuracies depending of the wave types and training and test data configurations. One reason the LDA shows the lowest accuracy in each configurations is that one feature, which is the only possible value due to the supervised nature of the algorithm, is not enough to separate the data, especially if some unknown parameters are considered. The reason the ICA outperforms the PCA on the test data with unknown configurations is likely related to the way the ICA transform the data onto lower dimensional vectors. The ICA determines components with high independency in the data and project the data onto those components whereas the PCA identifies vectors in the direction of maximal variance. Hence the PCA is a way of compressing the data while the ICA recovers the independent components hidden in the data which could be the reason why the ICA enables the KNN to distinguish between valid and invalid TOF values.
One restriction of the ICA algorithm is the relatively high number of components needed to give an appropriate lower dimensional space representation e.g. for longitudinal waves 11 components are needed and for shear waves 17 components are needed.
The presented results were performed based on an application with occurrence of many TOF-shifts due to its complexity to establish a method to successfully preprocess ultrasonic TOF measurements. In the future a more extensive validation will be necessary to systematically investigate the robustness of the algorithms for different training and test data sets. It is especially important to establish minimum requirements to be met by the training data in order to translate the principle to practical applications of preload evaluation in bolts and similar applications with requirements for high TOF-precision.

Conclusion
In this study we demonstrated that a preprocessing of ambiguous cross-correlated ultrasonic signals for TOF measurement is possible. We propose two different models for the longitudinal and shear wave data. Over various bolting parameters for the longitudinal wave data, it is possible to achieve an accuracy of 98.83% with a FOR of 2.1% and zero FDR if the preprocessing is performed by ICA with 11 components. In case of the shear wave data it is possible to set up a model with an accuracy of 99.92% with zero FOR and a FDR of 1.24% if the preprocessing is performed via ICA with 17 components. In contrast the PCA and LDA showed considerably lower accuracy values for both wave types. PCA and LDA achieved only an accuracy of 81.4% and 78% respectively for longitudinal wave data in the case of unknown parameters being introduced in the test data set.
In this study, we limited variables to bolting parameters and did not include a variation of ultrasonic parameters such as different ultrasonic transducers, different coupling agents as well as other ultrasonic systems. The main challenge for further investigation of machine learning based models consists in varying more boundary conditions and investigating the underlying effects. Influences of all these factors on the model's accuracy have to be examined for this method to be translated to technical applications. adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.