Introduction

The health and preservation of trees rely on the ability to detect internal defects without causing any damage. Researchers have found that stress wave tomography can be an effective method for this purpose (Wang and Allison 2008). Stress wave velocity in wood depends on its physical and mechanical properties. The underlying principle of this method is that defects in wood cause changes in density, which subsequently affect the speed of stress wave propagation where the stress wave travels faster in defect-free wood, it decreases in low-density areas caused by cracks, cavities, or decay. Therefore, by measuring the stress wave velocity along a tree trunk perpendicular to the grain, the internal condition of the tree can be determined. Stress wave velocity for non-decayed wood is much greater than that for decayed wood (Beall and Wilcox 1987; Ross 2015). Studies conducted on trees using stress wave-based acoustic tomography have shown that internal decay hidden inside the trunks has been successfully detected (Nicolotti et al. 2003; Wang et al. 2009).

Several studies have contributed to the development of wood examination using tomographic methods (Tomikawa et al. 1986; Brancheriau et al. 2008; Feng et al. 2014; Du et al. 2015; Huan et al. 2018; Du et al. 2018; Palma and Gonçalves 2022; Liu and Li 2018). Tomikawa and colleagues introduced straight-ray tomography for examining wooden poles (Tomikawa et al. 1986), while Brancheriau et al. used ultrasonic computed tomography in reflection to evaluate tree integrity (Brancheriau et al. 2008; Espinosa et al. 2019). Feng et al. (2014) proposed an image reconstruction algorithm based on interpolation to detect internal wood decays. Feng et al. (2023) used ground penetrating radar technique to evaluate the defects inside the tree trunk. Building on previous research, Du et al. (2015) presented a stress wave tomography method, which is one of the important techniques to detect defects (Wei et al. 2022), using ellipse-based spatial interpolation and velocity compensation. Another stress wave imaging algorithm, known as IABLE, is proposed by Huan et al. (2018). This method collects stress wave velocity signals from tree sections using professional equipment and employs an iterative inversion method to calculate the grid distribution of wave velocity in tree interior defects (Huan et al. 2018). Du et al. (2018) also proposed an image reconstruction method based on fragmented propagation rays of stress waves. This technique uses sensors suspended evenly around the timber to acquire stress wave velocity data, which is then used to visualize and reconstruct the image of internal defects through the visualization of propagation rays. The algorithm incorporates a ray segmentation algorithm using the elliptical neighborhood technique. Palma and Goncalves (2022) conducted a study measuring the importance of the number of sensors on interpolation and tomographic imaging techniques similar to those in all previous studies. Liu and Li (2018) on the other hand, talk about a method that they suggest may be more accurate than the assumption that acoustic waves propagate straight. This method is based on a hybrid wave propagation model (HWPM) that includes two velocity inversion phases: straight-ray inversion and curved-ray inversion (Liu and Li 2018). In addition to image reconstruction of internal images, a new technique based on laser 3D scanner to detect defects, such as external cracks, penetrating dead knots and wanes, on the surface of the tree trunk was proposed by Ai et al. (2022).

Although stress waves and machine learning techniques have not been used together to visualize internal defects in the literature, there are defect detection studies that include deep learning techniques. A deep learning algorithm that detects single and multiple errors only on synthetic data was developed by Zhou et al. (2021). In a study conducted in 2022, Sun (2022) aimed at performing defect analysis with deep learning techniques. The data for this study were collected as photographs and processed as images (Sun 2022). There are also studies using machine learning techniques for classification problems in external defects of wood (He et al. 2020).

The main objective of this study is to present an innovative approach for detecting internal defects in trees. In this context, the article makes two primary contributions. First, a methodology is developed for creating a ray map independent of any parameters and without using a grid structure. This provides a more flexible and scalable approach compared to traditional methods. Second, a method is introduced for creating a tomographic image by classifying the ray datasets obtained and integrating them with machine learning techniques, a process never before attempted. This enables more precise and accurate detection of internal defects in trees, offering a novel approach. These two fundamental contributions underscore the overall innovative and valuable nature of the article.

In the first step, stress wave velocity data obtained from segmented sensors is visualized as velocity rays between the sensors based on their positions and distance data (Du et al. 2018). Then the ray maps were colored according to speed thresholds and then segmented by taking into account the intersection areas. In more detail, the velocity rays drawn between two sensors are segmented by the proposed ray segmentation method in the first step of the algorithm. The area of influence of the rays is assumed to be equal or proportional to the value of each individual ray. In order to determine what value it will take at the points where the impact rays intersect, the rays are examined by dividing them into small pieces. It is found which of the rays intersect with all other rays and the average value of all rays intersecting on that segment is calculated and given to this ray segment. All ray segments are recolored according to the new values they receive. In the second step, the selected points on the split rays are labeled and fed to the K-nearest neighbor (KNN) and Gaussian process classifier (GPC) algorithms to generate the image of the defect in the wood. The results of these two algorithms are visualized and a tomographic map is obtained.

KNN algorithm is a popular example-based machine learning algorithm used for classification and regression tasks. It works on the principle of finding the K data point closest to a given test sample in the training set and making predictions based on the class labels or values of these nearest neighbors (Cover and Hart 1967). KNN is a non-parametric algorithm, meaning it does not make any assumptions about the underlying data distribution. The choice of parameter K determines the number of neighbors considered and this can have an impact on the performance of the algorithm and the bias-variance trade-off (Altman 1992).

Gaussian Process Classifier (GPC) is a powerful machine learning algorithm that belongs to the family of Gaussian processes (Friedman et al. 2009; Rasmussen and Williams 2006). It is primarily used for classification tasks where the goal is to predict the class label of a given input based on the training data. Rather than directly modeling class labels, GPC models the underlying decision boundary as a smooth function and captures the uncertainty associated with predictions (Seeger 2003; Snelson and Ghahramani 2007). The flexibility and probabilistic nature of GPC make it suitable for a wide variety of classification tasks, particularly in areas where uncertainty estimation is crucial.

The proposed algorithm is implemented using the Python programming language, providing a practical and efficient tool for non-destructive detection of tree interior defects. This method enables researchers to identify and analyze internal defects in trees, thereby contributing to the overall health assessment and preservation of forests through the utilization of stress wave tomography.

Instead of relying solely on the segmentation area during the visualization of the damage estimate within the wood, using machine learning algorithms whose accuracy can be calculated can increase the reliability of the result. The results can be improved with the parameters of machine learning algorithms and arranged in accordance with the data, which can lead to more successful results.

Data and materials

Measuring the acoustic signals in tree trunk

The acoustic signals in the tree trunk and wooden samples are measured using twelve sensors by using the FAKOPP ArborSonic 3D Acoustic Tomography (Fakopp Enterprise Bt., Hungary) device shown in Fig. 1. The samples used in the experiments were obtained from historical plane trees. The diameters of samples T1, T2, T3, and T4 are 120 cm, 110 cm, 140 cm, and 45 cm, respectively. While T1, T2, and T3 were obtained from the tree trunk, T4 was obtained from the tree branch. The moisture content of all samples is green, and the defects they contain are natural.

Fig. 1
figure 1

FAKOPP acoustic tomography device and a sample application

The sensors are located evenly around the circumference of the samples and the measurement is performed three times by tapping each sensor using the hammer sequentially. The circumference and the distances between sensors are measured by the caliper. The time of flight of the acoustic signal between the sensors is measured and converted to the velocity matrix by dividing the distance between sensors by the time of flight. The resulting sample velocity matrix is shown in Fig. 2. After obtaining the velocity matrix, tomographic image is constructed by the algorithm developed in this study for each sample.

Fig. 2
figure 2

Velocity matrix calculated based on time-of-flight values and the distances between sensors

Synthetic data generation

In order to create synthetic data to use in the study, firstly, polygons are produced by combining points at random positions that could form twelve triangles. Then, geometrically shaped defects are added into the predetermined coordinates within the polygonal area. Considering each corner of the polygon as a sensor, different velocity values are selected according to whether the rays emanating from the sensors passed through the defect or not, and a velocity table is created for each synthetic data.

Proposed method

The proposed algorithm presents a novel approach to analyze the stress wave velocity in wood and generate tomographic images of the cross section of the trunk. By employing a two-stage process, it effectively addresses the challenges associated with accurate segmentation and classification of stress wave velocity rays.

Ray segmentation for stress wave propagation

In this step, a similar segmentation technique inspired by the elliptical neighborhood technique proposed by Du et al. (2018) has been employed. The algorithm, as proposed in their publication, relies on the principle of dividing the ray map into grids, operating under the assumption of even propagation of rays within the wood. The content of the grid is described in Fig. 1 of Du et al.’s (2018) publication.

According to the method by Du et al. (2018), ray segmentation by elliptical neighborhood method involves splitting each original ray multiple times and calculating the appropriate elliptical neighborhood for each split ray.

The scheme presented in the diagram illustrates the segmentation technique inspired by Du et al.’s (2018) elliptical neighborhood method for ray propagation. In this technique, a ray map (Fig. 3a) is divided into a grid map (Fig. 3b) where specific areas affected by rays (Fig. 3c) are identified. A selected ray (Fig. 3d), such as L1, L2, or L3, is segmented repeatedly, constructing elliptical neighborhoods (Fig. 3f) around these segmented rays and estimating their values (Fig. 3e). The coefficient c1, calculated as the ratio of the ellipse’s short axis to its long axis, defines the elliptical shape. In this method, the values of the control coefficient \({c}_{1}\) play a decisive role for ray segmentation and consequently in image reconstruction. The values of these parameters may affect the effectiveness and results of the method. The strategy for selecting affected regions (Fig. 3g) depends on the values of segmented rays within these elliptical neighborhoods. The angle θ between the line segments from the starting point to the ending point (LSE) and from a specific point on the ellipse to the ending point (LPE) determines whether a grid cell falls within a segmented region. This iterative process continues until all original rays are processed, resulting in segmented rays (Fig. 3h) and a reconstructed ray map based on the estimated grid cell values.

Fig. 3
figure 3

Diagram of the Elliptical Neighborhood Segmentation Technique inspired by Du et al. (2018)

In this study, unlike Du et al.’s (2018) study, all areas requiring coloring to obtain a tomographic image are examined as point-based rather than grid-based. This approach allows for a more accurate estimation of missing data. Additionally, the segmentation algorithm proposed in this study does not require any parameters, making the method more stable. Similar to Du’s method, the smallest ray size in the relevant ray map is determined. The rays are repeatedly divided such that each one is one-third the size of the smallest ray. In this study, the ratio was taken as “1/3” rather than “1/2”. While it is possible to further reduce this ratio, it has not been preferred due to the significant increase in computational complexity it would entail. The iteration process concludes when the size of the smallest ray segment reaches one-third of the smallest ray size, serving as the threshold value. Consequently, the number of segments into which each ray is divided is determined individually for each ray map. Thus, instead of selecting and applying a fixed constant to data of varying sizes, a specific segmentation process is performed for each dataset. Our ray segmentation algorithm developed in this study can be described as follows:

Let’s define each ray in the dataset as \({r}_{i}\), and their sizes as \({s}_{i}\). We denote the minimum size of the rays as \({s}_{min}\), and the threshold value as threshold.

The algorithm operates as follows:

  • Initially, the minimum size of the rays is determined as \({s}_{min}= min({s}_{i})\)

  • For each ray \({r}_{i}\), while \(\frac{{s}_{i}}{{2}^{{n}_{i}}}>\frac{{s}_{min}}{3}\), following process is executed. Here, multiplying the \({s}_{min}\) value by.1/3 is thought to improve the results. Other coefficients can be chosen if desired.

  • Where “\(n\)” is the number of segments into which the relevant ray is divided.

  • This process terminates when \(\frac{{s}_{i}}{{2}^{{n}_{i}}}>\frac{{s}_{min}}{3}\) for each ray.

After segmenting the rays into segments, it is checked whether each segment has a point that intersects with other rays. The average of the speed values of the rays detected to intersect is taken and transferred to the relevant segment, and the segment is recolored according to its new speed value. This approach allows more precise determination of the velocity values at the intersecting points of impact areas.

To achieve this step first the sensor positions are used to construct the shape of the wood, assuming stress waves propagate as straight rays. The sensor locations are linked to each other to generate a ray map representing the propagation paths of stress waves in the wood. An example ray map is given in Fig. 4a serving as the basis for our further analysis.

Fig. 4
figure 4

a Ray map obtained as a result of measuring stress waves by sensors. b Ray map which is colored according to threshold values. c Segmented ray map

Before the segmentation process, the ray map is colored according to the speed of the stress waves. Three different colors are assigned to the rays: red, green, and yellow. According to the predefined threshold values, green, red, and yellow colors show the fast, slow and moderate levels of stress waves, respectively. This color-coding scheme shown in Fig. 4b facilitated the visual identification of changes in stress wave velocities across the wood.

After the ray map is colored, to improve the analysis further the recommended ray segmentation process is performed. This enables us to acquire detailed information about stress wave propagation patterns, leading to a more precise visual representation of defective areas within wood. The segmented ray map shown in Fig. 4c is an example of a segmented ray map created to provide comprehensive visualization of defects in wood.

Green rays, which correspond to fast stress waves, represent healthy areas in the wood. Conversely, the red rays, representing slow stress waves, and the yellow rays represent the defective and potentially defective areas, respectively (He et al. 2020; Wang and Allison 2008).

The method operates in the following manner:

  1. 1.

    Initialization: The ray map, which consists of colored rays representing stress wave paths, is prepared. Each ray is assigned a color based on the speed of the stress wave with green, red and yellow indicating high low and intermediate velocities, respectively.

  2. 2.

    Segmentation Process: The segmentation process begins with each ray being divided into a certain number of segments for subsequent computations.

  3. 3.

    Neighborhood Comparison: It is checked whether the ray segments intersect or not. It proceeds on the theory that the studied area is more similar to intersecting rays than non-intersecting rays. The intersecting rays are grouped in the same segment and their average is calculated. This average value is given to the segment being studied and the segment is recolored.

  4. 4.

    Iterative Process: Upon selecting a new segment, its intersection with all rays is examined, and the color information from the intersecting ray is incorporated into the equation for recoloring the current segment. This iterative process continues until all segments in the ray map have been analyzed and grouped into their respective sections.

  5. 5.

    Recoloring of Segments: Each of the segments is recolored according to the new speed value after taking the average value of the speed of all other rays it intersects.

Classification and clustering algorithms for tomographic image generation

This step of the algorithm utilizes machine learning algorithms to accurately detect and classify different types of defects in wood based on stress wave velocity data. It involves labeling and classifying specific points on split rays, which are then inputted into clustering and classification algorithms to produce a defect image of the wood.

Segmented ray map partially provides information about the location and size of the defect; however, this is not sufficient. To observe the size of the defect, the area it covers within the tree, and the area potentially affected by the defect, the gaps in the segmented ray map need to be filled, transforming it into a tomographic image. At this point, the classification process ensures the meaningful interpretation of the gaps in the image by assigning all missing color information to one of three classes (0 (red), 1 (green), 2 (yellow)) using selected methods. The defected area is labeled “2” to be shown in red, the healthy area is labeled “0” to be shown in green, and the area between the defective and healthy area, which is thought to be affected by the defect or potentially defective is labeled “1” to be yellow.

In segmented ray maps, a random number of points are selected on each ray, between the length of the smallest ray and half of this value. The (x,y) coordinate information of the selected points and the color value of the point are saved in a data table to obtain tomography from the ray map of each tree. Sample data is shown in Table 1.

Table 1 Sample data for KNN and GPC

The representation of the data created with the ray map and selected points, colored according to their labels, is given in Fig. 5c.

Fig. 5
figure 5

a Synthetic data of sample. b The ray map of the synthetic data. c Labeled input data generated from the ray map

Different machine learning algorithms are tried to obtain a tomographic image by using the data generated from the ray map shown in Fig. 5c. Among these, experiments were continued with the most successful two methods K-Nearest Neighbor (KNN-KNeighborsClassifier) and Gaussian Process (GPC—GaussianProcessClassifier) algorithms. These algorithms have been utilized to classify each point in the data table into a specific color class. KNN performs classification by analyzing the nearest neighbors of a point, while GPC conducts classification taking uncertainties into account as well.

After the missing points are classified a tomographic map is generated using these classification boundaries, offering a visual representation of the tree’s internal structure. The delineation of classification boundaries with red, green, and yellow hues on the map denotes the nature and intensity of internal flaws. Acquiring data from multiple points and their subsequent classification can mitigate measurement errors. By averaging multiple measurements rather than relying on a single one, the error rates decrease, ensuring more dependable and precise outcomes. KNN and GPC possess the capability to rectify such errors, thereby enhancing the reliability of results, while also facilitating the visualization of these classifications during data classification. This aids in the distinct mapping of internal flaws.

The KNN algorithm follows these steps:

  1. 1.

    Data visualization: Initially, the dataset, comprising points derived from the ray map, is visualized. This step aids in understanding the distribution of the data and the classes it contains, helping to identify any patterns or clusters present.

  2. 2.

    Data storage: The training data, which consists of input features and their corresponding target labels, is stored in memory.

    No Explicit Training: Unlike traditional machine learning models where explicit training involves adjusting parameters to minimize a loss function, KNN does not have a training step in the same sense. Instead, the “training” of KNN involves simply storing the data in memory.

    Lazy Learning: KNN is often referred to as a “lazy learner” because it postpones the process of generalizing from the training data until a prediction needs to be made. This means that KNN does not explicitly learn a model during the training phase; rather, it defers computation until inference time.

  3. 3.

    Classification and visualization of results: Compute the distance between each data point and all other data points in the dataset. Typically, Euclidean distance is used for this calculation. It operates by calculating distances between each data point and its neighbors based on a specified distance metric, such as Euclidean distance:

    $$\left( {\sqrt {\left( {\left( {x_{2} - x_{1} } \right)^{2} + \left( {y_{2} - y_{1} } \right)^{2} } \right)} } \right)$$

Identify the three nearest neighbors for each data point based on the calculated distances. These neighbors will be used to classify a new data point. Examine the classes of the three nearest neighbors and assign the majority class to the data point being classified. If two neighbors are red and one is green, the new data point will be classified as red. Repeat the classification process for all data points in the dataset. Once completed, you will have classified the entire dataset into red, green, and yellow classes based on the majority vote of the three nearest neighbors.

The results obtained after feeding the data (Fig. 6a) to the KNN algorithm are shown in Fig. 6b.

Fig. 6
figure 6

a Labeled input data generated from the ray map. b Image reconstructed using KNN

The GPC algorithm follows these steps:

  1. 1.

    Data Preparation: The data structure provided consists of two arrays: one containing the coordinates of each point and the other containing the corresponding class labels.

  2. 2.

    Model Initialization: The GPC model is initialized with appropriate hyperparameters. Default parameters were used for this study and were sufficient. It is decided to use the GPC default (kernel “1.0 * RBF(1.0)”) values.

  3. 3.

    GPC process: The provided data is given to the GPC model. The GPC algorithm learns the underlying relationships between input features (coordinates) and corresponding class labels (0, 1, or 2) according to the Gaussian process framework. The algorithm estimates the posterior distribution over functions representing class probabilities for each data point.

    GPC can make predictions on new unseen data points. For each data point in the dataset, the model predicts the probability of belonging to each class (0, 1, or 2) based on the learned Gaussian process. The class with the highest predicted probability is assigned as the predicted class label for that data point.

  4. 4.

    Visualization: Classification results obtained from the GPC model are visualized. This involves plotting the predicted class boundaries like a tomographic map, similar to the visualization described for the KNN algorithm. Drawing classification boundaries with red, green and yellow tones can represent the nature and intensity of internal defects in the internal structure of the tree.

The results obtained after feeding the data (Fig. 7a) to the GPC algorithm are shown in Fig. 7b.

Fig. 7
figure 7

a Labeled input data generated from the ray map. b Image reconstructed using GPC

Experiments and evaluations

In this study, real and synthetic data are used to measure the effectiveness of the designed algorithm. Results obtained from the real data are given in Fig. 8. The first and the second columns show the names of the samples, and the actual images of the trees used as input, respectively. The images in the third column show the results of the first part of the algorithm produced in the study, namely the segmented ray maps. The images in the fourth and fifth columns show the results obtained with the KNN and GPC methods proposed in the second part of the algorithm using the ray map, respectively.

Fig. 8
figure 8

Images reconstructed using different methods for real tree data

Experiment setup

The data collected from four real trees and four synthetic tree data as mentioned before are used for the analysis. All the experiments are implemented in Visual Studio and Jupyter Notebook using a laptop Apple M1 Sonoma 14.2.1 processor, written in Python 3.9.16.

Ground truth acquisition

The real tree data are displayed in horizontal sections in order to measure the accuracy of the analysis. The obtained images are given to the edge detection algorithm and the outer wall is colored red, the defective area edges are colored green as in the 3rd column in Tables 2 and 3, and ground truth is established for each tree. The same process is applied to the horizontal section of the generated synthetic data.

Table 2 Comparison with ground truth for real tree data and results of evaluation metrics
Table 3 Comparison with ground truth of the synthetic data and results of evaluation

Figure 9 shows the results obtained from the synthetic data. The first column defines the names given to the data. The second column of the image describes the representative images of the synthetic trees used as input. Here, the dark areas represent the defective regions, and the yellow parts represent the intact regions of the tree. The images in the third column show the results of the first part of the algorithm produced in the study, namely the segmented ray maps. The images in the fourth and fifth columns show the results obtained by the KNN and GPC methods proposed in the second part of the algorithm using the ray map, respectively.

Fig. 9
figure 9

Images reconstructed using different methods for synthetic tree data

Evaluation metrics

To evaluate the performance of the method, Ground Truth and model outputs are compared. For this purpose, evaluation metrics such as Accuracy (ACC), Dice similarity coefficient (DSC), F1 score and precision (PPV) are used. Said metrics are calculated based on four parameters: TP represents the number of true positives (pixels are correctly included as defective area pixels), FP represents false positives (incorrectly included as defective area pixels), FN represents false negatives. (pixels incorrectly excluded although they are part of the defective area pixels) and TN represents true negatives (pixels correctly included as background pixels).

Pixel accuracy measures the proportion of correctly classified pixels out of the total number of pixels in the images.

$$Accuracy=\frac{ T P + T N}{T P + T N + F P + F N}$$
(1)

The Dice coefficient is a measure of overlap between two sets. In the context of image segmentation, it evaluates the similarity of the segmented regions between the two images.

$$\text{Dice coefficient }=\frac{ 2 \times |A \cap B|}{T|A|+|B|} =\frac{ 2 \times T P}{2 \times T P + F P + F N}$$
(2)

Precision is a metric that focuses on the accuracy of positive predictions (in this case, the black pixels). It measures the ratio of true positive predictions (correctly identified black pixels) to the total number of predicted black pixels.

$$\text{Precision}=\frac{ TP }{TP+FP}$$
(3)

The F1 score is a balanced metric that considers both precision and recall. It’s the harmonic mean of precision and recall, providing a single value to evaluate the performance. Like precision, higher F1 scores are better, indicating better balance between precision and recall.

$$\text{F}1\text{ score }=\frac{ 2 \times TP }{2 \times TP+FP + FN}$$
(4)

To assess the performance and accuracy of the generated outputs, a comprehensive evaluation was conducted, involving the examination of tree view results (distinguished by a green outline), the proposed method with KNN (PM with KNN) results (distinguished by a red outline), and the proposed method with GPC (PM with GPC) results (distinguished by a purple outline). The evaluation criteria comprised key metrics including Accuracy, Dice coefficient, and Precision. In Table 2, the images portray a pixel-by-pixel comparison of the regions identified as defective by the algorithms when applied to real-world data versus the actual defective areas. The objective was to quantitatively measure the level of agreement between the algorithmic predictions and the ground truth. Remarkably, the results demonstrated a substantial average similarity score of 94.7%, indicating a highly favorable concordance between the algorithm’s findings and the real defective regions, thus underscoring the effectiveness of the employed techniques in identifying defects with a remarkable degree of accuracy.

Within the context of Table 3, a rigorous examination was undertaken to scrutinize the regions designated as defective by the algorithms when applied to synthetic data. These algorithmically predicted defective areas were pixel-to-pixel against the established ground truth, serving as a benchmark for accuracy. The quantitative assessment method employed here was grounded in a pixel-level comparison, wherein every pixel in the algorithmically identified areas was assessed for correspondence with the corresponding regions in the ground truth. This meticulous analysis revealed a notable outcome, with the calculated average similarity score reaching 95.2%. This score underscores the robust similarity between the algorithm’s outcomes and the actual defective areas in the synthetic data, signifying a high degree of precision and reliability in the defect detection process.

More sensitive data needs to be collected to detect gap type defects in the tree trunk as well as other defects such as cracks and knots. For this purpose, either the rays need to be further divided during the ray map segmentation process or the measurement device needs to collect more effective data by increasing the number of sensors. If the device can effectively collect data and distinguish between internal void defects, cracks, or knots, the proposed algorithm can provide corresponding results based on the collected data. Thus, theoretically, improving the tomographic image to be generated in the second part of the algorithm for other defect types such as cracks is possible. Two synthetic data were produced to support this theory. In Fig. 10, synthetic data produced using twelve sensors and the results produced with the proposed algorithm are shown. Figure 11 shows the data produced by doubling the number of sensors and the results produced by the proposed algorithm. A crack/slit type defect was added to these synthetic data. The numerical evaluation of these outputs, which show that the results have improved, are compared according to the evaluation metrics and are shown in Table 4.

Fig. 10
figure 10

a Synthetic data with 12 sensors with crack defects, b ray map, c output of the proposed method with KNN, d output of the proposed method with GPC

Fig. 11
figure 11

a Synthetic data with 24 sensors with crack defects, b ray map, c output of the proposed method with KNN, d output of the proposed method with GPC

Table 4 Results of the examination of the tree with crack/slit type defect using 12 sensors and 24 sensors, according to evaluation metrics

Table 4 presents an evaluation of two proposed methods using different classification algorithms, KNN and GPC, on synthetic data representing a defective tree examined with different numbers of sensors. Although both results showed not bad results for the initial data with 12 sensors (S5), they did not reach the overall success of the proposed algorithm. However, KNN demonstrated higher precision, recall, F1-score, and Dice coefficient compared to GPC, indicating its better ability to correctly classify defective instances. On the other hand, for the data with 24 sensors (S5*), both methods further improved in performance, with higher accuracy, precision, and F1-score, showcasing the effectiveness of increased sensor input.

The transition from the first dataset, characterized by 12 sensors, to the second dataset, employing 24 sensors, demonstrated notable proportional improvements across various performance metrics for both KNN and GPC. For KNN, there was an average increase of approximately 9.44% in accuracy, 8.95% in precision, 15.16% in F1-score, and 20.47% in the Dice coefficient. Similarly, GPC exhibited gains of approximately 9.16% in accuracy, 10.01% in precision, 6.56% in F1-score, and 20.32% in the Dice coefficient. These enhancements underscore the significant benefits of increased sensor density in defect detection, resulting in higher accuracy, precision, and overall performance for both classification algorithms. These results clearly showed that the identification of various defect types is primarily dependent on device capabilities. It means that if the device can discriminate the types of defects our algorithm can successfully and reliably give results according to the measured data as mentioned before. The experimental results of a study by Wang et al. (2009) also confirms this outcome where the authors observed broadened images for the cracks when they used 12 sensors in the measurement.

Discussions

The steps of ray segmentation proposed in this study, along with the data points obtained from segmented rays and the generation of tomographic images using KNN and GPC algorithms, are presented in Fig. 12 using the T1 dataset. For visual comparison, Fig. 12 also includes images obtained from Du’s method and a licensed software, Arborsonic.

Fig. 12
figure 12

Images reconstructed using different methods for real tree data

Upon examining Fig. 12, it is observed that the results obtained are more successful than those obtained using Arborsonic and Du’s method. Specifically, the examination of areas to be predicted in segmented ray maps in point format, rather than grid format, has significantly increased success rates.

In order to demonstrate the robustness and reliability of our proposed model, the results obtained with the synthetic data set and our real data are given as tomographic images in Figs. 13 and 14, together with the results of related studies in the literature. The column entitled “True Distribution” shows the actual defect of the respective tree in both figures. Figure 13 shows the outputs of ArborSonic 3D software (Fakopp Enterprise Bt. 2005), Du et al.’s method (Du et al. 2018), and our proposed method with GPC and KNN algorithms, respectively. Figure 14 does not include ArborSonic 3D results. This is because synthetically generated data cannot be uploaded to the public use part of the application. ArborSonic 3D software has only been tested with data collected from real trees.

Fig. 13
figure 13

Comparative analysis of tree defect detection methods on synthetic data

Fig. 14
figure 14

Comparative analysis of tree defect detection methods on real tree data

In the wood interior defect detection, our proposed algorithm with KNN demonstrated notable superiority over existing methodologies. Specifically, our model outperformed the state-of-the-art methods by a significant margin.

Comparing our results of real tree data with the algorithm proposed by Du et al. (2018) and ArborSonic 3D, we found improvements in ACC between 7% and 14%, PPV between 17% and 22%, F1-Score between 7% and 10%, and DSC between 8%-12%. There is a significant increase in the rates. The results of the metrics are shown in Table 5 for each algorithm.

Table 5 Performance metrics comparison of wood interior defect detection algorithms on real tree data

Comparing our results of synthetic data with the algorithm proposed by Du et al. (2018), we observed a substantial improvement of 8.65% in ACC, 10.36% in PPV, 4.74% of F1-Score and 14.64% DSC. The results produced for synthetic data are shown in Table 6.

Table 6 Performance metrics comparison of wood interior defect detection algorithms on synthetic data

For real tree data, our algorithm with KNN consistently exhibited higher accuracy, positive predictive value, F1 score, and dice similarity coefficient than our algorithm with GPC, ArborSonic 3D, and Du’s method. The comparison demonstrates that our proposed method with KNN algorithm has superior detection accuracy in capturing intricate patterns associated with internal defects.

Conclusion

In this research study, the objective is to estimate the internal defects in wood by analyzing the transmission time of stress waves between sensors. To achieve this, a comprehensive algorithm based on machine learning is developed, consisting of two main components. It was concluded that integration of machine learning techniques with the segmented ray data facilitated accurate defect detection and mapping within the wood structure. The study produces significantly more reliable results than previous studies and applications. Its effectiveness was assessed using four evaluation criterias revealing a success rate exceeding 90% across all metrics. When compared to prior research, the outputs demonstrated enhancements ranging from 7 to 22% in performance.

As a potential avenue for future research, the proposed method can be further enhanced by exploring alternative approaches for calculating threshold values. This exploration could involve various methodologies and techniques to determine optimal thresholding strategies, thereby refining the accuracy and robustness of the defect estimation process. Such advancements have the potential to strengthen the overall efficacy and applicability of the algorithm, contributing to the field of wood defect analysis and characterization. Another potential improvement is considered as the incorporation of additional features or data sources to enhance the classification accuracy.