Introduction

Background

For clear wood specimens of softwood species, strong relationships exist between the distance to pith and different mechanical and physical properties. For Norway spruce [Picea abies (L.) H. Karst], density, longitudinal modulus of elasticity (MOE) and modulus of rupture (MOR) increase significantly in the radial direction from pith to bark, whereas the longitudinal shrinkage coefficient decreases from pith to bark in the same direction (Blouin et al. 2007; Ormarsson et al. 1999). In general, the annual ring width also decreases from pith to bark, but thinning of trees in the stand may change this condition. Mechanical properties of sawn structural timber depend on both clear wood properties and occurrence of knots (Johansson 2003; Kliger et al. 1998), meaning that relationships between different properties of sawn timber are not identical to those valid for clear wood. Nonetheless, pith location and annual ring width, and how these properties change in the longitudinal direction of a board, are relevant for assessment of stiffness and strength (Hu et al. 2016) and shape stability (Ormarsson et al. 1999) of sawn timber. Knowledge of pith location is also needed to establish detailed and accurate three-dimensional (3D) models of sawn timber, including geometry of knots and local fibre orientation on the basis of surface scanning, and attempts to develop such models have been made in Hu et al. (2016) and Lukacevic et al. (2019). Furthermore, pith location and annual ring width affect the visual appearance of wood products. Board pieces with the pith visible on the surface are often downgraded to lower appearance classes (EN 1611–1:1999, 1999). In some cases, boards with the pith enclosed within the cross section should be rejected (EN 1611–1:1999, 1999). Thus, it would be of practical value if the commercially available scanners used for automated assessment of wood specimens could be used to identify annual rings and accurately determine location of pith.

Over the years, few attempts have been made to non-destructively detect the pith location of sawn timber boards (Briggert et al. 2016; Habite et al. 2020; Perlin et al. 2018). In the work presented by Perlin et al. (2018), a method was proposed to locate the pith of a wood cross section by utilising an ultrasonic tomography measurement technique. The method comprised mounting a fixed transmitter transducer and moving the receiver transducer around the cross section of the specimen to record several readings of ultrasonic pulse velocities (UPVs). The basis for the proposed method was that acoustic waves travel faster in radial direction than in tangential direction. Therefore, by mapping the direction of the highest UPVs values, according to Perlin et al. (2018) the pith can be located at a position where most of these high-velocity paths intersect. However, only two test specimens, a 25-cm-diameter circular Eucalyptus grandis specimen and a 20 cm square Apuleia leiocarpa, were used to validate the proposed method. According to Perlin et al. (2018), the accuracy of the proposed method could be affected by the presence of internal defects within the timber cross section.

Briggert et al. (2016) developed a method to reconstruct the 3D geometry of knots on the basis of data from surface laser scanning of Norway spruce timber boards. The method comprised detection of knot areas visible on the longitudinal surfaces of the board by means of tracheid effect scanning (Soest et al. 1993). Briggert et al. (2016) further utilised the detected orientation of knots to estimate the pith location along the length direction of the board. However, application of the method required the pith to be located outside the cross section of the assessed board. In addition, to be able to determine which knot surfaces (visible on different board surfaces) are parts of the same knot, knowledge of an approximate location of pith was needed already from the outset. This information was obtained by examination of the end cross section at one of the board ends.

In addition to the above-mentioned studies, numerous studies have utilised images of cross sections of logs generated from computer tomography (CT) X-ray scanning to predict the pith location of logs. Most of the studies involved (1) detection of growth rings on the cross-sectional CT images of the log slices with an assumption that the growth rings are concentric circles centred at the pith, and (2) application of Hough transform (HT) to the detected growth rings to estimate the pith location of the log slices. For a brief presentation of these research works, see Habite et al. (2020).

Habite et al. (2020) developed an algorithm, which was based on the information obtained from optical scanning of timber boards to automatically and non-destructively estimate the pith location of knot-free clear wood sections along Norway spruce boards. The first step in the proposed method was to automatically identify knot-free clear wood sections along the board by considering local fibre directions on the surfaces. Continuous wavelet transform (CWT), with the generalised Morse wavelet method, was then applied to low-pass-filtered images of boards to detect the annual ring width on all four longitudinal surfaces around the board. Finally, assuming that annual rings are shaped as concentric circles with the pith in the centre and with constant distance between the rings, the pith location of knot-free board sections was estimated through an optimisation technique. The proposed algorithm was applied to a total sample of 104 Norway spruce boards, and for a sub-sample of boards with the pith located within the cross section, median estimation errors of 2.3 mm and 3.1 mm in the larger and smaller direction of the board cross section, respectively, were obtained. For a larger sub-sample of boards with the pith located outside the board cross section in most positions along the boards, slightly higher estimation errors were obtained, with a median of 2.6 mm and 3.8 mm in the respective directions.

The facts that the method proposed in Habite et al. (2020) is based on data obtained from a high-speed industry scanner, and that it has been shown to provide accurate results, are promising from an industrial perspective. However, the method is limited by the inherent assumptions that the growth rings would be concentric circles with the pith in the centre and that the distance between consecutive growth rings would be constant. Annual rings of real board cross sections do not always comply very well to these assumptions. In addition, the filter parameters needed for the low-pass pre-processing technique may need frequent manual adjustment, depending on the quality and characteristics of the scanned board surfaces. Unfortunately, this may be an important obstacle for industrial use. Moreover, the applied optimisation algorithm, as it was implemented in Habite et al. (2020), made the calculation time too long with respect to the typical industrial speed requirement of about one second per board. To overcome these disadvantages, it is worthwhile to evaluate alternative methods to determine the location of pith on the basis of data obtained from industry speed optical scanning.

Purpose, objectives and limitations

The purpose of the present study is to examine the possibility of developing an accurate, operationally simple and robust deep learning-based method and algorithm, which is solely based on information obtained from optical scanning of longitudinal surfaces, to estimate the pith location of Norway spruce timber boards. The information obtained from optical scanning is raw RGB images of board surfaces without application of any image pre-processing. The work comprises two objectives, namely to accurately detect the discrete growth rings visible on the four sides of the longitudinal timber board surfaces and to determine the pith location at any clear wood section along timber boards on the basis of the detected surface growth rings.

The scope of the current study is limited to applications to knot-free clear wood cross sections of planed Norway spruce timber boards. However, the pith location for cross sections that include knots could be estimated using linear interpolation between the pith locations determined at adjacent clear wood sections, and the method presented should be applicable to many other species besides Norway spruce.

Material and data obtained from scanning

A total sample of 112 planed Norway spruce timber boards with nominal dimensions of \(45\times 145\times 4500\,\hbox {mm}^{3}\) originating from the areas around the lake Siljan in mid-Sweden and Hamina in south Finland were analysed. Out of the 112 boards, seven boards were used to train and one to validate an algorithm developed herein for detecting each individual growth ring on the four sides of the board. The remaining 104 boards were used to test another algorithm developed for estimation of pith location on clear wood sections along the boards, after detection of growth rings on surfaces. The sample of 104 boards was further divided into two subsets, consisting of 4 and 100 boards, respectively. The boards in the first subset had the pith located within their cross sections, and these boards were physically available for comparative manual assessment. The boards in the second subset were available in digital form through high-resolution RGB images, in-plane fibre direction information of all the four surfaces obtained from scanning of surfaces, and manually determined pith locations. Regarding the position of pith, the second subset contained boards with pith located both inside and outside their cross sections. The sample of \(4+100\) boards was identical with the sample used in Habite et al. (2020).

The data used to detect annual rings on board surfaces were obtained using an optical industry wood scanner equipped with LED lights, colour cameras, multi-sensor cameras, and line and dot lasers. Data delivered by the scanner consist of red, green and blue (RGB) channel images, and data of local in-plane fibre direction, of all the four sides of the scanned timber board. An approximate pixel size in the RGB images is \(0.8\times 0.07\,\hbox {mm}^{2}\) (lengthwise \(\times\) crosswise resolution), and the resolution of the local in-plane fibre direction data is approximately \(1\times 4.4\,\hbox {mm}^{2}\) (lengthwise \(\times\) crosswise). The resolution of the RGB images is about \(2070\times 5625\) and \(642\times 5625\) pixels for the wider \(145\times 4500\,\hbox {mm}^{2}\) and the narrower \(45\times 4500\,\hbox {mm}^{2}\) surfaces, respectively. The in-plane fibre directions were determined by utilising the so-called tracheid effect, which means that when a concentrated light source illuminates a wood surface, parts of the light will scatter into the cell structure and this scattered light will transmit more in the direction parallel to the fibres (tracheids) than in perpendicular direction (Briggert et al. 2018; Soest et al. 1993). All the boards included in this study had already been dried to 12 % MC and examined, manually and by means of an optical scanner, within a previous research project reported in Olsson and Oscarsson (2017), which facilitated the present study. Additional set-up details of the scanner are also provided in Olsson and Oscarsson (2017).

Artificial neural networks

Artificial neural networks (ANNs) are machine learning models that are loosely based on the framework of neurons in human’s central nervous system. A typical ANN consists of nonlinear processing units called artificial neurons arranged in layers and interconnected by a number of connections. As any other machine learning method, ANNs learn the required knowledge from a given training data set. The learned experience is stored in the connections between the ANN’s neurons. The following subsections provide a brief background on the ANN models used in this article including multilayer perceptrons, convolutional neural networks and conditional generative adversarial networks. Readers familiar with these ANN models can skip ahead to “Method” section.

Multilayer perceptrons

Multilayer perceptrons (MLPs) are perhaps the most widely used class of ANNs. As illustrated in Fig. 1, MLPs are composed of a number of interconnected MLP neurons arranged in layers. The first layer of an MLP is called the input layer, while the last one is called the output layer. The layers lying between the input and the output layers are referred to as the hidden layers. The number of neurons in the input and output layers is determined by the number of inputs and outputs of the modelled system, respectively. On the other hand, the number of hidden layers in the network along with the number of neurons in each hidden layer is hyperparameters that should be defined by the designer before running the training process.

Fig. 1
figure 1

Schematic of a typical MLP (fully connected) neural network showing the components of an MLP neuron

MLPs fall in the category of multilayer feedforward ANNs since the inputs are only allowed to propagate in the forward direction. Each neuron in any MLP layer is connected to all neurons in the preceding layer, which is why MLPs are commonly referred to as fully connected networks.

The artificial neurons of an MLP network are nonlinear units composed of the following components (Goodfellow et al. 2016):

  1. 1.

    Connection links that connect the neuron to all neurons in the preceding layer. A scalar called the connection weight \(w_{ik}\) is assigned to each link, where the subscript i denotes the neuron at the input end of the link, while the subscript k represents the neuron at the receiving end (i.e. the current neuron).

  2. 2.

    A linear aggregator that sums the weighted inputs from the N preceding neurons together with a bias \(\theta _k\):

    $$\begin{aligned} x_k = \theta _k + \sum _{i=1}^{N} w_{ik}y_i \end{aligned}$$
    (1)
  3. 3.

    An activation function f(.) that processes \(x_k\) to produce the final output of the neuron \(y_k\):

    $$\begin{aligned} y_k = f(x_k) \end{aligned}$$
    (2)

MLPs belong to the class of supervised neural networks, which means that they are trained over a data set (sample) that contains a number of input observations along with the corresponding desired targets. The weights \(w_{ik}\) and biases \(\theta _k\) are initially assigned with random values. The random ANN parameters are then tuned through a systematic and iterative training process that involves two operations: forward and back-propagation. In forward propagation, an input observation is propagated in the forward direction until the output emerges from the output layer. A certain loss function is then used to compute the error between the actual output of the neural network and the desired target associated with the applied input observation. Mean squared error (MSE) and mean absolute error (MAE) are examples of commonly used loss functions. The computed error is then back-propagated from the output layer through the hidden layers and finally to the input layer. During the back-propagation process, the sensitivity of each weight and bias in the network to the error is obtained. The sensitivities are then used to iteratively update the ANN parameters until a certain stopping criterion is achieved. Several gradient descent (GD) optimisation methods can be used in the training process such as stochastic gradient descent (SGD) presented in Ruder (2016) and Adam optimiser in Kingma and Ba (2014). In GD optimisation algorithms, the learning rate controls the size of the step taken at each iteration towards a local minimum of a loss function until convergence (Ruder 2016). Therefore, the learning rate is another key hyperparameter in the training process that determines how fast the ANN weights are adjusted with respect to the calculated sensitivities (Goodfellow et al. 2016).

Conditional generative adversarial networks

Convolutional neural networks (CNNs) are another type of ANNs commonly used for image classification and processing. A standard CNN consists mainly of alternating convolution and pooling layers, which are responsible for extracting features (like for example a vertical boundary line between two fields of different colour) from the input image. Each convolution layer is composed of a number of 2D weights known as filters or kernels. The input to a convolution layer is convolved with the kernels and then activated by an activation function in order to extract feature maps. This process can be expressed as:

$$\begin{aligned} y_{j}^l = f\left( b_{j}^l + \sum _{i=1}^{N_{l-1}} y_{i}^{l-1} *k_{ij}^l\right) \end{aligned}$$
(3)

where \(y_{j}^l\) is the jth feature map of the current layer, \(y_{i}^{l-1}\) is the ith feature map of the previous layer, \(k_{ij}^l\) is the kernel between the ith feature map of the previous layer and the jth feature map of the current layer, \(b_{j}^l\) is the 2D bias associated with the jth feature map of the current layer, \(N_{l-1}\) is the number of kernel in the previous layer, f(.) is the activation function, and the operator \(*\) denotes a standard convolution operation. The extracted features \(y_{j}^l\) are then down-sampled by a pooling layer in order to enhance the performance of the CNN and reduce the computational burden.

A U-net is a special CNN that is suitable for image-to-image translation tasks (Ronneberger et al. 2015). It consists of successive convolutional and pooling layers followed by a number of deconvolution and upsampling layers. The contracting part of the U-net (i.e. the convolution and pooling layers) extracts deep features from the input image, whereas the expansive part of the network (i.e. the deconvolution and up-sampling layers) uses the extracted features to construct a full-resolution output image that corresponds to the input image.

Training of the U-net is carried out in a supervised manner using a training data set composed of input images together with the corresponding ground truth images, the latter representing what we want the U-net to produce as output images on the basis of input images. During the training process, MAE or MSE is typically used as loss functions for computing the error between U-net output and the desired target. The objective of the training process is hence to minimise the Euclidean distance between the U-net output and the truth pixels over all input–target samples in the training data set. However, it was found that relying on the Euclidean distance as a loss function often results in unrealistic blurry output images. To overcome this limitation, conditional generative adversarial networks (conditional GANs or cGANs) have recently been proposed by Isola et al. (2017).

Fig. 2
figure 2

The generator of Pix2Pix. The “Encode” blocks denote a convolution + batch normalisation + activation operation. The “Decode” blocks represent a deconvolution + batch normalisation + activation operation. The dashed arrows represent “skip connections” introduced to enhance the performance of the generator. The generator attempts to translate the input image into a believable output image that is indistinguishable from the target image in the training data set

Fig. 3
figure 3

The discriminator of Pix2Pix. The “Encode” blocks denote a convolution + batch normalisation + activation operation. The discriminator attempts to determine whether the unknown image is “true” (i.e. same as the target image in the training data set corresponding to the input image) or “false” (i.e. generated by the generator)

Conditional GANs are image-to-image translating tools consisting of two CNNs, the generator and the discriminator. Both CNNs are trained simultaneously over a data set of input–target pairs. The generator is responsible for translating the input image to an output image. The discriminator assesses the input image together with a corresponding unknown image to determine whether the unknown image is “true” (i.e. analogous to the target image in the training data set) or “fake” (i.e. an output image generated by the generator). The generator is therefore trained to “trick” the discriminator by producing output images that are indistinguishable from target images. Meanwhile, the discriminator is trained to become better at distinguishing between output/fake images generated by the generator and target images. The idea of this adversarial training process is to use the discriminator’s output as a loss function in the training of the generator instead of relying exclusively on MAE or MSE.

In this work, a powerful cGAN model called “pix2pix” (Isola et al. 2017) was trained to translate RGB images of scanned boards to a binary output that represents the growth rings. The choice of Pix2Pix model was motivated by its success in challenging image-to-image translation problems including translating aerial photographs to maps and sketches to photographs as well as semantic labelling of scenes (Isola et al. 2017). As shown in Fig. 2, the generator of Pix2Pix is a modified version of the U-net designed to translate a \(256\times 256\) pixels RGB input image into an output image of the same resolution. Note that the output image shown in Fig. 2 is very similar to a target image which implies that the generator is well trained.

Fig. 4
figure 4

A single training iteration of Pix2Pix cGAN. a Discriminator training, b generator training. The “input” here is a \(256\times 256\) pixels RGB image representing a portion of a scanned board. The “output” is the output image generated when the generator processes the input image. The “target” is the ground-truth image in the training data set corresponding to the input image. The weights and biases of both the generator and discriminator are iteratively tuned using the “optimiser” in an attempt to minimise the total loss

The discriminator of Pix2Pix (Fig. 3) is another CNN that takes an input image together with an unknown image and tries to determine whether the second image is a true, target image or an output image produced by the generator. The output of the discriminator is a \(30\times 30\) matrix. Each element of this matrix represents the believability of each \(70\times 70\) overlapping portion of the unknown image. An output matrix of zeros (red colour) indicates that all portions of the unknown image are certainly produced by the generator, while a matrix of ones (green colour) indicates that the unknown image is indistinguishable from the ground-truth target image corresponding to the input. The unknown image shown in Fig. 3 is actually a rather poor output image produced by a not very well-trained generator. For such an unknown image, a discriminator that is properly trained produces a \(30\times 30\) matrix with many numbers close to zero (i.e. red colour in the matrix illustrated in Fig. 3) which indicates that the unknown image is a rather poor output image, clearly distinguishable from a target image.

In order to train Pix2Pix, it is required to train both the discriminator and generator according to the procedure illustrated in Fig. 4. The first step is to randomly initialize the parameters (i.e. weights and biases) of both CNNs. An input image from the training data set together with the corresponding target image is then fed into the discriminator; see Fig. 4a. The \(30\times 30\) output matrix is compared with a \(30\times 30\) reference matrix of ones. The error between the output and reference matrices, computed in terms of sigmoid cross entropy, is called the real loss. Next, the input image is fed into the generator that produces an output image. Both the input and output images are sent to the discriminator, which computes another \(30\times 30\) matrix. Sigmoid cross-entropy is used to calculate the error (the generated loss) between the resulting matrix and a reference \(30\times 30\) matrix of zeros. The total discriminator loss (i.e. real loss + generated loss) is then used to update the discriminator parameters. After that, the input image together with the output image produced by the generator is fed into the updated discriminator; see Fig. 4b. Sigmoid cross-entropy between the output of the discriminator and a reference matrix of ones is then calculated. The resulting loss is denoted by \(L_{cGAN}\). The error between the output and target images is also computed in terms of MAE (\(L_{MAE}\)). The total generator loss is calculated as (Isola et al. 2017):

$$\begin{aligned} L_{total}=L_{cGAN} + \lambda L_{MAE} \end{aligned}$$
(4)

where \(\lambda\) is a weighting factor for \(L_{MAE}\). The total generator loss is then used to update the parameters of the generator. This adversarial training procedure is iterated over all images in the training data set and repeated for a number of training epochs. The output of a successful cGAN training process is a generator capable of producing realistic images that cannot be distinguished from the ground-truth images even by a well-trained discriminator.

Method

In this section, the method employed to automatically detect individual growth rings and estimate the pith location of timber boards is presented. The algorithm is solely based on information obtained from industrial optical scanning of longitudinal surfaces. In order to verify the results obtained from the automatic algorithms, manual determination of pith locations is performed as well. Accordingly, this section is divided into two sub-sections giving details of the employed automatic and manual procedures, respectively.

Automatic procedure for estimation of pith location

The method developed to detect discrete surface growth rings visible on the four sides of boards and to estimate pith location of clear wood sections along the boards consists of three automatic steps:

  1. Step 1:

    Identify the knot-free clear wood sections along boards on the basis of the knowledge of local fibre orientation obtained from tracheid effect scanning.

  2. Step 2:

    Detect individual growth rings that are visible on all four sides of the board, on the basis of RGB images of the scanned board surfaces, by using trained cGANs.

  3. Step 3:

    Estimate the pith location for the identified clear wood sections along timber boards using a trained MLP neural network.

Regarding identification of knot-free clear wood sections, a procedure presented in Habite et al. (2020) was used. According to Habite et al. (2020), a clear wood section is defined as the centre of a 10-mm-long segment in the longitudinal board direction, across the four sides within which a maximum of 10 % of all the determined in-plane fibre directions have an angle that exceeds \(12^o\) with respect to the longitudinal direction of the board. In-depth explanation of the remaining two steps is presented in the following sub-sections.

Detection of surface growth rings

Conditional generative adversarial networks (cGANs) (see “Conditional generative adversarial networks” section) were trained and used to detect individual growth rings visible on the four surfaces of boards. Out of the total 112 Norway spruce boards investigated in the current study, seven boards were used to generate input–target training data sets required for cGAN training, and one board was used as a control board to validate the accuracy of the trained cGANs. The annual ring pattern visible on the wide sides of the investigated boards (145 mm) was quite different from ring width and pattern visible on the narrow sides of the boards (45 mm). Annual ring widths observed on narrow sides were larger. Due to such annual ring pattern difference together with the limited available size of the training data set, two separate input–target training data sets were generated, one using the two wide sides and the other using the two narrow sides of the seven boards. These two data sets were used to train two corresponding cGANs, one to detect growth rings visible on wide sides and the other to detect growth rings visible on narrow sides of boards. With a larger training data set, i.e. more than seven boards, it should be possible to train a single cGAN to capture all annual ring patterns/ring widths occurring on any side. Regarding the current use of two different networks for wide and narrow sides, respectively, it was noted that parts of the wide face surfaces, where annual rings are more or less tangential to the surface, actually have a ring pattern that looks more like the patterns on the narrow face surfaces. Still, the same network was used for all areas on wide surfaces.

The adopted cGAN is a Pix2Pix model designed to translate a \(256\times 256\) pixels RGB input board image into a binary output image of the same resolution (Isola et al. 2017). In the output binary image, the growth rings visible on the surface of the board (borders between late wood and early wood) are represented by ones (1) and the background by zeros (0). Accordingly, the training data sets were generated by following the same input–output/target structure with a resolution of \(256\times 256\) pixels. The input images of the data set were obtained by sliding a \(256\times 256\) pixels window over the RGB images of the four sides of the boards with an overlap of 200 and 70 pixels for the wide and narrow sides, respectively. The target part of the data set was produced by manually tracing the surface growth rings visible on the four sides of the boards to create binary images corresponding to annual rings on the RGB images of the seven boards. The resulting binary images were then sliced into several \(256\times 256\) pixels binary images to match the input RGB images produced from the scanning data.

Fig. 5
figure 5

Examples of input–target pairs of the training data sets: a no augmentation applied and b two different augmentation techniques applied, namely \(90^{\circ }\) rotation (left) and 50 % horizontal shrinking (right)

Fig. 6
figure 6

a Manually traced rings plotted over the colour image of part of a board, b cGAN-detected surface growth rings, c zoomed-in RGB image of part of a board, d zoomed-in image of cGAN-detected surface growth rings, and e annual ring width distribution for manually traced and cGAN-detected annual rings

Before proceeding to the training stage, two data augmentation procedures were applied to the training input and target images with the aim to enrich the training data sets and improve the performance of the cGANs. The augmentation procedures were to

  1. Step 1:

    rotate the input and target images by \(90^{\circ }\) in the counterclockwise direction to enhance the generalisation ability of the cGAN and

  2. Step 2:

    shrink the input and target images by 50 % in the horizontal direction in order to improve the cGANs ability to detect closely spaced growth rings.

Image pairs resulting from each of the augmentation procedures were added to the original input–target image pairs (giving three times as many pairs as the original number) and shuffled randomly to constitute the final training data set. With such procedure, 9,981 \(256\times 256\) pixels input–output pair RGB training images were generated. Figure 5a shows the examples of six \(256\times 256\) pixels RGB input images paired with the corresponding \(256\times 256\) pixels binary target images with no augmentation applied. Figure 5b shows the examples of another six \(256\times 256\) pixels RGB input images paired with the corresponding \(256\times 256\) pixels binary images, where the first augmentation procedure is applied to the three image pairs to the left and the second procedure is applied to three pairs to the right. Finally, the generated training data sets were used to train the cGANs from scratch using the adaptive moment estimation (Adam) optimiser (Goodfellow et al. 2016) with an initial learning rate of 0.0002 for 200 epochs. A Python (Team 2019) code based on the TensorFlow 1.14 implementation of Pix2Pix cGAN developed by Isola et al. (2017) was used to train the cGANs. The weighting coefficient \(\lambda\) in Eq. 4 was taken as 100 as recommended in Isola et al. (2017).

Since the cGANs were trained over \(256\times 256\) pixels images, the first step in applying the trained networks was to partition scanned RGB board images into images of size \(256\times 256\) pixels. Then, the trained cGANs were applied to the resulting \(256\times 256\) pixels images to generate binary images that were finally stitched together to match the original RGB images of the boards.

The trained cGANs were validated herein using the control board. Figure 6a shows the RGB image obtained from scanning of a part of the wide side of this board. Dark lines, which represent manually identified annual rings, are drawn on top of the RGB image. Figure 6b shows the translated and stitched binary image indicating the surface annual rings detected by the cGAN. Figure 6c, d shows zoomed-in images of a selected part from Fig. 6a, b, respectively. A selected section along the board is marked by a red line in Fig. 6a, c and a green line in Fig. 6b, d. In Fig. 6e, the lateral distance between the consecutive identified annual rings at this section is plotted. Red colour is used for a graph representing the distance between manually identified rings and green colour for a graph representing the distance between cGAN-detected rings. A local cGAN surface error is defined herein as the absolute difference in annual ring distance between the manual and the cGAN-based detection in a position on the board surface. Thus, the distance in vertical direction of the red and green graphs in Fig. 6e constitutes the local cGAN surface errors along the displayed section.

Figure 7 shows the local cGAN surface error at every grid point (resolution \(5\times 2\,\hbox {mm}^{2}\), in lengthwise \(\times\) crosswise direction) for the side and part of the board also shown in Fig. 6a, b. As can be seen, the highest error is registered at a section where a knot is present. It should be noted, however, that the cGANs were not trained to detect annual rings on surfaces containing knots. Moreover, Table 1 presents the statistics of the mean local cGAN surface errors in terms of the root-mean-square errors calculated for individual sections along the entire length of the control board.

Fig. 7
figure 7

a Manually traced rings drawn on top of the RGB image of part of a board, b cGAN-detected surface growth rings, c local cGAN surface error with a range of 0–30 mm, and d local cGAN surface error with a range of 0–5 mm

Table 1 Statistical results for the mean local cGAN surface error in terms of root-mean-square errors calculated for the individual sections along the control board

Automatic estimation of pith location

Once the discrete surface growth rings are detected, the next step is to estimate pith locations of the identified clear wood sections along the board. According to the study presented in Habite et al. (2020), the pith location of a clear wood section can be related to the annual ring width distribution visible across the four sides of a clear wood section. In Fig. 6e, the annual ring width distribution (in terms of distance between adjacent rings) across one wide face of such a section is shown. In the present research, an MLP network was trained to estimate the pith location by taking the annual ring width distribution of the four sides as an input. To train this MLP network, it is necessary to have a data set that contains a large number of inputs, which are annual ring width distributions of the four sides of sections, along with the desired targets, which are the corresponding pith locations. Obviously, obtaining such data set for actual boards is rather difficult. Therefore, an artificial training data set was generated and it consisted of artificial annual ring width distributions of simulated board cross sections together with the corresponding pith locations. The artificial cross sections were intended to simulate clear wood cross sections of dimensions \(45\times 145\,\hbox {mm}^{2}\).

The first step in generating an artificial board cross section was to randomly select the x- and y-coordinates of a pith location, \((x_p,y_p)\) within a specified domain in relation to the cross section. The second step was to generate a finite number of discrete circles that are sufficient to cover the cross section by using the following equations:

$$\begin{aligned} r_i= & {} \sqrt{[x -(x_p + n_i^x)]^2 + [y -(y_p + n_i^y)]^2} \end{aligned}$$
(5)
$$\begin{aligned} r_i= & {} r_{i-1} + dR_i; r_0 = 0 \end{aligned}$$
(6)

where \(r_i\) represents a radius of the ith discrete circle, corresponding to the ith annual ring of a real tree. Of course, annual rings of real trees are not perfectly concentric circles. In order to take this into account, to some extent, an eccentricity was applied to the centre of each generated circle by adding a random noise \((n_i^x,n_i^y)\) to the x- and y-coordinates of the pith location \((x_p,y_p)\). The random noise \((n_i^x,n_i^y)\) was selected from a normal distribution of mean 0.05 mm and standard deviation of 0.2 mm. As can be seen in Eq. 6, a radius \(r_i\) is calculated by adding a small radial increment \(dR_i\), to the radius of the preceding discrete circle of radius \(r_{i-1}\). The radial increment \(dR_i\) is a stochastic value obtained from normal distribution for which the mean value and the standard deviation both depend on i as defined in Table 2. The mean values and standard deviations applied are based on measurements of the radial variation of annual growth ring widths of 35–70-year-old Norway spruce trees (Blouin et al. 2007).

Table 2 Mean values and standard deviations for radial increments used to generate circles corresponding to annual rings of artificial board cross sections
Fig. 8
figure 8

Artificially generated board cross section, corresponding annual ring width distributions on the four sides and the orthogonal coordinate system used in this work

Once the artificial annual rings were obtained, the next step was to identify the positions of intersection between the annual rings and the four sides of the cross section. Then, the distances between adjacent intersection points were calculated to get the annual ring width distribution of the four sides. Figure 8 shows an example of artificially generated cross section of dimensions \(45\times 145\,\hbox {mm}^{2}\) with the pith location indicated by a red cross and the extracted annual ring width distribution of the four sides.

From the annual ring width distribution, a fixed number of data points was extracted from each side by linear interpolation with a resolution of about 2 mm. This was because the size of the input and output layers of an MLP network must be kept constant, as described in “Artificial neural networks” section. Thus, for the 145-mm-wide sides of the cross section, i.e. top and bottom surface, 72 data points from each side were extracted. From the left and right side of the cross section, a total of 42 data points (21 from each side) were extracted. Once the artificial annual ring width distributions of simulated board cross sections together with the corresponding pith locations were defined, it was possible to produce the training data set for the MLP neural network. In the current study, a total of 100,000 artificial cross sections were used to generate the training data set. The input layer of the MLP network consisted of a column vector obtained by concatenating the extracted data points from all the four sides in a consistent order. This resulted in a training data set which consisted of an input matrix of size \(186\times 100,000\) and an output matrix of size \(2\times 100,000\), the latter matrix giving the x- and y-coordinates of pith locations of the artificial cross sections.

Fig. 9
figure 9

Training performance measures presented as MSE

Out of the training data set, 70 % of the sample was used to train the MLP network, 15 % was used for validation, and the remaining 15 % for testing the trained network. Having a data set for validation is necessary to prevent the network from overfitting the training data (Goodfellow et al. 2016). The neural network was trained from scratch in TensorFlow 2.0 (Abadi 2016), using the adaptive moment estimation (Adam) solver with an initial learning rate of 0.001 and the rectified linear unit (ReLU) activation function for 200 epochs. The training performance was assessed by calculating the MSE between the predicted pith location and the target pith location included in the output part of the training data set. Figure 9 shows the performance of the MLP network in terms of MSE for both the training and validation samples over the 200 epochs.

Manual determination of pith location

For the first subset of four boards, where the pith was located within the board cross sections, manual measurement of pith locations was done by first cutting the boards at the selected clear wood sections and then using a ruler to measure the horizontal and vertical distance, respectively, from one corner of the cross section to the pith. A predefined coordinate system as shown in Fig. 10a was applied. One error source that affects the result is related to the limited precision obtained by the naked eye while measuring the x- and y-coordinates of the pith with a ruler. Another one is related to the fact that board cross sections are not exactly rectangular in shape, for example due to warping during drying, and thus do not comply perfectly with the orthogonal coordinate system used to define positions. Still, the estimated precision and accuracy obtained should be within one millimetre, giving a manual pith error of up to about one millimetre.

Fig. 10
figure 10

Manual detection of pith; a measurement of pith location for subset 1, b plastic sheet applied to pith location of boards of subset 2, c concentric circles fitted to annual rings, d scatter plot of the manually determined pith location for subset 2

Fig. 11
figure 11

Absolute difference between manually and algorithmically determined pith locations for boards of subset 1 where a cGAN and b manual tracing are used to identify annual rings in the algorithmic determination

For the second subset of 100 boards, pith locations were determined only at the two end cross sections of each board, resulting in 200 manually determined pith locations. The method was to use a transparent plastic sheet with a coordinate system, a scale and closely spaced concentric circles drawn upon it; see Fig. 10b. By trying to fit the concentric circles of different radii to the growth rings visible on the board end cross sections, as illustrated in Fig. 10c, the pith locations were determined both for cases where the pith was located either within or outside the board cross section. In Fig. 10d, a scatter plot of the 200 pith locations determined this way is displayed. About 60 % of the pith locations were located outside the board cross section. Regarding precision and accuracy, the result presented in Fig. 10d reveals that a precision of about 5 millimetre was applied. (Note, for example, the vertical distance between some blue marks.) The accuracy obtained depends on several factors and in cases where the pith was located outside the cross section it may be rather low, especially for cases where the pith was located far outside the cross section. For such cross sections, the manual determination was most difficult in cases where annual rings visible on the cross sections did not coincide with concentric circles and/or when knots were present in the end cross section. Overall, it is assessed that the manual pith error for board cross sections of subset 2 was often about 5 mm and for some cases probably even larger.

Results and discussion

As described in “Method” section, the proposed automatic method to estimate the pith location of a clear wood section consisted of three steps. The first was to identify knot-free clear wood sections, the second to detect the surface annual rings visible on all the four sides of the board by using the trained cGAN networks and the third to use the trained MLP network to estimate pith locations along clear wood sections. This was done for the 4+100 Norway spruce timber boards described in “Material and data obtained from scanning” section. Comparisons between automatically/algorithmically and manually determined pith locations were made for the two subsets of 4 and 100 boards, respectively. This gives the basis for assessment of the performance of the suggested algorithms.

Assessment on the basis of subset 1

For the first subset of four boards, pith locations were estimated on an average of around 11 clear wood sections per board (clear wood sections along boards were identified automatically based on tracheid effect scanning and a criterion of straight fibres in the section, for details see Habite et al. 2020), resulting in a total of 45 estimated pith locations. The errors involved in the automatically and manually determined pith locations can be divided in three different categories or error sources. Namely, the errors introduced during the manual estimation of the pith location (manual pith error), the error introduced during the cGAN surface annual ring detection and the error introduced during MLP pith location. The manual pith error, which is assumed to be much larger for board cross sections of subset 2 than of subset 1, are discussed in “Manual determination of pith location” section. Errors related to the cGANs are to some extent illustrated and discussed in “Detection of surface growth rings” section (Fig. 7). However, the significance of cGAN errors on the estimated pith locations was not covered in that section. Therefore, from here on the term cGAN pith error is used to represent the influence of cGAN surface errors on the determined pith location. Correspondingly, the term MLP pith error is used to represent the influence of errors related to the MLP network on the determined pith location. In order to distinguish between the cGAN pith error and the MLP pith error, algorithmically determined pith locations were calculated on the basis of annual ring width data obtained both from the cGANs and from manually traced rings. Thus, the MLP was applied to two sets of annual ring width data. Figure 11 shows the absolute difference between manually measured pith locations and the algorithmically estimated pith locations of the first subset, where in Fig. 11a, the cGAN detected annual rings were used and in Fig. 11b the manually traced annual rings were utilised. The absolute difference shown in Fig. 11a includes all three error sources (manual pith error, cGAN pith error and MLP pith error), whereas the presented absolute difference shown in Fig. 11b excludes the cGAN pith error.

Table 3 Absolute difference between manually and algorithmically determined pith locations, with and without application of cGAN, for boards of subset 1 comprising 45 estimated pith locations

In Table 3, the very same results are displayed in terms of statistics, i.e. with mean values, medians, standard deviations and percentiles (80th, 85th, 90th and 95th) of the differences between manually measured and algorithmically estimated pith locations, with and without the cGAN pith error included in the algorithmically determined pith locations. Table 3 also displays the direct differences between the two algorithmically estimated pith locations with and without the cGAN pith errors. In the following, “estimation error” is frequently used for the absolute difference between the manually and algorithmically determined pith locations. Using the suggested algorithm, including the cGAN pith error (Fig. 11a), a median estimation error of 1.4 mm and 2.9 mm, a standard deviation of 1.7 mm and 2.7 mm, and a 90th percentile of 4.8 mm and 5.7 mm were achieved in the x- and y-direction, respectively.

As can be seen from Fig. 11 and in Table 3, a somewhat smaller error was obtained for the case when the cGAN pith error was eliminated, which is obtained when annual rings were traced manually (Fig. 11b), as compared to the results which included the cGAN error. The errors of the algorithmically determined pith location is, in x-direction, about the same but, in y-direction, typically 1–2 mm smaller when the manually traced annual rings are used instead of the cGAN-detected annual rings. However, for board cross section where the errors in y-direction of the algorithmically determined pith location are comparatively large, say above 5 mm, the cGAN pith error is not the main explanation for the total error.

In Fig. 12a, b, two different clear wood sections, selected among the 45 evaluated sections of subset 1, are shown. In addition to colour images of the two cross sections, three different images/stripes, each representing a 10-mm-long section in longitudinal board direction, are shown at each of the four sides of each of the two cross sections. The middle image/stripe of these sets of three shows greyscale image of the side of the board. The other two are binary images showing manually traced and cGAN-detected annual rings, respectively. The red lines drawn on top of the stripes indicate the longitudinal position along the board of the cross sections displayed. The blue and green cross marks, drawn on top of the cross sections, indicate the algorithmically determined pith locations including and excluding the cGAN pith error, respectively. The red crosses drawn on top of the cross section indicate the manually measured pith locations, which may here be regarded as true pith locations (since the manual pith error is comparatively small for subset 1).

The clear wood cross sections displayed in Fig. 12a are the cross sections, out of the 45 evaluated, with the largest distance between the pith locations (7.3 mm and 4.6 mm in the x- and y-direction, respectively) determined with the algorithm, including and excluding the cGAN pith error, respectively. Thus, the difference between the two (distance between blue and green cross mark) can be attributed to the cGAN pith error. This can be understood by comparing the three images/stripes on the bottom side of this cross section, showing that the cGAN network failed to detect a few annual rings on the bottom side of the cross section where the annual rings are close to tangential to the board surface.

For the cross section shown in Fig. 12b, on the other hand, the distance between the two algorithmically determined pith locations is the smallest among the 45 evaluated sections, only 0.3 mm and 0.1 mm in the x- and y-direction, respectively. In this case, the cGANs seem to give quite accurate annual ring detection on all the four sides, and as a result, the two algorithmically determined pith locations (including and excluding the cGAN pith error) almost coincide with each other. However, the algorithmically determined pith locations are different from the manually determined pith location, with a difference in y-direction of around 8.3 mm. In this case, the estimation error originates from the assumption made during the training of the MLP neural network that annual rings would be circular in shape. This does not agree very well with the shape of the actual growth rings of this cross section. If, in the training of the MLP network (see “Automatic estimation of pith location” section), a more accurate model for artificial cross sections had been used, i.e. if non-circular annual rings had been included in the training data, it is possible that the MLP error would have been smaller.

However, for the present algorithm and the boards of subset 1 it can be concluded that both cGAN errors and MLP errors contribute to the total error. For individual cross sections, any of these error sources may dominate. Of course, the manual pith error may also contribute to the total error, but for a board of subset 1, where the pith is located within the cross section, this error is very small.

Fig. 12
figure 12

Clear wood section where a the highest absolute difference between the cGAN- and manual-based pith location estimation is recorded and b the lowest absolute difference between the cGAN- and manual-based pith location estimation is recorded

Assessment on the basis of subset 2

For the second subset of 100 boards, pith locations were manually determined at the two end sections of each board, resulting in a total of 200 determined pith locations. By utilising the cGAN-detected annual rings, automatic estimation of pith locations was done on the closest possible clear wood sections to the two ends of each board. Figure 13a, b shows the histograms of the difference between the 200 manually determined and automatically estimated pith locations in x- and y-direction, respectively. The results shown in Fig. 13a, b include all the three error sources defined in “Assessment on the basis of subset 1” section, which are manual pith error, cGAN pith error and MLP error. In Table 4, the very same results are displayed in terms of statistics with mean, medians, standard deviations and percentiles (80th, 85th, 90th and 95th) values of the absolute differences between manually determined and automatically estimated pith locations. Using the proposed algorithm, a median absolute difference of 3.9 mm and 5.4 mm and a standard deviation of 6.7 mm and 10.8 mm were achieved in the x- and y-direction, respectively. As can be seen from the results, the estimation errors presented in Table 4 are slightly higher than those obtained for the first subset shown in Table 3. This may be explained by the significantly higher magnitude of manual pith error introduced during the manual determination of pith location of subset 2 than subset 1; see  “Manual determination of pith location” section. Thus, the calculated absolute differences shown in Fig. 13a, b and Table 4 should not be interpreted as errors of the suggested automatic procedure alone, but rather as “discrepancies” or upper limits for such errors.

Fig. 13
figure 13

Histograms showing discrepancy between manually and automatically determined pith locations for board end sections of subset 2 a in the x-direction and b in the y-direction

Table 4 Statistical results for subset 2, comprising 200 estimations of pith location

Computational complexity

Training and testing of the cGANs and MLP networks were done using Python in a PC with Intel Xeon E5-2623 v3 CPU at 3.00 GHz (32 GB memory) and NVIDIA Quadro P4000 GPU. After training of the networks, a Python code was implemented to perform the procedure explained in “Detection of surface growth rings” section, which is detection of surface growth rings, and “Automatic estimation of pith location” section, which is estimation of pith location. The computational time required to detect surface growth rings visible on the four sides of a board with nominal dimensions of \(45\times 145\times 4500\,\hbox{mm}^{3}\) was on average 1.4 s, which is equivalent to approximately 300 ms per metre of a board. The computational time required to estimate the pith location of a single clear wood section was on average only 1.3 ms board, which is insignificant as compared to the time required for the application of the cGAN network.

Conclusion

A new, operationally simple deep learning-based algorithm was presented to automatically detect individual surface growth rings and estimate pith location at any clear wood section along Norway spruce boards. The proposed algorithm, which is based on RGB images of board surfaces obtained using an industrial optical scanner, is capable of estimating the pith location accurately even in cases where the pith is located outside the cross section of the board.

Detection of the surface annual rings was done by applying trained conditional generative adversarial networks (cGANs) on the RGB images of the four longitudinal sides of boards. Two separate cGANs were trained: one for the two wide sides and another for the two narrow sides of the boards. The results of the cGANs were then validated against manually traced surface annual rings, and median errors of distances between annual rings on surfaces were only 1.3 mm and 0.3 mm, for the wide and narrow sides, respectively. Following the cGAN operation, a multilayer perceptron (MLP) neural network was trained to estimate the pith location by using the detected surface growth rings of the four sides of a clear wood section. For the first subset of four boards (45 evaluated cross sections), where the pith was located within the cross section, a median estimation error, i.e., difference between the manually and automatically determined pith locations, of 1.4 mm and 2.9 mm in the x- and y-direction, respectively, was achieved. This accuracy is better than what was obtained by Habite et al. (2020). For the boards of subset 1, it can be concluded that both the cGAN pith error and the MLP pith error contribute to the total pith estimation error.

For the second subset of boards, where in about 60 % of the boards the pith was located outside the cross section, a median estimation discrepancy, calculated between the automatically and manually determined pith location, of 3.9 mm and 5.4 mm in the x- and y-direction, respectively, was achieved. Here, it must be noted, however, that for this set of boards it was difficult to distinguish between, on the one hand, errors related to the automatic algorithms utilising cGAN and MLP and, on the other hand, errors related to the manual determination of pith location.

In summary, it has been shown that the proposed deep learning-based method allows for an accurate and operationally simple detection of individual surface annual rings and pith locations in clear wood sections of Norway spruce timber boards. The operational simplicity of the proposed method lies in the fact that the algorithm is based on raw RGB images of boards obtained from optical scanning, without application of any image pre-processing. Further work should focus on: training a single and more general cGAN to detect the surface annual rings instead of the two networks that are used in current algorithm; improving the model for annual rings to be used in the training of the MLP network (for example eliminating the assumption of circular annual rings); and reducing the overall calculation time in order to fully meet the speed requirements of the industry. CGAN networks could also be trained and evaluated for other species than Norway spruce. Moreover, the potential of utilising knowledge of pith location and ring width distribution to improve assessment and grading of timber boards should be evaluated.