1 Introduction

Wood species recognition is still a new discovery in the computer vision which has a challenging task for the well-trained experts to study the characteristic on the wood surfaces under the macroscopic and microscopic views. The wood image quality intensely depends on the wood capturing quality. Many objective image quality assessment (IQA) methods propose to codify image quality. If we use a full-reference image quality assessment (FR-IQA), the observer can better assess the image by considering the distorted and undistorted image. We assess the wood image quality from no-reference wood image.

In the study of [1], they observed two types of IQA methods. They use the distorted image which causes Gaussian white noise or Gaussian blur and also human visual system (HVS) method. We propose the problem solving of that two points by combining deep convolutional neural networks (DCNNs) as a sophisticated method with saliency map. The IQA methods were mentioned in the distortion type. The study of [2] offered a NR-IQA method for JPEG2000 compression by associating a couple Gaussian mixture and wavelet coefficient. The most recent study observes the more distortion type and also the unknown distortion type.

NR-IQA methods can be restricted into natural scene statistics (NSS) and the training-based methods. In NSS, the distorted image can be detected in the undistorted image as mentioned in [2,3,4,5,6,6]. In the training-based method, we study the features learned from images where the classifier is trained. The training-based method can be recognized as the conventional machine learning assessment [7,8,9,10,11]. The conventional CNN method extracts the image features in recognition and training-based of the IQA. CNN technique concerns in the object classification [12], age and gender recognition [13], or fashion recognition [14]. Many CNN outcomes have been established to NR-IQA and accomplished the advanced outcomes [7,8,9] and also in the feature derivation [12, 13, 15, 16].

We propose a deeper CNN, which combines with salient wood image map, namely, deep salient wood image-based quality assessment (DS-WIQA). DS-WIQA architecture has five convolutional layers in our proposed DCNN, which is deeper than AlexNet which has three convolutional layers [17]. Compared with a closely DCNN, AlexNet [17] does not fit the deeper training model. DS-WIQA uses the convex and concave n-square methods for the salient wood image map. The saliency map of CNN-based in [1] did not analyze HVS into IQA. While in the works of [7, 8], all of images can be extracted to many image patches. In Fig. 3, HVS codifies the wood images quality. Unfortunately, it is difficult to perceive the difference between one and the other wood image patches. It causes a low quality assessment to all patches within a wood image. To introduce HVS, DS-WIQA combines the proposed DCNN which has the convex and concave n-square methods for the salient wood image map method. The more closer of our studies [9, 18] exploit a gradient map as wood’s image patch court. The others, [19, 20] calculates saliency map for each image and its saliency patch score of each patch.

We experimented with a proposed DCNN algorithm on Zenodo wood species [21] and Lignoindo [22] datasets. Our experiment employs Spearman’s rank order correlation coefficient (SROCC) and the linear correlation coefficient (LCC) scores, respectively. Our outcome shows that our salient wood image maps can improve DCNN in NR-IQA. To validate the work of our DS-WIQA, we also analyze a DS-WIQA model on the Zenodo dataset and spread it on the Lignoindo dataset for cross-dataset evaluation. Our DS-WIQA obtains the advanced outcomes on Zenodo and Lignoindo datasets.

2 Related work

The combining NSS-based NR-IQA method explores to capture statistical properties of the undistorted wood images. To evaluate the NSS performance, most algorithms formulate the distributions or train a model. In the study of [6], the NSS is evaluated on a set of wavelet coefficients. They identify the distortion image before applying a distortion classifier. The other case, method in [2] transforms each image using discrete cosine transform (DCT) and the resulting coefficients are used for a generalized Gaussian density model. Method in [4] extracts features in the shearlet domain by using a NR-IQA and neural network method. However, the auto-encoder used in that method is different blue from CNN. Many methods in [12,13,14,15,16] achieve state-of-the-art results in unsupervised reduction. Recently, method in [3] extracts a quality feature from the wavelet transform domain. Unfortunately, this transformation in wavelet domain is highly consuming time. Method in [5] proposes the NSS-IQA method, which is applied in the spatial domain. This latter method shows that subtracted, contrasted, and normalized coefficients can represent a statistical properties of distortion in the local normalization.

CNN-based NR-IQA methods in [7, 8], including our proposed method, are also based on spatial domain, but the difference is that our DCNN learn quality features instead of the naturalness wood image. The method in [10] proposes a method that is developing a codebook in image patches. The training model [10] is similar to CNN-based methods that the learned quality feature is not handcrafted. The codebook was combined with object detection in this method [11].

The saliency map is more advisable for the distorted and undistorted treatment [23, 24] in NR-IQA methods. Regrettably, it has a drawback of feature learning quality appraisal by employing sparse coding. To explore the CNN performance, method in [7, 8] proposed a CNN architecture by using the median subtracted contrast normalised coefficients (MDSCN) [5]. However, the depth of this CNN model is limited in the feature extraction and it is not persistent with HVS. The more closer study of our proposed method is analyzed in [9, 25], which associates in a couple CNNs and salient gradient map algorithm. A two-layer CNN architecture is exploited for feature extraction and classification. Notably, the Prewitt method probes the edges of each image in [9]. However, the weighing on the edges can lose the consideration on the important characteristic of image quality, such as a luminance [26, 27].

Two other closer studies [19, 20], they explore the salient image patches to appraise the image quality. In [19], their saliency-based DCNN (SDCNN) method divides an image into more image patches for saliency mapping which is considered a threshold value to cut out non-salient image patches in the weighted length of [0,1].The image quality is artificially determined by the weighted average in salient image patches. Accordingly, this SDCNN imprecises to evaluate image quality score. The other proposed deep learning based and saliency map method [20] measure HDR images quality from salient image patches. After all, this method needs a lot of training data.

The proposed method considers DCNN in feature extraction. Two or more convolutional layers of small filter size derive more good enough features [15]. Notwithstanding, our proposed DCNN model contains five convolutional layers. To raise the HVS in NR-IQA, we nominate salient wood image maps to evaluate the importance of each wood image patch. Nonetheless, the Gaussian white noise or Gaussian blurs and HVS methods [1] have some drawbacks. They will lower contrast and inconsistently level of detail with human visual perception. We are inspired by the method of [18] out of the existing salient algorithms of IQA performance.

3 Methods

3.1 DCNN architecture

Multilayer perceptron (MLP) in [7] uses the stacking multiple convolution and max-pooling layers, and one-column CNN, which has only the image patch of the different image input, and three-column CNN which has the image patches of left- and right-view image and the different image input. The study in [7] also constructs the CNN-learned features for stereoscopic images in NR-IQA. However, it is very difficult to recognize CNN featuring a highly HVS quality perception. The study of [28] shows fine-tuning which uses pre-trained CNNs of visual recognition tasks and a feature extractor.

The other studies [12, 13, 15, 16] show better performance of CNN architecture, which has more layers in feature extraction. Our proposed DCNN architecture is expanded from our previous study [25] that we explored five convolutional layers with an effective transfer learning which can well investigate wood image classification in NR-IQA. We are inspired by the study of [7]. We separate each input of wood’s image patches of the size 227x227. To reconstruct a DCNN architecture, we inspect the large kernels [15] which are represented as our powerful small kernels 7x7. Three overlapping of max pooling units (Pool 1, Pool 2, and Pool 3) are added to the output in first, second and fifth convolutional layer, respectively. We also propose the local feed method in the data pre-processing by using local response normalization (LRN). LRN1 and LRN2 are computed to the outputs in Pool 1 and Pool 2 units, respectively. We also configure first, second, third, and forth convolutional layers with 100 channels for alleviation. The fifth convolutional layer is configured with 25 channels because it is an interface between convolutional and fully connected (FC) layers. We connect FC layers to the higher number of channels because FC layers do not share them with convolutional layers. FC layers would result in a large total number of learnable parameters to be trained.

SR-CNN [29] applies activation functions of ReLU [30] and other derivatives, such as leaky ReLU (LReLU) [31] and parametrized ReLU (PReLU) [32]. SR-CNN avoids a vanishing gradient of positive values. However, ELU [33, 34] has fixed the learning characteristics among the other activation functions [30,31,32]. ELU also has a smaller computational time and the mean activators close to zero. The activation functions [30,31,32] have negative values, and they do not ensure a deactivated noise. In this case, we expand the \(\alpha\) hyperparameter of the ELU activation function in our previous study [25] as shown in Eq. 1.

$$\begin{aligned} ELU(\rho _i)=\bigg \{ \begin{array}{cc} \rho _i; \quad \rho _i\ge 0 \\ \alpha e^{\rho _i}-1; \quad \rho _i < 0\\ \end{array} \end{aligned}$$
(1)

where \(\alpha\) is the hyperparameter of the ELU, which controls the negative values of satellite image inputs \(\rho _i\). When x is leading more than zero, its achievement like Rectified Linear Units.The opposite, when x is less or equal than zero, the function close to a negative constant value for \(\beta =1\) (Fig. 1).

Fig. 1
figure 1

Proposed DCNN Architecture [25]

3.2 DS-WIQA architecture

Fig. 2
figure 2

Proposed DS-WIQA Architecture

In DS-WIQA, we nominated the n-convex and n-concave salient wood image maps. The salient wood image map of a n-square is convex as shown in Fig. 2 (red color). The expansion of convexedly rectangular means that all diagonals of each vertex are placed in this n-square, except the end points. So, from \(A_1\), we make diagonals of \(\overline{{\mathrm{A}}_1 {\mathrm{A}}_{{\mathrm{j}}}}\); \(j = 3,4, \dots , n-1\) which are all of this n-square except the end points. This means that the n-square is a combination of triangles \(\triangle A_1, \triangle A_i, \dots , \triangle A_{i + 1}; i = 2, 3, \dots , n-1\). Suppose \(A_i (x_i, y_i); \quad i = 2, 3, \dots , n\), which has the area of each triangle \(L_i\) is as follows.

$$\begin{aligned} L{_i} = \frac{1}{2} \Bigg | det \left( \begin{array}{ccc} x_i-x_1 &{} y_i-y_1 \\ x_{i+1} - x_1 &{} y_{i+1} -y_1 \end{array}\right) \Bigg |; \quad i= 2,3, \ldots , n-1 \end{aligned}$$
(2)

So, we calculated the salient wood image map n-convex L from Eq. 2 as follows.

$$\begin{aligned} L = \frac{1}{2} \sum _{i=2}^{n-1} \Bigg | det \left( \begin{array}{ccc} x_i-x_1 &{} y_i-y_1 \\ x_{i+1} - x_1 &{} y_{i+1} -y_1 \end{array}\right) \Bigg |; \quad i= 2,3, \ldots , n-1 \end{aligned}$$
(3)

Salient wood image map of a n-square is concave as shown in Fig. 2 (blue color). We create it in two types, namely n-square concave which has a vertex and no-vertex. All the diagonals of each vertex are the diagonals of the end points, which are outside of the n-square, except the vertex \(A_1\). Therefore, this n-square can be formed from triangles of \(\triangle A_1, \triangle A_i, \dots , \triangle A_ {i + 1}; i = 2, 3, \dots , n-1\). We need to emphasize that \(A_1\) on concave n-square can not be extracted from any of these n-square points. This is only true if \(A_1\) is a n-square corner, so all diagonals of \(A_1\) is placed again inside or on the concave n-square.

It is incorrect if \(A_1\) is replaced by \(A_2\), because there is the diagonal \(\overline{{\mathrm{A}}_2 {\mathrm{A}}_{{\mathrm{j}} + 1}}\) and \(\overline{{\mathrm{A}}_2 {\mathrm{A}}_{{\mathrm{n}}-1}}\) from \(A_2\) of the end points which are outside this concave n-square. Likewise, if \(A_1\) is replaced by \(A_3\). But, there is the diagonal \(\overline{{\mathrm{A}}_3 {\mathrm{A}}_{\mathrm{k}}}\) of \(A_3\) which in addition to being at the end points is also outside this concave n-square. Replacing \(A_1\) with \(A_m\); \(m = 4, 5, \dots , n\) is still incorrect, because it can always be shown that there is a diagonal of \(A_m\) which in addition to being at the end points is also outside this concave n-square.

Determining a triangle with a vertex of \(A_1\) and a vertex diagonally from that vertex besides its end points outside the concave n-square means false. This happens because there is a side of the \(\triangle A_{m}, \triangle A_{p}, \triangle {A_{p + 1}}; 1 \le p \ne m \le n\) which is located outside this concave n-square, so that it has an effect:

$$\begin{aligned} \sum _{1 \le p \ne m \le n}^{n-1} {L_{\triangle A_{m}, \triangle A_{p}, \triangle {A_{p + 1}}}} > L_n; \end{aligned}$$
(4)

where \(L_{\triangle A_{m}, \triangle A_{p}, \triangle {A_{p + 1}}} = the p-triangle square; 1 \le p \ne m \le n; and L_n = the n-square\).

If an n-square is concave n-square which has a vertex so that all diagonals from this vertex are located in or at this n-square, then the area can be determined by Eq. 2, after taking \(A_1 (x_1, y_1)\) as a vertex’s concave.

If the concave n-square does not have a vertex, so all diagonals of this vertex lie within or on this n-square, then the concave n-square is divided into \(n-square_{m}\). So, \(m = 1, 2, \dots , k\) part, so that every two n-square parts adjacent are only allied on one side. For each \(n-square_{m}\) there is a vertex. All diagonals of this vertex is located inside or at \(n-square_{m}\). Thus, the width of each concave n-square with \(m = 1, 2, \dots , k\) can be determined from Eq. 3. If the width of each \(n-square_ {m}\); \(m = 1,2, \dots , k\) is \(A_m\), then the salient wood image map is

$$\begin{aligned} L_n = \sum _{m=1}^{k} {A_m} \end{aligned}$$
(5)

The relationship between \(n_m\) and n; \(m=1,2, \dots , k\) of \(n-square and n-square_{m}\), where every part is

$$\begin{aligned} \sum _{m=1}^{k} {n_m} = n + 2(k-1) \end{aligned}$$
(6)

4 Experiments and results

4.1 Datasets

Zenodo dataset [21] trains and tests our DCNN and also DS-WIQA methods. Lignoindo dataset [22] is used for DS-WIQA cross-dataset evaluation. We train a classifier on five types of distortion included in Zenodo dataset. They are JP2K (JPEG2000), JPEG, white noise Gaussian (WG), blocking artifact (BA) and the fast fading (FF). We assess our proposed model from a Lignoindo dataset of those distortion types. Zenodo dataset contains 544 distorted wood images from five types of distortion. Zenodo also comes with 109 reference wood images. We gather the perceptive model for this dataset which uses difference mean opinion score (DMOS) in the length of [0, 99] as mentioned in [35,36,37,38]. The higher DMOS indicates a lower quality.

In this work, each wood image of the whole Zenodo dataset is assigned for training, validation, and test sets in the smaller non-overlapping patches. At the same time, Lignoindo dataset is desired for cross-dataset validation in the length of [0, 1]. We also redound the predicted Zenodo dataset into the similar range with Lignoindo dataset by exploiting a nonlinear function.

4.2 Evaluation measurement

We consider our prospective DCNN and DS-WIQA methods as illustrated in Fig. 3 by calculating LCC and SROCC appraisals. LCC computes validation and test sets of the predictions and ground truth. Considering that, ground truth can represent a SROCC value from the same set appraisal. In the distorted wood images, all the wood images from five types distortion are extracted into \(70\%\), \(15\%\), and \(15\%\) of training, validation, and testing, respectively. The all of corrected wood images on the Zenodo dataset are also allotted in the similar protocol code. On the validation set, the highest LCC resolves the best result of each training iteration, which is repeated until 10 time iterations. We employ the five types of distorted wood image on the cross-dataset validation in 10 iterations by using the Zenodo and Lignoindo datasets.

To generate the training and testing data, we analyzed that the five types distortion of wood image is determined by using a distortion coefficient value within \(10^{-7}\). Consistently, to get each wood image with a distortion coefficient value, we make a set of number \([-250, -249, -248, ..., -1, 0]\) correlating to \(-250 x 10^{-7}\) in 10 epochs, and so on. The saliency coefficient value will be defined during the testing by our DS-WIQA on the whole Lignoindo dataset.

4.3 DCNN experiment

In the training set of DCNN, we divide each wood image into small sizes of 7x7 wood image patches as the initial wood image in the length of [0.01, 0.9], 10 of training epochs, and 32 of the mini-batch size. The learning rate cuts down 0.1 each five epochs iterations.

Fig. 3
figure 3

The salient wood image map and the distorted wood image estimation

The average of LCC and SROCC in every ten iterations testing is demonstrated in Table 1. For the distorted wood image, our DCNN outperforms the multilayer perceptron (MLP) which uses CNN-based [7] on five types distortion. We outperform MLP in SROCC by \(18.84\%\) JP2K, \(21.29\%\) JPEG, and \(16.62\%\) WG, \(48.44\%\) BA, and \(26.06\%\) FF. Our DCNN is also superior compared with MLP in LCC by \(29.93\%\) JP2K, \(32.43\%\) JPEG, and \(29.10\%\) WG, \(39.66\%\) BA, and \(33.69\%\) FF. The undistorted wood image is represented as All in Table 1. All wood images from Zenodo dataset are used in the training regardless of their distortion types, as shown in Table 1. The same measurements are used in the distorted wood image. The higher LCC and SROCC are achieved by our proposed DCNN. We outperform MLP by \(25.44\%\) SROCC and \(32.88\%\) LCC, respectively.

Table 1 Proposed DCNN performance: SROCC and LCC appraisals performance on Zenodo dataset

4.4 DS-WIQA experiment

In the later study, we incorporate the proposed DCNN with the salient wood image map of [21]. When DS-WIQA concludes the wood image quality appraisal, the insignificant wood image patches can ignored by using a coefficient \(\alpha \in \{ 0, 0.01, 0.1, 0.7 \}\). If \(\alpha = 0\), DS-WIQA is similar with DCNN and otherwise, (\(\alpha \ge 0.7\)), DS-WIQA reads on a subset, as exposed in Fig. 4.

Table 2 DS-WIQA validation: SROCC and LCC on Zenodo validation set with the best performance coefficient \(\alpha ^*\)
Table 3 DS-WIQA testing: SROCC and LCC appraisals on the Zenodo test set

Table 2 presents SROCC’s and LCC’s learning of the ten iterations on Zenodo dataset validation where the distinct \(\alpha\) values apply to the whole distinct of distorted wood images. The saliency map raises achievement on the five types distortion of JP2K, JPEG, WG, BA, and FF in SROCC and LCC, respectively. Unfortunately, the salient map in SROCC is admirable perfomance than LCC. Our proposed DS-WIQA also achieves improvement of \(7.85\%\) for the fine wood image of the Zenodo dataset on SROCC compared with LCC.

Also, the best coefficient \(\alpha ^*\) of each distortion type is proved by exploring SROCC and LCC of the ten iterations of the Zenodo dataset in Table 2. For JP2K and JPEG, the highest SROCC and LCC, respectively, is achieved when \(\alpha = 0\) on Zenodo validation set. Table 3 reveals that the similar DS-WIQA’s to DCNN’s performance on the test set occurs in the two distortion types of no-salient map.

Table 3 describes the DS-WIQA’s performance on the Zenodo dataset of the other methods by using \(\alpha ^*\). Our DS-WIQA exceeds other methods of image quality appraisal on the five types distortion. On the all distortion types, our DCNN and DS-WIQA achieve 0.9763 and 0.9800, respectively, on SROCC and 0.9686 and 0.9705, respectively, at LCC, which outrun the other advanced no-referrence image quality appraisal methods [2,3,4,5,6,7, 9,10,11]. Our DS-WIQA also achieved the highest result at all.

In the DS-WIQA cross-dataset test, we train on the Zenodo dataset and test on the Lignoindo dataset for the five types distortion in our cross-dataset experiment. The output of our DS-WIQA is the length of [0, 99] in the DMOS. We follow the settings in [8], the Lignoindo dataset is extracted to two subsets. They are \(90\%\) of the data training and \(10\%\) of data testing. We rerun these data training and testing about 10 times to get a cross-dataset performance on the Lignoindo dataset. The best performance of DS-WIQA coefficient is \(\alpha =0.1\) as used on the fine wood image of blocking artifact in Table 2. Our DS-WIQA has a better state-of-the-art compared the other methods [2,3,4,5,6,7, 9,10,11, 24] as we can see in the Table 4.

Our study proves that the overlapping max pooling units achieve the best among other types of pooling units for SROCC and LCC appraisals, respectively. The overlapping max pooling outrun \(5.6\%\) and \(4.2\%\) outrun than non-overlapping max pooling on SROCC and LCC appraisals, respectively. From this case, we stop the overlapping max pooling units until \(3rd\) and \(4th\) Layers. We analyze that the overlapping max pooling tends to outrun in \(5th\) layer, as demonstrated in Table 5.

Overall, DS-WIQA surpasses the other advanced methods [2,3,4,5,6,7, 9,10,11, 24]. DS-WIQA surpasses \(0.38\%\) our DCNN and \(34.84\%\) other methods [2,3,4,5,6,7, 9,10,11, 24], respectively, on SROCC. Our DCNN also outruns \(34.33\%\) other methods [2,3,4,5,6,7, 9,10,11, 24] on SROCC. DS-WIQA also exceeds \(0.22\%\) our DCNN and \(30.15\%\) other methods [2,3,4,5,6,7, 9,10,11, 24], respectively, on LCC. At the same time, our proposed DCNN exceeds \(29.86\%\) other methods [2,3,4,5,6,7, 9,10,11, 24] on LCC.

Fig. 4
figure 4

Evaluation of the trained model

Table 4 DS-WIQA evaluation: SROCC and LCC appraisals on the Lignoindo dataset and trained on the Zenodo dataset
Table 5 DS-WIQA: pooling
Table 6 Computational complexity in functions times (\(\mu\) second) for shift\(\sim\)add operation of [2,3,4,5,6,7, 9, 10]
Table 7 Computational complexity in functions times (\(\mu\) second) for shift\(\sim\)add operation of [11, 24] and proposed methods

In computational complexity, our proposed methods have better performance in operational function times for shift add operation compared to other methods [2,3,4,5,6,7, 9,10,11, 24] as described in Tables 6 and  7. We have an amount of time (\(\mu\) second), which is performed in the number of elementary operations. Our DCNN and DS-WIQA reduced the terms of shift add operation. DS-WIQA emerged to be first-rate to our DCNN and the other advanced methods [2,3,4,5,6,7, 9,10,11, 24].

5 Conclusion

We propose a new DCNN of NR-IQA which has a better result compared with the other NR-IQA methods [2,3,4,5,6,7, 9,10,11, 24]. In the distorted wood image, our DCNN outperforms the recent sophisticated MLP [7] in five types distortion which has \(18.84\%\) JP2K, \(21.29\%\) JPEG, and \(16.62\%\) WG, \(48.44\%\) BA, and \(26.06\%\) FF in SROCC and \(29.93\%\) JP2K, \(32.43\%\) JPEG, and \(29.10\%\) WG, \(39.66\%\) BA, and \(33.69\%\) FF in LCC. In the undistorted wood image, our poposed DCNN is superior to MLP [7] by \(25.44\%\) SROCC and \(32.88\%\) LCC, respectively.

Our DS-WIQA exceeds other state-of-the-art methods [2,3,4,5,6,7, 9,10,11, 24]. DS-WIQA obtains \(0.38\%\) and \(0.22\%\) higher than our DCNN, and \(34.84\%\) and \(30.15\%\) higher than other methods [2,3,4,5,6,7, 9,10,11, 24] on SROCC and LCC, respectively. Experimental results show that the computational complexity of our DCNN and DS-WIQA reduce shift add operation in exponential, logarithmic, and trigonometric functions. DS-WIQA raises to be more noteworthy than our DCNN and the other methods [2,3,4,5,6,7, 9,10,11, 24].