Mechanical properties of wood materials using near-infrared spectroscopy based on correlation local embedding and partial least-squares

This study used near-infrared (NIR) spectroscopy to predict mechanical properties of wood. NIR spectra were collected in wavelengths 900–1700 nm, and spectra averaged by radial and tangential surface spectra were used to establish a partial least square (PLS) model based on correlation local embedding (CLE). Mongolian oak (Quercus mongolica Fisch. ex Ledeb.) was used to test the effectiveness of the model. The cross-validation method was used to verify the robustness of the CLE–PLS model. Ninety samples were tested as the calibration set and forty-five as the validation set. The results show that the prediction coefficient of determination (Rp2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_{p}^{2}$$\end{document}) is 0.80 for MOR, and 0.78 for MOE. The ratio of performance to deviation is 2.23 for MOR and 2.15 for MOE.

1 3 (Picea mariana). Todorović et al. (2015) evaluated whether thermally modified beech wood (Fagus moesiaca (K. Maly) C.) could enhance the accuracy of prediction models. The NIR spectra are usually obtained from radial, tangential and transverse surfaces of solid wood samples. Zhao et al. (2012) investigated the effects of average NIR spectra from radial and tangential surfaces. Unfortunately, recent research shows that the coefficient of determination (R 2 ) and the ratio of performance to deviation (RPD) of mechanical properties are lower than other physical and chemical properties. The reason is that the partial least squares (PLS) regression model is only well adapted to linear relationships between NIR spectra and properties. In a non-linear relationship such as MOE, the accuracy will be reduced.
In order to improve the accuracy of MOE determination, different data pre-processing methods were evaluated such as, multiplicative scatter correction (MSC), first derivative (1stDer), second derivative (2ndDer), Savitzky-Golay smooth (S-G) (Andrade et al. 2010;Bächle et al. 2010). In the discipline of mathematics many improved PLS models have been reported, including moving window PLS (MWPLS), and synergy interval PLS (SiPLS) (Yang et al. 2015;Deng et al. 2016). These models commonly use partial spectra bands to select linear data. Saul and Roweis (2003) implemented a non-linear method to reduce data dimensionality and as a tool to simplify and accelerate machine learning in high dimensional spaces. Roweis and Saul (2012) used one of the manifold non-linear methods, locally linear embedding (LLE), to use the lower dimensionality within the Euclidean distance, but errors might be larger while certain neighbor points map into a lower dimensionality space. For this problem, Nguyen et al. (2015) showed that using correlation to seek neighbor points is better than using Euclidean distance. While somewhat effective, the disadvantage of these methods is that information related to mechanical properties may be missed when only relevant spectra are selected.
This study uses NIR spectroscopy to predict the MOR and MOE of Mongolian oak wood. A correlation linear embedding PLS (CLE-PLS) model was implemented to conform to the non-linear relationship between NIR spectra and MOR and MOE, which could be applied as a nondestructive identification of wood properties.

Materials
Mongolian oak (Quercus mongolica) was used to build the near-infrared prediction model. It is a deciduous species distributed in the northeast of China and is commonly used as structural material. Ten trees were randomly chosen from the 'Chong He' forest farm located in Wuchang, Heilongjiang province. The average diameter at breast height was 16 cm and the average height was 13 m. All the logs were taken above 1.3 m from the ground and cut into 1-m-long or 2-m-long pieces. After air-drying, each log (40 logs in total) was cut into bending mechanical samples (no pith) of 300 mm × 20 mm × 20 mm. Samples were numbered 1-4, with 1 and 4 taken from the sapwood and 2 and 3 from the heartwood (Fig. 1).
After removing samples with visible defects and deformations, 135 bending mechanical samples remained. These were numbered from 1 to 135; sapwood and heartwood samples had different serial numbers. All samples were conditioned to 12% moisture content prior to testing.

NIR spectroscopy test
NIR spectra were collected with a one-chip spectrometer produced by INSION Co., Germany. This spectrometer is equipped with an optical fiber probe covering wavelengths from 900 to 1700 nm, with a 7.0 nm resolution. The NIR spectra were acquired by an optical fiber probe 5 mm in diameter (Fig. 2). SPEC View 7.1 collected and recorded the spectra. For each scanning point, 30 scans were collected and averaged into a single spectrum. For each sample (300 mm × 20 mm × 20 mm), eight longitudinal spectra on the radial surface and eight longitudinal spectra on the tangential surface of the bending test samples were averaged to one spectrum. The sampling points were scanned in equal spacing 30 mm on both of the surfaces (Fig. 3). In addition, the diameter of the optical fiber probe was 5 mm.

NIR spectra pre-processing methods
Matlab 2014a software is a commonly used pre-processing method. The scattering effect of diffuse reflectance measurements can be reduced by multiplicative scatter correction (MSC), and the ground area of baseline shift can be modified by first or second derivatives, and high frequency noise can be smoothed by Savitzky-Golay (S-G).

NIR CLE-PLS calibration models development
CLE-PLS calibrations were developed by PLS analysis. The coefficient of determination R 2 c is for the calibration set, and R 2 p is for the validation set. The standard error of calibration (SEC), the standard error of prediction (SEP) and the ratio of performance to deviation (RPD) were used to evaluate the performance of the calibration model. The RPD is the ratio of the standard deviation (SD) to the original data and the SEP.
Correlation local embedding is a non-liner dimension reduce method. First, neighbor points are found for each spectra by correlation. Second, a local reconstruction weight matrix is defined and the mapping invariance is used to calculate the lower dimension data by the local reconstruction weight matrix and the nearest neighbor points. Finally, lower dimension data are used and measured to establish the PLS calibration model (Fig. 4).
The detailed algorithm development process is: Step 1 Input spectra data of the calibration set, X = x i |i = 1, 2, … , n and n is the number of samples, Step 2 Set a value to k as the number of neighbor points.
Step 3 Calculate the correlation coefficient of two random spectra by Eq. (1): where, x is the average spectra of calibration, j = 1, 2, … , n.
Step 4 Define a local reconstruction weight matrix: G.
(1) First, an error function is designed as Eq. (2): where, x ij is the neighbor points of x i , and g ij is weight. Constraint condition: Second, a covariance matrix H i is designed as Eq. (4): where x ij and x im are both neighbor points of x i Finally, use a Lagrange multiplier algorithm to combine Eqs. (3) and (4). Calculate the g ij (Eq. 5) of G.
Step 5 Preserve the mapping invariance to calculate the lower dimension data by neighbor points and G. Assuming C is d dimensional data (d ≪ N), and the constraint of mapping is provided in Eq. (6): where, J(C) is the objective function, c i is the output vector of x i , and c ij is neighbor points of c i in lower dimensional space. Equation (6) can be simplified as Eq. (7): Constraint condition: where, I is d × d dimensional unit matrix. Therefore, Eq. (7) can be simplified continually. It is shown in Eq. (10): Step 6 Output the lower dimensional data C. C is the vectors of d minimum non-zero eigenvalues calculated by Step 7 Use C and the measured value to establish the PLS regression model.

Results and discussion
Pre-processing and PLS model Spectra data were collected and averaged from the radial and tangential surfaces of 135 samples (Fig. 5). The original spectra are shown in Fig. 5a. The spectra were prerpocessed with different preprocessing methods such as MSC, Fig. 5b, first derivative (1stDer), Fig. 5c, second derivative (2ndDer), Fig. 5d, and S-G smooth, Fig. 5e. Among these methods, 1stDer could eliminate the baseline drift and reduce background interference, and S-G smooth could suppress high frequency noise. In order to achieve the best prediction of mechanical properties, these methods were combined in different ways and the spectra data after preprocessing are shown in Fig. 5f. The number of windows for S-G can be set as 7, 9, 11, and 13, in which we found that 9 was the best. The weight coefficient was [− 21, 14, 39, 54, 59, 54, 39, 14, − 21]. Wavelength/nm

Fig. 5
Original spectrum and pre-processed spectrum Using these pre-processing methods, the spectra are consistent by MSC, the background of gentle area can be expressed more effectively by 1stDer and 2ndDer, and the useless noise or irrelevant absorption peak can be eliminated when combined with S-G smooth (Table 1). In the PLS model, the same number of factors was set at 4.
The performance of the PLS calibration model corresponds to different pre-processing methods in a different manner. The most frequently used method is 2ndDer of the spectra. We found that using 2ndDer is better than 1stDer, a result that is consistent with the research on Eucalyptus pellita. For the spectra of Mongolian oak, 1stDer + S-G is the best of the above methods as shown by the higher R 2 c and R 2 p , as well as the lower SEC and SEP (Table 1).
Although we used pre-processing method (1stDer + S-G) here, the R 2 c and R 2 p are only 0.70 and 0.62 for MOR, and the R 2 c and R 2 p are only 0.67 and 0.60 for MOE. For MOR and MOE, RPD are both 1.5 < RPD < 2, indicating that the model cannot provide an adequate prediction. Compared with other studies of wood samples with NIR spectra, the coefficient of determination (Table 1) is a slightly lower because of the linearity of the PLS model. Higher correlation could be found for MOE using various species of wood. For example, in Schimleck's paper, R 2 p values are 0.75 (R p is 0.87 of MOR, factors = 5), 0.70 (R p is 0.84 of MOE, factors = 5) for six softwoods, and Zhao et al. obtained results of R 2 p = 0.77 (R p is 0.88) for MOR, R 2 p = 0.79 (R p is 0.89) for MOE by using both radial and tangential surface. The R 2 p in these studies are still lower than 0.80, and it could be even lower if considering other wood properties such as density, moisture, and some chemical properties.

CLE-PLS model
The pre-processed spectra by 1stDer + S-G were regarded as the input data of the calibration model. As two significant parameters of CLE-PLS, the lowest dimension d and the number of neighbor points k need to be selected. The effects for SEC of different values are shown in Fig. 6. The minimum SEC (standard error of calibration) has been determined and d and k were selected when SEC reach a minimum.  Table 2 and the correlation between predicted and measured values of the CLE-PLS validation model is shown in Fig. 7.
In order to verify the robustness of the CLE-PLS model, cross-validation was applied to divide calibration and validation sets. The calibration sets, including 90 samples and the validation sets, including 45 samples, were both randomly generated 20 times. The comparisons of PLS and CLE-PLS models is shown in Table 3.
Compared with the PLS (partial least square) model, CLE-PLS performed better for MOR and MOE as it had the higher R 2 p and RPD, as well as the lower SEP (Fig. 7).   Using the CLE-PLS model, the values of R 2 p (0.80 for MOR and 0.79 for MOE) were both more than 0.75, and the values RPD (2.23 for MOR and 2.15 for MOE) were both more than 2.0. This suggests that the CLE-PLS model could be used for preliminary screening (2 < RPD < 3) of Mongolia oak wood products. In the prediction of Mongolian oak mechanical bending properties, the results show that the CLE-PLS model is more effective.

Conclusions
Near-infrared (NIR) spectra, ranging from 900 to 1700 nm predicted the modulus of rupture (MOR) and modulus of elasticity (MOE) of Mongolian oak. The truth values of MOR and MOE were determined by the four-point bending method, and the spectra were collected from radial and tangential surfaces of samples. Compared with five preprocessing methods, 1stDer + S-G is better. Meanwhile, correlation-local embedding (CLE) dimensional methods were used to improve the performance of the partial least squares model (PLS). CLE-PLS [improved 0.18 determination coefficient] for both MOR and MOE in contrast with the PLS model. In addition, the ratio of performance to deviation (RPD) values for MOR and MOE were both more than 2.0 ([separately improved by 0.61 and 0.57] respectively) by using the CLE-PLS model. The CLE algorithm is effective in non-linear dimensionality, and the CLE-PLS model can be used for preliminary screening of Mongolian oak, (without visible defects and deformations), to predict MOR and MOE. Our results also indicate that non-linear dimensional methods to develop NIR spectra prediction models may be more applicable.