Wavelet geographically weighted regression for spectroscopic modelling of soil properties

Song, Yongze; Shen, Zefang; Wu, Peng; Viscarra Rossel, R. A.

doi:10.1038/s41598-021-96772-z

Wavelet geographically weighted regression for spectroscopic modelling of soil properties

Article
Open access
Published: 01 September 2021

Volume 11, article number 17503, (2021)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Wavelet geographically weighted regression for spectroscopic modelling of soil properties

Download PDF

Yongze Song¹,
Zefang Shen²,
Peng Wu¹ &
…
R. A. Viscarra Rossel²

2586 Accesses
12 Citations
6 Altmetric
Explore all metrics

Abstract

Soil properties, such as organic carbon, pH and clay content, are critical indicators of ecosystem function. Visible–near infrared (vis–NIR) reflectance spectroscopy has been widely used to cost-efficiently estimate such soil properties. Multivariate modelling, such as partial least squares regression (PLSR), and machine learning are the most common methods for modelling soil properties with spectra. Often, such models do not account for the multiresolution information presented in the vis–NIR signal, or the spatial variation in the data. To address these potential shortcomings, we used wavelets to decompose the vis–NIR spectra of 226 soils from agricultural and forested regions in south-western Western Australia and developed a wavelet geographically weighted regression (WGWR) for estimating soil organic carbon content, clay content and pH. To evaluate the WGWR models, we compared them to linear models derived with multiresolution data from a wavelet decomposition (WLR) and PLSR without multiresolution information. Overall, validation of the WGWR models produced more accurate estimates of the soil properties than WLR and PLSR. Around 3.5–49.1% of the improvement in the estimates was due to the multiresolution analysis and 1.0–5.2% due to the integration of spatial information in the modelling. The WGWR improves the modelling of soil properties with spectra.

Improving the accuracy of soil organic carbon content prediction based on visible and near-infrared spectroscopy and machine learning

Article 12 April 2021

Hyperspectral analysis of soil organic matter in coal mining regions using wavelets, correlations, and partial least squares regression

Article 17 January 2016

Prediction of Soil Properties by Visible and Near-Infrared Reflectance Spectroscopy

Article 01 December 2020

Introduction

Soil properties are critical indicators of ecosystem function^1,2. They can directly indicate the quality of ecosystem services, including food and energy production, plant growth, carbon storage, regulation of greenhouse gas emissions and climate change^3,4,5,6,7. Soil organic carbon, clay content and pH are essential soil properties affecting soil nutrient supply and plant development⁸. However, the measurement of these soil properties remains challenging because conventional analytical methods are time-consuming and expensive^9,10. Diffuse reflectance soil spectroscopy, for example, using visible and near infrared (vis–NIR) spectra, has been proposed as a means to overcome those issues. The physcial basis of vis–NIR spectroscopy relies on overtones and combination bands from fundamental molecular vibrations of bonds in molecules of soil consituents, which occur in the mid infrared region^11,12. Increasingly, the method has been used to estimate soil properties and to estimate their values more rapidly and cost-efficiently than conventional laboratory analytical methods^13,14. Another advantage of the method is that a vis–NIR spectrum can be used to simultaneously characterise multiple soil properties⁴.

Methods for modelling continuous soil properties with highly collinear spectra include multivariate statistics and machine learning. The most common statistical methods are principal component regression (PCR)^15,16 and partial least squares regression (PLSR)^17,18. Different machine learning algorithms also have been used, including support vector machines, artificial neural networks, random forests and other regression trees¹⁹. More recently, convolutional neural network (CNN) and other deep learning architectures are also being developed^20,21,22.

Wavelets have been successfully used with spectra in soil science and other fields of research^4,23,24. Studies have demonstrated that the discrete wavelet transform (DWT) can improve the analysis of soil diffuse reflectance spectra for the prediction of soil properties¹⁴. They showed that multiresolution analysis (MRA) of soil diffuse reflectance spectra could identify different spectral features that occurred over different resolutions (or scales). They also showed that the highly collinear spectra could be transformed into a smaller number of orthogonal wavelet coefficients that produced more parsimonious and accurate multivariate calibrations.

Soil properties, like other natural phenomena, vary spatially and at different scales^25,26,27. This variability is due to complex interactions between the environmental factors that affect the formation and distribution of soil²⁸. The incorporation of spatial information in aspatial models can improve the accuracy of their predictions²⁹. Spectroscopic modelling of soil properties often ignores geography and the spatial dependence of soil properties. Only a few studies have tried to account for geography in spectroscopic modelling. For example, the states or territories of Australia were used as categorical variables to account for any variance in the modelling of Australian soil spectra resulting from geography³⁰. Sila et al.³¹ used regression–kriging to predict soil properties with mid-infrared spectra of soil samples, where residuals from a regression fit were informed using variograms.

Geographically weighted regression (GWR) might be a useful tool for modelling spectra and accounting for the geographic relationships and spatial non-stationarity in the data³². GWR, developed by³³, supports locally varied regression parameter estimates for each explanatory variables across space^34,35. The recent advances in the methodology and applications of GWR have helped to acquire new understanding of spatial processes. For example, basic GWR has been adapted for improved local inference of soil property data³⁶, it has been adapted to a multiscale form^37,38, to address issues of local multicollinearity³⁹, and to down-weight the influence of outliers for robustly estimating the variability of local coefficients in social data^40,41. These studies demonstrate the power and versatility of GWR for measuring spatial non-stationarity^37,38,42,43. As such, GWR has been used in various fields of research, including ecology and environment⁴⁴, climate⁴⁵, social science⁴⁶ and public health⁴⁷.

Here, we propose to combine wavelets with geographically weighted regression (WGWR). Our hypothesis is that the spectral modelling of soil properties can be improved by accounting for the multiresolution information in the spectra and the spatial variations of the data. Thus, our aim is to demonstrate the implementation of WGWR for modelling soil properties with vis–NIR spectra and to evaluate the performance of the WGWR by comparing its predictions to those from PLSR and WLR. Experiments were conducted using a spectral library from the south-western West of Australia (WA).

Results

The spatial distributions of the organic carbon, clay content and pH data at the sampling locations are shown in Fig. 1a–c. The statistical summary of soil properties is shown in Table 1. In the study area, the mean soil organic carbon is 1.61%, mean clay content is 16.64% and mean pH is 5.77. Maps of spatial distributions, density figures, and statistical summaries of soil properties indicate their spatial variation across the study area.

Table 1 Descriptive statistical summary of the soil properties. SD is the standard deviation, CV the coefficient of variation and Skew. is the skewness coefficient.

Full size table

Figure 1d, e shows the measured and transformed spectra. The broad absorptions between 350–1100 nm are associated with the iron oxides hematite or goethite, but also with organic carbon⁴⁸. The wavelengths near 1412 nm are generally associated with the first overtone of hydroxyl stretching modes of water and minerals⁴⁹. The wavelengths near 1917 nm are linked with hydroxyl and H-bonding hydroxyl stretching vibrations of water molecules and mineral constituents of water⁵⁰. The neighbouring wavelengths near 2207 nm are related to clay minerals^51,52.

Wavelet Geographically Weighted Regression (WGWR) of soil properties

Multiresolution analysis

The MRA of a vis–NIR spectrum shows six scales with detailed coefficients and a smooth component at the coarsest scale (Fig. 2a). The details at the different wavelet scales reveal the multiresolution features of soil spectra. At the finest scales $\lambda$ = 2 and 4, the high frequency elements of the spectra occur at the interface between the three detectors in the spectroscopic sensor, where the signal is ‘noisier’. At the medium scales $\lambda$ = 8 and 16, the wavelet coefficients depict the edges of the absorptions of the soil constituents near 595 nm, 1007 nm, 1415 nm, 1831 nm, 1903 nm, and 2207 nm. At the coarse scales $\lambda$ = 32 and 64, the wavelet coefficients represent the broader absorptions of soil consituents primarily near 1400 nm and 1900 nm. The MRA results indicate that wavelet transformation can effectively identify the multiresolution local features of soil spectra.

Optimal identification of wavelets

Figure 2b shows the absolute correlations (|R|) between soil organic carbon, clay and pH, and the wavelet coefficients at the different wavelet scales. The larger |R| values occur at different wavelengths and wavelet scales, showing the multiresolution features in the spectra. For organic carbon, the largest |R| values at the smooth and detail components occur near 632 nm, 1894 nm, 1984 nm, 1953 nm, 985 nm, 1004 nm and 1003 nm. For clay content the largest |R| values occur near 2246 nm, 2399 nm, 2455 nm, 1927 nm, 1893 nm, 1892 nm and 1890 nm, and for pH near 1940 nm, 1949 nm, 1862 nm, 1906 nm, 1905 nm, 1892 nm and 1890 nm. The wavelets around these wavelengths show greater correlations with the soil properties as they represent absorptions due to the mineral and organic composition of soil⁵³.

Figure 3 illustrates the procedure for selecting the optimal combinations of wavelet coefficients for soil organic carbon. The selection of optimal coefficients for soil clay and pH followed a similar processes. According to the distributions of |R| values, wavelets are ranked from the highest to the lowest |R| values (Fig. 3a). Figure3b shows the frequency of the ranked wavelets grouped by wavelet scales. The statistical summaries indicate that the wavelets at coarse scales tend to be more correlated with soil organic carbon compared with the wavelets at fine scales.

Figure 3 c shows the wavelets selected by the multicollinearity analysis with the threshold that the maximum variance inflation factor (VIF) value is lower than 10 (see "Methods" section). In this step, 17 wavelet coefficients were selected. This shows that vis–NIR spectra are highly collinear and redundant. The number of explanatory wavelets are much reduced and the multicollinearity is eliminated. Figure 3 d shows the result of the fitness of the wavelet coefficient selection, which we performed by a ten-fold cross validation. With increases in the number of wavelet coefficients, the fitness of training data gradually increased, but that of testing data increased initially and decreased after four coefficients. This shows that the combination of the four coefficients was optimal for modelling soil organic carbon (Fig. 3e). The selected optimal combinations of wavelet coefficients for clay and pH are shown in Fig. 4.

Geographically weighted regression

Due to the spatial non-stationarity of soil properties, GWR is used to model the relationships between soil properties and the selected optimal combinations of wavelets. Figure 5 shows spatial distributions of local coefficients of wavelets in WGWR models, where significance of local coefficients were tested but not shown on the maps. The coefficients of both training and testing data are combined on maps of Fig. 5. The maps of local coefficients indicate spatially variable coefficients of wavelets across the study area for predicting soil organic carbon, clay and pH values. The spatially variable local coefficients also reveal the spatial non-stationarity of the relationships between soil properties and spectra data.

Comparing WGWR to other methods

Figure 6 shows maps of the PLSR, WLR, and WGWR residuals calculated on the test dataset for soil organic carbon, clay and pH, respectively. The maps indicate that the absolute values of the residuals are smaller for WLR and WGWR, respectively, compared with PLSR, due to the wavelet-based multi-resolution analysis. In addition, compared to the WLR, the residuals from the WGWR are smaller.

The validation results of the PLSR, WLR and WGWR are given in Table 2. In the PLSR, the (R$^2$) of the models derived using the training data of soil organic carbon, clay and pH are 0.547, 0.674 and 0.445, and the R$^2$ of the models when generalised on the test data are 0.477, 0.389 and 0.347, respectively. Due to the incorporation of multi-solution information, the WLR models performed better than PLSR. Compared to PLSR, the R$^2$ of the WLR of organic carbon, clay and pH increased by 22.4%, 49.1% and 3.5%, and the RMSE reduced by 10.8%, 17.1% and 0.9%, respectively. The incorporation of geographical information helped to improve the accuracy of the spectroscopic soil property estimates. In the WGWR, the training R$^2$ of soil organic carbon, clay and pH are 0.702, 0.678 and 0.414, and the test set R$^2$ values are 0.590, 0.587 and 0.378, respectively. Compared to WLR, the R$^2$ of the WGWR estimates of organic carbon, clay and pH increased by 1.0%, 1.2% and 5.2%, and the RMSE decreased by 0.7%, 0.8% and 1.5%, respectively. Thus, compared to PLSR, the R$^2$ of the WGWR estimates of organic carbon, clay and pH increased by 23.6%, 50.9% and 8.8%, and their RMSE decreased by 11.4%, 17.8% and 2.4%, respectively.

Table 2 Assessment statistics for the validation of PLSR, WLR and WGWR.

Full size table

In addition, the Akaike information criterion (AICc) also demonstrate the improved accuracy and parsimony of WGWR.

Discussion

This study proposes a WGWR to more accurately estimate soil properties using reflectance spectra. We demonstrate that the integration of an MRA of reflectance spectra and spatial non-stationarity in the relationships between soil properties and spectra can improve the spectroscopic modelling of soil properties. The advantages of WGWR are the improved prediction accuracy, fewer spectral variables with reduced multicollinearity, and more robust estimates compared to PLSR and WLR.

The assessments of the soil organic carbon, clay and pH estimates indicate that the multiresolution features of spectra modelled by wavelet-based MRA can improve the skill of the modelling by 3.5–49.1%. Viscarra Rossel & Lark¹⁴ developed the modelling of soil properties with wavelets using an MRA. Compared to the approach by Viscarra Rossel & Lark, this study provides an alternative framework for the selection of coefficients. Here, we use correlation rather than variance for the ranking of coefficients and the VIF for eliminating multicollinearity, followed by ten-fold cross validation to minimise overfitting. As a result, the number of predictors were much reduced for modelling with WLR and WGWR. Two-thousand-one-hundred-and-fifty-one vis–NIR wavelengths were used in the PLSR, but only 4, 7 and 6 wavelets were selected for modelling soil organic carbon, clay and pH with WLR and WGWR.

The consideration of spatial non-stationarity in the WGWR, reduced errors and improved the accuracy of the models. Thus we show that WGWR can improve the modelling of soil properties with spectra by accounting for both multi-resolution information and spatial non-stationarity.

Our results suggest that, if spatial information is available, geographical characteristics of soil properties should be considered and used in spectroscopic modelling of soil properties.

Reflectance spectroscopy is an efficient and cost-efficient approach for rapidly estimating soil properties. We developed the WGWR that effectively integrates the multiresolution characteristics of soil vis–NIR spectra, the process of optimal wavelets identification and the spatial variations of soil properties and the spectra. Compared to PLSR and WLR, WGWR produced more accurate estimates of soil organic carbon, clay content and pH. The models were more parsimonious and thus the danger of multicollinearity of spectral variables and overfitting was eliminated. Improved modelling of soil properties with spectra, like we have done here, can also provide insights of geographical characteristics in soil-related ecosystems services, climate responses and sustainable development. Future studies might investigate the use of other geospatial methods for use with soil spectra, such as as kriging with external drift⁵⁴.

Methods

Study area and soil observations

The study region in the south west of WA covers about 252,100 km$^2$. It represents diverse land uses, including cropping, native forests and nature conservation. It is one of the primary agricultural production regions in Australia⁵⁵. In 2018–2019, the gross value of agricultural production (GVAP) in WA, primarily in the South West Agricultural Area, was about 18% of the national GVAP⁵⁶. The primary grains produced in this area include wheat, barley, canola, lupins, oats, and field peas, where wheat account for 65% of annual grains in WA⁵⁷. Understanding the soil properties is essential for agricultural and environmental management.

We used a set of 226 soil samples collected within the study area. The shortest distances between soil samples and their nearest neighbours vary from 0.24 to 84.16 km. Among the samples, 20.4% of the sample points have neighbour samples within 1 km, and 33.2% of the sample points are the only samples within a radius of 10 km. At each sampling location, samples were taken at multiple depths from the surface down to 135 cm. The soil properties measured included soil organic carbon, clay content and pH measured in water. The analytical methods used to measure these soil properties are described in Rayment^58,59,60. To derive a more consistent dataset for the modelling (described below), at the each sampling location, we took a weighted average of the soil properties from different depths, using the depths as weights.

The statistical distributions of soil organic carbon and clay content were negatively skewed so they were transformed to approximate normality using logs. Outliers were identified by setting thresholds of 2.5 standard deviations from the mean values⁶¹. Values that exceeded the threshold were removed. As a result, 4, 6 and 3 outliers were removed from the organic carbon, clay and pH data, respectively.

vis–NIR spectroscopy

The vis–NIR reflectance spectra of 226 soil samples were measured with a Labspec vis–NIR spectrometer (PANalytical Company, Boulder, CO., USA). The spectral range of the spectrometer spans from 350 to 2500 nm, and it has a spectral resolution of 3 nm at 700 nm and 10 nm at 1400 and 2100 nm. Measurements were made with a high-intensity contact probe (also from PaNalytic) that uses a halogen bulb ($2901\pm 10$ K) for illumination. The contact probe measures a spot of diameter 10 mm, and it is designed to minimize errors associated with stray light. The sensor was calibrated with a Spectralon® white reference panel once every ten measurements. For each soil sample, 30 spectra were averaged to minimize noise and so to maximize the signal-to-noise ratio. The measurements were made following the protocols described in Rossel⁴. Spectra were recorded with a sampling resolution of 1 nm so that each spectrum comprised reflectances at 2151 wavelengths. The measured reflectances, R, were first converted to apparent absorbance as log10(1/R).

Wavelet geographically weighted regression

To improve the modelling of soil properties with spectra, we developed a WGWR model. It integrates the multiresolution information in the spectra and the spatial variations of soil properties. The WGWR model consists of three steps: (1) decomposition of the vis–NIR spectra with a DWT and MRA, (2) selection of an optimal set of wavelet coefficients for the regression, and (3) GWR. The workflow is shown in Fig. 7.

First, we decomposed each vis–NIR spectrum using the DWT and MRA to reveal the multiresolution nature of the spectra. For the decomposition, we used the Daubechies wavelet with 4 vanishing moments. The MRA is implemented via a pyramid algorithm⁶², in which a spectrum is decomposed into the detail components ($D_i$) at different wavelet scales ($\lambda _i$) up to a coarsest scale, when a smooth or approximation component (S) is obtained. In this study, the spectrum beyond its boundaries, including the start and end of the data, is assumed to be a symmetric reflection of the spectrum⁶³. The sum of the detail and smooth components is the original spectrum. Viscarra Rossel and Lark¹⁴ provide a description of the approach for the analysis of soil spectra. The decomposition was performed, as above, for the vis–NIR spectra of all samples.

Second, to identify the optimal wavelet coefficients for modelling, we correlated the soil properties to the wavelet coefficients and recorded the Pearson correlation coefficient. We then ranked the wavelet coefficients according to the absolute values of correlation coefficients (|R|). A multicollinearity analysis was then performed using a VIF, a measure of multicollinearity of variables in a regression model, to discard wavelet coefficients that were highly correlated. Highly correlated explanatory variables can lead to unstable coefficients and a less accurate regression^64,65. From the ranked set of coefficients, the wavelet with the largest |R| was selected as the first explanatory variable to use in the regression to estimate the multicollinearity among wavelets. Then, wavelet coefficients from the ranked list were sequentially added to the first, and a linear regression performed. If the VIF was smaller than 10, that wavelet coefficient is selected, but if it was larger than 10, then that coefficient was removed. The procedure continued sequentially and the final selected coefficients are uncorrelated and with a VIF smaller than 10. The remaining selected wavelet coefficients, were sequentially added one at a time to perform regressions using a ten-fold cross validation. We did this to eliminate overfitting in the assessments and modelling. The average cross validation R$^2$, and the number of wavelet coefficients were compared to derive the optimal number of coefficients with the highest average cross validation R$^2$. Thus, the final selected wavelet coefficients were the optimal combination for each of the modelled soil properties.

Third, a GWR is applied to characterise geographically local relationships between soil property and the optimal combination of wavelets derived from reflectance spectra. Soil properties are spatially correlated^66,67. The GWR models enable locally varied estimates of coefficients for all explanatory variables in the regression. The spatial non-stationarity of soil properties is examined using the Monta Carlo technique with the randomisation variability test of local coefficients and the coefficient of variations of local coefficients^68,69,70. In the GWR model, the location-wise coefficients of the selected wavelets are estimated with distance-decay spatial weights. The GWR model for estimating the geographically local relationships is computed as:

$$\begin{aligned} s=\beta _0 (\varvec{u}) + \sum _{i=1}^{m} \beta _i (\varvec{u}) w_i + \epsilon \end{aligned}$$

(1)

where s is the observation of a soil property (e.g. organic carbon) at the location $\varvec{u}$, $w_i(i=1,\ldots ,m)$ is the ith selected optimal wavelet at the location $\varvec{u}$, $\beta _i (\varvec{u})$ is the location-wise regression coefficient, and $\epsilon$ is the normally distributed random error. The spatially adaptive Gaussian kernel function is applied in the weighting scheme, where the optimal bandwidth is determined through the adaptive process, and the number of neighbour observations is optimised by minimising the AICc of the model³².

Model comparison and validation

We compared the WGWR to PLSR and WLR. Our implementation of PLSR used the iterative singular value decomposition algorithm. The explanatory variables in the PLSR are the selected optimal combination of the PLS components of wavelet transformed spectra. To select the optimal number of PLS factors to use in the regression we used a cross validation and selected as many factors as necessary to produced the the smallest error⁷¹. For the WLR, the selection of the optimal wavelet coefficients to use was the same as that for the WGWR (see above).

The methods were evaluated with an external validation process. It involved selecting, at random, 70% of the observations to train the models and the remained 30% of the observations to test the estimates. To evaluate the performance of the methods we used the coefficient of determination (R$^2$), the mean absolute error (MAE) to assess bias and root mean squared error (RMSE) to assess inaccuracy. In the cross validation, values of soil properties have been back-transformed, since they have been transformed before modelling. To further compare AICc values of different models, relative likelihood of the models was computed as:

$$\begin{aligned} \eta _j = exp \left(\frac{AICc_{min} - AICc_j}{2} \right) \end{aligned}$$

(2)

where $\eta _j$ and $AICc_j$ are the relative likelihood and AICc value of jth model, respectively; and $AICc_{min}$ is the minimum AICc value among optional models. The $\eta _j$ is used to explain the probability that the minimised information loss in the jth model⁷².

All computations were performed in the R software version 4.0.3⁷³. The wavelet analysis was performed using the package “waveslim”⁶³, the PLSR was performed using the package “pls”⁷⁴, and the GWR was performed using the package “spgwr”⁷⁵.

References

Schmidt, M. W. et al. Persistence of soil organic matter as an ecosystem property. Nature 478, 49–56 (2011).
Article ADS CAS PubMed Google Scholar
Drobnik, T., Greiner, L., Keller, A. & Grêt-Regamey, A. Soil quality indicators-from soil functions to ecosystem services. Ecol. Ind. 94, 151–169 (2018).
Article Google Scholar
Bradford, M. A. et al. Managing uncertainty in soil carbon feedbacks to climate change. Nat. Clim. Change 6, 751–758 (2016).
Article ADS CAS Google Scholar
Viscarra Rossel, R. et al. A global spectral library to characterize the world’s soil. Earth Sci. Rev. 155, 198–230 (2016).
Amundson, R. & Biardeau, L. Opinion: Soil carbon sequestration is an elusive climate mitigation tool. Proc. Natl. Acad. Sci. 115, 11652–11656 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Smith, P. et al. How to measure, report and verify soil carbon change to realize the potential of soil carbon sequestration for atmospheric greenhouse gas removal. Glob. Change Biol. 26, 219–241 (2020).
Article ADS Google Scholar
Zhang, S., Yu, Z., Lin, J. & Zhu, B. Responses of soil carbon decomposition to drying-rewetting cycles: A meta-analysis. Geoderma 361, 114069 (2020).
Article ADS CAS Google Scholar
Bot, A. & Benites, J. The Importance of Soil Organic Matter: Key to Drought-Resistant Soil and Sustained Food Production. 80 (Food & Agriculture Org., 2005).
Rawles, W. J. & Brakensiek, D. Estimating soil water retention from soil properties. J. Irrig. Drain. Div. 108, 166–171 (1982).
Article Google Scholar
Zhao, D., Zhao, X., Khongnawang, T., Arshad, M. & Triantafilis, J. A Vis–NIR spectral library to predict clay in Australian cotton growing soil. Soil Sci. Soc. Am. J. 82, 1347–1357 (2018).
Article ADS CAS Google Scholar
Demattê, J. A., Campos, R. C., Alves, M. C., Fiorio, P. R. & Nanni, M. R. Visible–NIR reflectance: A new approach on soil evaluation. Geoderma 121, 95–112 (2004).
Article ADS CAS Google Scholar
Viscarra Rossel, R. Fine-resolution multiscale mapping of clay minerals in Australian soils measured with near infrared spectra. J. Geophys. Res. Earth Surf. 116 (2011).
Viscarra Rossel, R., Walvoort, D., McBratney, A., Janik, L. J. & Skjemstad, J. Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties. Geoderma 131, 59–75 (2006).
Viscarra Rossel, R. & Lark, R. Improved analysis and modelling of soil diffuse reflectance spectra using wavelets. Eur. J. Soil Sci. 60, 453–464 (2009).
Abdi, H. & Williams, L. J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2, 433–459 (2010).
Article Google Scholar
Næs, T. & Martens, H. Principal component regression in NIR analysis: Viewpoints, background details and selection of components. J. Chemom. 2, 155–167 (1988).
Article Google Scholar
Geladi, P. & Kowalski, B. R. Partial least-squares regression: A tutorial. Anal. Chim. Acta 185, 1–17 (1986).
Article CAS Google Scholar
Rossel, R. V. Robust modelling of soil diffuse reflectance spectra by “bagging-partial least squares regression”. J. Near Infrared Spectrosc. 15, 39–47 (2007).
Viscarra Rossel, R. & Behrens, T. Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 158, 46–54 (2010).
Tsakiridis, N. L., Keramaris, K. D., Theocharis, J. B. & Zalidis, G. C. Simultaneous prediction of soil properties from VNIR–SWIR spectra using a localized multi-channel 1-d convolutional neural network. Geoderma 367, 114208 (2020).
Article ADS CAS Google Scholar
Yang, J., Wang, X., Wang, R. & Wang, H. Combination of convolutional neural networks and recurrent neural networks for predicting soil properties using Vis–NIR spectroscopy. Geoderma 380, 114616 (2020).
Article ADS CAS Google Scholar
Shen, Z. & Viscarra Rossel, R. A. Automated spectroscopic modelling with optimised convolutional neural networks. Sci. Rep. 11, 208. https://doi.org/10.1038/s41598-020-80486-9 (2021).
Article CAS PubMed PubMed Central Google Scholar
Li, F., Wang, L., Liu, J., Wang, Y. & Chang, Q. Evaluation of leaf n concentration in winter wheat based on discrete wavelet transform analysis. Remote Sens. 11, 1331 (2019).
Article ADS Google Scholar
Meng, X. et al. Regional soil organic carbon prediction model based on a discrete wavelet analysis of hyperspectral satellite data. Int. J. Appl. Earth Obs. Geoinf. 89, 102111 (2020).
Google Scholar
Jiang, B. Geospatial analysis requires a different way of thinking: The problem of spatial heterogeneity. GeoJournal 80, 1–13 (2015).
Article Google Scholar
Song, Y., Wang, J., Ge, Y. & Xu, C. An optimal parameters-based geographical detector model enhances geographic characteristics of explanatory variables for spatial heterogeneity analysis: Cases with different types of spatial data. GISci. Remote Sens. 57, 593–610 (2020).
Article Google Scholar
Yang, Z. et al. The effect of environmental heterogeneity on species richness depends on community position along the environmental gradient. Sci. Rep. 5, 1–7 (2015).
Google Scholar
Jenny, H. Factors of Soil Formation (McGraw-Hill, 1941).
Ye, H. et al. Effects of different sampling densities on geographically weighted regression kriging for predicting soil organic carbon. Spat. Stat. 20, 76–91 (2017).
Article MathSciNet Google Scholar
Viscarra Rossel, R. & Webster, R. Predicting soil properties from the Australian soil visible-near infrared spectroscopic database. Eur. J. Soil Sci. 63. https://doi.org/10.1111/j.1365-2389.2012.01495.x (2012).
Sila, A., Pokhariyal, G. & Shepherd, K. Evaluating regression-kriging for mid-infrared spectroscopy prediction of soil properties in western Kenya. Geoderma Reg. 10, 39–47 (2017).
Article Google Scholar
Fotheringham, A. S., Brunsdon, C. & Charlton, M. Geographically Weighted Regression: The Analysis of Spatially Varying Relationships (Wiley, 2003).
Brunsdon, C., Fotheringham, A. S. & Charlton, M. E. Geographically weighted regression: A method for exploring spatial nonstationarity. Geogr. Anal. 28, 281–298 (1996).
Article Google Scholar
Bidanset, P. E. & Lombard, J. R. The effect of kernel and bandwidth specification in geographically weighted regression models on the accuracy and uniformity of mass real estate appraisal. J. Prop. Tax Assess. Admin. 11, 5–14 (2014).
Google Scholar
Brunsdon, C., Fotheringham, A. & Charlton, M. Geographically weighted summary statistics? A framework for localised exploratory data analysis. Comput. Environ. Urban Syst. 26, 501–524 (2002).
Article MATH Google Scholar
Comber, A. et al. The GWR route map: A guide to the informed application of geographically weighted regression. arXiv preprint arXiv:2004.06070 (2020).
Fotheringham, A. S., Yang, W. & Kang, W. Multiscale geographically weighted regression (MGWR). Ann. Am. Assoc. Geogr. 107, 1247–1265 (2017).
Google Scholar
Yu, H. et al. Inference in multiscale geographically weighted regression. Geogr. Anal. 52, 87–106 (2020).
Article Google Scholar
Wheeler, D. & Tiefelsdorf, M. Multicollinearity and correlation among local regression coefficients in geographically weighted regression. J. Geogr. Syst. 7, 161–187 (2005).
Article Google Scholar
Harris, P., Fotheringham, A. S. & Juggins, S. Robust geographically weighted regression: A technique for quantifying spatial relationships between freshwater acidification critical loads and catchment attributes. Ann. Assoc. Am. Geogr. 100, 286–306 (2010).
Article Google Scholar
Harris, P., Brunsdon, C., Lu, B., Nakaya, T. & Charlton, M. Introducing bootstrap methods to investigate coefficient non-stationarity in spatial regression models. Spat. Stat. 21, 241–261 (2017).
Article MathSciNet Google Scholar
Cho, S.-H., Lambert, D. M. & Chen, Z. Geographically weighted regression bandwidth selection and spatial autocorrelation: An empirical example using Chinese agriculture data. Appl. Econ. Lett. 17, 767–772 (2010).
Article Google Scholar
Lu, B., Yang, W., Ge, Y. & Harris, P. Improvements to the calibration of a geographically weighted regression with parameter-specific distance metrics and bandwidths. Comput. Environ. Urban Syst. 71, 41–57 (2018).
Article Google Scholar
Arabameri, A., Pradhan, B. & Rezaei, K. Gully erosion zonation mapping using integrated geographically weighted regression with certainty factor and random forest models in gis. J. Environ. Manag. 232, 928–942 (2019).
Article Google Scholar
Li, X. et al. Mapping soil organic carbon and total nitrogen in croplands of the corn belt of northeast china based on geographically weighted regression kriging model. Comput. Geosci. 135, 104392 (2020).
Article CAS Google Scholar
Cao, K., Diao, M. & Wu, B. A big data-based geographically weighted regression model for public housing prices: A case study in Singapore. Ann. Am. Assoc. Geogr. 109, 173–186 (2019).
Google Scholar
Ge, Y. et al. Geographically weighted regression-based determinants of malaria incidences in northern China. Trans. GIS 21, 934–953 (2017).
Article Google Scholar
Viscarra Rossel, R. A. & Hicks, W. S. Soil organic carbon and its fractions estimated by visible–near infrared transfer functions. Eur. J. Soil Sci. 66(3), 438–450 (2015).
Article CAS Google Scholar
Wight, J. P., Ashworth, A. J. & Allen, F. L. Organic substrate, clay type, texture, and water influence on NIR carbon measurements. Geoderma 261, 36–43 (2016).
Article ADS CAS Google Scholar
Costa, L. R., Tonoli, G. H. D., Milagres, F. R. & Hein, P. R. G. Artificial neural network and partial least square regressions for rapid estimation of cellulose pulp dryness based on near infrared spectroscopic data. Carbohyd. Polym. 224, 115186 (2019).
Article CAS Google Scholar
Murphy, R. J., Schneider, S., Taylor, Z. & Nieto, J. Mapping clay minerals in an open-pit mine using hyperspectral imagery and automated feature extraction. In Vertical Geology, From Remote Sensing to 3D Geological Modelling. Proceedings of the first Vertical Geology Conference, Lausanne, Switzerland, 5–7 (2014).
Todorova, M. H. & Atanassova, S. L. Near infrared spectra and soft independent modelling of class analogy for discrimination of chernozems, luvisols and vertisols. J. Near Infrared Spectrosc. 24, 271–280 (2016).
Article ADS CAS Google Scholar
Stenberg, B., Viscarra Rossel, R., Mouazen, A. & Wetterlind, J. Visible and Near Infrared Spectroscopy in Soil Science, vol. 107 (Academic Press, 2010).
Harris, P., Fotheringham, A., Crespo, R. & Charlton, M. The use of geographically weighted regression for spatial prediction: An evaluation of models using simulated data sets. Math. Geosci. 42, 657–680 (2010).
Article MathSciNet CAS MATH Google Scholar
Department of Primary Industries and Regional Development, Western Australia. South West Agricultural Region (dpird-008) (2020).
Australian Bureau of Statistics. Value of Agricultural Commodities Produced, Australia (2020).
Department of Primary Industries and Regional Development, Western Australia. Western Australian Wheat Industry (2019).
Rayment, G. E. & Lyons, D. J. Soil Chemical Methods—Australasia (CSIRO Publishing, 2010).
Dolui, S. et al. Structural correlation-based outlier rejection (score) algorithm for arterial spin labeling time series. J. Magn. Reson. Imaging 45, 1786–1797 (2017).
Article PubMed Google Scholar
Pollet, T. V. & van der Meij, L. To remove or not to remove: The impact of outlier handling on significance testing in testosterone data. Adapt. Hum. Behav. Physiol. 3, 43–60 (2017).
Article Google Scholar
Leys, C., Ley, C., Klein, O., Bernard, P. & Licata, L. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J. Exp. Soc. Psychol. 49, 764–766 (2013).
Article Google Scholar
Mallat, S. G. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11, 674–693 (1989).
Article ADS MATH Google Scholar
Whitcher, B. waveslim: Basic Wavelet Routines for One-, Two-, and Three-Dimensional Signal Processing (2020).
O’brien, R. M. A caution regarding rules of thumb for variance inflation factors. Qual. Quant. 41, 673–690 (2007).
Akinwande, M. O. et al. Variance inflation factor: As a condition for the inclusion of suppressor variable (s) in regression analysis. Open J. Stat. 5, 754 (2015).
Article Google Scholar
Webster, R. & Oliver, M. A. Sample adequately to estimate variograms of soil properties. J. Soil Sci. 43, 177–192 (1992).
Article Google Scholar
Atteia, O., Dubois, J.-P. & Webster, R. Geostatistical analysis of soil contamination in the Swiss Jura. Environ. Pollut. 86, 315–327 (1994).
Article CAS PubMed Google Scholar
Brunsdon, C., Fotheringham, S. & Charlton, M. Geographically weighted regression-modelling spatial non-stationarity. J. R. Stat. Soc. Ser. D (The Statistician) 47, 431–443 (1998).
Google Scholar
Gollini, I., Lu, B., Charlton, M., Brunsdon, C. & Harris, P. Gwmodel: An r package for exploring spatial heterogeneity using geographically weighted models. arXiv preprint arXiv:1306.0413 (2013).
Lu, B., Harris, P., Charlton, M. & Brunsdon, C. The GWmodel R package: Further topics for exploring spatial heterogeneity using geographically weighted models. Geo Spat. Inf. Sci. 17, 85–101 (2014).
Article Google Scholar
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer Science & Business Media, 2009).
Burnham, K. P. & Anderson, D. R. A practical information-theoretic approach. Model selection and multimodel inference 2 (2002).
R Core Team. R: A Language and Environment for Statistical Computing. (R Foundation for Statistical Computing, Vienna, Austria 2020).
Mevik, B.-H., Wehrens, R., Liland, K. H. & Hiemstra, P. pls: Partial Least Squares and Principal Component Regression (2020).
Bivand, R., Yu, D., Nakaya, T. & Garcia-Lopez, M. spgwr: Geographically weighted regression. R Package Version 0.6-34. http://cran.r-project.org/web/packages/spgwr/. Accessed August 30th 2020 (2020).

Download references

Acknowledgements

This work was supported by funding from Curtin University.

Author information

Authors and Affiliations

School of Design and the Built Environment, Curtin University, GPO Box U1987, Perth, WA, 6845, Australia
Yongze Song & Peng Wu
Soil and Landscape Science, School of Molecular and Life Sciences, Curtin University, GPO Box U1987, Perth, WA, 6845, Australia
Zefang Shen & R. A. Viscarra Rossel

Authors

Yongze Song
View author publications
You can also search for this author in PubMed Google Scholar
Zefang Shen
View author publications
You can also search for this author in PubMed Google Scholar
Peng Wu
View author publications
You can also search for this author in PubMed Google Scholar
R. A. Viscarra Rossel
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.S. and R.A.V.R. designed the experiments and data analyses. R.A.V.R. supervised the research. Y.S. and Z.S. carried out the experiments and Y.S. drafted the manuscript. R.A.V.R., Z.S., and P.W. revised the manuscript. All authors reviewed the manuscript before submitting.

Corresponding author

Correspondence to R. A. Viscarra Rossel.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Song, Y., Shen, Z., Wu, P. et al. Wavelet geographically weighted regression for spectroscopic modelling of soil properties. Sci Rep 11, 17503 (2021). https://doi.org/10.1038/s41598-021-96772-z

Download citation

Received: 08 April 2021
Accepted: 09 August 2021
Published: 01 September 2021
DOI: https://doi.org/10.1038/s41598-021-96772-z
Springer Nature Limited

This article is cited by

Geographically Optimal Similarity
- Yongze Song
Mathematical Geosciences (2023)

Wavelet geographically weighted regression for spectroscopic modelling of soil properties

Abstract

Similar content being viewed by others

Improving the accuracy of soil organic carbon content prediction based on visible and near-infrared spectroscopy and machine learning

Hyperspectral analysis of soil organic matter in coal mining regions using wavelets, correlations, and partial least squares regression

Prediction of Soil Properties by Visible and Near-Infrared Reflectance Spectroscopy

Introduction

Results