1 Introduction

Soil properties play a crucial role in efficient farming having various aspects. However, efficient farming practices are possible when soils have sufficient amount of vital macro- and micronutrients. Soil nitrogen is one of the important macronutrient which helps to the growth of crops [1, 2] and healthy practices in agricultural soil. Moreover, the nutrient absorption process of crops/plants is fulfilled by the SNs [1]. Therefore, the precise extraction and estimation of SNs are imperative for precision farming, crop growth and yield estimation, evaluations of soil health, etc. Nevertheless, soils may vary according to spatio-temporal dynamics as compared to water and air. The extractions of soil physicochemical nutrients by conventional methods are risky task due to harmful chemicals. Furthermore, these conventional methods are complicated, laborious, time consuming and expensive. They do not offer the spatio-temporal information of soil nutrients [3,4,5].

Recently, Remotely Sensed Hyperspectral Reflectance Spectra (RSHRS) has provided valuable source in soil sciences to replace the customary ways of soil study for overcoming several limitations. The RSHRS is generally known as visible-near-infrared (VNIR) reflectance spectroscopy. The RSHRS is more precise, non-wasting, rapid and inexpensive than conventional methods. In addition, the single reflectance spectra of soil provide the additional information of soil physicochemical nutrients [4, 6, 7]. Nonetheless, the physicochemical nutrients of soil may be the reason of reducing the reflectance spectra. Additionally, the measurement of soil reflectance spectra is difficult owing to circumstances of controlled laboratory which gives inconsistent characteristics in soil [8,9,10].

Therefore, the use of reflectance spectra was mostly carried out for the determination of soil physicochemical nutrients by the research community of soil science. Several soil scientists have focused the usage of RSHRS for predicting the SN from various land sectors. The researchers [1] have used ASD FieldSpec 3 data for monitoring the soil nitrate nitrogen in the apple orchards with FDT and multivariate scatter correction preprocessing methods along with support vector machine. An assessment of various soil properties including soil nitrate nitrogen was done through VNIR-MIR or combined reflectance spectroscopy using PLSR modeling [4]. Similarly, PLSR based model was designed for the modeling of soil parameters like soil labile nitrogen, carbon, organic matter, silt and clay in subtropical coastal wetlands [3]. In addition, several soil properties were mapped using hyperspectral airborne image and ASD data for the prediction of soil attributes [10]. Moreover, quantitative mapping of SN contents was carried out using ASD field Spectroradiometer and Hyperion image with multivariate statistical analysis [2]. However, spatio-temporal inconsistencies of soil groups and methods used for detecting and estimating the SNs have many constrains. Additionally, agricultural soils were often not used for rapid identification of SNs. Under this constraint, an effort has been made on developing a novel model to predict and analyze nitrogen from several agricultural soils. The methodology is found to be better than other reported techniques [1,2,3,4, 11].

The present study aims (1) to use the VNIR spectroscopy for SN extraction from agricultural soils, (2) to use the Savitzky–Golay method for eliminating the noise from soil reflectance spectra, (3) to develop the models using PLSR method from RSHRS to extract and predict the SNs and (4) to verify and validate the accuracy of developed models. This paper is organized as follows. Section first gives the background of the study with set objectives for present research. The studied area, laboratory analysis and measurements of reflectance spectra are provided in section two. Section three highlights the methodology along with preprocessing, statistical analysis, development of PLSR models and evaluation of accuracy. The results are discussed in section four. Section five concludes the present study along with its future enhancement.

2 Study area and data used

2.1 Area of interest (AOI)

The agricultural soils of Phulambri Tehsil of Aurangabad District, Maharashtra India were selected as our area of interest. The AOI (Fig. 1) was of size 62.35 km2 geo-located at 19°28′43.27"–20°24′52.19" N latitude and 75°13′10.75"–75°30′14.87" E longitude. The soil of the present region is black cotton with sandy loam texture. The environmental conditions were semiarid with 580 m elevation above the sea level. The temperature falls between 17° C to 33° C with 710 mm rainfall [12]

Fig. 1
figure 1

Area of interest and region of field campaign

2.2 Field campaign and laboratory analysis

The tenure of field campaign was chosen as February–March, 2015, in between 0800 to 1330 (IST) due to clear environmental conditions. The farming sites of AOI were considered for collecting the soil samples from the topsoil’s (0–20 cm) in airtight artificial bags. The crops were harvested when soil sampling was done. The number of 74 soil samples has been collected from the studied region. These samples were passed using 2 mm sieve after the air-dried. Subsequently, two groups of samples were similarly generated to predict the reflectance spectra of soil and SNs. The calibration and validation set was done from the random samples on 15 and 59 soil samples correspondingly. The Thermo Quest EA 1112 elemental analyzer was used for analyzing the SNs at standard soil testing laboratory.

2.3 Measurements of reflectance spectra using ASD instrument

The reflectance spectra of soil samples were obtained using ASD field-spec-4 Spectroradiometer instrument [13] in the controlled laboratory conditions. The device provides high spectral resolution at the 350–2500 nm VNIR spectrum. These spectral measurements of soil samples were done instantly after the soil sampling. The 20-cm diameter and 2-cm thickness size Petri dish was used to scan the soil spectra at the bottom of the Field of View (FOV). In our case, we have used 8° off-nadir FOV to scan the samples from the distance of 15 cm above the samples. The soil samples were illuminated using 75 W tungsten quartz halogen lamp. The distance between samples from the lamp was set as 45 cm with 60° of Zenith angle. Prior to scanning the samples, white reference panel was used to optimize and calibrate the device [14]. However, samples were recorded with ten epochs and subsequently averaged to obtain pure reflectance using RS3 software.

3 The methodology

3.1 Spectral preprocessing

Generally, the device produces background or low signal-to-noise; hence, spectral pre-treatment is essential to remove the said noises. Moreover, said causes were eliminated while removing the fringe spectra related to 350–399 nm and 2451–2500 nm wavebands. We obtained 2051 pure wavelength for the each spectra ranged between 400–2450 nm [14]. The ASCII files were generated from files with .asd format corresponding to each reflectance spectra for further analysis. The spectral smoothing and its renovation were done using SG method along with FDT method [15] in Unscrambler X 10.5 software CAMO, Norway. The order of polynomial was set as 2 for SG method. The first and second smoothing point was 2 and 5. The order of derivative was set as one. The absorption features of SNs were increased by FDT method. Both the methods were jointly used to predict the SNs.

3.2 Statistical analysis

Two separate groups were generated from 74 random soil samples for calibration (15) and validation (59) sets. The leave-one-out cross-validation (LOOCV) approach was used on all samples which were used in calibration set. Subsequently, correlation study was carried out between nitrogen and their predictor variables (reflectance spectra) using statistical analysis. The PLSR-based models were generated for the rest of the processing.

3.3 Development of models based on PLSR method

The PLSR-based models were developed after the spectral renovations with LOOCV method. Subsequently, the spectral data and reference (laboratory) data of soil were calibrated using the developed models. The PLSR is linear multiple regression method which requires fewer latent variables and used when independent variables are more. The models were designed using Eqs. 1 and 2 for decomposing the block of predictors (Eq. 1)

$$X = T \cdot P^{T} + {\text{Residuals}}(E)$$
(1)

and predict the response variables (Eq. 2)

$$Y = T \cdot C + {\text{Error}}(f)$$
(2)

where X and Y are the mean-centered matrices which contain the predictor and response variables, respectively. P and C are the factor loadings; E and f are the matrices of residuals and errors, respectively [3, 4, 12, 16].

The LOOCV method was used to identify and analyze the nitrogen sensitive bands of spectra and to resolve the number of features for conserving the calibration models. The PLSR method generated number of features and models while developing the suitable model. Each model provided one additional feature from the earlier one. Hence, we obtained more models; among those, we have tested eight features with the computation of RMSEP. The suitable cross-validated calibration model was selected using RMSEP. The final model was selected based on highest accuracy of predicted R2 and lowest RMSEP of cross-validation model. Finally, the optimum number of PLS features were determined. The RMSEP, R2 values and LOOCV outcomes were used for evaluating the results obtained through PLS with all the nitrogen sensitive bands.

3.4 Validation of prediction models

The performance evaluation of developed prediction models were verified using R2 values, ratio of performance to deviation (RPD) and RMSEP by Eqs. (35), respectively. Hence, the highest R2 values and lowest RMSEP were used to select the suitable model. The highest accuracy of the model shows stability and correctness of the model [12].

$$R^{2} = 1 - \frac{{\sum\nolimits_{i = 1}^{n} {(Y{\text{meas}}} - Y{\text{pred}})^{2} }}{{\sum\nolimits_{i = 1}^{n} {(Yi} - Y_{{{\text{mean}}}} )^{2} }}$$
(3)
$${\text{RMSEP}} = \sqrt {\frac{1}{n}\sum\nolimits_{i = 1}^{n} {(Y_{{{\text{pred}}}} } - Y_{{{\text{meas}}}} )^{2} }$$
(4)
$${\text{RPD } = \text{ SD}}_{{{\text{val}}{.}}} {\text{/RMSEP}}$$
(5)

where \(Y_{{{\text{meas}}}}\) is the measured values (SN), \(Y_{{{\text{pred}}}}\) is the predicted theoretical values, \(Y_{{{\text{mean}}}}\) is the mean of measured values, \(n\) is the number of measured/predicted values with i = 1, 2, 3,…,n, \({\text{SD}}_{{{\text{val}}{.}}}\) is standard deviation of measured values in the validation set, and RMSEP is the root mean square error of prediction of validation dataset [12, 17].

4 Experimental results and its discussion

The statistical outcomes of the determination of nitrogen were derived through laboratory methods. The laboratory analysis was done on 74 soil samples resulted in Table 1.

Table 1 Statistics of soil nitrogen

Fig. 2 shows redundant noise eliminated spectra of 74 samples as discussed in section III-A. The edge-removed spectra were used for further processing to avoid the redundant noises. However, there was also some confusion in fringe-removed spectra due to more spectral bands. This task is complicated for computation. Hence, we have used continuum-removal (CN) method to detect the sensitive bands related to nitrogen (Fig. 3).

Fig. 2
figure 2

The edge-band-removed spectra (400–2450 nm)

Fig. 3
figure 3

The reflectance spectra of soil samples using continuum-removed method

The CN method is a means of stabilizing reflectance spectra for allowing the assessment of particular absorption characteristics from regular baseline [18]. The detection of absorption characteristics of soils is done by the CN method. Also the calculation of band depth absorption was carried out using this method. As a result, the absorption features were directly identified from every spectrum along with its CN spectra. The water absorption channels were identified at 1417, 1914 and 2209 nm bands. Moreover, the clay of soil was also detected at 2200–2329 nm [9, 19]. Conversely, nitrogen sensitive bands were observed at the wavelengths 451–491, 511–521, 536, 612–658, 980–997, 1209–1376, 1472–1795, 1901, 2210, 2296 and 2382 nm [9]. Hence, further analysis was done on nitrogen sensitive bands.

Consequently, preprocessed (edge-removed and FDT-SG smoothing) spectra were used for developing the PLSR models. The overlapped and nonchemical affected bands were eliminated using FDT-SG method to smooth the spectra. The de-noised and renovated spectra of 74 soil samples are shown in Fig. 4. The spectral characteristics of these spectra have been preserved. It was noticed that the spectral smoothness was enhanced with the use of FDT-SG method. Moreover, the nitrogen sensitive bands were also preserved significantly.

Fig. 4
figure 4

Renovated spectrum by FDT-SG smoothing method

Subsequently, the correlation study was done between spectra and nitrogen of the samples. Since the correlation analysis is the reliable approach for studying the correlation between dependent and independent variables. It was observed by the obtained results, the SN features were enhanced by the renovated spectra. The suitable bands were identified using the correlation study of nitrogen features and transferred spectra. Hence, the correlation study provided suitable bands at the wavelengths 480 nm, 511 nm, 653 nm, 997 nm, 1472 nm, 1795 nm, 2210 nm and 2296 nm. The highest correlation for the SN (R2 = 0.95) was found at wavelength 997 nm.

The random cross-validation method with PLS kernel were used for developing the models. Afterward, the prediction performance (Table 2) of models was evaluated for both (raw and renovated) the spectra. However, the raw spectra were not sufficient for SN calibration. The preprocessed spectra have given satisfactory SN calibrations. The calibration accuracy was only 72% (R2 = 0.72) on the raw spectra with 3.95 RMSE values. The obtained calibration (R2 values and RMSE) and validation (R2 values, RMSEP, SD and RPD) statistics of raw and renovated spectra have been summarized in Table 2.

Table 2 Accuracy evaluations of nitrogen features based on PLSR model

The scatter plots of the measured values and RSHRS predicted values of SNs on both the spectra are shown in Figs. 5 and 6 correspondingly. The R2 values were higher for all the PLSR models in calibration sets. However, the prediction accuracy (R2 = 0.68 and RMSEP = 4.34) was found to be low for raw spectra. Since the nitrogen was moderately calibrated and validated (Table 2) after the spectral renovation. Furthermore, the verification performances of developed models for the prediction of SN were evaluated by cross-validation R2, RMSEP and RPD values (Table 2). The percentage of variation in the dependent variable was demonstrated using R2 values which was accounted for the independent variables. The RMSEP value estimates the variation of predicted values around measured values, and this approaches to zero [17].

Fig. 5
figure 5

The predicted and measured SN values scatter plot based on PLSR modeling before preprocessing

Fig. 6
figure 6

The predicted and measured SN values scatter plot based on PLSR modeling after preprocessing

The ability of RSHRS was shown by the RPD values to predict the values of soil properties [20]. The RPD is grouped into three ranges > 2.0, 1.4–2.0, and < 1.4 which indicate the excellent, acceptable and poor predictions, respectively. It was recommended that the predictions between the ranges 1.4–2.0 could be achieved by various calibration approaches [20]. Conversely, the properties of the ranges having < 1.4 may not be reliably predicted using reflectance spectra [20]. The results indicate that the RPD values were better for this calibration model for both types of the spectra. The RPD values were 2.13 and 2.14 for raw and renovated spectra, respectively. Alternatively, the calibration and prediction accuracy was enhanced for renovated spectra. The prediction accuracy was 94% (R2 = 0.94) for renovated spectra along with RMSEP = 1.56 and RPD = 2.14 which is satisfactory for agricultural areas. Hence, it may be highlighted according to the verification results, the pre-treatment is essential for the estimation of soil nutrients. Our results were found to be satisfactory to predict the SNs for large area with acceptable accuracy.

Thus, the model is accurate and reliable to estimate the SNs from farming sectors. Our results were obtained with accuracy of (R2 = 0.68 and 0.94) for both classes of the datasets. The result (R2 = 0.95) obtained in previous study [1] was similar to our results. However, the methods used in study [1] were different and may be the reason for related outcome. The prediction accuracy (R2 = 0.94) of the present study for SN is better than the reference [3] which was studied for densely covered area with coastal wetlands. The study [11] demonstrated that the use of PLSR along with back propagation neural network (BPNN) and genetic algorithm (GA)-BPNN was sufficient for the prediction of soil nitrogen. However, the prediction accuracies (R2) were 0.25, 0.65 and 0.82 for PLSR, BPNN and GA-BPNN methods, respectively [11], which are inferior to our results. Similarly, total nitrogen was accurately predicted [21] using local correlation maximization-complementary superiority method (LCMCS), PLSR, local correlation maximization (LCM) de-noising method and complementary superiority (CS) method with 0.88, 0.74, 079 and 0.81 prediction (R2) results, respectively. It is clearly seen that R2 values of our study are slightly superior to the values reported in [21]. The performance of our model is superior to the results (R2 = 0.92 and 0.93) obtained in references [22] using PLS and extreme learning machine models. Thus, the present methodology performed better than results reported earlier [2, 4, 23,24,25].

5 Conclusions and future scope

The RSHRS approach was successfully implemented on 74 soil sample for the estimation of nitrogen contents in agricultural areas. The contribution of the present study is the new model for estimating the nitrogen for agricultural sites. The absorption bands due to nitrogen were successfully identified using the (CN) method. Hence, it is observed that the (CN) method is needed to detect the absorption bands for soil nutrients especially nitrogen. The correlation study is also needed for identifying nitrogen sensitive bands. Additionally, the preprocessing is significant for better correlation of nitrogen features and models. The present approach highlights the significance of reflectance spectra as compared to the conventional methods. It is found that the nitrogen sensitive spectral bands are at the wavelengths 480 nm, 511 nm, 653 nm, 997 nm, 1472 nm, 1795 nm, 2210 nm and 2296 nm. The prediction and calibration accuracy are enhanced with the use of FDT-SG method. The present PLSR model was robust and more appropriate for the estimation of nitrogen features. It is observed that the SN can be estimated very well for large areas with better results. The present study is expected to be useful for healthy farming practices. In future, the model can be implanted using satellite or airborne hyperspectral images in large area.