Evaluation of Feature Selection and Feature Extraction Techniques on Multi-Temporal Landsat-8 Images for Crop Classification
- 298 Downloads
Recent advancements in the remotely sensed data products and machine learning algorithms are utilized effectively for classification of crops over a considerable large area. This article proposes the use of feature extraction techniques to be employed on the multi-temporal Landsat-8 OLI sensor’s surface reflectances and derived Normalized Difference Indices datasets to classify different crop types. Numerous dimension reduction techniques, viz., feature selection (random forest and PIC measure based), linear (principal component analysis (PCA) and independent component analysis) and nonlinear feature extraction (kernel PCA and Autoencoder), are evaluated to detect most favourable features which should be apt for classification of crops. Subsequently, the detected features are used in a promising nonparametric classifier, support vector machine, for crop classification. It has been found that all the evaluated feature extraction techniques, employed on the multi-temporal datasets, result in better performance compared to feature selection-based approaches. PCA, being a simple and efficient feature extraction algorithm, is well-suited in this classification study and extracted features can classify the crops with an average overall accuracy of 94.32%. Most of the crop types achieve user and producer accuracy of more than 90%. Multi-temporal images prove to be more advantageous compared to the single-date imagery for crop identification.
KeywordsCrop classification Multi-temporal images Landsat-8 OLI sensor Normalized difference indices Feature selection and feature extraction techniques Support vector machine
Agriculture plays a crucial role in food demand/availability, farmer’s livelihood, overall economic growth and sustainable development of a developing country like India where population is expanding rapidly. Identification and classification of crops are very imperative to assess the soil, water and climatic requirements of different crops and to recognize the crop’s growing characteristics. Detailed and precise crop distribution information at global, national and regional scales have significant advantages for many applications such as irrigation scheduling, crop area estimation, crop yield forecasting, crop modelling, sustainable natural resource management, crop insurance planning and drought risk analysis . Agricultural crop management is an integral part of earth system processes. Hydrological variables (e.g. precipitation, soil moisture, evapotranspiration), which are the important component of earth system processes, play a major role in growth of vegetation or crops. Adequate supply of water is crucial for optimal growth of crops and irrigation water requirement varies substantially for different agricultural fields. Optimal use of irrigation water will help in reducing the wastage of natural resources of water in the Earth. Crop mapping information can help in optimizing the supply of water, estimating the crop production statistics and also in identifying the factors influencing crop stress or damage. Cropland mapping is necessary to carry out social actions or policies as well as estimating the yield of different types of crops in a particular area . Further, crop maps can be integrated into a variety of environmental models for better understanding of the overall agricultural response to environmental issues .
Remote sensing data contributes significantly to understand different earth system processes (e.g. estimation of different hydrological variables, land use land cover change and its cause and affect etc.). Crop mapping would have been a very challenging job in the absence of remotely sensed satellite data since remote sensing data offers an efficient and reliable spatial and temporal collection of information at regional or country or global-scale. Remotely sensed satellite images have shown potential in producing precise cropland maps at both regional and local scales with moderate and high spatial resolution [10, 12]. The spectral signatures of different crops, monitored by remote sensing sensors vary with the changes in the phenology (i.e. growth-stage) and health of crops, and this information will be beneficial in irrigation management and crop yield prediction.
Spectral classification of crop types from satellite-based multispectral images can help in site-specific water supply management and yield forecasting . Single-date satellite imagery does not show satisfactory performance in discriminating crop types. Different crop types may show quite similar spectral response patterns with the multispectral data of coarse spectral resolution during some specific phenological or growth stages, which limits the crop classification accuracy of a single-date imagery [4, 17]. The use of satellite image time series, which provides spectral-temporal profiles of the crops at different stages of growth, emerges as a promising approach in remote sensing-based crop type classification . The potential of vegetation indices (e.g. Normalized Difference Vegetation Index (NDVI)) time-series data from high temporal resolution moderate resolution imaging spectroradiometer (MODIS) sensor had been evaluated for crop mapping in different regional studies [3, 27, 31, 35]. Hao et al.  proposed random forest-based approach for feature selection of time-series MODIS data for crop classification. MODIS data are not effective in case of crop type heterogeneity mapping in a fine spatial resolution. Hence, high spatial resolution satellite (e.g. Landsat, ASTER) data had been used in numerous studies to classify different types of crops [17, 18, 19, 21, 30]. The Landsat-8 operational land imager (OLI) sensor acquired multi-temporal surface reflectances of spectral bands , phenology-based derived vegetation indices [5, 24] and combination of both surface reflectances and vegetation indices  were used in different studies to distinguish different type of croplands. Waldhoff et al.  evaluated a multi-data approach for remote sensing-based regional crop rotation mapping. Landsat-8 data along with synthetic aperture radar (SAR) data (e.g. Radarsat-2, Sentinel-1) was evaluated for crop classification [12, 26]. High spatial resolution Sentinel-2 data time-series were also used in different studies for crop identification and monitoring [25, 29]. However, spatial and temporal availability of Sentinel-2 and SAR data for all locations and time periods are limited, hence readily accessible multi-temporal Landsat-8 images are used for crop classification in this study. Multi-temporal dataset increases the dimensionality of the features, which can increase the computational complexity and time for classification. Feature selection and feature extraction techniques have the potential to identify or extract the optimal features from the high-dimensional multi-temporal dataset and the identified optimal features can be used for crop classification to achieve better performances. Different feature extraction techniques, viz., principal component analysis (PCA), independent component analysis (ICA), kernel PCA and autoencoder (AE) are extensively used in hyperspectral image classification [1, 14], but not emphasized enough in case of multi-temporal multispectral image classification. Numerous studies reported that ICA , kernel PCA  and AE  can learn more efficient features from the hyperspectral data compared to PCA, but this presumption is controlled by the used dataset which may alter for different datasets. Hence these techniques are evaluated in this study to achieve optimal classification performance.
Different parametric (e.g. linear discriminant analysis) and nonparametric classifiers (e.g. decision trees, random forest, support vector machine (SVM) and neural networks) have been successfully applied so far for different crop classification studies [9, 17, 33]. The performances of these classifiers depend on the availability of adequate reference data. Several studies confirmed that SVM classifier performed better than other conventional classification algorithms for crop classification [7, 8, 15, 25, 29, 36].
The primary objective of this paper is to classify seven types of crops utilizing the multi-temporal Landsat-8 images. Features identified from the surface reflectances of the spectral bands covering visible to shortwave infrared region and the derived Normalised Difference Indices (NDIs) dataset are used in the classifier model for crop classification. Hence, another objective of this work is evaluation of different feature selection and feature extraction techniques to be employed on the multi-temporal dataset for selection or extraction of salient features. Evaluation of dimension reduction techniques are carried out by comparing the crop classification performances. The nonparametric and nonlinear supervised classifier, SVM, has been used for classification of the crops.
The remainder of this paper is structured as follows. Section 2 provides the details about the study area and the used datasets. In Section 3, the methods used in this study are discussed. In Section 4, detailed results of all the experiments along with discussions are reported. Finally, Section 5 summarizes the conclusions.
2 Study Area and Dataset
Landsat-8 data acquisition date and path/row details
Details of the ground-truth samples of different type of crops
3.1 Identifying Features for Classification
Four different approaches (viz., I: surface reflectances of all the bands from a single-date imagery; II: surface reflectances of all the bands along with optimal NDIs of a single-date imagery; III: optimal sets of spectral bands and NDIs selection from multi-temporal images; and IV: feature extraction from multi-temporal spectral bands and NDIs) are evaluated to identify the most appropriate features from the multi-temporal dataset in order to achieve the optimal classification performance. All the approaches are performed considering NDIs along with surface reflactances of spectral bands except Approach-I, which is evaluated only using surface reflectances. NDIs dataset can provide some additional information about the crop phenology which will be helpful in discriminating different crops. The first two approaches are evaluated using only single-date imagery, whereas the next two approaches use multi-temporal images for classification of crops. In case of multi-temporal images, number of features (i.e. surface reflectances of spectral bands and NDIs) increases which introduces the issue of high dimensionality. Therefore, feature selection and feature extraction techniques are applied in the Approach-III and IV respectively to obtain the salient features representing the multi-temporal dataset by reducing the dimensionality. Two of the most popular linear feature extraction techniques, viz., PCA and ICA  and two nonlinear feature extraction techniques, viz., kernel PCA and AE are assessed to extract the optimal features from the multi-temporal dataset. The details of these approaches are explained below:
Surface reflectances of all the seven spectral bands of single-date imagery are used as features for the classification of crops. Hence, three classification experiments are performed considering the imagery of three different months.
ρ1 and ρ2 are reflectances of two different spectral bands.
Approach-IV is also evaluated with three additional experiments by replacing the feature extraction technique PCA with ICA, kernel PCA and AE to assess the impact of different feature extraction techniques for crop classification. All these experiments are performed by extracting the same number of features from each month’s dataset in order to avoid the changes in classification performances due to changes in number of features. These feature extraction techniques are briefly explained in the following sub-sections and the details about these algorithms can be found in the respective cited literatures.
3.2 Partial Informational Correlation
In this study, PIC measure based stepwise.PIC function from R-package ‘NPRED’  has been used for feature selection. PIC is a supervised feature selection technique, where the features (i.e. sureface reflectances of spectral bands or NDIs) are selected based on the conditional dependence with the labelled samples’ class types. PIC selects the features where each feature is unique and collectively they provide sufficient information content to classify the data.
3.3 Principal Component Analysis
PCA is the most commonly used dimension reduction technique because of its simplicity and ease of application . PCA is a statistical technique that transforms the correlated variables to a set of linearly uncorrelated variables, which are known as PCs. The PCs are generally linear combinations of the original variables. First few PCs retain maximum variance of the data, which assist the purpose of dimension reduction and help in classification process with more efficiency.
3.4 Independent Component Analysis
ICA is a statistically powerful technique for revealing underlying hidden information from a set of random variables. ICA extracts uncorrelated and independent features from the set of original variables by performing linear transformation. ICA is distinguishable from PCA, since it uses non-Gaussian structure of data to recover underlying components. Reconstruction ICA algorithm of MATLAB  has been used to extract the optimal features from the multi-temporal dataset.
3.5 Kernel PCA
Kernel PCA is a nonlinear form of PCA which generalizes the standard PCA to nonlinear dimensionality reduction . PCA only assesses the second-order statistic, whereas a nonlinear version of PCA can capture a part of higher-order statistic and thus can provide a better representation of the data . In order to capture the higher-order statistic, variables are mapped to nonlinear feature space using kernel trick and then PCA is performed on the data of the nonlinear feature space to extract the nonlinear features. In this study, Gaussian kernel is used as kernel function and the parameter is varied from 0.002 to 200 and optimally fixed as 2 based on the classification performance.
AE is a deep learning-based model and can be considered as a nonlinear generalization of PCA . Typically, AE has a neural network architecture, where original variables are considered as input layer and these are mapped to hidden layer using a nonlinear activation function and then dataset of hidden layer tries to reconstruct the original variables in the output layer, and in this process the set of variables, created in the hidden layer, are nonlinear representation of the original variables. The nonlinear features are further used in the classifier model.
3.7 Support Vector Machine Classifier
SVM is a nonparametric, linear binary classifier in its simplest form. The capability of SVM classifier has been tested in numerous studies for crop type mapping , which is a multiclass problem. In case of multiclass classification problem, error correcting output codes (ECOC) model splits up the problem into a set of binary classifiers and evaluates the overall performance. Further, kernel functions can be used in the SVM classifier model to deal with the nonlinear dataset. In this study, radial basis function (RBF) or Gaussian kernel, which is most popularly applied, is used in the SVM classifier to classify the nonlinearly separable dataset . The parameters of RBF-SVM classifier are searched in the range of 10−3–103 and optimal values are selected using the Bayesian optimization technique. The Bayesian optimization technique considers the underlying probabilistic model for the objective function as a Gaussian process and the ‘expected-improvement-per-second-plus’ as acquisition function.
3.8 Experiment Setup
All the experiments are carried out in MATLAB environment (version 9.4, 64-bit), with Intel(R) Core(TM) i5-4460 CPU @3.20 GHz Processor, 16.00 GB memory (RAM) and NVIDIA GeForce 210 graphics card. All the labelled samples are divided into two parts, viz., training dataset (50% of the labelled samples from each class) and testing dataset (rest of the labelled samples) [Table 2]. Ten trials are performed considering ten sets of random sampling for partitioning of the training and testing data. Training datasets are used for classification model building and later testing datasets are utilized for performance evaluation of the classifier. Performance evaluation measures overall accuracy (OA), average accuracy (AA) and kappa coefficient (k) are estimated for each experiment . Mean and standard deviation (SD) of these measures are calculated from the results of ten sets of testing datasets and reported for each experiment. User accuracy (UA) and producer accuracy (PA) are also evaluated for each crop. McNemar test is performed to verify whether the differences (i.e. increase or decrease) in classification performances of two different approaches are statistically significant or not.
4 Results and Discussion
4.1 Classification Performances of Different Approaches
Crop classification performances from different approaches
Feature details (spectral bands or NDI’s band combinations)
All the 7 spectral bands (Dec-2015)
82.53 ± 1.73
0.7504 ± 0.0232
74.19 ± 8.33
All the 7 spectral bands (Jan-2016)
89.19 ± 1.62
0.8483 ± 0.0231
81.42 ± 4.19
All the 7 spectral bands (Feb-2016)
82.51 ± 2.30
0.7530 ± 0.0322
76.34 ± 5.31
7 spectral bands along with optimally selected NDIs (band 7&1, 6&3, 4&2, 7&4) (Dec-2015)
84.19 ± 1.45
0.7732 ± 0.0218
79.25 ± 7.08
7 spectral bands along with optimally selected NDIs (band 6&2, 5&4, 7&1, 7&6, 6&5) (Jan-2016)
90.41 ± 1.60
0.8653 ± 0.0220
84.09 ± 8.34
7 spectral bands along with optimally selected NDIs (band 6&2, 6&5, 6&4, 7&4, 6&1) (Feb-2016)
83.47 ± 3.89
0.7634 ± 0.0627
77.79 ± 6.13
5 spectral bands and 5 NDIs selected from multi-temporal images using PIC
90.02 ± 1.89
0.8591 ± 0.0271
85.58 ± 4.89
Total 10 and 13 PCs extracted from spectral bands and NDIs of multi-temporal images respectively
94.32 ± 1.73
0.9199 ± 0.0097
89.94 ± 2.18
Total 10 and 13 features extracted from spectral bands and NDIs of multi-temporal images respectively using ICA
93.93 ± 0.94
0.9143 ± 0.0135
88.69 ± 2.44
Total 10 and 13 features extracted from spectral bands and NDIs of multi-temporal images respectively using kernel PCA
94.13 ± 0.65
0.9170 ± 0.0093
89.78 ± 1.74
Total 10 and 13 features extracted from spectral bands and NDIs of multi-temporal images respectively using AE
94.69 ± 1.28
0.9252 ± 0.0182
91.06 ± 3.12
Approach-I considers spectral bands of each month separately, hence three experiments are performed in this approach. Only surface reflectances of spectral bands are used as features in the SVM classifier. It is evident from Fig. 2 that the spectral curves are more distinguishable in case of Jan-2016 imagery. It has been observed that spectral bands of Jan-2016 imagery provide better performance compared to the spectral bands of other 2 months. This confirms that spectral bands of Jan-2016 imagery are able to identify the crop types with better accuracy. The best average OA (i.e. mean of OAs from 10 random datasets) of 89.19% is achieved with the Approach-I.
In the case of Approach-II, PIC-based optimally selected NDIs are also used as features along with the surface reflectances of spectral bands to classify the crops. This approach is also performed with single-date imagery, i.e. each month’s data are considered separately for classification and therefore three experiments are carried out similar to the previous approach. Here, also dataset of Jan-2016 imagery has performed better compared to the other 2 months images. The band combinations of optimal NDIs for the Jan-2016 imagery are Band 6&2, 5&4, 7&1, 7&6, 6&5. Band combination 5&4 is the popularly used vegetation index NDVI. Although, classification performances are improved by 1.2–2% (approx.) compared to Approach-I (Table 3) with the consideration of optimal NDIs along with surface reflectances of spectral bands, the improvement is not statistically significant (only three trials among ten random trials provide statistically improved performances).
Approach-III deals with the multi-temporal images to incorporate the changes of spectral responses of crops during different growth-stages in the classifier model. The nonparametric conditional dependency measure, PIC has been used for selection of most appropriate spectral bands and NDIs from the multi-temporal dataset conditioned on crop class types. The selected spectral bands and NDIs should have the maximum information content to distinguish different crop classes. In this approach, five spectral bands (band: 2, 3, 5, 6 from Jan-2016 and 7 from Feb-2016) and five NDIs (combination 6&2, 5&4, 7&1, 6&5 from Jan-2016 and 6&1 from Feb-2016) are selected to be used in the classifier model. In this feature selection process, no spectral bands or NDIs are selected from the Dec-2015 imagery, which indicates that the datasets of that month are not efficient enough compared to the other month’s datasets to distinguish the crop types. The achieved average OA with this approach is 90.02%, which is almost the same as the classification performance achieved using Approach-II with the single-date imagery of Jan-2016. The similar performance of the Approach-II and Approach-III concludes that feature selection technique is not able to extract most appropriate spectral bands or NDIs, which can improve the classification performance. Since, crop classification performance is not improved by employing feature selection technique on multi-temporal images, application of feature extraction techniques is encouraged in Approach-IV.
Finally, Approach-IV is introduced in this study, where feature extraction technique PCA is applied on the set of spectral bands and set of NDIs of each month’s dataset separately. First four and five PCs are extracted from the sets of spectral bands and NDIs of Dec-2015 imagery respectively, which explain the ~ 98% variability of the whole dataset. Similarly, the first three and four PCs are extracted from both Jan-2016 and Feb-2016 imagery. Then, all these 23 PCs are used as features in the SVM classifier model. This approach with PCA achieves average OA of 94.32%, which is better than the performances of other three approaches and this improvement is also statistically significant. Additionally, Approach-IV is also evaluated by replacing PCA with ICA, kernel PCA and AE and their classification accuracies are reported in Table 3. It has been observed that performance evaluation measures are statistically similar with all the feature extraction techniques. Since, nonlinear feature extraction techniques are not able to extract extra information from the dataset, it can be concluded that nonlinearity is not prominent in this dataset. Although AE-based features achieve maximum accuracy, PCA being a simple and most popularly used feature extraction technique and its features are being achieved a statistically similar performance will be more justifiable to be use.
In the case of Approach-I and Approach-II (i.e. considering only single-date imagery), Jan-2016 image provide comparatively better performances. The feature selection-based Approach-III optimally selects maximum numbers (i.e. 4) of spectral bands and NDIs from Jan-2016 image only. Approach-III does not select any feature from the Dec-2015 imagery dataset, which may conclude that information content of Dec-2015 image is not important for classification. Therefore, another experiment is performed similar to Approach-IV without considering the PCs of the Dec-2015 dataset and average OA of 91.2% (< 94.32%) is achieved. The performance of this experiment suggests that datasets of all months are important and essential to achieve optimal classification accuracy. Random forest-based feature selection approach is also evaluated over the multi-temporal dataset for selection of optimal spectral bands and NDIs, which are used in the SVM classifier and achieves an average OA of 88.58%. These experiments can draw an important inference that feature extraction techniques are better than feature selection techniques to accomplish most favourable performance. Another experiment is performed considering the combinations of several important vegetation indices, viz., NDVI, Enhanced Vegetation Index (EVI), Atmospherically Resistant Vegetation Index (ARVI), Normalized Difference Moisture Index (NDMI) and Tasselled Cap Greenness (TCG) of the multi-temporal images, as proposed in Fan et al. , in the SVM classifier and this approach achieves an average OA of 87.84%. On analysing the performances of numerous experiments, it is evident that Approach-IV, which is introduced in this study, is providing the best classification accuracy among all the experiments.
The single-date imagery of Jan-2016 can be identified as optimal temporal window, where highest classification accuracy is achieved compared to other two single-date images. The multi-temporal images retain phonological information of different crop types, which helps in improving the classification performance compared to single-date imagery. The multi-temporal NDIs provide additional information along with reflectances of spectral bands to distinguish different types of crops, which is proven to be advantageous in this study. The multi-temporal analysis outperforms the performance of single-date imagery of optimal temporal window.
4.2 Class-Specific Classification Performance Analysis
Confusion matrix prepared from performance of PCA-based Approach-IV
Predicted (output) class
True (target) class
In this study, numerous feature selection and feature extraction techniques are evaluated based on their ability in dimension reduction and prominent feature representation of the multi-temporal spectral bands and NDIs datasets. The representing features of multi-temporal datasets are utilized in SVM classifier to classify different crops. The sets of PCs extracted from individual month’s dataset are able to classify the crops more efficiently compared to the combined dataset of all months. The PCA-based feature extraction approach along with SVM classifier has performed better (OA: 94.32%) compared to other approaches. PCA, being a simple and linear feature extraction technique, is more acceptable and simultaneously the extracted features perform more efficiently in this study. To the best of our knowledge, different feature selection and feature extraction techniques are evaluated first time on the surface reflectances of different spectral bands and NDIs of multi-temporal images. It has been evidenced that multi-temporal images are more efficient compared to the single-date imagery to identify the crops. The proposed approach has performed efficiently for all considered crops except wheat and groundnut because of unavailability of sufficient number of samples, required to train the SVM classifier. Our results demonstrate that if classifier model is trained with appropriate features and sufficient numbers of samples, it can identify different types of crops with an approximate accuracy of 95%.
The limitation of this study is the availability of very few pixels for the wheat, cotton and groundnut crop types. Further, improvement can be done in the classification performance with the availability of adequate number of samples of each class. This study can be extended to state and country scales with future availability of crop information. Crop mapping over a large spatio-temporal scale will be helpful in irrigation water management, crop yield forecasting, insurance planning and food security analysis. Investigation can be further extended to a finer spatial resolution with the availability of continuous spatial and temporal Sentinel-2 datasets.
The authors gratefully acknowledge the FASAL (Forecasting Agricultural output using Space, Agro-meteorology and Land based observations) Programme of Mahalanobis National Crop Forecast Centre (MNCFC), New Delhi, an Attached Office under Dept. of Agriculture, Cooperation & Farmers Welfare, Ministry of Agriculture & Farmers Welfare, Government of India (https://www.ncfc.gov.in/fasal.html) for sharing the crop information over the Karnataka state.
- 11.Hossin M, Sulaiman M (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowledge Manag Process 5:1Google Scholar
- 13.Le Q.V., Karpenko A., Ngiam J., & Ng A.Y. (2011). ICA with reconstruction cost for efficient overcomplete feature learning. In, Advances in neural information processing systems (pp. 1017-1025)Google Scholar
- 19.Roy D, Yan L (2018) Robust Landsat-based crop time series modelling. Remote Sens EnvironGoogle Scholar
- 26.Skakun S, Kussul N, Shelestov AY, Lavreniuk M, Kussul O (2016) Efficiency assessment of multitemporal C-band Radarsat-2 intensity and Landsat-8 surface reflectance satellite imagery for crop classification in Ukraine. IEEE J Select Topics Appl Earth Observ Remote Sens 9:3712–3719CrossRefGoogle Scholar
- 34.Wang Q. (2012). Kernel principal component analysis and its applications in face recognition and active shape models. arXiv preprint arXiv:1207.3538 Google Scholar