1 Introduction

In remote sensing technology, Hyperspectral sensor captures the electromagnetic spectrum from visible to near-infrared wavelengths. Hundreds of small spectral bands are often provided by hyperspectral imaging sensors from the same location on the earth's surface. Each pixel in a HSI can be thought of as a high-dimensional feature vector whose elements correspond to spectral reflectance at a certain wavelength. HSIs have been extensively used in several fields due to their ability to detect small spectral differences. The most challenging subject of research in the hyperspectral imaging community is the classification of HS imaging. It means assigning each pixel to a certain class based on its unique spectral properties which has gotten a lot of interest in the remote sensing sector. There are two key obstacles with HSI classification tasks: (1) the spectral signals' significant geographical diversity, and (2) the little training available samples vs. hyperspectral data's high dimensionality. Many elements, such as variations in illumination, ambient, climatic, and temporal circumstances, contribute to the first challenge. The second difficulty will provide indistinct worries for a number of approaches and impair the capability of classifiers to generalize [1, 2]. The field of hyperspectral imagery has recently progressed due to the rapid development of optics and photonics. As a result, several satellites now include hyperspectral sensors that can create photos with a lot of spectral information. Using satellites, it is easy to differentiate between extremely similar materials and objects with the help of extensive information obtained in hyperspectral photos. A small reconnaissance imaging spectrometer for Mars hyperspectral imaging dataset [3] was examined, and subsequently the mineralogy of the Martian surface was determined using linear mixing of absorption band approaches. It used a visible and near-infrared (VNIR) imaging spectrometer apparatus for mineral exploration in [4], which was a device used for hyperspectral image scanning push broom sensitive to VNIR wave-lengths from 400 to 1000 nm.

Earlier, the classification of hyperspectral imaging research focused mainly on the image's spectral information, and numerous classification algorithms based on this spectrum data were developed, including SVM [5], random forest (RF), and polynomial logistic regression [6]. The first step in concurrently addressing both the lack of label and high dimensionality concerns in hyperspectral picture classification is to investigate the relationship between various pixels. The classification process in hyperspectral pictures can benefit from spatial layout information [6, 7]. Post processing is used to combine the findings of numerous classification/segmentation procedures to create the final spectral-spatial classification map. The geographical or contextual information is not directly incorporated into the classifier in most of the aforementioned HSI picture categorization approaches. Many computers vision tasks have been proposed using sparse representation where the use of sparsity as a prior often leads to better results [8, 9]. Feature reduction, feature extraction, and feature selection are techniques used in machine learning and data analysis to reduce the dimensionality of data and extract relevant features. These techniques can be used to improve the accuracy and efficiency of machine learning models by reducing the complexity of the data.

PCA is a widely used technique for reducing the dimensionality of data by transforming it into a lower-dimensional space while retaining as much of the original variation in the data as possible. PCA works by finding a set of orthogonal linear combinations of the original variables that capture the most variability in the data. These combinations are called principal components and are ordered in terms of their explained variance. PCA can be used to reduce the dimensionality of the data and remove noise, making it easier to visualize and analyze. Linear Discriminant Analysis (LDA) is a technique used for both feature reduction and classification [9, 10]. LDA works by projecting the data onto a lower-dimensional space while maximizing the separation between classes. It is commonly used for dimensionality reduction and feature extraction in supervised classification problems. LDA is based on the assumption that the data are normally distributed and that the classes have equal covariance matrices. Both PCA and LDA are powerful techniques for feature reduction, feature extraction, and selection. PCA is useful for reducing the dimensionality of the data while preserving the most important information, while LDA is useful for maximizing the separation between classes and improving classification accuracy. The choice between PCA and LDA depends on the specific problem and the goals of the analysis [11]. Based on the classification process, hyperspectral image classification is of three categories: supervised classification [12, 13], Semi-Supervised [14,15,16] and unsupervised classification [17]. In the supervised classification, the pixel is labelled according to the least distance constraint for final classification based on the combined sparse coefficients and structured dictionary. Experiments on two genuine HSI datasets demonstrate the method's efficacy [10]. A Kernel ELM for the supervised classification of hyperspectral images has been proposed which produced satisfactory results [13]. Reference [17] provides models for the fully automatic and unsupervised solution of clustering, feature identification, and class number estimation issues at the same time.

The organization of rest of the article is as follows: Sect. 2 illustrates the various research work performed in this field; Sect. 3 demonstrates the proposed system or methodology. Section 4 discusses materials and methods which includes all the datasets and algorithms considered in this study. Section 5 provides experimental results and discussion. Finally, Sect. 6 depicts the conclusion.

2 Literature Review

The utmost critical and significant step in Hyperspectral Image classification is the selection of the appropriate band and obtaining the appropriate result. In [1], stacked auto-encoders, deep belief network, CNN, RNN, and generative adversarial network are just a few of the deep learning models that are frequently used to classify HSIs. Spectral-feature network, spatial-feature network, and other deep networks were utilized in the HSI categorization. In network of spectral-spatial features, each category extracts the most important features that goes with it. It also compared and assessed the results of four classical computational intelligence-based methods and six deep network-based approaches used in HSI classification. The classification accuracies achieved by various methods show that deep learning-based approaches outperform traditional ML approaches in general, and the DFFN, which combines RL and feature fusion, achieves the best results. In [2], in contrast to traditional computer vision tasks that only look at the geographic context, the authors suggested method that can upgrade hyperspectral image categorization using both geographical context and spectral correlation. For hyperspectral image classification, it recommends four novel deep learning models: 2-D-CNN, 3-D-CNN, R-2-D-CNN, and R-3-D-CNN. Six benchmark datasets were used to conduct rigorous studies. The experimental results depict the superiority of the suggested deep learning-based models when compared to other state-of-the-art methods. In [3], it is grouped into six primary areas like data fusion, classification, target identification, and fast computation. The summarization of the current state of the art in each area, present instances, and point to future difficulties and different research areas have been explored.

In [4], the VNIR imaging spectrometer's design and construction are summarized, as well as a novel absorption bands modelling method was employed for data interpretation. We have demonstrated the utility of this methodology in distinguishing between goethite-hematite combinations and identifying non-Fe-bearing minerals in this way. Although these data and methods are only of indirect help in the hunt for life in subsurface drill cores, it can provide essential background on the mineralogical environment. An upcoming Mars remote sub-surface drilling astrobiology mission will require a hyperspectral spectrometer which covers at least the Visible Near Infrared. Reference [5] proposes a latest model that targeted at comprehending and evaluating the capabilities of SVM classifiers in hyper-dimensional feature fields. The efficacy of SVM in comparison to traditional techniques is next evaluated, as well as their performance in hypersubspaces of various dimensionalities.

The performance of SVM is also compared with other two different non-parametric models (RBFNN and the K-nearest neighbor) to support such an analysis. Finally, it investigates the reasonably significant issue of using binary SVM to hyperspectral data in both two and multi-class classification. Four alternatives multiclass strategies are also evaluated and compared. References [6, 7], propose a new supervised segmentation technique for remotely sensed hyperspectral image data that uses a Bayesian framework to incorporate spectral and spatial information. To describe noise and highly diverse pixels, a MLR approach is utilized to study posterior probability distributions from spectral data using a subspace projection method. It has been demonstrated that this method accurately characterizes hyperspectral data in both the spectral and spatial domains. In [8, 9], the effectiveness of two feature extraction approaches is compared. The results may vary depending on the data source, but overall, Folded-PCA outperforms traditional PCA when it comes to extracting essential features from hyperspectral remote sensing photos. This is because Folded-PCA takes into account both global and local structures. As a result, it has established to be both efficient and successful in extracting most relevant features from remote sensing photos. The authors [10,11,12,13,14,15,16,17] discussed about supervised, unsupervised and semi-supervised classification for hyperspectral image data. Although the supervised and unsupervised categorization methods mentioned here have their own set of advantages, still the application of each approach has its own set of constraints as well. In [11], the implementation of MFLDA for dimension reduction of hyperspectral data is addressing some practical challenges related to limited training data and incomplete class knowledge. By modifying the original FLDA to require only the desired class signatures, the MFLDA is able to effectively transform the data into a lower-dimensional space where the desired classes can be easily distinguished. The use of a sparse representation classification technique, which involves sampling on super-pixels, is another interesting approach to addressing the challenges of imprecise context information for hyperspectral images. This technique may be useful for identifying patterns or features in the data that are not immediately apparent, and for improving the accuracy of classification outcomes. Also, the combination of MFLDA and sparse representation classification could be a powerful tool for analyzing and interpreting hyperspectral data, particularly in cases where there is limited training data or incomplete information about the underlying classes.

The classification of hyperspectral imaging has been done using most commonly used feature extraction technique, N-PCA and learning algorithm i.e., Kernel ELM [13]. The findings of a real-world hyperspectral data experiment show that the suggested method in the said article outperforms existing supervised approaches currently in use. A multi-objective particle swarm optimization with three different statistical techniques has been developed [17] to solve the typical challenges of hyperspectral imaging data, i.e., clustering, feature reduction and class number estimation. Both generated and real hyperspectral pictures were subjected to a thorough experimental investigation. In general, the collected findings suggest that, despite its fully unsupervised character, the proposed methodology can generate intriguing classification results. Reference [18] focuses on applying multiple computational intelligence techniques like SVM, RF, LR, KNN, and DT to classify Indian Pines hyperspectral imagery dataset. PCA and MNF were used to lessen the number of unnecessary and noisy bands present in the dataset. To determine the efficacy of the models, many performance indicators such as the confusion matrix, total accuracy, and training duration are taken into account. RF has the highest accuracy and the shortest time among all the models in the referenced research. Reference [19] presents a new RNN model for hyperspectral image classification where hyperspectral pixels are considered as sequential data. In particular, it proposed Parametric Rectified Tanh (PRetanh), a novel developed activation function for RNN hyperspectral data processing that allows for relatively high learning rates without the risk of being trapped in the divergence.

This research work focuses on hyperspectral image categorization utilizing several machine learning approaches such as support vector machine (SVM), K-Nearest Neighbour (KNN), CNN and LSTM. To reduce the number of superfluous and noisy bands in the dataset, Principal Component Analysis (PCA) and Minimum Noise Fraction (MNF) were utilized. Different performance evaluation measures like time taken for testing, classification accuracy, kappa accuracy, precision, recall, specificity, F1_score, and Gmean have been taken to prove the efficacy of the models.

3 Materials and Methods

This study aims to classify different types of landscape fields in five standard hyperspectral imaging datasets: Indian Pines, Salinas, Pavia University, Kennedy Space Centre, and Botswana. These datasets contain a wide variety of land cover types, including dense vegetation, barren land, grasslands, soybean, and wood. Classification of hyperspectral images involves assigning each pixel in an image to a specific class or category based on the spectral information in the image. This can be a challenging task due to the high dimensionality of hyperspectral data and the presence of noise and other sources of variability. To address these challenges, the study may use a machine learning approach such as CNNs, SVMs, or KNNs to classify the hyperspectral data. These algorithms can be trained on a set of labeled data to learn the patterns and features associated with each land cover type, and then applied to new, unlabeled data to make predictions about the land cover. For the experimentation, three supervised classification models i.e., SVM, KNN, CNN, LSTM are taken into consideration. Further, to reduce the huge dimensionality present in the datasets, MNF and PCA has been considered. Entire process is pictorially represented in Fig. 1.

Fig. 1
figure 1

Flow diagram of proposed model

To read the dataset and to conduct a series of operations, the following hardware specification Memory/ RAM–32 GB, SSD–1tb, HDD–1 TB, GPU–16 GB, CPU–i7—10,700 2.90ghz have been used. Python 3.8.5 has been considered for all the implementation.

3.1 Datasets

The hyperspectral dataset [20] contains a series of bands and the geographical landscapes are described along different classes. Each class specifies a unique kind of data present in that area. To view the complete dataset, all bands or certain no. of bands are to be stacked upon one another to visualise the data. Certain classes of data can only be seen after stacking some specific set of bands together which might be different for some different set of bands. During classification, some bands do not provide any relevant data information due to which they are discarded from the dataset.

3.2 Indian Pines

The Indian pines data have been captured in north-western Indiana. This data comprise 224 reflectance bands. This data are mixed with agriculture, forests, and recurrent vegetation. The ground truth of this data consists of 16 individual classes each consisting of various sample numbers. In Fig. 2 sample bands of the dataset are shown.

Fig. 2
figure 2

Sample bands of Indian pines

3.3 Salinas

Salinas dataset uses an AVIRIS sensor for capturing images. The AVIRIS sensor produces 224 reflectance bands. The geographical location of this data is at the Salinas Valley, in California. Bare soils, vineyard fields, and vegetables are part of this valley. All this vegetation is distributed with 16 bands present in the ground truth. Figure 3 shows a selection of bands from the Salinas dataset.

Fig. 3
figure 3

Sample bands of salinas

3.4 Pavia University

The Pavia university data were collected from the Pavia region situated in northern Italy. This data comprise one hundred and three spectral bands and the ground truth information is classified into nine different classes. Figure 4 illustrates a sampling of bands from the Pavia University dataset.

Fig. 4
figure 4

Sample bands of Pavia University

3.5 Kennedy Space Center

The Kennedy Space Centre, situated in Florida is the geographical source for this data. This data was collected with the NASA AVIRIS instrument, hence consists of 224 bands. The environmental land coverings were classified into 13 classes. Figure 5 depicts a sample of bands from the KSC dataset.

Fig. 5
figure 5

Sample band of Kennedy space center

3.6 Botswana

The Botswana data have been collected by NASA EO-1 satellite from the Okavango Delta region at Botswana. This data comprise 242 bands which was filtering was reduced to 145 bands. The landscape there consists of swamps, occasional swamps, and drier woodlands which have been classified within 14 identified classes. A sample of bands from the Botswana dataset is shown in Fig. 6.

Fig. 6
figure 6

Sample band of Botswana

For the execution of the project, the image file along with its ground truth is provided to the model and the data are stored in matrix format. The shape of the matrix is detected which will be used for future reference. Further, the pixels of the image are extracted and stored in a CSV format. The CSV file helps to gather information about the dataset. This includes the type of data, the no. of rows and columns, memory usage, etc. The dataset details are demonstrated in Table 1.

Table 1 Dataset details

3.7 Pre-processing

Data pre-processing is most important step before classifying the data. The pre-processing techniques are used to extract the maximum relevant data information during classification. The data pre-processing helps reduce excess noise without tampering with the original information. This is done using two methods: PCA and Minimum Noise Fraction.

3.8 Principal Component Analysis

PCA is typically a non-parametric and unsupervised statistical learning technique that is largely used in machine learning to reduce the redundant or unnecessary features present in the datasets. PCA is implemented to reshape the data into a necessary dimension [21]. After dimensionality reduction, the model is fitted with the image matrix and dimensionality reduction is imposed on that matrix.

3.9 Minimum Noise Fraction

Denoising hyperspectral imaging with MNF is a well-known approach [22, 23]. It transforms a hyperspectral noisy data into required output images with constantly decreasing noise levels. The remote sensing data also contain a lot of high-quality noise patterns, lowering the classification accuracy which is why noise extraction are critical in such situations. The data were initially denoised using the denoise function, and then processed for classification in this model.

3.10 Classification

In this sub-section, all the machine learning approaches, CNN, SVM, KNN, are briefly described.

3.10.1 Convolutional Neural Network

CNN, a deep neural network technique which is mostly applied in image classification. In addition to the simple neural network (NN) [24,25,26,27,28], the deep neural network (DNN) comprises of more hidden layers. The addition of more hidden layers into a neural network helps in enhancing the classification accuracy [29,30,31,32,33,34,35] and reducing the processing time. In Fig. 7 the basic CNN diagram is represented.

Fig. 7
figure 7

CNN diagram for proposed model

The dataset is filtered with MNF [14, 36,37,38,39,40,41,42] to reduce the noise to the least possible which has helped in increasing the accuracy without affecting any other details. By processing the dataset with MNF, a noise reduces map is generated [43,44,45,46,47,48]. This map has limited noise which helps in deriving more clarity and accuracy. The processed data are used by CNN. The Convolutional Neural Network processes the data through multiple layers to generate the best result which is pictorially represented in Fig. 7.

To do so, the data are split into two sets, one for training and another for testing. Both the sets are fed into the classification models as input. This is done with the help of test_train_split, a scipy function that splits the data into 4 different sets (the X_train, Y_train, X_test, and the Y_test) [48,49,50,51,52]. These are the training and testing dataset based on which the accuracy is determined.

3.10.2 Support Vector Machine

SVM is a computational intelligence model that is primarily used for classification and also sometimes used for regression. The goal of SVM is to calculate a plane in the multidimensional space where each dimension represents independent features. The hyperplane classifies the data points present on both sides. The challenge of using SVM is that multiple hyperplanes can be configured based on the data amongst which the only one of the hyperplanes must be used. The hyperplane which is present at the maximum distance between the data points is selected [24,25,26]. Based on three different kernels, the SVM classifier can categorise and predict data. The SVM diagram is shown in Fig. 8.

Fig. 8
figure 8

SVM diagram

Linear Kernel: SVM's most basic kernel is the linear kernel. This kernel categorizes data in the simplest binary way possible. The linear kernel is the fastest of the kernels, however it has the lowest prediction accuracy. The Linear Kernel formula is represented by Eq. (1).

$$ K \, (X,Y)\, = \,X^T Y $$
(1)

Where, X;Y = two input values.

Polynomial Kernel::The polynomial kernel is equivalent to the linear kernel; however, it does not need binary classification to describe it. It can predict data in polynomial classes and in any order. Its forecast accuracy and speed are both averagely specified. The polynomial kernel formula is written as Eq. (2).

$$ K \, \left( {X, \, Y} \right)\, = \,(\gamma \cdot X^T Y\, + \,c)^d , \, \gamma \, > \,0 $$
(2)

Where, d = degree of the polynomial.

c = constant.

γ = free parameter.

RBF Kernel: The data points are surrounded by curves in the Radial Basis Function kernel. The usage of curves allows for more flexibility in data classification, resulting in more accurate data predictions. The RBF kernel is the most accurate, however it is also the slowest of all the kernels. The RBF Kernel formula is represented by Eq. (3).

$$ K \, \left( {X, \, Y} \right)\, = \,\exp \, (\left\| {X\, - \,Y} \right\|^2 /2\sigma 2) $$
$$ K \, \left( {X, \, Y} \right)\, = \,\exp \, ( - \,\gamma \cdot \left\| {X\, - \,Y} \right\|^2 ), \, \gamma \, > \,0 $$
(3)

Where, ||XY||= Euclidean distance γ = parameter.

3.11 K-Nearest Neighbour

The K-Nearest neighbor classifier classifies objects by assuming that similar objects are present nearby. In the Fig. 9, the value of k is stated as six. Thus, only six nearest neighboring values inside the circle are revealed. Various distance measuring methods are there in KNN. They are Euclidean distance, Manhattan distance, Hamming distance, Minkowski distance etc. In the current study, Euclidean distance measurement technique is considered. So before classifying the objects the classifier first calculates the distance of neighboring particles from the targeted object. The classification is done accordingly. In simple words, KNN assumes the identity of an object by judging its neighbours [27, 28, 47]. In Eq. (4), the parameters are, input attributes (n), the square root of the sum of squared differences between a new point (x) and an existing point (y).

$$ \sqrt {{\mathop \sum \limits_{i = 1}^n (x_i - y_i )^2 }} $$
(4)
Fig. 9
figure 9

KNN diagram

3.12 Long Short-Term Memory (LSTM)

LSTM [35, 39] is a type of RNN architecture that has a chain-like structure with multiple repeating LSTM units, each comprising several memory blocks known as cells as depicted in Fig. 10.

Fig. 10
figure 10

LSTM chain structure

LSTMs are designed to learn and retain information over long sequences of data. In an LSTM network, there are memory cells that are able to store information over time. The memory cells are connected to gates that control the flow of information into and out of the cells. The four gates are key components of the LSTM architecture:

Forget gate: This gate determines which information should be forgotten from the previous time step. It takes as input the previous memory cell value and the current input, and produces an output between 0 and 1 for each element of the memory cell. An output of 0 means "forget this information", while an output of 1 means "keep this information".

Input gate/Learn gate: This gate determines which information should be added to the memory cell. It takes as input the previous memory cell value and the current input, and produces an output between 0 and 1 for each element of the memory cell. An output of 0 means "don't add this information", while an output of 1 means "add this information completely".

Output gate/Remember gate: This gate determines which information should be output from the memory cell at the current time step. It takes as input the current memory cell value and the current input, and produces an output between 0 and 1 for each element of the memory cell. This output is multiplied by the current cell value to produce the output of the LSTM cell.

Use gate: This gate determines how much of the current memory cell value should be used to make the prediction. It takes as input the current memory cell value and the current input, and produces an output between 0 and 1. This output is multiplied by the memory cell value to produce the prediction.

Using these gates to control the flow of information, an LSTM network is able to selectively remember or forget information over long sequences of data as given in Fig. 11. This makes them well-suited to tasks such as language modeling, speech recognition, and sentiment analysis [18, 19].

Fig. 11
figure 11

LSTM diagram for proposed model

Mathematically, an LSTM structure evaluates a mapping using an input sequence.

After the forget gate determines the degree of information to remember or discard, the LSTM cell moves on to determine the new information to store in the cell state. It takes into account the input xit and previous cell state Ct−1 and outputs the degree of information which is to be remembered. ft can be defined as:

$$ f_t \, = \,\sigma (W_f \cdot [H_{it - 1} ,x_{it} ]\, + \,b_f ) $$
(5)

where Hit−1 is the output from the previous state.

Now, after this, it is important to determine the new information to store it in the cell which makes the new cell state. It is performed in two main steps:

A sigmoid layer, also known as the input gate layer, determines which values from the input and previous hidden state will be selected for updating the cell state. This is computed using the equation.

$$ l_t \, = \,\sigma (W_l \cdot [H_{it - 1} ,x_{it} \, + \,b_i ]) $$
(6)

where [Hit − 1, xit] is the concatenation of the previous hidden state and the current input.

A tanh layer forms a vector of new candidate values \(\hat{C}\) that are selected to add in further states. This is computed using the equation.

$$ \hat{C}\, = \,\tanh (W_C [H_{it - 1} ,x_{it} ]\, + \,b_C ) $$
(7)

where [Hit − 1, xit] is the concatenation of the previous hidden state and the current input.

Now The new information is then added to the old cell state Ct − 1 to form the new cell state Ct using element-wise multiplication (Hadamard product) of the forget gate output ft and the old cell state Ct − 1, and element-wise multiplication of the input gate output lt and the candidate values Ĉt.with ft and adding

$$ l_t \, \times \,\hat{C}:\, = \,f_t \circ C_{t - 1} \, + \,l_t \circ \hat{C}_t $$
(8)

Finally, the output is computed by passing the updated cell state Ct through a tanh activation function and multiplying it by the output gate output Ot, which is computed using the equation:

$$ O_{{\text{it}}} = \left( {\left[ {h_{{\text{it}}} \, - \,1,x_{{\text{it}}} } \right]} \right) + b_{{\text{output}}} $$
(9)

where [ℎit−1, xit] is the concatenation of the previous hidden state and the current input,

and Woutp is the output gate weight matrix

The final output of the LSTM cell is the updated hidden state, computed using the equation.

$$ h_{it} \, = \,O_t \, \times \,\tanh (C_t ) $$
(10)

4 Results and Discussion

To achieve the result acquired from the classification of hyperspectral pictures, the results of different classifiers are compared. In this study, classifiers taken into account are SVM, KNN, and CNN. Minimum Noise Fraction and other feature selection algorithms such as PCA have been used to minimize noise and select the most suitable bands to avoid the computational burden [48,49,50,51,52]. To assess the effectiveness of the models, various performance measures techniques such as classification accuracy, testing time, Precision, Recall, Specificity, F_score and Gmean are used. According to the simulation data, CNN + MNF outperformed with maximum accuracy percentage within less amount of time for all the datasets.

The overall accuracy [32,33,34,35,36,37,38,39] refers to the percentage of correctly predicted sample points compared to the total number of samples. As shown in the equation, taking into account the following parameters: Specificity, Sensitivity or Recall, Precision and F1_Score (5–8). Accuracy, Kappa accuracy and G-mean are calculated. The accuracy and G-mean is measured using Eqs. (1115).

$$ {\text{Sensitivity }}\left( {{\text{or}}} \right){\text{ Recall}}\, = \,{\text{TP/}}\left( {{\text{TP}}\, + \,{\text{FN}}} \right) $$
(11)
$$ {\text{Specificity}}\, = \,{\text{TN}}/\left( {{\text{FP}}\, + \,{\text{TN}}} \right) $$
(12)
$$ {\text{Precision}}\, = \,{\text{TP}}/\left( {{\text{TP}}\, + \,{\text{FP}}} \right) $$
(13)
$$ F{1}\_{\text{Score}}\, = \,{\text{2 TP}}/\left( {{\text{2TP}}\, + \,{\text{FP}}\, + \,{\text{FN}}} \right) $$
(14)
$$ {\text{Accuracy}}\, = \,{\text{TP}}\, + \,{\text{TN}}/\left( {{\text{TP}}\, + \,{\text{FP}}\, + \,{\text{TN}}\, + \,{\text{FN}}} \right) $$
(15)
$$ {\text{Kappa Accuracy}}\, = \,\left( {{\text{Total Accuracy}}-{\text{Random Accuracy}}} \right)/\left( {{1} - {\text{Random Accuracy}}} \right) $$
(16)
$$ {\text{G-mean }} = \sqrt[2]{{{\text{Sensitivity}} \times {\text{Specificity}}}} $$
(17)

In Tables 2 and 3 model performance is tabulated. The data present in Table 2 are represented graphically in Fig. 15.

Table 2 Kappa accuracy, average accuracy and time taken for all the models
Table 3 Precision, recall, specificity, F1_score and Gmean of all the models

The classification map obtained for all the datasets is illustrated in Fig. 12 for Indian Pines, Fig. 13 for Salinas, Fig. 14 for Pavia University, Fig. 15 for Botswana, and Fig. 16 for Kennedy Space Centre.

Fig. 12
figure 12

Classification maps obtained from Indian pines a Actual map, b Ground truth, c MNF + SVM, d PCA + SVM, e MNF + KNN, f PCA + KNN, g CNN + MNF, h CNN + PCA, i LSTM + MNF, j LSTM + PCA

Fig. 13
figure 13

Classification maps obtained from salinas a Actual map, b Ground truth, c MNF + SVM, d PCA + SVM, e MNF + KNN, f PCA + KNN, g CNN + MNF, h CNN + PCA, i LSTM + MNF, j LSTM + PCA

Fig. 14
figure 14

Classification maps obtained from Pavia University a Actual map, b Ground truth, c MNF + SVM; d PCA + SVM, e MNF + KNN, f PCA + KNN, g CNN + MNF; h CNN + PCA, i LSTM + MNF, j LSTM + PCA

Fig. 15
figure 15

Classification maps obtained from KSC (Kennedy space center) a Actual map, b Ground truth, c MNF + SVM, d PCA + SVM, e MNF + KNN, f PCA + KNN, g CNN + MNF, h CNN + PCA, i LSTM + MNF, j LSTM + PCA

Fig. 16
figure 16

Classification maps obtained from Botswana a Actual map, b Ground truth, c MNF + SVM, d PCA + SVM, e MNF + KNN, f PCA + KNN, g CNN + MNF, h CNN + PCA

4.1 Accuracy Comparison

From Figs. 17 and 18 based on the simulation results, it can be concluded that the LSTM model combined with the MNF feature extraction technique consistently performs slightly better than the CNN model with MNF across most of the datasets. The LSTM model achieves the highest accuracy percentages and consumes the shortest amount of time compared to the other models. Specifically, for the Indian Pines dataset, the LSTM + MNF model achieves an accuracy of 99.06%. For the Salinas dataset, it achieves an accuracy of 99.01%. The Pavia University dataset obtains an accuracy of 99.67% using the LSTM + MNF model. The KSC dataset achieves an accuracy of 99.83%, and the Botswana dataset achieves the highest accuracy of 99.92% using the LSTM + MNF model. Therefore, based on the simulation results, it is observed that the LSTM model outperforms the other models in terms of accuracy percentage and time consumption, making it the most effective model for classifying hyperspectral imaging datasets. To compare the accuracy, we have compared LSTM model with other models like, Random Forest Decision Tree, [26, 27] Gaussian Naive Bayes, [27] Quadratic Discriminant Analysis, Logistic Regression [27], SVM, KNN, CNN, proposed by other researchers are taken into considerations as shown in Tables 4, 5, 6, 7, 8. The factors used for dimensionality reduction we have used PCA and MNF, and also compared with Factor Analysis (FA), [32] Independent Component Analysis (ICA) [27], Linear Discriminant Analysis (LDA), [36] Truncated SVD. Comparison on classification accuracies of proposed mode with different existing models using different dataset is shown in Fig. 19.

Fig. 17
figure 17

Graph for accuracy comparison (Kappa Accuracy, Average Accuracy and Time Taken)

Fig. 18
figure 18

Graph for accuracy comparison (Precision, Recall, Specificity, F1_Score and GMean)

Table 4 Comparison on classification accuracies of proposed mode with Indian pines dataset
Table 5 Comparison on classification accuracies of proposed mode with Pavia University dataset
Table 6 Comparison on classification accuracies of proposed mode with Salinas dataset
Table 7 Comparison on classification accuracies of proposed mode with Kennedy Space Centre dataset
Table 8 Comparison on classification accuracies of proposed mode with Botswana dataset
Fig. 19
figure 19

Comparison on classification accuracies of proposed mode with different existing models using different dataset

5 Conclusions and Recommendations

The research proposes using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) models for classifying five different hyperspectral imaging datasets: Indian Pines, Salinas, Pavia University, Kennedy Space Centre, and Botswana. These datasets contain images captured at different bandwidths, stacked together to form a single image. To improve model performance and reduce noise, Principal Component Analysis (PCA) and Minimum Noise Fraction (MNF) techniques are used to reduce dimensionality and enhance relevant information. The results of the simulations show that both LSTM and CNN-based models outperform traditional models for all datasets. MNF consistently yields better results compared to PCA for both the LSTM and CNN models. Moreover, combining LSTM with MNF performs slightly better than CNN with MNF for most datasets. In terms of accuracy, the LSTM model with the MNF approach consistently outperforms other models across all five datasets, including Random Forest, Decision Tree, Gaussian Naive Bayes, Quadratic Discriminant Analysis, Logistic Regression, SVM, KNN, and CNN. Furthermore, different dimensionality reduction techniques were compared, including PCA, MNF, Factor Analysis (FA), Independent Component Analysis (ICA), Linear Discriminant Analysis (LDA), and Truncated SVD. Regardless of the dimensionality reduction technique used, the LSTM model with the MNF approach consistently showed the best accuracy.

The research recommends the use of LSTM and CNN models with the MNF feature extraction technique for hyperspectral imaging dataset classification. Both deep learning models outperform traditional methods, and MNF consistently yields better results than PCA. Combining LSTM with MNF is suggested for enhanced performance. Furthermore, the research emphasizes the evaluation of various performance measures and suggests considering metrics such as accuracy, kappa accuracy, precision, recall, specificity, F1_score, and G-mean. To improve generalization, it is advised to explore the applicability of the proposed models and techniques on other datasets. Additionally, researchers can investigate alternative deep learning architectures or feature extraction methods for potential performance enhancements. Also computational efficiency, especially testing time, should be considered alongside performance when deploying these models in real-world applications. So the research provides valuable insights and recommendations for achieving improved hyperspectral imaging dataset classification results using LSTM and CNN models with the MNF feature extraction technique. Adhering to these guidelines can lead to advancements in this field and more accurate classifications with the same datasets.