Two-stage multi-dimensional convolutional stacked autoencoder network model for hyperspectral images classification

Bai, Yang; Sun, Xiyan; Ji, Yuanfa; Fu, Wentao; Zhang, Jinli

doi:10.1007/s11042-023-16456-w

Two-stage multi-dimensional convolutional stacked autoencoder network model for hyperspectral images classification

Open access
Published: 16 August 2023

Volume 83, pages 23489–23508, (2024)
Cite this article

Download PDF

You have full access to this open access article

Multimedia Tools and Applications Aims and scope Submit manuscript

Two-stage multi-dimensional convolutional stacked autoencoder network model for hyperspectral images classification

Download PDF

Yang Bai^1,2,
Xiyan Sun^1,2,3,
Yuanfa Ji^1,2,
Wentao Fu^1,3 &
…
Jinli Zhang¹

722 Accesses
3 Citations
Explore all metrics

Abstract

Deep learning models have been widely used in hyperspectral images classification. However, the classification results are not satisfactory when the number of training samples is small. Focused on above-mentioned problem, a novel Two-stage Multi-dimensional Convolutional Stacked Autoencoder (TMC-SAE) model is proposed for hyperspectral images classification. The proposed model is composed of two sub-models SAE-1 and SAE-2. The SAE-1 is a 1D autoencoder with asymmetric structre based on full connection layers and 1D convolution layers to reduce spectral dimensionality. The SAE-2 is a hybrid autoencoder composed of 2D and 3D convolution operations to extract spectral-spatial features from the reduced dimensionality data by SAE-1. The SAE-1 is trained with raw data by unsupervised learning and the encoder of SAE-1 is employed to reduce spectral dimensionality of raw data. The data after dimension reduction is used to train the SAE-2 by unsupervised learning. The fine-tuning of SAE-2 encoder and the training of classifier are implemented simultaneously with small number of samples by supervised learning. Comparative experiments are performed on three widely used hyperspectral remote sensing data. The extensive comparative experiments demonstrate that the proposed architecture can effectively extract deep features and maintain high classification accuracy with small number of training samples.

Classification of hyperspectral images by deep learning of spectral-spatial features

Article 12 June 2020

A shallow network for hyperspectral image classification using an autoencoder with convolutional neural network

Article 15 September 2021

AECNN: Autoencoder with Convolutional Neural Network for Hyperspectral Image Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Hyperspectral images (HSIs), which are comprised of hundreds of spectral bands and provide rich spectral and spatial information, are widely used in agriculture [24], environmental monitoring [32], mineral exploration [27], military and security [31], astronomy [13], medicine [25], chemistry [34], urban planning [38], etc. For these applications, the HSIs classification, which is to specify specific class for each pixel, is an important basic task. Because the effectiveness of all applications is directly affected by the classification accuracy. Unfortunately, the unbalance between the high dimensionality of spectral bands and the limited number of labeled samples make it very difficult to improve classification accuracy. On the one hand, the explosion of dimensionality not only provides abundant spectral information, but also contains enormous redundant information and noise, which makes the classification accuracy not increase but also decrease. This phenomenon is known as curse of dimensionality. On the other hand, the high cost of labeling samples results in a small number of labeld samples for training model. Therefore, how to extract deep discriminative features from a small number of training samples become a key step of HSI classification tasks [21, 35].

The traditional feature extraction(FE) methods consists of band selection(BS) and dimensionality reduction(DR). The purpose of BS is to select a subset from all spectral bands, which contains not only smaller dimensions, but also enough features representing the raw data for classification [11, 30, 37]. The purpose of DR is to find a lower dimensional representation of raw high dimensional data according to some mapping algrorithms, such as principal component analysis(PCA) [9, 18, 22, 41, 45], linear discriminant analysis(LDA) [7, 16, 19, 28, 43], morphological attribute profiles(MAPs) [2, 8, 10, 23, 40], etc. In the various BS algorithms, only the features of the subset bands are used for classification, that is, the features of other bands are discarded, so it will cause the waste of valuable feature information. The DR algorithms are mainly based on handcrafted features, so only shallow features can be obtained. Due to the inability to obtain deep features, it is difficult for traditional classification methods to further improve the classification accuracy.

In recent years, deep learning(DL) has shown amazing ability in deep feature extraction and achieved great successs in machine vision. So researchers are inspired to introduce the DL models into HSI classification. The DL models for HSI classification mainly include deep belief network(DBN), convolutional neural network(CNN) and autoencoder(AE), etc. Chen [5] proposed a novel deep model architecture for HSI classification, which combined the PCA for dimensionality reduction, the DBN model for spectral feature extraction and logistic regression as a classifier. Ghassemi [12] proposed a HSI classification framework in which the DBN was applied to extract spectral-spatial features. Because the DBN is a one-dimensional(1D) model, it is necessary to expand the two-dimensional(2D) spatial data into 1D vectors before extracting spatial features. The above-mentioned flatten processing of spatial features will cause the loss of spatial features and limits the improvement of classification accuracy.

CNN models, which is the most widely used DL model for HSIs classification, mainly contains three categories: 1D-CNN, 2D-CNN and 3D-CNN [1]. Hu [39] proposed a 1D-CNN model, which consisted of a convolutional layer, a max pooling layer and a full connection layer, for HSI classification with spectral features only. Li [20] proposed a pixel-pair 1D-CNN method combining the spectral and spatial information as the input of model to imrpove the classification accuracy. Yue [44] presented a framework, which consisted of PCA for dimensionality reduction, a deep 2D-CNN for spectral-spatial feature extraction and a logistic regression classifier. Yu [42] introduced a deconvolution layer into a deep 2D-CNN model to enhance the extracted features from raw data. Haque [14] proposed a multi-scale 2D-CNN model named PCA-MS-CNN for HSIs classification. Li [21] proposed a lighter 3D-CNN framwork, which consisted of 3D convolution layers and full connection layers. Roy [29] proposed a hybrid CNN consisting of a spectral-spatial 3D-CNN followed by a spatial 2D-CNN. Zhang [46] proposed a Attention-Dense-HybridSN network based on 3D-CNN and 2D-CNN. In the network, a 3D-Dense block was used for extracting spectral-spatial features, and the channel and spatial attention were introduced to refine the extracted features. Because the 1D-CNN and 2D-CNN models cannot extract the spectral-spatial joint features of HSI data, the HSI classification methods based on 1D-CNN and 2D-CNN will lead to the loss of effective information. The 3D convolution kernel structrually matches the 3D cube data, so it can be used to extract spatial-spectral joint features. In addition, all above-mentioned CNNs are supervised learning models and the satisfactory classification accuracy(CA) can be obtained with sufficient labeled training samples, but the CA will decline rapidly when the training samples is few.

In recent years, AE as a unsupervised learning model has gained much attention. Chen [4] proposed three 1D-SAE models which were used for HSIs classification with spectral information, spatial information and spectral-spatial features respecively. Palma [17] proposed a hybrid unsupervised model based on 1D stacked AE(SAE) by introducding CNN in the training process of encoder and decoder. Mei [26] proposed a 3D convolutional autoncoder(3D-CAE), which consisted of a encoder with 3D convolutional operations only to maximally explore spatial-spectral information and a decoder to reconstruct the raw data. Sun [33] proposed a multi-scale 3D-CAE model composed of 3D convolutional layers and deconvolutional layers. The AE is composed of an encoder which can learns a representation for input data without labeled samples and a decoder which is used to resconstruct the input data.

Targeting the problem that the classification accuracy of models declines significantly with the decrease of the number of training samples, a novel deep learning framwork named Two-stage Multi-dimensional Convolutional Stacked Autoencoder(TMC-SAE) for HSI classification is proposed in this paper. The main contributions of this paper are summarized as follows.

(1)
The TMC-SAE model was proposed for classification of hyperspectral remote sensing images. The highest classification accuracy was achieved with small number of training samples compared to other state-of-the-art models.
(2)
The TMC-SAE consists of two independent stacked autoencoders SAE-1 and SAE-2. They are trainded independently by unsupervised learning. This architecture not only makes that the depth of SAE-1 and SAE-2 is not too large, but also ensures that TMC-SAE can extract depth features from HSIs.
(3)
The SAE-1 is designed to be a 1D asymmetric SAE for spectral dimentionality reduction. The encoder of SAE-1 with 5 layers contains more trainable parameters than the decoder with 3 layers. This makes the feature extraction ability of the encoder obtain more attention during training.
(4)
The SAE-2 is designed to be a hybrid network with 3D convolution and 2D convolution operations. The deep spatial-spectral-joint features extracted by SAE-2 make sure that the classification accuracy remains high when the number of training samples is small.

The remaining part of this paper is organized as follows. The related theoretical basis is described in Section 2. The framework details of TMC-SAE is presented in Sections 3 and 4. The experimental results over three benchmark hyperspectral datasets are shown in Section 5. Finally, conclusion are drawn in Section 6.

2 Related works

2.1 Stacked autoencoder

Figure 1 shows the general architecture of the autoencoder(AE), which consists of an encoder and a decoder. The function of the encoder is to extract the features of the input data and reduce the dimensionality of the data. The purpose of the decoder is to reconstructes the original data from the features extracted by the encoder.

During training, the encoder maps the input $X\in {R}^{h}$ to low dimensional representations $Y\in {R}^{i}$ through some algorithm and the decoder recovers $\widetilde{X}\in {R}^{h}$ from $Y\in {R}^{i}$ through inverse transformation. The purpose of training is to minimize the error between $X$ and $\widetilde{X}$. This stage can be formulated mathematically as

$$\begin{array}{l}Y=f(W_eX+b_e)\\\widetilde X=g(W_dY+b_d)\\\arg\;\min\lbrack loss(X,\widetilde X)\rbrack\end{array}$$

(1)

where ${W}_{e}$, ${b}_{e}$ and $f\left(\cdot \right)$ denote the weights, bias and activation function of encoder respectively, ${W}_{d}$, ${b}_{d}$ and $g\left(\cdot \right)$ denote the weights, bias and activation function of decoder respectively.

During testing, only the encoder is adopted for feature extraction, and the features extracted by encoder are fed into the classifier for classification as shown in Fig. 2. The decoder is only used to obtain reconstructed data during training phase. The reconstructed data is closer to the input data, it is considered that the features are more representative.

An AE which encoder and decoder contain more than one layer neural network is called a stacked autoencoder(SAE). In general, the number of operation layers in encoder and decoder are equal and the operations of decoder and encoder are inverse. In other words, the encoder and decoder are structurally symmetrical. The symmetrical structure makes SAE easy to be constructed. However, it is difficult to increase the depth of SAE because when the encoder is added one layer, the decoder must be added one layer, which makes the number of SAE layers be increased by 2. In order to improve the depth of encoder, an asymmetric structure of SAE is proposed, where the number of layers in decoder is smaller than that in encoder. This makes that there are more layers and trainable parameters in encoder to extract deep features for classification.

2.2 2D and 3D convolution

The 2D convolution and 3D convolution, which principle is shown in Fig. 3, are basic operations for extracting features in convolutional neural networks.

In the 2D convolution operation, input data is convolved with 2D kernels. The output data ${y}_{i,j}^{x,y}$ at spatial position $\left(x,y\right)$ in the jth feature map of the ith layer is denoted as

$${y}_{i,j}^{x,y}=f\left(\sum_{m}\sum_{p=0}^{{W}_{1}-1}\sum_{q=0}^{{W}_{2}-1}{w}_{i,j,m}^{p,q}{v}_{(i-1),m}^{(x+p)(y+q)}+{b}_{i,j}\right)$$

(2)

where $m$ is the index of the feature maps in the $\left(i-1\right)$th layer, ${w}_{i,j,m}^{p,q}$ is the weight of position $\left(p,q\right)$ connected to the mth feature map, ${W}_{1}$ and ${W}_{2}$ are the width and height of the kernel, ${b}_{i,j}$ is the bias for the jth feature map in the ith layer and $f\left(\cdot \right)$ is the activation function. Through 2D convolution operations, deep spatial features of input data can be extracted into output data.

In the 3D convolution operation, input data is convolved with 3D kernels. The output data ${y}_{i,j}^{x,y,z}$ at position position $\left(x,y,z\right)$ of the jth feature map in the ith layer is given by

$${y}_{i,j}^{x,y,z}=f\left(\sum_{m}\sum_{p=0}^{{W}_{1}-1}\sum_{q=0}^{{W}_{2}-1}\sum_{r=0}^{{W}_{3}-1}{w}_{i,j,m}^{p,q,r}{x}_{(i-1),m}^{(x+p)(y+q)(z+r)}+{b}_{i,j}\right)$$

(3)

where ${w}_{i,j,m}^{x,y,z}$ is the weight of position $\left(p,q,r\right)$ connected to the mth feature map in the ith layer, ${W}_{3}$ is the size of kernel along toward spectral dimension, and other parameters are the same as the Eq. (2). The structure of 3D kernel is consistent with that of HSI data cube, so 3D convolution operations can extract spatial and spectral features simultaneously.

3 Proposed TMC-SAE

3.1 Framework of the proposed TMC-SAE

In this paper, the TMC-SAE is proposed for HSI classification. As shown in Fig. 4, the TMC-SAE is composed of two stacked autoencoders(SAE) SAE-1 and SAE-2 respectively and a classifier. Both SAE-1 and SAE-2 contain a encoder and a decoder. The function of encoders and decoders are to extract features and reconstruct input data respectively. The decoders are designed only for training the encoders and not for classification. The network for classification is composed of the SAE-1 encoder, SAE-2 encoder, and classifier. The structures and training details of SAE-1, SAE-2 and classifier will be described in below.

The SAE-1 is a 1D SAE with asymmetric structure as shown in Fig. 5, in which the encoder and decoder are based on full connection(FC) layers and 1D convolutional layers respectively. The purpose of this asymmetric structre is to make the encoder contains more trainable parameters than decoder to improve its ability of feature extraction.

The encoder of SAE-1 consists of five FC layers which contain k1, k2, k3, k4, k5 neurons respectively and each FC layer is followed by a batch normalization(BN) layer, activation layer with ReLU activation function and dropout layer(rate = 0.5). The decoder of SAE-1 is composed of three 1D deconvolution(DC) layers and each DC layer is followed by a BN layer and activation layer.

It is assumed that the raw HSI data is represented by $\boldsymbol X\in {\mathbb{R}}^{M\times N\times B}$, where $M$ and $N$ are the height and width of the image and $B$ is the number of spectral bands. After the dimension reduction of spectral by encoder, the pixel data vector $x\in {\mathbb{R}}^{B}$ is mapped to the feature vecotr $h$ with k5 dimensionality. The trained encoder will be used to reduce the dimension of raw HSI data and the output of encoder with size of $M\times N\times k5$ will be taken as the input of ASE-2. The encoder of SAE-1 reduces the number of spectral bands from $B$ to k5 while maintaining the same spatial dimensions.

A hybrid network SAE-2 is proposed to further extract spectral-spatial features from the data after dimension reduction by encoder of SAE-1. The framework of SAE-2 is shown in Fig. 6. It consists of a encoder, which stacks three 3D convolution layers and three 2D convolution layers to extract spatial-spectral features simultaneously, and a companion decoder, which is composed of three 3D deconvolution layers and three 2D deconvolution layers to reconstruct the input data from the features extracted by the encoder.

The SAE-1 encoder output $X\in {\mathbb{R}}^{M\times N\times k_5}$ is divided into the 3-D neighboring patches $\boldsymbol P\in {\mathbb{R}}^{S\times S\times k_5}$, which is taken as the input of SAE-2. Each patch ${P}_{x,y}\in \boldsymbol P$ centered at the spatial location $\left(x,y\right)$ pixel is generated by covering the $S\times S$ window and all spectral bands. The function of reshape layer is to combine the spectral dimension and channel dimension of the feature maps to make it suitable for next 2D convolution layer. There is none trainable parameter in the reshape layer. The backpropagate method is used to train the SAE-2 with a MSE loss function. In both the encoder and decoder, the ReLU activation function is adopted for every convolution and deconvolution layer to improve network fitting ability.

After the ASE-2 is trained, the encoder of ASE-2 is used independently to provide extracted spatial-spectral features for classifier. The classifier consists of a flatten layer, which expands the extracted features by ASE-2 encoder to 1D vectors, and three FC layers. The first two FC layers with ReLU activation function are designed to extract features further and followed by a dropout layer to prevent overfitting. The last FC layer with the same number of neurons as the number of classes of pixels uses softmax activation function to implement the classifier.

3.2 Details of training

The training of TMC-SAE is a three-phase process: (1) the training of SAE-1 based on unsupervised learning. In this step, the encoder of SAE-1 automatically extracts features form raw spectral data and the decoder reconstructs the raw data from the output of encoder. The training dataset is composed of all pixel vectors. The trained encoder of SAE-1 realizes dimension reduction from the raw HSI data $\boldsymbol X\in {\mathbb{R}}^{M\times N\times B}$ to $\boldsymbol Y\in {\mathbb{R}}^{M\times N\times {k}_{5}}$ only in spectral dimension. (2) the training of SAE-2 based on unsupervised learning. This process is as same as step (1) except that the training data is the extracted features of trained SAE-1 encoder. In this phase, the 3D neighboring pathces dataset $\boldsymbol Z\in {\mathbb{R}}^{P\times P\times {k}_{5}}$, which contains the information of all labeled pixels and is generated from $\boldsymbol Y\in {\mathbb{R}}^{M\times N\times {k}_{5}}$, is taken as the training dataset. The parameters P represents the patch window size of the training sample. (3) the training of classifier and fine-tuning of SAE-2 based on supervised learning with small labeled smaples. In this phase, the dataset $\boldsymbol Z\in {\mathbb{R}}^{P\times P\times {k}_{5}}$ is divided into training and testing groups, respectively. The classifier training and SAE-2 encoder fine-tuning are performed simultaneously based on the training group. After the above process, the classification performance of TMC-SAE is verifed based on the testing group. It can be seen from the above details that the features of all pixels can be used for the ASE-1 and ASE-2 training. This allows the encoders of ASE-1 and ASE-2 make maximum use of the information in the dataset instead of relying on only a small number of labeled samples. Thanks to the deep features extraction ability of SAE-1 and SAE-2 encoders, the high classification accuracy can still be obtained based on a small samples training group. The detailed flowchart of TMC-SAE training and testing is shown in Fig. 7.

4 Details of experimental

4.1 Data description

In this paper, three benchmark hyperspectral datasets with different environmental settings are adopted to validate our proposed network. The first dataset was gathered by the Airborne Visible Infrared Imaging Spectrometer(AVIRIS) instrument over a mixed vegetation site in northwestern Indiana (Indian Pines, IP). It contains $145\times 145$ pixels with 220 spectral channels covering the range from 0.4 to 2.5 $\mu m$. The second dataset was acquired over Kennedy Space Center(KSC), Florida. It consists of $512\times 614$ pixels with 176 spectral bands. There are 13 different land-cover classes in the raw dataset. The third dataset was gathered over SalinasValley(SV), California. It contains $512\times 217$ pixels and 224 bands in the range of 0.4–2.5 $\mu m$. There are 204 bands in the corrected data after 20 water absorption bands are removed. The land-cover classes and the labeled pixel numbers of each class for all datasets are listed in Table 1. The ground truth images of all datasets are shown in Fig. 8. All experiments are conducted on a computer with Intel(R) Core i7- CPU, Nvidia Geforce GTX 3090 GPU and 64 Gb RAM.

Table 1 The Class labels and number of training and testing samples

Full size table

4.2 Network construction

Because the numbers of bands in three datasets are different, the numbers of neuron(k1 ~ k5) in the FC layers of SAE-1 encoder are different. In general, the spectral band compression ratio of the SAE-1 encoder is about 1/8. The network structure of SAE-1 is given in Table 2. It can be seen from Table 2 that the number of trainable parameters in encoder is much larger than those in the decoder. This asymmetric structure imporves the feature extraction ability of encoder.

Table 2 Network structures of SAE-1

Full size table

The parameters of all layers in SAE-2 are the same for all datasets. The structure of SAE-2 and classifier is given in Table 3. In the SAE-2, the kernel sizes and strides of all layers are based on 3 and 1, respectively. The purpose of this design is to reduce the trainable parameters and the loss of spatial-spectral information during training process. The activation function employed in network is ReLU except for the last layer of the classifier. The learning rates of the ASE-1 and ASE-2 training are both 0.001, but the learning rate is 0.0001 when the classifier is trained and the encoder of ASE-2 is fine-tuned.

Table 3 Network structures of SAE-2 and classifier

Full size table

5 Experimental results and analysis

5.1 Analysis of parameters

In the architecture of TMC-SAE, the depth of SAE-1 is an important parameter for the classification performance. A series of experiments were conducted to evaluate the impact of SAE-1 depth on classification results. In the experiment, the depth of SAE-1 encoder was set eight different values from 1 to 8 and the overall accuracy(OA) was used to evaluate the classification performance of TMC-SAE with different depth on three datasets repectively. The experimental results are shown in Fig. 9. It can be seen that the OA first increases and then decreases as the depth of SAE-1 increases. This indicates that deeper SAE-1 can extract representative and deep features but will encounter the overfitting. Based on the experimental results, the depth of SAE-1 encoder was determined to be 5.

The encoder of SAE-2 consists of 2D convolution layers and 3D convolution layers. The purpose of 3D convolution operations is to extract spatial-spectral joint features from data that have been dimensionally reduced by SAE-1. The function of 2D convolution operations is to extract deeper features for classification task. In order to evaluate the effectiveness of 3D convolution and 2D convolution operations, the incomplete SAE without 3D convolution branch and that without 2D branch were used for classification experiments separately. The experimental results shown in Fig. 10 indicate that the SAE without 2D or 3D operations slightly reduce classification accuracy.

The loss and classification accuracy convergence curves of training group are portrayed in Fig. 11. It can be seen that both curves of all datasets converge at about 200 epochs.

5.2 Visualization and analysis of ASE-1

In order to gain detailed understanding of the SAE-1, visualization about spectral information is provided in this section. The spectral curves are used to visualize the features before and after extraction by SAE-1. The raw spectral curves of graminoid marsh(class 8) and spartina marsh(class 9) in KSC are shown in Fig. 12a and b. Obviously, the two curves are very similar and fifficult to distinguish. The extracted feature curves by SAE-1 are shown in Fig. 12c and d. These two features, which dimensions are reduced from 175 to 20, become more discriminable and abstract.

5.3 Comparison of classification results

In this experiment, the overall accuracy(OA), average accuracy(AA), and Kappa coefficient(Kappa) are introduced to evaluate the classification results. In addition, the results of the proposed TMC-SAE are compared with six state-of-the-art HSI classification models, which cover unsupervised learning and supervised learning with different dimensions, such as 1D-CNN [39], 2D-CNN [36], 3D-CNN-C [6], M3D-DCNN [15], 3D-CNN-H [3] and 3D-CAE [33]. The architectures and hyperparameters of these comparative models are consistent with that given in the corresponding papers. All the models are implemented using Python language and TensorFlow library. In order to verify the feature extraction ability of proposed model under the condition of small number of labeled samples, the training sample percentage of each class for IP, KSC and SV is set to 5%, 5% and 1% respectively.

The quantitative results over IP, KSC and SV datasets are listed in Tables 4, 5 and 6 respectively. It can be observed from three tables that the OA, AA and Kappa of propsed TMC-SAE outperform those of all other models for all datasets. The OA of TMC-SAE achieves 92.65% for IP, 94.41% for KSC and 98.50% for SV. The best accuracy of class 1–4, 10, 11, 13, 15 for IP, class 1, 3, 6–13 for KSC and class 1–3, 5–7, 9, 13, 15, 16 for SV is generated by the proposed TMC-SAE model. The experimental results show that there is no much lower result among the accuracy of each class of the proposed TMC-SAE even if the training sample is very few. It can be concluded that the feature extraction capability of TMC-SAE is more stronger and the above capability is enhanced by the unsupervised learning of SAE-1 and SAE-2. Figure 13 illustrates the classification maps of IP dataset with each above-mentioned model. The quality of the classification map of TMC-SAE is much better than other models especially for the classes with small number of samples.

Table 4 Classification accuracy of different models over the Indian Pines dataset

Full size table

Table 5 Classification accuracy of different models over the KSC dataset

Full size table

Table 6 Classification accuracy of different models over the Salinas dataset

Full size table

5.4 Impact of the training sample size

In this part, the effect of the different training sample size with all models is explored. For IP and KSC datasets, the percentage of training samples is set 3%, 5%, 10%, 15% and 20% and for SV dataset, it is set 0.5%, 1%, 3%, 5% and 7%. Figure 14 shows the OA results of different percentage of training samples on all datasets. As we can observe in Fig. 14, for all models, higher classification results can be obtained with larger proportion of training samples. However, with the decline of the proportion of training samples, the decline of classification accuracy of different models varies greatly. For IP dataset, the OA results of 2D-CNN-N, 3D-CNN-C, 3D-CAS and TMC-SAE are similar, when the percentage of training sample is 20%. However, there is more than difference between the largest OA result (proposed TMC-SAE, 85.29%) and the smallest classification result (M3D-DCNN, 75.11%) when the percentage of training sample is reduced to 3%. The proposed TMC-SAE model generates the highest accuracies in all experiments with small number of training sample. Specifically, when the proportion of training sample is 3% and 5%, the decline of classification accuracy of the proposed TMC-SAE is the smallest. For SV dataset, when the percentage of training samples is 7%, the OA results of all methods exceed 99% except 1D-CNN. It indicates that these models can extract sufficient features for classification when there are enough training samples. When the percentage of training samples decreases, especially at 1% and 0.5%, the OA of TMC-SAE remains the highest value. It indicates that the TMC-SAE maintains better feature extraction ability in small number of training samples.

6 Discussion and conclusion

In this paper, a new network architecture for hyperspectral remote sensing image classification is proposed. It consists of two stacked autoencoder networks SAE-1 and SAE-2. The purpose of SAE-1 based on 1D CNN is for feature extraction in spectral domain only. The asymmetric architecture improves the feature extraction ability of SAE-1 by making the number of trainable parameters in encoder more than that in decoder. The SAE-2 based on 2D and 3D CNN can extract spatial-spectral joint features from the information compressed by SAE-1. Generally, there is only one unsupervised learning in the previous network training. In this paper, the proposed TMC-SAE is divided into two independent autoencoders SAE-1 and SAE-2. This architecture increases the number of unsupervised training times to two, so that the information in unlabeled samples can be extracted more fully. The experimental results with real hyperspectral images demonstrate that the proposed TMC-SAE can achieve better classification result with a small number of training samples.

Data availability

The datasets analysed during the current study are available from the corresponding author on reasonable request.

References

Bai Y, Sun X, Ji Y, Huang J, Fu W, Shi H (2022) Bibliometric and visualized analysis of deep learning in remote sensing. Int J Remote Sens 43(15-16SI):5534–5571
Article Google Scholar
Bao R, Xia J, Dalla Mura M, Du P, Chanussot J, Ren J (2016) Combining morphological attribute profiles via an ensemble method for hyperspectral image classification. IEEE Geosci Remote Sens Lett 13:359–363
Google Scholar
Ben Hamida A, Benoit A, Lambert P, Ben Amar C (2018) 3-D deep learning approach for remote sensing image classification. IEEE Trans Geosci Remote Sens 56:4420–4434
Article ADS Google Scholar
Chen Y, Lin Z, Zhao X, Wang G, Gu Y (2014) Deep learning-based classification of hyperspectral data. IEEE J Sel Top Appl Earth Observ Remote Sens 7:2094–2107
Article ADS Google Scholar
Chen Y, Zhao X, Jia X (2015) Spectral-spatial classification of hyperspectral data based on deep belief network. IEEE J Sel Top Appl Earth Observ Remote Sens 8:2381–2392
Article ADS Google Scholar
Chen Y, Jiang H, Li C, Jia X, Ghamisi P (2016) Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans Geosci Remote Sens 54:6232–6251
Article ADS Google Scholar
Du Q, Younan NH (2008) Dimensionality reduction and linear discriminant analysis for hyperspectral image classification. Knowledge-based intelligent information and engineering systems, pt 3, proceedings, 5179, pp 392–399
Falco N, Benediktsson JA, Bruzzone L (2015) Spectral and spatial classification of hyperspectral images based on ICA and reduced morphological attribute profiles. IEEE Trans Geosci Remote Sens 53:6223–6240
Article ADS Google Scholar
Fauvel M, Chanussot J, Benediktsson JA (2006) Kernel principal component analysis for feature reduction in hyperspectrale images analysis. IEEE, New York, p 238
Google Scholar
Fauvel M, Benediktsson JA, Chanussot J, Sveinsson JR (2008) Spectral and spatial classification of hyperspectral data using SVMs and morphological profiles. IEEE Trans Geosci Remote Sens 46:3804–3814
Article ADS Google Scholar
Feng J, Jiao L, Sun T, Liu H, Zhang X (2016) Multiple kernel learning based on discriminative kernel clustering for hyperspectral band selection. IEEE Trans Geosci Remote Sens 54:6516–6530
Article ADS Google Scholar
Ghassemi M, Ghassemian H, Imani M (2018) Deep belief networks for feature fusion in hyperspectral image classification. Proceedingss of the 2018 IEEE international conference on aerospace electronics and remote sensing technology (ICARES 2018)
Guilloteau C, Oberlin T, Berne O, Dobigeon N (2020) Fusion of hyperspectral and multispectral infrared astronomical images. 2020 IEEE 11th sensor array and multichannel signal processing workshop (SAM)
Haque MR, Mishu SZ (2019) Spectral-spatial feature extraction using PCA and multi-scale deep convolutional neural network for hyperspectral image classification. 2019 22nd International Conference on Computer and Information Technology (ICCIT) 2019 22nd International Conference on Computer and Information Technology (ICCIT), pp 1–6
He MY, Li B, Chen HH (2017) Multi-scale 3D deep convolutional neural network for hyperspectral image classification. IEEE International Conference on Image Processing ICIP. IEEE, New York, pp 3904–3908
Imani M, Ghassemian H (2015) Two dimensional linear discriminant analyses for hyperspectral data. Photogramm Eng Remote Sens 81:777–786
Article Google Scholar
Jijón-Palma ME, Kern J, Amisse C, Centeno JAS (2021) Improving stacked-autoencoders with 1D convolutional-nets for hyperspectral image land-cover classification. J Appl Remote Sens 15:26506
Article Google Scholar
Khan Z, Shafait F, Mian A (2015) Joint Group Sparse PCA for compressed hyperspectral imaging. IEEE Trans Image Process 24:4934–4942
Article MathSciNet PubMed ADS Google Scholar
Li W, Prasad S, Fowler JE, Bruce LM (2011) Locality-preserving discriminant analysis in kernel-induced feature spaces for hyperspectral image classification. IEEE Geosci Remote Sens Lett 8:894–898
Article ADS Google Scholar
Li W, Wu G, Zhang F, Du Q (2017) Hyperspectral image classification using deep pixel-pair features. IEEE Trans Geosci Remote Sens 55:844–853
Article ADS Google Scholar
Li Y, Zhang H, Shen Q (2017) Spectral-spatial classification of hyperspectral imagery with 3D convolutional neural network. Remote Sens 9:67
Article ADS Google Scholar
Licciardi G, Marpu PR, Chanussot J, Benediktsson JA (2012) Linear versus nonlinear PCA for the classification of hyperspectral data based on the extended morphological profiles. IEEE Geosci Remote Sens Lett 9:447–451
Article ADS Google Scholar
Liu B, Guo W, Chen X, Gao K, Zuo X, Wang R, Yu A (2020) Morphological attribute profile cube and deep random forest for small sample classification of hyperspectral image. IEEE Access 8:117096–117108
Article Google Scholar
Lu B, Dao PD, Liu J, He Y, Shang J (2020) Recent advances of hyperspectral imaging technology and applications in agriculture. Remote Sens 12:2659
Article ADS Google Scholar
Marotz J, Kulcke A, Siemers F, Cruz D, Aljowder A, Promny D, Daeschlein G, Wild T (2019) Extended perfusion parameter estimation from hyperspectral imaging data for bedside diagnostic in medicine. Molecules 24:4164
Article CAS PubMed PubMed Central Google Scholar
Mei S, Ji J, Geng Y, Zhang Z, Li X, Du Q (2019) Unsupervised spatial-spectral feature learning by 3D convolutional autoencoder for hyperspectral classification. IEEE Trans Geosci Remote Sens 57:6808–6820
Article ADS Google Scholar
Nakayama K, Tonooka H (2021) Improvement of a mineral discrimination method using multispectral image and surrounding hyperspectral image. J Appl Remote Sens 15
Peng J, Luo T (2016) Sparse matrix transform-based linear discriminant analysis for hyperspectral image classification. Signal Image Video Process 10:761–768
Article Google Scholar
Roy SK, Krishna G, Dubey SR, Chaudhuri BB (2020) HybridSN: exploring 3-D-2-D CNN feature hierarchy for hyperspectral image classification. IEEE Geosci Remote Sens Lett 17:277–281
Article ADS Google Scholar
Sellami A, Farah M, Farah IR, Solaiman B (2018) Hyperspectral imagery semantic interpretation based on adaptive constrained band selection and knowledge extraction techniques. IEEE J Sel Top Appl Earth Observ Remote Sens 11:1337–1347
Article ADS Google Scholar
Shimoni M, Haelterman R, Perneel C (2019) Hyperspectral imaging for military and security applications combining myriad processing and sensing techniques. IEEE Geosci Remote Sens Mag 7:101–117
Article Google Scholar
Stuart MB, McGonigle AJS, Willmott JR (2019) Hyperspectral imaging in environmental monitoring: a review of recent developments and technological advances in compact field deployable systems. Sensors 19:3071
Article PubMed PubMed Central ADS Google Scholar
Sun Q, Liu X, Bourennane S (2021) Unsupervised multi-level feature extraction for improvement of hyperspectral classification. Remote Sens 13:1602
Article ADS Google Scholar
Sun Y, Qian X, Liu Y, Wang J, Lv Q, Yuan M (2021) Identification of typical solid hazardous chemicals based on hyperspectral imaging. Remote Sens 13:2608
Article ADS Google Scholar
Tao C, Pan H, Li Y, Zou Z (2015) Unsupervised spectral-spatial feature learning with stacked sparse autoencoder for hyperspectral imagery classification. IEEE Geosci Remote Sens Lett 12:2438–2442
Article ADS Google Scholar
Tun NL, Gavrilov A, Tun NM, Trieu DM, Aung H (2021) Hyperspectral remote sensing images classification using fully convolutional neural network. IEEE, pp 2166–2170
Wang C, Gong M, Zhang M, Chan Y (2015) Unsupervised hyperspectral image band selection via column subset selection. IEEE Geosci Remote Sens Lett 12:1411–1415
Article ADS Google Scholar
Weber C, Aguejdad R, Briottet X, Avala J, Fabre S, Demuynck J, Zenou E, Deville Y, Karoui MS, Benhalouche FZ et al (2018) Hyperspectral imagery for environmental urban planning. IGARSS 2018 - 2018 IEEE international geoscience and remote sensing symposium, pp 1628–1631
Wei H, Yangyu H, Li W, Fan Z, Hengchao L, Tianfu W (2015) Deep convolutional neural networks for hyperspectral image classification. J Sens 2015
Ye Z, Yan Y, Bai L, Hui M (2018) Feature extraction based on morphological attribute profiles for classification of hyperspectral image. Tenth international conference on digital image processing (ICDIP 2018), 10806
Yi B, Li W, Du J (2012) Classification of hyperspectral data based on principal component analysis. Information-Int Interdiscip J 15:3771–3777
Google Scholar
Yu C, Li F, Chang C, Cen K, Zhao M. 2019. Deep 2D convolutional neural network with deconvolution layer for hyperspectral image classification. Springer Singapore, Singapore pp 149–56
Yuan H, Lu Y, Yang L, Luo H, Tang YY (2013) Spectral-spatial linear discriminant analysis for hyperspectral image classification. 2013 IEEE International conference on cybernetics (CYBCONF)
Yue J, Zhao W, Mao S, Liu H (2015) Spectral-spatial classification of hyperspectral images using deep convolutional neural networks. Remote Sens Lett 6:468–477
Article Google Scholar
Zhang L, Su H, Shen J (2019) hyperspectral dimensionality reduction based on multiscale superpixelwise Kernel principal component analysis. Remote Sens 11:1219
Article ADS Google Scholar
Zhang J, Wei F, Feng F, Wang C (2020) Spatial-spectral feature refinement for hyperspectral image classification based on attention-dense 3D–2D-CNN. Sensors 20:5191
Article PubMed PubMed Central ADS Google Scholar

Download references

Funding

This research work is supported by the project supported by Guangxi Key Laboratory of Precision Navigation Technology and Application, Guilin University of Electronic Technology (No. DH202208).

Author information

Authors and Affiliations

School of Information and Communication, Guilin University of Electronic Technology, Guilin, China
Yang Bai, Xiyan Sun, Yuanfa Ji, Wentao Fu & Jinli Zhang
Guangxi Key Laboratory of Precision Navigation Technology and Application, Guilin University of Electronic Technology, Guilin, China
Yang Bai, Xiyan Sun & Yuanfa Ji
National & Local Joint Engineering Research Center of Satellite Navigation and Location Service, Guilin University of Electronic Technology, Guilin, China
Xiyan Sun & Wentao Fu

Authors

Yang Bai
View author publications
You can also search for this author in PubMed Google Scholar
Xiyan Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yuanfa Ji
View author publications
You can also search for this author in PubMed Google Scholar
Wentao Fu
View author publications
You can also search for this author in PubMed Google Scholar
Jinli Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiyan Sun.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bai, Y., Sun, X., Ji, Y. et al. Two-stage multi-dimensional convolutional stacked autoencoder network model for hyperspectral images classification. Multimed Tools Appl 83, 23489–23508 (2024). https://doi.org/10.1007/s11042-023-16456-w

Download citation

Received: 21 December 2022
Revised: 11 May 2023
Accepted: 06 August 2023
Published: 16 August 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-16456-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Two-stage multi-dimensional convolutional stacked autoencoder network model for hyperspectral images classification

Abstract

Similar content being viewed by others

Classification of hyperspectral images by deep learning of spectral-spatial features

A shallow network for hyperspectral image classification using an autoencoder with convolutional neural network

AECNN: Autoencoder with Convolutional Neural Network for Hyperspectral Image Classification

1 Introduction

2 Related works

2.1 Stacked autoencoder

2.2 2D and 3D convolution

3 Proposed TMC-SAE

3.1 Framework of the proposed TMC-SAE

3.2 Details of training

4 Details of experimental

4.1 Data description

4.2 Network construction

5 Experimental results and analysis

5.1 Analysis of parameters

5.2 Visualization and analysis of ASE-1

5.3 Comparison of classification results

5.4 Impact of the training sample size

6 Discussion and conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation