1 Introduction

Hyperspectral imaging technology can break through the limitations of two-dimensional space, and get the fine spectral information of the target while acquiring its spatial image information [1,2,3]. Compared with traditional imaging methods, the advantage of hyperspectral images lies in their ability to accurately obtain diagnostic spectral features of the target, thereby excellently completing tasks such as pixel level classification [4,5,6], scene classification [7], and object detection [8, 9]. Therefore, hyperspectral imaging technology is applied in both civilian and military fields, such as medicine [10, 11], agriculture [12, 13], environmental monitoring [14], and camouflage target detection [15]. Due to the limitations of imaging level, traditional hyperspectral target detection focuses more on quantitative analysis of spectral information [16]. The orthogonal subspace projection (OSP) algorithm [17] projects the original image onto an orthogonal matrix space, effectively suppressing background information. Harsanyi proposed the constrained energy minimization (CEM) algorithm [18]. A linear filter was designed in the CEM method, which can suppress and filter out the background in the image, filtering out the interested targets. On this basis, an object detection algorithm based on generalized likelihood ratio (GLRT), adaptive matched filter (AMF), and adaptive cosine estimator (ACE) has emerged [19,20,21]. The kernel-based object detection algorithm combines the ideas of kernel functions in machine learning with hyperspectral object detection algorithms, better utilizing the hidden nonlinear features in hyperspectral data. At present, many effective methods have been developed, such as KMF, KMSD, KASD, and KCEM [22,23,24,25,26]. However, there are some limitations in the application because there is no specific rule for the selection of the kernel function.

In recent years, with the development of statistical pattern recognition and deep learning under big data, new data-driven target detection algorithms based on data have begun to emerge. Deep learning-based methods can extract deep level features of targets, playing an important role in many stages of hyperspectral image processing such as noise processing [27], mixed pixel decomposition [28], and classification [29]. Although deep learning-based hyperspectral target detection techniques have made breakthroughs, the samples used in the process of training neural networks in most of the studies are derived from the same hyperspectral image as the actual test data. In theory, the target spectral data used in supervised object detection should come from the spectral library rather than from the hyperspectral image itself. Therefore, I personally believe that this approach to some extent avoids the issue of spectral uncertainty and is not conducive to the application of hyperspectral target detection technology. Of course, some researchers also recognize the above problems and use the transfer learning method to improve [30, 31]. Overall, the focus of most existing research results has been put on improving the model structure of deep neural networks, while the research in this paper is more focused on the application of hyperspectral target detection technology.

A multi-scale spectral feature extraction method was adopted in the preprocessing to reduce the influence of spectral uncertainty on target detection tasks. 3D–2D CNN can fully extract comprehensive and effective information during the feature extraction process. In addition, in response to the complex and variable background of the target being tested in practical applications, the proposed method utilizes multiple spectral similarity measures to replace the general dimension reduction method. This method converts simple background features into the degree of similarity between the background and the target, reducing the impact of complex backgrounds on target detection. The proposed method has certain reference significance for the practical application of hyperspectral object detection, laying the foundation for the development of real-time hyperspectral object detection technology.

2 Method and principle

2.1 Hyperspectral images under land-based imaging conditions

Land-based hyperspectral imaging systems refer to hyperspectral imaging systems based on various ground or near-ground imaging platforms. Land-based hyperspectral imaging differs greatly from traditional remote sensing imaging in terms of imaging platform, imaging environment, and target spectral characteristics. Li et al. analyzed the factors affecting spectral reflectance under land-based imaging conditions and studied the effects of solar zenith angle, detection zenith angle, and relative azimuthal angle on the spectral reflectance of features using the control variable method, respectively [32]. Figure 1 shows the hyperspectral images under two different imaging conditions, which can more intuitively show the characteristics of the two imaging conditions.

Fig. 1
figure 1

Remote sensing hyperspectral images and land-based hyperspectral images after PCA. ac First principal component image, second principal component Image, third principal component image of the remote sensing hyperspectral image; df First principal component image, second principal component Image, third principal component image of the land-based hyperspectral image

In fact, the principle of hyperspectral imaging on any imaging platform is the same. The reflectivity of ground objects will constantly change with changes in the external environment; therefore, there is uncertainty in the spectrum of ground objects. The existence of spectral uncertainty makes it difficult to identify targets with unique spectral curves, which poses great difficulties for accurate detection and recognition of targets. Compared to remote sensing images, the imaging process of land-based hyperspectral images is relatively simple, and the inherent laws of spectral feature changes are easier to analyze. The bidirectional reflection distribution function (BRDF) model is often used to analyze the variation pattern of ground reflectance [33]. In terms of real-time requirements for applications, remote sensing hyperspectral imaging requires a relatively long process of acquisition, storage, correction, data processing, and long-distance transmission, resulting in a low level of real-time performance. Hyperspectral imaging under land-based imaging conditions mainly utilizes near-ground platforms without complex processes such as atmospheric corrections, and has high requirements for applications such as real time.

2.2 Spectral similarity evaluation indicators and stacking learning

Generally speaking, spectral similarity measurement methods are divided into distance based, projection based, information measure based, and statistical property based methods. Common spectral similarity measures include spectral angle, Euclidean distance, correlation coefficient, etc. These similarity measurement methods have a simple process and low computational complexity, but their evaluation aspect is single, while ignoring that the contribution of different bands to similarity is not entirely the same, and their practicality is very limited. Formulas (1)–(3) are spectral angular similarity (SAM), normalized Euclidean (NED) distance, and spectral correlation coefficient (CC), respectively [34, 35]. Assuming that the spectral reflectance vectors of different targets are represented by X and Y, respectively.

$$\theta = \arccos \frac{{\sum\limits_{i = 1}^{n} {x_{i} \cdot y_{i} } }}{{\sqrt {\sum\limits_{i = 1}^{n} {x_{i}^{2} } } \cdot \sqrt {\sum\limits_{i = 1}^{n} {y_{i}^{2} } } }}$$
(1)

The generalized angle between X and Y is represented by \(\theta\). n represents the number of spectral bands. If \(\theta\) is smaller, the similarity in the shape of the spectral curve will be higher.

$$S = \sqrt {\frac{1}{n - 1}\sum\limits_{i = 1}^{n} {\left( {x_{i} - y_{i} } \right)^{2} } }$$
(2)

The distance between X and Y is represented by \(S\). n represents the number of spectral bands. If \(S\) is smaller, the similarity in the shape of the spectral curve will be higher.

$$r = \frac{{\sum\limits_{i = 1}^{n} {\left( {x_{i} - \overline{x} } \right)\left( {y_{i} - \overline{y} } \right)} }}{{\sqrt {\sum\limits_{i = 1}^{n} {\left( {x_{i} - \overline{x} } \right)^{2} \times \left( {y_{i} - \overline{y} } \right)^{2} } } }}$$
(3)

where \(r\) is the correlation coefficient of two variables. If the \(r\) is larger, the correlation between the X and Y is stronger.

The different similarity evaluation indicators consider various factors such as the difference in amplitude value, angle value, shape of spectral curves, and changes in internal information of spectral vectors. Compared to using only individual similarity metrics, the method of combining multiple metrics has better performance and can demonstrate stronger discrimination.

Stacking learning refers to integrating several different weak learners and training a metamodel to combine the outputs of these weak models as the final prediction result [36]. Figure 2 is the basic flowchart of stacking learning. This article draws inspiration from the core idea of stacking learning, uses different similarity evaluation indicators to obtain the feature maps to be input, and then uses the 3D–2D model as a metamodel to comprehensively utilize spatial spectral information.

Fig. 2
figure 2

Basic flowchart of stacking learning

2.3 3D convolution and multi-channel 2D convolution

3D convolution may appear to have the same number of channels as multi-channel 2D convolution, but there is a fundamental difference between the two. Figure 3 shows the principle of 3D convolution, where the channel depth of its input layer needs to be greater than the convolutional kernel. The 3D filter is moved in all three directions (height, width, and channel), providing numerical values for multiplication and addition at each position. Due to the filter sliding through a 3D space, the output values are also arranged according to the 3D space, and the output is also a 3D data. 3D convolution can simultaneously extract multidimensional features from data and has been widely used in many fields [37,38,39]. For hyperspectral image processing, 3D convolution can extract spatial and spectral information of targets simultaneously, effectively improving the efficiency of classification and detection [40].

Fig. 3
figure 3

The principle of 3D convolution

The number of channels input data in the 2D convolutional kernel is the same as the number of channels in the input, and the parameters on the channels are compressed by summation. As shown in Fig. 4, the convolutional kernel of multi-channel 2D convolution can only move in both the length and width directions of the input. Therefore, the main function of 2D convolution is to extract spatial information.

Fig. 4
figure 4

Principle of multi-channel 2D convolution

2.4 Proposed method

As shown in Fig. 5, an application-oriented hyperspectral object detection framework is proposed in this article, which is aimed at detecting specific targets at different times and backgrounds. This method mainly includes the training stage and the detection stage. In the training phase, it can be roughly divided into three steps: acquisition of hyperspectral data, image preprocessing, and model parameter training.

Fig. 5
figure 5

Overall flowchart of the proposed method

Acquisition of hyperspectral data: Currently, obtaining land-based hyperspectral images mainly relies on field tripod imaging spectrometers or unmanned aerial vehicle imaging spectrometers. During the training phase, the obtained hyperspectral images should have both diversity and representativeness. In practical applications, the real-time nature of hyperspectral data acquisition should be emphasized.

Image preprocessing: The preprocessing of hyperspectral images refers to the process from the original hyperspectral images to the primary feature maps, which mainly includes two steps: multi-scale spectral feature extraction and spectral similarity calculation. Multi-scale spectral feature extraction can fully utilize spectral information from various bands and is widely used in hyperspectral image processing tasks [41, 42]. When extracting multi-scale spectral features, similarity calculations are performed using spectral vectors of different scales and steps to obtain feature maps.

Model parameter training: The feature extraction network used in this article draws inspiration from HybridSN [43]. The hyperspectral image data cube is divided into small overlapping 3D-patches, and the truth labels of which are determined by the label of the centered pixel. The specific network structure is shown in Table 1.

Table 1 Layer wise summary of the proposed architecture with window size 25 × 25

The trained model is used in the detection phase to get the detection results. In order to cope with the problem of spectral uncertainty revealed by the target under different conditions, the combined spectral similarity measure is employed in the proposed framework. In order to fully utilize the spectral information, multi-scale spectral feature extraction is performed during preprocessing is used. The 3D–2D CNN model is capable of deeply extracting the information of the target spatial dimension along with the spectral dimension, which further enhances the stability and detection effect of the framework.

2.5 Transfer learning and application system design

Transfer learning is a new task to improve learning by transferring knowledge from the learned related tasks. The proposed preprocessing method is used to extract target information, and the trained model is applied to subsequent target detection tasks. This part mainly designs the application system from both software and hardware aspects, as shown in Fig. 6. In terms of software, a practical solution was proposed to overcome the problems encountered, and then the trained model was hardware encapsulated. In terms of hardware, on the one hand, it is to improve image processing speed, and on the other hand, it is real-time imaging and display. Classical hyperspectral object detection algorithms focus on designing better filters to extract targets, while the systems designed above emphasize convenience and real-time performance during application, making them more practical and meaningful.

Fig. 6
figure 6

Design of application system for hyperspectral target detection

3 Experiments

Both the publicly available hyperspectral image dataset and the actual measured hyperspectral image dataset were utilized in the experiment. During the preprocessing process, each similarity evaluation method can utilize multi-scale spectral information to obtain 10 feature maps. The preprocessing process is shown in Fig. 7. The experiment first verified the effectiveness of the proposed method, and then analyzed the stability of different target detection algorithms under spectral uncertainty conditions. Finally, discussions were conducted. The test times of different algorithms were also compared.

Fig. 7
figure 7

Steps for sample data preprocessing

3.1 Experimental data

The public dataset used in the experiment was two subregion images from the San Diego dataset. The San Diego dataset is widely used in object detection tasks. The image has a total of 400 × 400 pixels, each with 224 bands of information, with a spectral coverage ranging from 0.4 to 1.8 nm and spatial resolution of 3.5 m. In the experiment, the upper left corner 100 × 100 pixels were used as the training set, represented by "data1"; use the middle 200 × 200 pixels as the detection image, represented by "data2," as shown in Fig. 8. The corresponding target spectral average is represented by "target1" and "target2." The aircraft targets contained in the two sub images are considered as the same type of target under the influence of spectral uncertainty.

Fig. 8
figure 8

Publicly available hyperspectral image data

The actual measured data was captured in a certain area of Shijiazhuang City, Hebei Province, China, with the same material as the aircraft model, as shown in Fig. 9. The two sets of data are represented by "data3" and "data4," respectively, and the corresponding target spectral average is represented by "target3" and "target4." "data3" was filmed on March 10, 2023, with an image size of 400 × 170 pixels. "data4" was filmed on March 16, 2023, with an image size of 500 × 300 pixels.

Fig. 9
figure 9

Actual measured hyperspectral image data

3.2 Experiments on publicly available datasets

In the experiment targeting public datasets, "data1" is used as training data, and "data2" is used as the data to be detected. Firstly, use "target1" to detect three spectral similarities of "data1" to obtain the features of the target in different aspects. Based on the similarity relationship between the spectral vector at each pixel point in "data1" and the target spectral vector, preliminary results obtained from pre matching detection are obtained. As shown in Fig. 10, each row represents results of corresponding similarity method at different scales. Use the preprocessed data of 100 × 100 × 30 as the training input for the neural network model, and the corresponding labels as the output.

Fig. 10
figure 10

The images obtained by similarity matching "data1" with "target1"

The superiority of the proposed method is that it does not need to know the spectral information of the target to be measured in advance, but uses the idea of transfer learning to detect targets. Therefore, when detecting targets in the public dataset "data2," "target1" is used as a prior information for preprocessing, as shown in Fig. 11.

Fig. 11
figure 11

The images obtained by similarity matching "data2" with "target1"

Through the above series of processing, input for training and detection can be obtained. Throughout the entire process of object detection, the proposed method did not utilize "target2," but instead migrated "target1" to the detection task. If "target1" is used to identify the target in "data2," the images obtained by different types of algorithms after testing are shown in Fig. 12. Figure 13 shows the ROC curves of different algorithms, the AUC values of various object detection algorithms are shown in Table 2, and the test time of various object detection algorithms are shown in Table 3.

Fig. 12
figure 12

The detection results of different algorithms for "data2"

Fig. 13
figure 13

The ROC curves obtained by detecting "data2" using different algorithms

Table 2 The AUC value obtained by detecting "data2" using different algorithms
Table 3 The test time obtained by detecting "data2" using different algorithms(S)

Even if "target1" is used to detect "data2," various target detection methods have certain effects, especially when the AUC value of the proposed method reaches 0.99 or above. The detection performance of GLRT and SACE methods is poor, indicating that these two algorithms have poor ability to cope with spectral uncertainty in target detection. The OSP and KCEM methods still have a certain degree of stability when conducting detection tasks. In terms of testing time, the OSP method has the shortest testing time, while the proposed method has a longer testing time.

3.3 Experiments on measured datasets

In the experiment on measured land-based hyperspectral image data, "data3" is used as training data, and "data4" is used as the data to be detected. The preprocessing steps for this part of the experiment are the same as those for public datasets, so we will not elaborate further. If "target3" is used to recognize targets in "data4," the results obtained by different types of algorithms are shown in Fig. 14. The ROC curves of different algorithms are shown in Fig. 15, AUC values of various object detection algorithms are shown in Table 4, and the test time of various object detection algorithms are shown in Table 5.

Fig. 14
figure 14

The detection results of different algorithms for "data4"

Fig. 15
figure 15

The ROC curves obtained by detecting "data4" using different algorithms

Table 4 The AUC value obtained by detecting "data4" using different algorithms
Table 5 The test time obtained by detecting "data4" using different algorithms(S)

From the experimental results of measured hyperspectral data, it can be seen that the proposed method can still maintain a high AUC value and has good detection performance. From the perspective of AUC values alone, GLRT, SACE, and KCEM methods still have certain effectiveness in target detection of land-based hyperspectral images, but they are very limited. The detection performance of OSP method is the worst, indicating that this algorithm has poor ability to cope with spectral uncertainty in the process of target detection in land-based hyperspectral images.

3.4 Analysis of stability of different methods

For the sake of researching the detection capacity of various object detection algorithms in remote sensing hyperspectral images and land-based hyperspectral images, and analyze the stability of different algorithms in dealing with spectral uncertainty issues. Figure 16 shows the detection result images in various scenarios. Figure 17 shows the AUC values of various detection methods.

Fig. 16
figure 16

The detection results of different algorithms in various situations

Fig. 17
figure 17

The AUC values obtained from different algorithms in various situations

From the perspective of target detection capabilities of different algorithms, the OSP algorithm performs well in public remote sensing hyperspectral images and has a strong ability to cope with spectral uncertainty. However, the OSP algorithm is clearly not suitable for target detection tasks in land-based hyperspectral images. In comparison methods, the overall performance of SACE and GLRT methods is not as good as that of KCEM method. This may be due to the introduction of nonlinear features in KCEM method, which improves detection performance and stability. The proposed method achieves AUC values above 0.99 in target detection of different hyperspectral images, which combines high accuracy and stability.

3.5 Experimental results and discussion

The above experiment can be divided into two parts: one is based on publicly available hyperspectral datasets, and the other is based on measured hyperspectral datasets. In both parts of the experiment, the idea of target spectral migration was adopted. Compared to other hyperspectral target detection algorithms, the proposed method exhibits outstanding detection performance and stability. In summary, the experimental results can validate the following viewpoints.

  1. 1.

    There are significant differences in image processing between land-based hyperspectral images and remote sensing hyperspectral images. Land-based hyperspectral images and remote sensing hyperspectral images exhibit their respective characteristics in both spatial and spectral dimensions. Early target detection algorithms were all aimed at remote sensing hyperspectral images, and experiments have shown that these algorithms are not entirely applicative for target detection in land-based hyperspectral images. For example, the OSP algorithm can achieve good results in remote sensing hyperspectral images, but its detection performance is poor in land-based hyperspectral images.

  2. 2.

    The proposed framework is basically unaffected by the spectral uncertainty, and its detection results have strong stability compared to other supervised target detection algorithms. The proposed method utilizes the idea of ensemble learning and deep convolutional neural networks to effectively extract invariant features in the target spectrum. Whether in remote sensing hyperspectral images or land-based hyperspectral images, it has outstanding detection performance and robustness.

  3. 3.

    There are still many urgent problems to be solved in current target detection methods for land-based hyperspectral image target detection tasks. Compared to hyperspectral remote sensing images, land-based hyperspectral images can obtain more detailed spatial structure information of targets. However, land-based hyperspectral images contain richer and more complex information. For the measured hyperspectral data, although the AUC values are all very high, from the detection images, the false alarm rate of this framework is still at a high level. On the one hand, this is due to the relatively simple composition of training samples, and on the other hand, it is determined by the characteristics of land-based hyperspectral images themselves, such as the influence of target shadows. The real-time performance of the proposed framework could be improved from the testing time.

4 Conclusion

An application-oriented land-based hyperspectral target detection framework based on 3D–2D CNN and transfer learning was proposed in this study, and the detection effect and robustness of this method were proved through experiments. The framework design adopts integrated learning, transfer learning, multi-scale spectral feature extraction, 3D–2D CNN model, and other strategies. Through experiments, it has been confirmed that the proposed target detection algorithm has strong robustness for different types of hyperspectral images. There are still some areas that can be further developed during the research process. Firstly, based on the analysis of spectral characteristics of land-based hyperspectral images, search for more invariant spectral features to address spectral uncertainty issues. Secondly, the selection of training samples should strive to balance quantity and representativeness. Thirdly, there is still great room for improvement in real-time performance of the proposed method, and improvements are still needed in the feature extraction network to improve detection speed and seek the best balance between detection speed and effectiveness. Fourthly, multiple evaluation indicators need to be utilized to evaluate the effectiveness of target detection. At present, real-time imaging devices have become a reality, and the proposed method provides a feasible solution for the future application of target detection in land-based hyperspectral images.