1 Introduction

Synthetic aperture radar (SAR) with very high-resolution images plays a crucial role in automatic target recognition (ATR) for their robust ability to work in all weather conditions during day and night in different applications such as homeland security, surveillance and military tasks [1,2,3,4,5].

Moving stationary target acquisition and recognition (MSTAR), a standard SAR-ATR database [6], is used for the testing and validation of different algorithms. Due to the noisy background of SAR images, and in order to extract the useful information, various preprocessing techniques are introduced in the literature [7,8,9]. Furthermore, studies explored that shadowing parts have a great effect on the accuracy of detection in parallel to target information and hence feature fusion based on both parts is recommended in [10].

Fig. 1
figure 1

Block diagram of the proposed method

Different approaches for feature extraction have been introduced for SAR image target recognition. Linear discriminant analysis (LDA), principle component analysis (PCA) and independent component analysis (ICA) techniques have been commonly used in pattern recognition [11,12,13,14]. The problem associated with these techniques is that generally they are very sensitive to speckle noise [15] and are rotation variant. In order to overcome these problems, moment-based descriptors can be utilized as an effective region-based shape descriptor. Hus invariant moments are the simplest method for generating shape descriptors [16]. Although they are rotational invariant, they suffer from a high degree of information redundancy since the bases are not orthogonal [17]. In addition, higher order moments are noise sensitive. In order to avoid these problems, Zernike moment (ZM) was suggested as a continuous orthogonal moment method, which was used in [18]. Zernike polynomials are rotation invariant with its robustness to speckle noise and having a minimum information redundancy since the basis is orthogonal.

However, several drawbacks still remain with this method. As mentioned before, Zernike moments are defined as a continuous function; hence for a digital image, an approximation by discrete summation is considered which leads to numerical errors in computation of moments. Moreover, this approximation can effect some properties such as orthogonally and rotational invariance. Zernike moments are expressed inside the unit disc \(\left( x^2+y^2\right) \le 1\), which increases the computational complexity with an appropriate transformation of the image coordinate space [19]. In order to overcome all the above problems, a discrete orthogonal moment method called radial Chebyshev moment (RCM) is introduced to eliminate both computational complexity, due to normalization and computational error caused by approximations [20].

In this paper, three types of segmentation were applied to generate areas of interests: segmented region, segmented boundary and segmented texture, to be used in the feature extraction process. This approach is adopted for both target and shadow areas of the input image. As feature extractors, ZM and RCM were employed to generate region feature descriptors. Finally, region descriptors are fused by concatenating the feature vectors into longer descriptors to be used in the support vector machine (SVM) classifier. Results showed that in both feature extraction methods, total accuracy of fused segmentation of target and shadow parts improves significantly. Further comparison between ZM and RCM reveals that accuracy of RCM is higher than ZM by 8%. In addition to the improvement gained by using RCM instead of ZM, fusion of the feature descriptors obtained from segmented areas will also improve the performance by 6%. This paper consists of three contributions.

The first contribution of the paper is the utilization of RCM for the first time in the literature for target recognition in SAR images. RCM was introduced as an alternative feature extraction technique overcoming the drawbacks of the ZM method. Because of elimination of computational complexity and computational errors that were explained earlier, the accuracy has been improved significantly.

Second contribution involves adding the shadow part of the target as an extra source of features improving the SAR recognition. Shadow parts are areas on the grounds that are not covered by the radar signal; as a result, no return signal is received, and these areas appear dark in the SAR images. This property is utilized for improving the total accuracy.

The third and final contribution is about feature fusion. An input SAR image segmented with different techniques can be represented by fusing the region descriptors of these images, which improves the overall accuracy.

2 Methodology

The details of the proposed method are given in Fig. 1. Each SAR image contains a target, shadow and noisy background. Our aim is to remove the noisy background while preserving the target and shadow parts. Histogram equalization and average filter are used to remove the background. In addition, by utilizing two different threshold values, target and shadow regions are separated. Furthermore, combining target and shadow corresponds to the third part to be considered. Therefore, a SAR image is categorized into three different parts: target, shadow and \(\hbox {target}+\hbox {shadow}\). Each target, shadow or \(\hbox {target}+\hbox {shadow}\) is segmented into three different objects: namely, segmented region (SR), which refers to the binary shape region, segmented boundary (SB), which indicates boundary area and segmented texture (ST) which extracts whole texture of the region of interest. SR corresponds to the mask covering the region of interest after background removal as visually illustrated in Fig. 1. SB is the processed SAR images after applying Sobel filter followed by dilation to SR as it can be seen in Fig. 2. ST is the multiplication of original image by SR. RCM is introduced in addition to ZM for feature extraction of the given SAR images. For each segmented object, 100 features are extracted. Feature vectors coming from each of the three segmented objects are merged to form a vector of 300 features. For a single SAR image, target, shadow and \(\hbox {target} + \hbox {shadow}\) parts with a vector of 300 features each is fused by concatenation resulting a final feature vector of 900 features to be used in classification. We choose library support vector machine (LIBSVM) which is a standard support vector machine (SVM) classifier with tenfold cross validation.

Fig. 2
figure 2

Segmentation method for target detection

2.1 Segmentation process

SAR images have a very noisy background, which should be removed before further processing. All SAR images in this dataset (MSTAR) are required to have the histogram equalization, average filter, threshold, Sobel filter and dilation to remove noisy background as given in Fig. 2.

Histogram equalization is the first step of the segmentation process. By applying it, the output will have pixel values distributed equally on the interval [0, 1]. An equalized image then is followed by an average filter through which the image is smoothened in order to reduce noise artifacts. The filter mask size of the average filter was chosen to be \(11\times 11\).

Fig. 3
figure 3

a Original image. b Histogram equalization. c Average filter. d Threshold for target detection. e Sobel filter. f Dilation

The thresholding is next applied on the smoothed image. As discussed in the previous section, it is essential to extract the edges of both target and shadow parts; therefore, it is necessary to apply different threshold levels for obtaining both parts. In [7], it was suggested to have 0.8 to be a threshold pixel value for target parts. Shadow parts are areas on the grounds that are not scanned by the radar signals due to reflections. The natural result of this process is that no returned signal is received and these areas appear dark in the SAR images. Two thresholds have been adopted in this paper, \(\tau \) and \(\xi \) for the segmentation of the target and shadow parts, respectively. Gray level threshold is defined as a constant between 0 and 1. In order to detect the target, which is brighter in the image as illustrated in Fig. 3c, it is required to choose the constant closer to 1. In [7], \(\tau \) is chosen to be 0.8. The experimental results validate the effectiveness of \(\tau \) at 0.8 for efficient segmentation. On the other hand, for detecting shadow parts, which cover darker area in the image a constant, \(\xi \), closer to zero should be chosen. In this context, \(\xi \) is chosen to be 0.2 for effective shadow segmentation. The thresholded images at this stage can be considered to be segmentation region/mask corresponding to target and shadow parts, respectively. Combining image with target and shadow images forms the region of interest (ROI) for further processing. Having the binary ROI image as a mask image, it is multiplied with the input image to generate the segmented texture containing texture of the target as well as that of the shadow. In the next step, Sobel filter [7] is adopted to perform edge detection on the mask image. Dilation by \(2\times 2\) structuring element [7] is used to connect the disconnected edges and emphasize the boundaries. In this regard, the edge boundaries of the target and shadow parts are extracted for further processing. Figure 3 shows the block diagram of the adopted process to extract the target and shadow boundaries of the image (hb03787.004) from BRT70 group (armored personal carrier) with serial number SN-C71.

Fig. 4
figure 4

First column: SR of target (a), shadow (d) and \(\hbox {target}+\hbox {shadow}\) (g). Second column: SB of target (b), shadow (e) and \(\hbox {target}+\hbox {shadow}\) (h). Third column: ST of target (c), shadow (f) and Target+shadow (i)

Figure 4 explores all segmented areas of a sample image (hb04000.000 image) including SR, SB and ST of target Fig. 4a–c (respectively), shadow Fig. 4d–f (respectively) and both Fig. 4g–i (respectively). For each segmented image, a feature extraction method was applied to extract a distinct number of features. In ZM method, for each segmented image 34 features were extracted; as a result, the total number of extracted features for a single image is 306, while for RCM 100, features are extracted for single segmented image, which means 900 of features were extracted for every given image.

2.2 Feature extraction technique

Feature extraction algorithms extract unique target information from each image. Identifiability, translation, rotation and scale invariance, affine invariance and noise resistance [21] must be considered for the adopted algorithm. Two robust shape-based feature extraction techniques are radial Chebyshev moment [22] and Zernike moment [23].

2.2.1 Zernike moment (ZM)

Zernike moments are orthogonal moments that consist of a set of complex polynomials, known as Zernike polynomials. It forms a complete orthogonal set on the unit disc \((x2+y2)\le 1\). A complex Zernike moment is defined as [24]:

$$\begin{aligned} Z_{mn}=\frac{(p+1)}{\pi }\int _x\int _yV^*_{pq}(\rho , \theta )f(x,y)\hbox {d}x\hbox {d}y \end{aligned}$$
(1)

For a digital image f(xy) function with the size of \(N \times N\), Eq. (1) can be approximated as in [25].

$$\begin{aligned} Z_{mn}=\frac{(p+1)}{\pi }\sum _{x=1}^{N}\sum _{y=1}^{N} V^*_{pq}(\rho , \theta )f(x,y) \end{aligned}$$
(2)

where \(\rho \) and \(\theta \) in the polar coordinates are defined as:

$$\begin{aligned} \rho =\left( x^2+y^2\right) ^{\frac{1}{2}}, \quad \hbox {and} \quad \theta =\hbox {tan}^{-1}\left( \frac{y}{x}\right) \end{aligned}$$
(3)

\(\rho =1,2, \inf \) is the order of Zernike polynomial, and q is the repetition of Zernike moment that takes on positive and negative integer subject to the following conditions:

$$\begin{aligned} p-|q|= \hbox {even}, \quad \hbox {and} \quad |q|\le p \end{aligned}$$
(4)

The symbol \(*\) indicates the complex conjugate. An orthogonal basis function for the Zernike moments is defined by:

$$\begin{aligned} V^*_{pq}(\rho , \theta )=R_{pq}(\rho )\hbox {e}^{jq\theta } \end{aligned}$$
(5)

where \(R_{pq}\) is defined below:

$$\begin{aligned} R_{pq}(\rho )=\sum _{s=0}^{ (p-|q|)/2 } \frac{ (-1)^s (p-s)! \rho ^{p-2s} }{ s! (\frac{p +|q|}{2} -s)! (\frac{p -|q|}{2} -s)!} \end{aligned}$$
(6)

In [18] 34, Zernike moments are calculated for each image based on the Table 1. For each segmented image, 34 features are extracted. Totally, they use 68 features in their shape descriptors of each target with 34 features from segmented mask and textures, respectively. In this paper, in addition of segmented mask and texture, we use the segmented boundary as well for the target images. Furthermore, we use the segmented mask, texture and boundary of the shadow parts. Finally, we use three images for combined target and shadow images. In this respect, we use nine images for each object with 34 features for each image, respectively, generating a vector of 306 features in the shape descriptor.

Table 1 List of Zernike moments used for each segmented image [18]

2.2.2 Radial Chebyshev moment (RCM)

The radial Chebyshev moment of order p and repetition q for an image of size \(N\times N\) with \(m=(N/2)+1\) is defined as [19]:

$$\begin{aligned} S_{pq}=\frac{1}{2\pi \rho (\rho ,m)} \sum _{r=0}^{m-1} \sum _{\theta =0}^{2\pi } t_p(r) \times \hbox {e}^{-jq\theta }\times f(r,\theta ) \end{aligned}$$
(7)

where \(t_\rho (r)\) is an orthogonal basis Chebyshev polynomial function for an image of size \(N \times N\):

$$\begin{aligned} \displaystyle t_0(x)= & {} 1\nonumber \\ \displaystyle t_1(x)= & {} \frac{2x-N+1}{N} \nonumber \\ \displaystyle t_p(x)= & {} \frac{ (2p-1) t_1(x) t_{p-1}(x)-(p-1) \left\{ 1- \frac{(p-1)^2}{N^2} \right\} t_{p-2}(x) }{P}\nonumber \\ \end{aligned}$$
(8)

\(\rho (p,N)\) is the squared-norm:

$$\begin{aligned} \displaystyle \rho (p,N)= & {} \frac{ N \left( 1-\frac{1}{N^2}\right) \left( 1-\frac{2^2}{N^2}\right) \left( 1-\frac{p^2}{N^2}\right) }{2p+1} \nonumber \\ \displaystyle p= & {} 0,1, N-1, m=(N/2)+1 \end{aligned}$$
(9)

Like ZM calculation, RCM can be calculated in different order. It is assumed that order p and repetition q are \(p,q={1,2,\ldots ,10}\), which accumulates 100 moment features that are extracted for each segmented image as summarized in Table 2. Therefore, the total number of features extracted for a single image is 900 after considering target, shadow and \(\hbox {target}+ \hbox {shadow}\) images in three different segmentation methods.

Features for both ZM and RCM can be computed as many times as is desired. However, considering the dimensionality of an image in the moment space, after a dimension is reached, the extra information that can be gained from a feature is expected to approach zero.

Table 2 List of radial Chebyshev moment used for each segmented image
Fig. 5
figure 5

Accuracy of segmented region of target in ZM and RCM

Figure 5 demonstrates two graphs. Both graphs indicate the number of features versus total accuracy on the training set using tenfold cross validation. The first graph shows accuracy of ZM based on the segmented region of target. It is clear that after first 40 features (approximately), the accuracy is not varied significantly. Therefore, based on the study on the training set it can be suggested that the dimension of the moment space can be limited to 40.

Hence, around 40 features are sufficient for evaluating accuracy. Based on this observation and the number of features used for the dataset in [18] which is 34, it was decided to adopt 34 to be the number of features used in ZM approach in this paper. Figure 5 also shows accuracy of RCM based on the segmented region of target. Obviously after 100 features, accuracy remains constant. In this context, the decision was taken to use 100 features in the shape descriptor vector in the RCM-based segmented image representation.

2.3 Classifier

In the classification stage, k-fold class validation technique was applied. In this model, whole dataset is divided to k equal subsets. The algorithm is then repeated k times. Each time \(k-1\) subsets are chosen randomly as a training sample set, and the remaining sample set is used for testing. K is chosen to 10 in all of the experiments. In each fold, accuracy is calculated, and at the end of k-fold, the average accuracy is calculated.

In this study, we use multi-library support vector machine (multi-LibSVM) [26], which is a standard library for support vector machine (SVM). All codes run under MATLAB pattern recognition toolbox (PRTools) [27]. The radial basis function (RBF) kernel is applied in all the experiments.

3 Experimental results

This section provides experiments to examine the performance of the proposed method.

3.1 MSTAR database

In this paper, MSTAR image database [6] is used. Targets consist of three different types of ground vehicles and seven serial numbers. BTR70 with only one serial number (SN-C71) is an armored personnel carrier. BMP2 is an infantry-fighting vehicle with three serial numbers (SN-9563, SN-9566 and SN-C21), and T72 is a tank with three serial numbers (SN-132, SN-812 and SN-S7). An example of each type is illustrated in Fig. 6. The number of train and test sample used for each type and serial is listed in Table 3. All images with the size of \(128 \times 128\) pixels have one-foot resolution. An X-band SAR sensor is used for data collection by Sandia National Laboratory (SNL). Totally 1622 images are collected at \(17^{\circ }\) depression angle for training and 1365 images at \(15^\circ \) for testing.

3.2 Results and discussion

In this section, the experimental results of the proposed method which consists of three contributions are discussed. A comparison based on the feature extraction techniques was done between ZM and RCM. At the same time, effects on feature fusion were discussed as the second contribution. Consideration of shadowing parts is the last contribution of the result of this work.

The number of images used as a sample in the whole experiment is 2987. A tenfold class validation technique is applied in all the experiments. In the first 7 rounds and last 3 rounds, 299 and 298 samples are used for testing, respectively, and the remaining samples are used as training. The number of features extracted in each experiment is different based on the techniques we applied.

Table 4 shows that the accuracy is lower if segmentation is omitted before feature extraction in both techniques. We extract 34 and 100 features for ZM and RCM, respectively. The results on accuracy clearly show that with or without segmentation the proposed RCM-based approach is superior to ZM-based method. The results also show that segmented region, segmented boundary and segmented texture results are comparable both in RCM-based approach which has 92.64% accuracy. This accuracy is improved to 96.35% when the fusion of the three segmentation methods is performed by concatenating the feature vectors of each method into a single vector.

Fig. 6
figure 6

Three types of ground vehicles. a An armored personnel carrier (SN-C71). b An infantry-fighting vehicle (SN-9563). c A tank (SN-132)

Table 3 SAR database
Table 4 ZM and RCM-based target recognition without and with preprocessing
Table 5 Accuracy (%) of segmentation with target and/or shadow based on RCM

Having RCM superior over the ZM-based target recognition, it was decided to adopt RCM-based feature representation of the targets as SAR images. The dimensionality of the feature vector for each representation is chosen to be 100. One of the major contributions of this paper is to include the information extracted from the shadow of the vehicle to be recognized. It should be remembered that this shadowing effect is based on electromagnetic waves, which is caused by the depression angle of the aerial vehicle acquiring the images rather than the sunlight. In this context, the results obtained from the shape descriptor vectors are given in Table 5.

It can be seen that the results obtained from segmented boundary approach for target and shadow parts are higher than those of SR and ST. Furthermore, improvement is provided by fusing the vectors coming from target and shadows, respectively. The highest performance, 93.74%, is obtained from combining segmented boundary shape descriptors of both target and shadow parts.

In the final setup, investigation was on the concatenation of feature vectors extracted from target, shadow and \(\hbox {target}+\hbox {shadow}\) images for SB, SR and ST cases. In other words, feature vectors for targets (100 features), shadows (100 features) and \(\hbox {target}+\hbox {shadows}\) (100 features) are extracted for SB, SR and ST, respectively. After concatenation of the vectors from SB, SR and ST for targets generate 300 features. The same operation is employed for shadow images; hence, 300 features are used to describe shadow parts. Finally, 900 features are used to represent target (300), shadow (300) and \(\hbox {target}+\hbox {shadow}\) (300) in a single vector.

Table 6 shows the improvement provided by combining SR, SB and ST objects into a single vector. For example, 300 features extracted from \(\hbox {target}+\hbox {shadow}\) images reached 98.15%. Concatenation of shadows part to targets slightly increased the accuracy, and it reached to 98.25%. However, in comparison with \(\hbox {target}+\hbox {shadow}\) more features are extracted (600 features), and slight accuracy is improved by only 0.1%. Last experiment shows that concatenating of target, shadows and \(\hbox {target}+\hbox {shadow}\) further improved the accuracy and it reached its highest value of 98.69%. This result justifies that the feature fusion technique improves the total accuracy. A comparison between Tables 5 and 6 indicates that, generally, segmentation based on SR, SB and ST for both target and shadow followed by feature fusion drastically improves the accuracy for both ZM and RCM. Table 7 verifies that proposed method has the highest performance among the alternative methods in the literature.

Table 6 Accuracy of the proposed method versus alternative methods in the literature

3.3 Conclusion

In this paper, we developed a feature extraction algorithm using radial Chebyshev moments and compared it with a commonly used method called Zernike moments. RCM is a discrete orthogonal moment that overcomes numerical errors and computational complexity due to normalization in ZM, and as a result, we achieve improvement in accuracy. Experimental results verify that RCM gives higher performance in accuracy as compared to ZM. Accuracy of RCM without using any segmentation is 75.33%, while the accuracy is 57.89% for ZM.

Additionally, we considered shadow parts as part of feature extraction parallel to target information, and then, we applied feature fusion technique based on different image segmentation process: segmented region, segmented boundary and segmented texture for target and shadow part. Experimental results show that overall accuracy of fused images is improved for both techniques used for feature extraction. Accuracy of fused data for target part is 96.35%, which is around 4% up to 6% improvement over SB, ST and SR. Furthermore addition of shadow effects to fused data, accuracy reached to 98.69%.