Introduction

With the development of marine exploration, marine transportation and exploitation of submarine resources, activities of ships are becoming increasingly frequent; meanwhile, safety requirements for marine transportation are also increasing. Therefore, accurately detecting ships in legal water areas is an urgent problem and is of great practical and strategic significance to the social and economic development and national security of China.

Synthetic aperture radar (SAR) imaging is a 24-h, all-weather detection method (Margarit et al. 2009; Zhang 1989) and is important for detecting ship position. At present, the application of SAR imaging in ship detection in the sea is gaining increasing attention (Gao 2011; Cui et al. 2014), which has further promoted the research on ship detection.

The CFAR algorithm is the most widely used algorithm in the field of ship detection (An et al. 2014). This algorithm was proposed by Wackerman et al. (2001) and based on ship detection in the Ottawa Defense Research Center in 2001 (Gao et al. 2009). The well-performed two-parameter CFAR algorithm proposed by Lincoln Lab (Novak et al. 1997), is adapted from the unit-average CFAR technique based on the Gaussian normal distribution. Since then, scientists have expended much effort on improving the performance of the CFAR detection algorithm from different aspects; thus, many CFAR-based detection algorithms have emerged (Ai et al. 2017; Bisceglie and Galdi 2005; LeCun et al. 2015). However, with the increase in satellite remote sensing data, the detection speeds of these CFAR algorithms cannot meet our needs. After analysis, the satellite remote sensing data have the characteristics of “big data”. Thus, the accuracy and speed of ship detection can be improved by utilizing the deep learning method, which can analyse big data.

Since Professor Hinton and others (Hinton and Salakhutdinov 2006) published a paper on the conversion of multilayer neural network high-dimensional data to low-dimensional coding in 2006, deep learning has become more widely used. It has become a hotspot in the field of artificial intelligence, and image recognition based on deep learning is a successful typical application in this field at present (Bentes et al. 2017). The most commonly used basic structures of deep learning include convolutional neural networks (CNNs), deep belief networks (DBNs) and sparse autoencoders (SAEs) (Cho et al. 2017); among them, the CNN has made a breakthrough in the field of static image recognition by exceeding the other pattern recognition algorithms in terms of performance to a large extent (Garcia and Delakis 2004; Lawrence et al. 1997; LeCun et al. 1998).

Dai proposed a modified CFAR algorithm (Dai et al. 2016) that improves the detection accuracy by setting the size of the protection window according to the sizes of objects. Schwegmann proposed a ship detection algorithm based on deep learning (Schwegmann et al. 2016) that improves the detection accuracy largely by training on samples. Kang proposed an R-CNN ship detection algorithm based on the CFAR algorithm (Kang et al. 2017) that improves ship detection performance by combining pixel and depth features.

Based on the CFAR algorithm and the CNN model for image recognition, this paper has implemented a maritime SAR image ship detection scheme that incorporates CFAR and CNN. The main contributions of this letter are listed as follows.

  1. 1.

    A CNN algorithm is designed for ship detection, which is based on the understanding of the CNN model and ship characteristics.

  2. 2.

    A ship detection algorithm with higher performance is designed with the advantages of the CFAR algorithm and the CNN algorithm. By comparing the results of the algorithms, it is concluded that the algorithm in this paper can improve ship detection performance.

CNN Algorithm Introductions

CNN is a widely used model and method in deep learning, which is developed based on an artificial neural network. It is composed of multiple processing layers to learn representations of data with multiple levels of abstraction (Xu and Scott 2017). Compared with traditional artificial neural networks, CNN has many advantages, such as strong adaptability, strong feature learning ability, fewer model training parameters and high recognition accuracy, and has made breakthroughs in image recognition.

CNN Structure

CNN is a multi-layer network that is based on the traditional artificial neural networks. Arel constructed a multi-level, trainable CNN model in 2010. The model contains five layers: input layer, convolutional layer, pooling layer, fully connected layer and output layer (Zhan et al. 2016), which are composed of one or several two-dimensional images of the same size.

CNN Training Process

For CNN model training, a supervised learning method is used, which requires a sufficiently large labelled training sample set and additional convolutional kernel parameters. Weights are fine-tuned through back-propagation until a convolutional neural network model with high detection performance has been trained.

The training process can be described as follows: (1) Select the training samples and validation samples. (2) Initialize the number of model layers and the parameters. Set the number of model layers and the number of filters, and initialize the weights, bias and learning rate. (3) The feature is extracted by convolutional and pooling in forward propagation. Then, the classification results are obtained. (4) Calculate the error and residuals by outputting the values and label values of the training sample classification results. Moreover, adjust the weights and bias using certain methods. (5) Determine whether the validation sample error rate is in line with the user’s requirements: if not, return to step (2) and re-adjust the parameters; otherwise, the training ends, as the network optimal model has been obtained (Dahl et al. 2010).

Combined Ship Detection Algorithm of CFAR and CNN

Specific Scheme of Algorithm

The CNN model is architecture with good effect and extensive application in pattern recognition by deep learning. Training deep, multi-stage architectures using supervised gradient back propagation requires many labeled samples. The CNN model extracts features from graph samples in forward propagation; meanwhile, in back propagation, it learns to correct the templates in forward propagation to improve ship recognition. Satellite remote sensing technology has developed rapidly, and the amount of remote sensing data and the recognition accuracy have greatly increased. Remote sensing data have characteristics of “big data”, which provides the possibility for the application of the CNN model to remote sensing image ship detection.

To improve the accuracy of maritime SAR image detection, in this paper, CFAR global detection and the CNN model are used for ship detection. The overall design process consists of the following steps: First, obtain the training and validation samples of the ship and construct a CNN optimal network model with small error rate. Second, detect the entire SAR image. The distribution of the histogram is used to obtain simple statistics. Moreover, the CFAR value is set, and the global threshold is obtained for selecting the target index value. Finally, local detection is performed by using the trained CNN optimal model to obtain the classification results of ship and sea clutter. The final detection results are displayed. The design process is illustrated in Fig. 1.

Fig. 1
figure 1

Ship detection scheme based on CFAR and CNN

Thus, the main three blocks of ship detection which combines CFAR with CNN are training the CNN network model, CFAR global detection, and CNN local detection; details are as follows.

Training the CNN Network Model

To build a complete CNN model, it is necessary to determine the training and validation samples, the number of model layers and the pooling methods according to the nature and properties of the SAR imagery.

Obtain Training and Validation Samples

The CNN model can extract potential information on the spatial distribution of the ship pixel grey values in remote sensing images to identify ships accurately (Chen et al. 2012; Park et al. 2013; Wang and Liu 2012; Yin et al. 2012). Remote sensing image data are composed of pixel grey values, and the ship characteristics are mainly reflected by the spatial distribution information of grey values of different pixels, such as the distribution relationship between ship pixels and the regional distribution relationships of ships and clutter. The method for extracting these features is to capture a remote sensing image that includes a complete ship and use it as an input-layer feature image.

In general, there are three ship types based on length range: a large wheel measures 200–300 m, a medium-sized ship measures 30–100 m, and a small fishing boat measures approximately 20 m. Assuming that the maximum length of a ship is 300 m and the resolution of an experimental SAR image is R metres, the equation of the maximum pixel length Lmax of the ships in the experimental images is as follows:

$$ \begin{array}{*{20}c} {L_{max} = \frac{300}{R} } \\ \end{array} $$
(1)

An SAR image contains pixels of sea clutter and ships, and the purpose of this paper is to detect the ships. Therefore, the size of the selected input layer feature image must be larger than the maximum pixel length of a ship, so that all ship information can be fully contained.

If the resolution of the SAR image is 3 m, as calculated by Eq. (1), the maximum pixel length of the ship is 100; if the resolution is 4 m, the maximum pixel length of the ship is 75. Thus, the pixel lengths of the ship at 3 and 4 m are similar. The SAR images with the two resolutions can be sampled simultaneously to build the same CNN model. Therefore, it is most appropriate to extract an area with a pixel size of 90 × 90 in the image as a training and validation sample so that the ship can be kept within the sample.

In essence, the purpose of this paper is to classify sea clutter and ships in SAR images. Thus, we need to extract two kinds of samples from the SAR images: negative samples and ship samples, as shown in Fig. 2.

Fig. 2
figure 2

Training and test samples

In this paper, four large-sized maritime SAR images with resolutions of 3 and 4 m were selected. In addition, 180 training and validation samples were extracted from these SAR images, including 120 sea clutter samples and 60 ship samples.

Set the Number of Model Layers

The setting of the number of hidden layers in the model depends on the size of the input feature map: the larger the input layer is, the higher the number of hidden layers that can be used. Theoretically, when there are more hidden layers, more features are extracted; however, the error rate of the model is not always smaller, so it is necessary to train the CNN model many times on selected samples to determine the most suitable number of hidden layers.

In the process of CNN model training, the forward propagation, back propagation and weight updating are repeated several times to obtain the mean square error curve of the training samples. By comparing the mean square error curves for different numbers of hidden layers, the number of hidden layers with the smallest error rate can be determined. In this paper, the size of the selected training samples is 90 × 90, and 5, 6 and 7 hidden layers are used to train the input layer. The training results are shown in Fig. 3.

Fig. 3
figure 3

Error curves for different numbers of hidden layers

According to Fig. 3, when six hidden layers are used, the mean square error curve declines most rapidly and the final error rate is the smallest, so six is the optimal number of hidden layers for training the CNN model.

Select the Pooling Method

Ship detection using the CNN model is based on the distribution of ship sample pixel grey values. The pixels of a ship target should be clustered and highlighted; in other words, the ship can be extracted more accurately by taking the average, so it is appropriate to use average pooling to train the CNN model for ship detection.

CFAR Global Detection

Global detection mainly focuses on the whole SAR image, and the proposed approach can obtain the index matrix, including all ship target points. The implementation process is as follows.

First, the histogram probability distribution function f(x) is obtained from the histogram statistics of all clutter points in the SAR image. The first-level CFAR value (denoted as Pfa) is set, and the global threshold Tlow can be obtained by solving Eq. (2), the discrete form is presented in Eq. (3). Then, the grey values and the global threshold of each pixel in the SAR image are compared: if the grey value of the pixel T is greater than Tlow, it is considered to be different from the surrounding clutter points, classified as a ship target, and set to 1; otherwise, it is classified as marine clutter and set to 0. The decision rule is presented in Eq. (4).

$$ P_{fa} = \int\limits_{T}^{\infty } {f(x)dx} . $$
(2)

The discrete form is:

$$ \int\limits_{T}^{\infty } {f\left( x \right) dx} = \mathop \sum \limits_{i = 1}^{n} f\left( {x_{i} } \right)\Delta x_{i} . $$
(3)

The decision rule is:

$$ T\left\{ {\begin{array}{*{20}l} { > T_{low} } \hfill & {\text{ship target}} \hfill \\ { < T_{low} } \hfill & {\text{marine clutter}} \hfill \\ \end{array} } \right.. $$
(4)

When all pixels are compared, all target pixels that are different from the surrounding clutter points in the SAR image are detected, and the index matrix is obtained.

CFAR Local Detection

After training the CNN network model and performing global detection, the optimal network model and a target pixel index matrix that is suitable for ship detection in SAR images are obtained. Now, it is necessary to obtain the detection area from the SAR image by CNN local detection to classify the detected object. The method for obtaining the detection area is as follows: the pixel window with a size of 90 × 90 whose centre is the ship target point index is identified in the SAR image global detection result and taken as a detection image to determine whether the target point is a ship. A schematic diagram is shown in Fig. 4.

Fig. 4
figure 4

Acquiring the detection area by the target index

The detection feature images are input into the trained optimal CNN model, and the ships and non-ship objects such as sea clutter, side lobes, RFI noise, and wakes, are distinguished. The classification results are integrated into a single image, the ships are detected according to the areas of the four connected regions, and the final ship results are highlighted in the image.

Results and Analysis

Results of CNN Model Training

Image to be Detected

Figure 5 shows an SAR image to be detected. Its size is 1637 × 1776 pixels and its resolution is 3 metres. It was captured by the high-resolution radar satellite COSMO_SkyMed with the HH polarization method in 2007. It uses the coordinate system GCS_WGS_1984, with geographical coordinates of 38.84° for latitude and 121.87° for longitude. There are 17 ship targets, according to manual interpretation. The distribution of the surface image is complex, and the angular reflections of the ship trajectories cause some relatively bright sea clutter, which is easy to detect incorrectly; in other words, the accuracy of detection is low.

Fig. 5
figure 5

Maritime SAR image

Results of Sample Selection

Many training samples are needed in CNN model training to obtain a high-performing detection model. Due to the limited resources and the very few samples, in this paper, the method of rotating the sample by 60° increments is used to increase the number of samples; in this way, one sample is turned into six different samples. The numbers of training and validation samples are listed in Table 1.

Table 1 Information on the training and test samples

Results of Model Construction

According to the training samples, the validation samples and the layers and related parameters of the CNN model, the CNN model with the minimum error rate is obtained, and the model construction and training processes are shown in Fig. 6 (in which c-layer represents a convolutional layer, s-layer represents a sampling layer, and f–c-layer represents a fully connected layer). The model consists of eight layers: an input layer, an output layer and six hidden layers. The input layer consists of a 90 × 90-pixel image of a ship and a negative sample, and the six hidden layers consist of three convolutional layers, two pooling layers and a fully connected layer. The convolutional kernels of the three convolutional layers are 4, 8, 8; their sizes are 11 × 11, 7 × 7, and 4 × 4, respectively; the pooling sizes of the two pooling layers are 4 × 4 and 2 × 2; the number of batch training samples is 2; the number of iterations is 1; and the learning rate is 1.

Fig. 6
figure 6

CNN optimal model for ship detection

Result of CFAR Global Detection

The COSMO-SkyMed satellite image that is shown in Fig. 7 is used as the detection object. Global detection is carried out using a detection algorithm with CNN and CFAR. Pfa is set to 4e−3, and 11,591 target points are selected. The global detection result is shown in Fig. 7. Finally, the target index matrix is obtained.

Fig. 7
figure 7

Result of CFAR global detection

CFAR global detection has lower computational cost. Additionally, most sea clutter can be removed quickly, so all ship pixels and a few high-pixel sea clutters are detected. As shown in Fig. 7, all ship pixels are in the SAR image, along with sea clutter and ship trajectories. Thus, CNN local detection is needed for further filtering. Moreover, the ship pixels can be detected by CFAR global detection more accurately.

Final Detection Result

In this section, a maritime SAR image is detected by the multilevel CFAR algorithm based on multithreading and by an algorithm that is based on CFAR and CNN. The quality factors and running times of the two algorithms are compared and analysed according to the final detection result, and the performance of the algorithm is judged.

The multi-level CFAR detection algorithm based on multithreading (Wang et al. 2017) combines global detection and local detection; in addition, multi-threaded parallel computing technology is utilized. The specific process can be described as follows: First, global detection is performed on the whole SAR image, and the histogram statistics is used. Additionally, the first-level CFAR value is set, the global threshold is calculated, and the target index matrix is obtained. Next, local detection is performed with a sliding window, the possible ship target points that have an impact on the background statistics are eliminated in the background clutter window, the K statistical distribution is used, the second-level CFAR value is set, and the local threshold is calculated. Moreover, the detection speed is improved by using multithreading. Finally, the four connected areas are calculated for the target points that were obtained in the local detection process, and potential ships are selected according to the smallest ship area.

Criteria for Detection Performance

The merits of each algorithm are determined by the accuracy and efficiency of the final detection results. For the target detection, the false-alarm rate, the detection rate and the quality factor figure of merit (FOM) are the key indices that are used to judge the accuracy of the algorithm. In addition, the time that the algorithm takes to detect the image is an important index of the algorithm’s detection efficiency.

False-alarm rate equation for target detection:

$$ P_{fa} = \frac{{N_{FP} }}{{N_{TN} + N_{FP} }} \times 100\% . $$
(5)

Equation for the target detection rate:

$$ P_{T} = \frac{{N_{TP} }}{{N_{TN} }} \times 100\% . $$
(6)

Equation for the quality factor FOM:

$$ FOM = \frac{{N_{TP} }}{{N_{FP} + N_{TN} }} \times 100\% . $$
(7)

where NTP represents the number of correctly detected targets, NFP represents the number of false-alarm targets, and NTN represents the number of actual targets, which is determined by visual interpretation.

The smaller the false-alarm rate is, the higher the detection accuracy is; the closer the target detection rate and the quality factor are to 1, the higher the detection accuracy is; and the less time that it takes for the algorithm to detect the image, the higher the efficiency of the algorithm is. Only when the detection accuracy and efficiency of the algorithm are high can the detection performance be optimal.

Comparison and Analysis of Detection Results

The global detection mode of the multi-level CFAR algorithm based on multithreading is the same as that of the ship detection algorithm combining CNN and CFAR. For local detection, the global matrix index is used as the centre, a sliding window is established, Pfa2 is set to 1e−5 when the threshold is solved, and the detection results are integrated together.

Finally, the ships are identified according to the statistics of four connected areas for the detection results of the two algorithms. The final detection results are shown in Fig. 8, in which the red boxes indicate the correctly detected ships, and the blue circles indicate the falsely detected or missed ships.

Fig. 8
figure 8

Detection results of two algorithms. a Detection results of the multithreaded and multi-level CFAR algorithm, b detection results of the algorithm that combines CFAR and CNN

The detection results of the algorithm that combines CFAR and CNN, and those of the multithreaded and multi-level CFAR algorithm are compared in Table 2.

Table 2 Comparison of the results of two detection algorithms

Therefore, for the SAR image with complex distribution, there are false alarms and false detections by the two algorithms. Thus, the quality factor does not reach 1 for either algorithm. However, the detection speed of the multi-threaded and multilevel CFAR algorithm is faster, but it is not particularly accurate; for the algorithm that combines CFAR and CNN, the detection speed and precision are significantly higher than those of the multilevel CFAR algorithm, so the algorithm that combines CFAR and CNN is more suitable for application to the ship detection system.

Conclusion

With the development of satellite remote sensing, satellite remote sensing data have characteristics of “big data”, and deep learning is currently the most effective method for dealing with these big data. To explore ways to improve accuracy and speed of ship detection with big data, this paper focuses on the typical CFAR detection algorithm and deep learning techniques. Moreover, an algorithm that combines CFAR and CNN is realized. The final detection results show that on a marine satellite remote sensing image of complex distribution, compared with the multi-threaded and multilevel CFAR algorithm, this algorithm is more accurate, has faster detection speed, and obtains an ideal detection result. Thus, it is more suitable for the ship inspection system.