1 Introduction

Ship detection is very important to marine transportation [5]. Space borne Synthetic Aperture Radar (SAR) has been one of the most critical data source for ship detection because it can penetrate the clouds and track objects in all kinds of weather [28]. In marine applications, ship recognition from SAR imagery has long been a hotspot [4, 9, 19]. With the advancement of image analysis technology, SAR images can be used to derive more detailed ship information [8]. The size of a ship provides basic information for ship classification [11]. And the size information can provide useful information for ship classification. The intricate geometric parameter estimate is also a part of the interpretation of SAR image. A method for extracting ship size that is both efficient and precise will bring a new concept for SAR image interpretation.

Ships, in general, are metallic objects that may reflect SAR sensor electromagnetic radiation significantly more strongly than the surrounding ocean. On SAR images, one ship can be identified as a bright back scattering intensity target with high normalized radar cross-section (NRCS) values. The minimum bounding rectangle (MBR) is a geometric characteristic of the ship’s NRCS that offers a preliminary size for determining a ship’s ground size. In the meantime, the ship’s superstructure, sea-ship interaction, and imaging conditions all have an effect on the NRCS. Li et al. [11]. These factors lead to a large gap between the initial size and the ground size. Figure 1 shows several examples of ship’s signature on SAR images, the size of the MBR, and the ground size of the ship. The MBR is labeled by visual interpretation. The difference between the MBRs and the ground size appears to be clear. As a result, precisely extracting ship size from SAR images is difficult.

Fig. 1
figure 1

Examples of ships on SAR images. a/d/g/j ship signature on SAR image; b/e/h/k labeled MBR of ship signature and the MBR’s size; c/f/i/l the ship’s ground size

2 Traditional Methods

2.1 Typical Procedure of Traditional Methods

The majority of classic techniques for extracting ship size from SAR images have three stages (Fig. 2): (1) binarization, (2) initial size extraction, and (3) accurate size estimation. Binarization divides the pixels in the SAR image into two groups: ship signatures and non-ship signatures. The binary result is then converted into an MBR in the second phase. The length and width of the created MBR are used to determine the ship’s starting size. Finally, a regression model is used to determine the accurate ship size using the initial size and other relevant factors such as the maximum and minimum NRCS of the ship signature. Statistical/machine learning (ML) methods, such as linear regression, non-linear regression, and kernel-based methods, are commonly used in regression models.

Fig. 2
figure 2

The procedure of the traditional algorithm for ship size extraction from SAR images

2.2 Representative Traditional Methods

Stasolla and Greidanus [26] used Constant False Alarm Rate (CFAR) to binary the SAR image. CFAR is a common method [21, 29, 30] that separates ship signatures and backgrounds. Further, to extract the ship’s MBR, they used the mathematical morphology method to refine the signature. They adopted the MBR’s length and width as the ship’s final length and width without a third step. They tested their model with 127 available ship samples from Sentinel-1 images. The mean absolute error (MAE) of length is 30 m (relative error 16%), and the MAE of width is 11 m (relative error 37%). In 2018, Li et al. [11] estimated the ship’s size of the OpenSARShip [7]. The ship signature was obtained using a threshold-based approach. They use an image segmentation procedure to refine the ship signature and determine the original ship size. Finally, the gradient boosting model is employed to estimate the accurate ship size. The MAE of the length and width, according to experiments, is 8.80 m (relative error 4.66 percent) and 2.17 m (relative error 7.01%), respectively.

2.3 Issue to be Further Addressed

The accuracy of ship size extraction is improving as years roll on. The standard three-step procedure is quite complicated. Binarization and initial size extraction need advanced image processes in order to meet the next estimation stage [11]. The third stage is similarly difficult [20]. The inaccuracies caused in each stage will add up and eventually compromise the accuracy of the final size extraction. It is possible to build new approaches to increase ship size extraction accuracy and efficiency in the era of big data.

Deep learning (DL), as the cutting-edge AI technology, has made great achievements in computer vision [10].Multiple neural network layers make up a typical DL model. It accepts raw data as input and learns the essential characteristics automatically to perform classification or prediction [25]. End-to-end learning is the term for this process. DL simplifies feature engineering and is well suited to modeling massive data and complex interactions when compared to traditional machine learning. DL has been successfully employed in oceanography, geography, and remote sensing in recent years [12, 13, 22, 24, 31, 32]. DL proposes novel approaches to the problem of estimating the size of a ship.

3 Deep Learning Method

3.1 Ship Detection Based on DL

A deep convolution neural network (CNN) is a subtype of DNN that is made up of CNN layers. CNN-based models have had a lot of success in target detection. Researchers proposed CNN-based ship detection models, such as models based on faster region-based convolutional network (Faster-RCNN) [23], single-shot multi-box detector (SSD) [15], and you only look once (YOLO) [2]. Orientation is an important characteristic of a ship. Several researchers suggested a rotatable bounding box (RBB) to replace the usual non-rotating RBB, such as DRBox [14] and DRBox-v2 [1].

For the ship detection task, DL has become the first choice. DL-based models achieve end-to-end detections with higher accuracy and robustness over conventional models. However, for ship size extraction, there is almost no application of deep learning. Therefore, developing an end-to-end DL model is necessary.

3.2 SSENet: A Deep Learning Model to Extract Ship Size from SAR Images

SSENet is a new end-to-end DL model that replaces the previous three-step process for extracting ship size from SAR data. The model uses DRBox-v2 to create the ship’s RBB from the SAR image and a DNN-based regression model to estimate the accurate ship size. The DNN-based regression model is proposed using a hybrid input and a loss function termed mean scaled square error (MSSE), which considerably increases ship size estimation accuracy.

3.2.1 Overall Structure of SSENet

SSENet’s overall structure consists of three phases (Fig. 3): (1) RBB generation; (2) accurate ship size estimation; (3) MSSE loss calculation and overall model optimization. The SAR chip is used as input in the first stage, which uses a deep CNN model called DRBox-V2 to automatically detect the ship’s RBBs. The RBB with the highest confidence is chosen as the initinal RBB. A DNN model is used in the second stage to estimate ship size. The DNN model takes two types of data as inputs: (1) the initial length, width, and orientation angle, and (2) the SSD feature map. The accuracte ship’s length and width are generated using the DNN model.

Fig. 3
figure 3

Structure of SSENet

3.2.2 Generating RBB for the Ship

The DRBox-v2 is used to generate RBB for the ship [1]. Its input is a \(300\times 300\) pixels SAR image, and its output is a series of RBBs. DRBox-v2 contains two sub-modules: a feature extraction module and an output module. The feature extraction module extracted abstracted features. Here, the VGG16 is employed as the feature extraction module. The VGG16 consists of five feature extraction units. Two stacking CNN layers make up the first feature extraction unit, while a max-pooling layer and two stacking CNN layers make up the others. Each feature extraction unit produces a three-dimensional feature map as its output. Five feature maps named F1, F2, ......, F5 are generated. The number of channels in the F1-F5 feature maps is 64, 128, 256, and 512. The pooling kernel is 2 \(\times \) 2. After on max-pooling layer, the spatial size of a feature map is downscaled as 1/2 size of its original size. As the input SAR image is 300 \(\times \) 300 pixels, the spatial size of F1-F5 feature maps is 300 \(\times \) 300, 150 \(\times \) 150, \(75\times 75\), \(38\times 38\), and \(19\times 19\) pixels.

The output module generates output maps by convolutioning feature maps Of, Fig. 2b. There are two outputs for one SAR image: the confidence of being a ship, as well as the geographic offsets of prior RBBs. A softmax function activates the Of to obtain the confidence output. A sigmoid function activates the Of to obtain the location offsets. Three feature maps (F2, F3, and F4) are fused to generate Of.FPN is used to combine different feature maps. The cross-entropy and the smooth L1 loss [15] are used as the confidences loss and geographic loss for DRBox-v2.

Following the first process, a ship’s candidate RBBs are collected, providing beginning references for the future exact size estimation.

3.2.3 Estimating Ship Size Based on a DNN Model

There are two elements to the DNN model’s inputs, as shown in Fig. 3c. The initial ship size and orientation angle, which are determined from the best RBB and give primary and direct information for correct ship size regression, are the first part. The DRBox-v2 generates a sequence of ship RBBs. As the best RBB, the RBB with the highest confidence value is chosen. The initial ship size is the length and width of the best RBB. Furthermore, the best RBB’s orientation angle is the ship’s orientation angle, as shown in Fig. 4. It has an impact on the SAR image’s ship signature [7, 11]. As the orientation does not distinguish between the bow and the stern of one ship, we transform the angle’s range to (\(-90^\circ \), 90\(^\circ \)].

Fig. 4
figure 4

Illustration of the ship orientation. a Coordinate system; b An example of a ship chip

The other component of the inputs is the feature map derived from the input SAR image. In typical environmental conditions, the ship’s signature in the SAR image reflects the sea clutter. It indicates whether the ship is moving or stationary. During the SAR integration time, a moving target is frequently found in several resolution cells. Smearing and brightness loss in the SAR image are caused by the dispersion of backscattered energy. A moving ship’s signature reveals an azimuth displacement. The SAR system receives the Doppler signal from the scatter in the azimuth direction. A stationary ship’s azimuth position is identical to the azimuth position of a SAR platform. The Doppler shift, on the other hand, has an extra component for a moving ship, resulting in an azimuth change in the ship signature. The environmental conditions during satellite imaging, such as wind fronts, ocean waves, and rain cells, alter the ship’s signature on the SAR image. Under typical conditions, the sea-ship interaction produces a complicated ship motion in the real world and a polarimetric scattering signature with a wide range of polarimetric scattering processes [14, 16, 17]. In reference [11], the relationship between the status of the ship, the surroundings, and the ship’s size has been demonstrated. The abstracted feature map derived from the input SAR image contains the factors stated above. Therefore, the feature map F5 in Fig. 3b is employed as the other component of the input.

F5 is a three-dimensional feature map with 512 19 \(\times \) 19 pixels channels. The input vector contains 184,832 (512 \(\times \) 19 \(\times \) 19) elements, which brings training difficulties for the fully connected DNN regression model. It is necessary to make some transformations to reduce the dimension of F5.

Fig. 5
figure 5

Transforming the feature map F5 as inputs. a F5 feature map. b Compressing F5 in the channel dimension and obtain the F5M. c Compressing F5M in the spatial dimension and obtain F6. d Flattening F6 as one-dimensional input vector

As shown in Fig. 5a ,b, we transform F5 by a CNN layer with 1 \(\times \) 1 \(\times \) N convolutional kernels, obtaining F5M. Compared with F5, the channel number of F5M is reduced from 512 to N, Fig. 4b. F5 is compressed in channel dimension. Then, an S size max-pooling is performed on the new feature map F5M, and a new feature map F6 is obtained, Fig. 5c. The spatial size of the F6 is \(\lceil 19/S\rceil \). The values of N and S are defined by experiments. Finally, F6 is flattened as a one-dimensional feature vector. The flattened vector is concatenated with the initial width, length, and orientation to form the inputs of the DNN model, Fig. 3c.

As shown in Fig. 3c, to perform regression, three hidden NN layers are used. There are 256 neurons in each NN layer. The parameter-tuning experiment produces the number of hidden NN layers and the number of neurons. The rectified linear unit is the activation function of each layer. Two neurons are stacked on the last hidden NN layer to form an output layer. A sigmoid function is stacked one the output layer to transform the estimated values to 0–1 and output the estimated width Wp and the estimated length Lp, Fig. 3c.

3.2.4 Calculating MSSE Loss and Optimizing SSENet

The MSSE loss function is used in the DNN regression model. For most regression issues, the mean square error (MSE) is a commonly used loss function. The definition of MSE is shown in Equation (1): yi represents the ground truth, \(y_{i}^{'}\) represents the prediction value, and N means the number of values to be predicted. The loss value calculated by MSE and the ground truth value have no relation. Assume a ship’s ground length and width are 100 and 50 m, respectively, and the predicted length and width are 80 and 30 m, respectively. Both the length and width MSE values are 400. Because the model is optimized based on loss values, both the length and width losses contribute equally to the model’s optimization. In practice, a ship’s length is much greater than its width. In most cases, the length is more concerning than the width. In order to increase the length estimate accuracy, we hope that the length loss helps to optimize the model more than the width loss.

$$\begin{aligned} MSE = \frac{1}{N}\sum _{i = 1}^{N}\left( y_{i} - y_{i}^{'} \right) ^{2} \end{aligned}$$
(1)
$$\begin{aligned} MSSE = \frac{1}{N}\sum _{i = 1}^{N}{y_{i} \cdot \left( y_{i} - y_{i}^{'} \right) }^{2} \end{aligned}$$
(2)
$$\begin{aligned} Size_{Loss} = MSSE_L + MSSE_W \end{aligned}$$
(3)

MSSE loss function solves the mentioned issue. MSSE incorporates the ship length and width ground truth into the traditional MSE. The ground truth is utilized as a dynamic parameter to scale the square error. The definition of MSSE is shown in Eq. (2): yi, \(y_{i}^{'}\) and N is the number of all samples. The MSSE length and width losses in the example are 40,000 and 20,000, respectively. The loss in length is substantially greater than the loss in width. As a result, the penalty for the model’s length will be increased during the training phase. Therefore, the optimization procedure is more conducive to length estimation. Based on Eq. (2), the loss of length MSSEL and the loss of width MSSEW are calculated. The size loss (SizeLoss) is the summation of MSSEL and MSSEW, Eq.  (3).

Besides SizeLoss, the confidence loss (ConfLoss) and the location loss (LocaLoss) are another two losses calculated in the first stage, Fig. 3b. ConfLoss is the cross-entropy loss, and LocaLoss is the smooth L1 loss [1, 23]. Their definitions are as follow:

$$\begin{aligned} Conf_{Loss} = \sum _{i = 1}^{N}{c_{i}\log {c_{i}^{'} + (1 - c_{i})\log {(1 - c_{i}^{'})}}} \end{aligned}$$
(4)
$$\begin{aligned} Loca_{Loss} = \frac{1}{N}\sum _{i = 1}^{N}{\text {smooth}_{L1}{(x_{i})}{= \left\{ \begin{matrix} 0.5x_{i}^{2},\ \ \text {if}\ \Vert x| < 1 \\ |x| - \ 0.5,\ \ \text {otherwise} \\ \end{matrix} \right. \ }} \end{aligned}$$
(5)

where N is the number of predicted targets, ci is the ground confidence of a sample, \(c_{i}^{'}\) is the predicted confidence of a sample, and xi is the element-wise difference between the ground RBB and the predicted RBB. The three losses, SizeLoss, ConfLoss, and LocaLoss, are added to form the final loss that optimizes SSENet integrally.

3.3 Experiments on SSENet

3.3.1 Experiments Data

The OpenSARShip dataset (http://opensar.sjtu.edu.cn/) is a Sentinel-1 ship interpretation dataset that includes 11,346 SAR ship chips and automatic identification system (AIS) messages. The ground size for each ship is provided via the AIS. The ground range detected (GRD) of IW is the picture mode of Sentinel-1. The spatial resolution of the SAR image is around 20 m, with a pixel spacing of 10 m. SNAP 3.0 performs radiometric calibration and terrain correction. The amplitude values of pixels for VH (vertical emitting and horizontal receiving) and VV (vertical emitting and vertical receiving) polarizations are stored on each SAR chip, which has one ship and two channels. The experiment set for SSENet includes 1,890 samples in the VV mode. Figure 6 shows the distributions of ground ship’s length and width. The length ranges from 28 to 399 m. The width ranges from 6 m to 65 m. Each SAR chip is \(300 \times 300\) pixels in size. We transform the values of SAR images to [0, 255]. The training set consisted of 1,500 SAR chips chosen at random. The remaining 390 chips will be used for testing.

The ground truths for the experimental set include two parts: the ground ship size and the RBB for each ship. The ground size is obtained from the OpenSARShip. The RBB for each ship is labeled manually by a Matlab tool shared in DRBox-v2. The DRBox-v2 is trained to generate accurate RBB based on the ground RBB.

Fig. 6
figure 6

The range of length and width of the testing set

3.3.2 Experiments Setting

A workstation with one GeForce RTX 2070 8GB GPU is used in the experiment. Python 3.6 is the programming language used. TensorFlow is a deep learning package. For training, the batch size is six. 0.0002 is the initial learning rate. The learning rate reduces by half every 5,000 training epochs during the training procedure. When the SizeLoss < 0.001, the LocaLoss < 0.005, and the composite loss < 0.01, the training procedure stops.

MAE and the mean absolute percentage error (MAPE) are employed as metrics. MAE is a typical absolute error, and MAPE is a widely used relative error. Assuming yi is the ground truth, \(y_{i}^{'}\) is the estimation value, and N is the number of samples, the definitions of MAE and MAPE are as follow:

$$\begin{aligned} MAE = \frac{1}{N}\sum _{i = 1}^{N}\left| y_{i} - y_{i}^{'} \right| \end{aligned}$$
(6)
$$\begin{aligned} MAPE\ (\%) = \frac{100}{N}\sum _{i = 1}^{N}\left| \frac{y_{i} - y_{i}^{'}}{y_{i}} \right| \end{aligned}$$
(7)

3.3.3 Performance of SSENet

The hyper-parameters of SSENet are determined by parameter tuning, and a well-trained model is picked up to be evaluated. The 390 samples of the testing set are fed into the well-trained SSENet. The outputs are the scaled lengths and widths estimated by the model. The scaled values are rescaled to normal values.

The estimated ship sizes are shown in Fig. 7a, b. The length and width MAEs are 7.88 and 2.23 m, respectively. The MAEs of the estimated length and width are pushed under 0.8-pixel spacing. The MAPE of estimated length and width are 5.53 and 8.93%, respectively. The R2 score are 0.9773 and 0.9093. This indicates that the estimated ship length/width is quite close to the ground length/width. The R2 score of widths is smaller than that of length, which means the width is difficult to estimate than the length. There are two factors that contribute to this phenomena. A ship’s width is far smaller than its length. The width of the ship’s signature on the SAR image is more ambiguous than the length [26], which causes random errors in the width of the labeled RBB. Second, the MSSE loss function makes the model fit the length better.

Fig. 7
figure 7

Relationships between the SSENet’s estimated size, and the size of the labeled RBB. a and b The relationships between the ground size and the SSENet’s size. c and d The relationships between the ground size and the labeled RBB’s size

We plot the relationship between the labeled RBB’s size and the ship’s ground size, as shown in Fig. 7c, d. The labeled RBB is treated as the RBB closest to the ship’s signature for visual interpretation. As shown in Fig. 7c, the MAE of length is nearly 40 m, and the MAE of width is more than 50 m. The gap between the labeled RBB’s size and the ground size is large. By adding the regression model, SSENet pushes the MAEs under 8 m. Therefore, the proposed regression model based on DNN is necessary and effective. Figure 8 shows some examples of SSENet’s results. The outputs of one sample include the detected RBB, the confidence score to be a ship, and the estimated ship size. For most ship samples, the estimated sizes are consistent with the ground sizes.

Fig. 8
figure 8

Some examples of SSENet, the outputs include the detected RBB, the confidence score to be a ship, and the estimated ship size

3.3.4 Effectiveness of the Inputs

The efficiency of the inputs for the DNN regression model is tested. The results are shown in Table 1. Three compared models employ different inputs. The inputs for SSENet1 include initial ship size, without feature map F6. For SSENet2, the inputs are initial ship size and F6. Based on the three inputs, SSENet3 adds the initial orientation as another input.

Table 1 Model performance with different inputs

The results are displayed in Table 1. SSENet1 obtains the largest MAE and MAPE among the three models. By adding F6, SSENet2 reduces the length’s MAE about 2 m compared with SSENet1. This finding illustrates that the feature map of a SAR image is an important input for estimating ship size. Adding the feature map as an input improves the accuracy of size estimation. Finally, by explicitly including the ship’s initial orientation as another input, the estimation errors are significantly minimized. Therefore, each element of the inputs for SSENet shows contributions to the final size estimation. Figure 8 shows several results of SSENet3, and the red/green rectangle is the labeled/detected RBB. The estimated confidence score to be a ship and estimated the size by SSENet are also displayed.

3.3.5 Effectiveness of MSSE Loss

An experiment is conducted to test the effectiveness of the new loss function, MSSE. The results are shown in Table 2. SSENetMSE is the model with MSE loss. SSENetMSSE is the model with MSSE loss. The other parts of the two models are the same.

Table 2 Performance of with MSE or MSSE

The results are shown in Table 2. The length MAE of SSENetMSSE is nearly 1m less than that of SSENetMSE, reducing by 11%. For the width, SSENetMSSE performs slightly worse than SSENetMSE. The reason for this is that MSSE emphasizes a significant loss and drives the model to focus on length rather than width. The difference in width between the two values, however, is only a few centimeters. The disadvantages of MSSE are not overshadowed by the aforementioned constraint. As a result, our MSSE loss is helpful, particularly when evaluating the ship’s length.

4 Discussions

4.1 ML versus DL

SSENet’s regression model is a DNN model. We choose three typical ML models, Gradient Boosting Regression (GBR) [6], Support Vector Regression (SVR) [3], and Linear Regression (LR) [18] to discuss their performances. GBR and SVR are applied in ship size extraction [8, 11]. LR is a baseline model [27]. Because these three ML models aren’t NN-based, they can’t be combined with the SSD to create an end-to-end model. The SAR images cannot be fed into the three ML models. The inputs for these three models are the initial ship size and orientation of the labeled RBB. The parameters of GBR, SVR, LR are tuned and the estimation results with the best metrics are recorded. The DNN model is used by SSENet.

The results are shown in Table 3. The result of SSENet is in the last row. GBR performs the best among four models (LR, SVR, GBR, and DNN). GBR is an ensemble learning model with good performance in the three-stage procedure [26]. However, GBR is unable to extract features from SAR images automatically. GBR cannot be combined with a DL-based ship detection model, such as DRBox-v2, to create an integrated ship size extraction model. The premise of using GBR is that the SAR image should be binarized accurately, and the initial RBB is well extracted by traditional methods. As stated in Sect. 2, the traditional method faces big challenges. Practically, GBR is not an end-to-end model: feeding the SAR image and obtaining the ship size.

Table 3 MAE and MAPE of different models

The error of DNN model is large. However, a DNN model can be combined with any deep learning models based on CNN or NN to extract size from the SAR image from beginning to end. In contrast to traditional techniques, the DL model optimizes all parameters globally. The DNN regression model can use the feature maps extracted by the DL model to increase the accuracy of the estimated ship size. As shown in Table 3, the SSENet reduces the MAE of length by nearly 2 m compared with the GBR, about 18.68 %. Therefore, compared with traditional methods, the ship size extraction model based on deep learning is more practical.

4.2 Errors’s Sources

This section delves into the details of estimation errors and attempts to determine what causes large inaccuracies. The ship’s direction and transit speed are two elements that need to be investigated, according to previous research [10, 26].

4.2.1 Ship Orientation

The estimated errors with respect to the ship’s orientation angle is displayed. Figure 9a and b show the results of the length. Fig. 9c, d show the results of the width. The MAEs vary with the ship orientation variation. Large MAEs occur when the orientation angles are closer to \(0^{\circ }\) (\(0^{\circ }\) means the azimuth direction) in the range of (\(-45^{\circ }, 45^{\circ }\)]. The reasons for the above phenomenon include two aspects. The first one is the ship motion. When the ship moves in a direction that is near to the azimuth direction, the azimuth direction’s speed component is large. Because of the large component, the ship signature appears to be smeared, increasing the estimation error. The other reason is the unequal resolution during imaging, 5 m \(\times \) 20 m for range and azimuth directions, respectively. The low resolution in the azimuth enlarges the errors [26].

Fig. 9
figure 9

The trend of errors with respect to the orientation angle. a Trend of length’s MAE; b Trend of length’s MAPE; c Trend of width’s MAE; d Trend of width’s MAE

As shown in Fig. 9, when the initial orientation angle (\(cos\theta \)) is added to the DNN model, the errors are reduced. This finding also proves that using the original orientation angle as an input is valid.

4.2.2 Ship Speed

Figure 10 shows the errors corresponding to the ship’s speed. Because the OpenSARShip’s SAR images are mostly from ports, around 83% of the ships are still there. Figure 10a shows that the MAEs are small in the range of (0, 1) knot (1.852 km/h). With the increase of ship speed, the MAE fluctuates slightly. When the speed is greater than 15 kn (27.780 km/h), the MAEs increase apparently: 19.04 and 4.71 m. These two values are far greater than those of other speed intervals. The ship’s speed cannot be derived from the SAR image signature. Therefore, it is difficult to refine the estimated sizes of ships by pre-input the ship’s speed parameter.

Fig. 10
figure 10

Figure 10. The trend of errors with respect to the travel speed. a The trend of MAE; b Trend of MAPE

4.2.3 Ship Size

Figure 11 shows the absolute error (AE) of each estimated and the ground size. The AE of a estimated size takes the absolute value of the difference between the predicted value and the true value. As shown in Fig.s 11a and b, there are no obvious relationships between AE and ground size. Therefore, the ship size is not a source of errors.

Fig. 11
figure 11

The distribution of AE with respect to the ground size. a The AE of length; b The AE of width

5 Conclusions

SSENet, a DL-based model for extracting ship size from SAR data, is proposed in this chapter. A DNN-based regression model and an SSD-based model make up the SSENet. The DNN model is fed the initial ship size and orientation angle derived from the RBB, as well as the high-level features extracted from the input SAR image. The OpenSARShip trains and validates SSENet. Experiments show that: (1) the SSENet can straight extract ship size from SAR images with MAE less than 0.8 pixels; (2) the new MSSE loss reduces the length’s MAE nearly 1 m than the old MSE loss; (3) SSENet shows obvious advantage over the GBR model; (4) SSNet exhibits robustness over four separate data sets.