1 Introduction

Tropical cyclone (TC), as one of the most violent phenomena of air-sea interaction, often brings disastrous storm surges and flooding and causes significant damage to human life, agriculture, forestry, fisheries, and infrastructure. Therefore, the knowledge of TC track, intensity, structure, and evolution is required to guide severe weather forecasting and risk assessment. Generally, the formation of TCs needs the support of dynamic environmental conditions and thermodynamically favorable environmental conditions [19]. Because only a small percentage of convective disturbances are developing into TCs, it is still challenging to predict the TC formation accurately.

Since the Dvorak Technique (DT) was proposed and developed [8, 21, 28], it has been wildly used in TCs intensity estimation [14, 22, 25, 29] and TC formation prediction [6, 20, 32, 33]. However, DT is based on the infrared technique, whose observations may be obscured by significant convection or cirrus clouds. In contrast, microwave radiation images can capture the strong convective areas and cloud organization. Therefore, it is potential to predict the formation of TCs with microwave remote sensing data.

With the advancements in high-performance computing, machine learning methods based on big datasets are wildly used in tropical cyclogenesis detection of TC formation. Based on the decision tree method, a series of classification rules are constructed to predict future tropical cyclone (TC) genesis events, and the overall prediction accuracy is 81.72% [35]. Using the dataset established with WindSat wind products, [23] established a classification model for tropical cyclogenesis detection. The validation shows that the model produced a positive detection rate of approximately 95.3% and a false alarm rate of 28.5%. This study confirmed the potential of microwave remote sensing observation in detecting typhoon formation [23]. Recently, based on the internal structure information of tropical cyclones obtained by satellite remote sensing, [27] effectively improved the prediction accuracy of the rapid enhancement process of tropical cyclones and reduced the false alarm rate using the machine learning method. Moreover, [13] compared different machine learning algorithms’ TC formation detection performance. Their results prove that the machine learning method performs better than the traditional linear discriminant analysis.

However, with the continuous accumulation of remote sensing data, traditional machine learning methods cannot deal with massive data perfectly. Fortunately, powerful deep learning has demonstrated its more significant superiority over traditional physical or statistical-based algorithms for image information extraction [18]. In ocean remote sensing applications, the deep learning methods are used in hurricane intensity estimation [7, 24], sea ice concentration prediction [5, 9, 11], sea surface temperature estimation [1, 30] and other fields [10, 26, 37]. A deep learning approach has been proposed to identify tropical cyclones (TCs) and their precursors based on twenty-year simulated outgoing longwave radiation (OLR) calculated with a cloud-resolving global atmospheric simulation [19]. In the Northwest Pacific in the period from July to November, the probability of detection (POD) of the model is 79.9–89.1%, and the false alarm ratio (FAR) is 32.8–53.4%. In addition, this study reveals that the detection performance is correlated with the amount of training data and TC lifetimes.

Although deep learning is increasingly widely used in ocean remote sensing [18], the disadvantages of deep learning are also evident. It requires high computing power and a long training time. Moreover, most deep learning models do not have incremental learning capacity, which means the model needs to be retrained if updating the dataset. In ocean remote sensing, the satellite-based data increases every day, so the size of datasets is expected to expand further to improve the generalization ability and identification accuracy of models. Therefore, the defect of no incremental learning is not friendly to storage resources or model update times. Fortunately, the capacity of incremental learning of the Broad Learning System (BLS) [3] makes it have the potential to be applied in the field of ocean remote sensing. Meanwhile, the BLS is a time-cost-friendly learning strategy due to its flatted network. These advantages can compensate for the disadvantage of its accuracy compared with deep learning, so it has been widely used as soon as it is proposed. Recently, it has successfully been applied in seismic attenuation modeling [16], model updating [17], hyperspectral imagery classification [34], and crack detection [36].

In this chapter, we proposed a tropical cyclogenesis detection algorithm based on Special Sensor Microwave Imager (SSM/I) brightness temperature data. The proposed model based on BLS has three unique features: low hardware requirements, fast computation speed, and incremental learning ability. In Sect. 2, the dataset used in this study is presented. In Sect. 3, the details of BLS are introduced. The experimental results are shown in Sect. 4, and the conclusion is given in Sect. 5.

2 Data Description

The dataset used in this chapter is extracted from the brightness temperature (TB) observations acquired by SSM/I. This series of instruments is carried onboard Defense Meteorological Satellite Program (DMSP) near-polar orbiting satellites. The SSM/I is a conically scanning sensor that measures the natural microwave emission from the Earth in the spectral band from 19 GHz to 85 GHz with different polarization (See Table 1). The parameters derived from these radiometer observations include surface wind speed, atmospheric water vapor, cloud liquid water, and rain rate [31]. Comparing the feature of TB images in different channels/polarizations, the 37 GHz H-polarization (37H) channel is selected due to its clear description of the features of disturbances and tropical cyclones.

Table 1 Channel characteristics of SSM/I
Fig. 1
figure 1

Qualified samples: a valid TC samples and b valid non-TC samples

Fig. 2
figure 2

Unqualified samples: a invalid TC samples and b invalid non-TC samples

To collect the sample images covered TCs or non-developed disturbances (non-TC), the TC best tracks and tropical cloud cluster (TCC) tracks during 2005–2009 are used as auxiliary data. This information can be obtained from the International Best Track Archive for Climate Stewardship dataset (IBTrACS) [15] and Global Tropical Cloud Cluster dataset [12], respectively. The time resolution of these two datasets is three hours. Note that not all the best track records in the TC evolution period are used, but those during the TC formation period are selected. Specifically, the time when the TC maximum wind speed reaches 25 knots for the first time is defined as the starting time. Then, the 72 h after this time is defined as the TC formation period [23]. For the TCC tracks, only the records that have not developed into TCs are selected. The preprocessing steps for extracting the TC and non-TC images are described as follow:

  1. (1)

    For each TC/non-TC track record, determine the matching SSM/I TB data within the absolute time difference of 1.5 h.

  2. (2)

    Take the track record as the image center position, and extract the sub-images with the size of \(8^{\circ }\) \(\times \) \(8^{\circ }\) from the SSM/I TB observations.

  3. (3)

    The sub-images with more than 60% non-empty pixels are retained as qualified samples (see Fig. 1); otherwise, the invalid data will be excluded (see Fig. 2).

Following the above steps, 880 TC samples and 6268 non-TC samples were obtained from the SSM/I observations in 2005–2009. Due to the significant number difference between the two samples, only 2506 non-TC samples in 2005–2006 and 880 TC samples in 2005–2009 were selected to form the final dataset. Each sample is in the size of \(224\,\times \,224\) pixels with RGB channels. Finally, These two datasets are randomly divided into the training set and the testing set in the ratio of 4:1, respectively.

3 Broad Learning System for Tropical Cyclogenesis Detection

Once the dataset is established, the tropical cyclogenesis detection can be executed with the broad learning system. In contrast to deep learning methods, the BLS provides a time-cost-friendly learning strategy due to its flatted network. The main structure of BLS consists of the input layer, node layer, and output layer. Specifically, the node layer includes the feature nodes and enhancement nodes. Generally, the input data is mapped to feature nodes with random weights. Then, the feature nodes are further mapped to enhancements with new random weights. Finally, the final weights of BLS can be trained by estimating the output data with these feature nodes and enhancement nodes. Figure 3 shows the architecture of this study, the definition of variables are: X is the input data, F is the feature node, and E is the enhancement node. Y is the respective classification labels of the input data X. The details of BLS are presented as follows.

Fig. 3
figure 3

The architecture of BLS

3.1 Broad Learning Model

Assume that the input data is X, so the feature vector F mapped with random weight can be described as

$$\begin{aligned} F_{i} = \phi (XW_{e_{i}} + b_{e_{i}}),i = 1,\ldots ,n \end{aligned}$$
(1)

where \(F_{i}\) is the i-th feature node, \(W_{e}\) and \(b_{e}\) are the random weights and biases with the proper dimensions, respectively. Denote \(F^{n} \equiv [F_{1},\ldots ,F_{n}]\), which is the concatenation of all the first n groups of mapped features. Then, the enhancement nodes can be given by:

$$\begin{aligned} E_{m} = \xi \left( F^{n}W_{h_{m}} + b_{h_{m}} \right) \end{aligned}$$
(2)

where \(E_{m}\) is the m-th enhancement node, \(W_{h}\) and \(b_{h}\) are the random weights and biases with the proper dimensions, respectively. Similarly, the concatenation of all the first m groups of enhancement nodes are denoted as \(E^{m} \equiv [E_{1},\ldots ,E_{m}]\). Therefore, the broad model can be represented as the equation of the form

$$\begin{aligned} \begin{aligned} Y&= \left[F_{1},\ldots ,F_{n}\ | \ \xi \left( F^{n}W_{h_{1}} + b_{h_{1}} \right) ,\ldots ,\xi \left( F^{n}W_{h_{m}} + b_{h_{m}} \right) \right]W^{m}\\&= \left[F_{1},\ldots ,F_{n}\ | \ E_{1},\ldots ,E_{m} \right]W^{m}\\&= [F^{n}|E^{m}]W^{m} \end{aligned} \end{aligned}$$
(3)

where \(W^{m} = {[F^{n}\ |{\ E}^{m}]}^{+}Y\) are the connecting weights for the broad structure to be computed and \({[F^{n}\ |{\ E}^{m}]}^{+}\) is the pseudo-inverse of \([F^{n}\ |{\ E}^{m}]\). In a flatted network, pseudo-inverse can be considered a very convenient approach to solving the output-layer weights of a neural network. However, a straightforward solution is too expensive, especially when the training samples and input patterns suffer from high volume, high velocity, and/or high variety [4]. Under this situation of the expensive cost for directly computing the pseudo-inverse, the solution can be approximated by ridge regression:

$$\begin{aligned} A^{+} = {[F^{n}\ |{\ E}^{m}]}^{+} = {(\lambda \text {I} + [F^{n}\ |{\ E}^{m}]{[F^{n}\ |{\ E}^{m}]}^{T})}^{- 1}{[F^{n}\ |{\ E}^{m}]}^{T} \end{aligned}$$
(4)

where \(\lambda \) is the regularization parameter. Finally, the model weights are given by

$$\begin{aligned} W = A^{+}Y \end{aligned}$$
(5)

During the BLS computation, it should be noted that the number of enhancement nodes is a hyperparameter (N3), and the number of feature nodes is the combination of two hyperparameters: the number of feature windows (N1) and the number of nodes in each feature window (N2). Here, the Bayesian optimization method is used to find the optimal model hyperparameters, and it can be easily executed with the Hyperopt package [2].

3.2 Incremental Learning of BLS

When a deep learning model works not well, the number of convolutional kernels or the number of convolutional layers will increase. This will lead to expensive computation and long time costing. However, BLS usually uses incremental learning to address the low model accuracy caused by insufficient mapping nodes. Generally, the incremental learning part can improve the model performance. There are two ways to expand the broad structure: (1) increment of enhancement nodes and feature nodes, and (2) adding input data.

3.2.1 Increment of the Feature Nodes and Enhancement Nodes

Assume that the initial BLS has n feature nodes and m enhancement nodes that is \(A = [F^{n}\ |{\ E}^{m}]\). In the adding process, the (n+1)-th feature node is given by:

$$\begin{aligned} F_{n + 1} = \phi \left( XW_{e_{n + 1}} + b_{e_{n + 1}} \right) \end{aligned}$$
(6)

So that the corresponding enhancement node to this feature node is given by:

$$\begin{aligned} E_{\text {ex}_{m}} = [\xi \left( F_{n + 1}W_{\text {ex}_{1}} + b_{\text {ex}_{1}} \right) ,\ldots ,\xi \left( F_{n + 1}W_{\text {ex}_{m}} + b_{\text {ex}_{m}} \right) ]\end{aligned}$$
(7)

Then, there are additional p enhancement nodes added to the BLS structure, and the (m+1)-th enhancement node is given by

$$\begin{aligned} E_{m + 1} = \xi \left( F^{n}W_{h_{m + 1}} + b_{h_{m + 1}} \right) \end{aligned}$$
(8)

Therefore, the final node layer matrix is combined as

$$\begin{aligned} A^{'} = [A \left| {\ F}_{n + 1\ } \right| E_{\text {ex}_{m}}\ |\ E_{m + 1}]\end{aligned}$$
(9)

Then, the pseudo-inverse of \(A^{'}\) is computed with

$$\begin{aligned} (A^{'})^{+} = \begin{bmatrix} A^{+} - DB^{T} \\ B^{T} \\ \end{bmatrix} \end{aligned}$$
(10)

where \(D = {(A)}^{+} [{\ F}_{n + 1\ } \left| \ E_{\text {ex}_{m}}\right| \ E_{m + 1}]\),

$$\begin{aligned} B^{T}= {\left\{ \begin{array}{ll} C^{+} &{} C \ne 0 \\ (1+D^{T}D )^{- 1}B^{T}A^{+} &{} C = 0 \end{array}\right. } \end{aligned}$$
(11)

and \(C = [{\ F}_{n + 1\ } \left| \ E_{\text {ex}_{m}}\ \right| \ E_{m + 1} ]- \text {AD}\). Finally, the new weights are

$$\begin{aligned} W^{'} = \begin{bmatrix} W - DB^{T}Y \\ B^{T}Y \\ \end{bmatrix} \end{aligned}$$
(12)

As seen in Eq. 12, the updated weights consist of the initial and the new parts. There is no need to re-calculate the pseudo-inverse for the whole nodes but only compute the added nodes.

3.2.2 Increment of the Input Data

The increment of feature nodes and enhancement nodes mentioned above are for the fixed dataset. However, in most learning models, the input dataset is the core factor influencing the prediction accuracy. As for deep learning, once some new data is added to the training dataset, the existing model needs to be retrained. This is time-consuming and reduces the timeliness of model updating and application. Fortunately, there is no need to retrain the whole model for BLS after adding input data. The BLS will train only the added ones.

Denote \({\ X}_{a\ }\) as the new inputs, the respective increment of mapped feature nodes and enhancement nodes are:

$$\begin{aligned} F_{x}^{n} = [\phi \left( X_{a}W_{e_{1}} + b_{e_{1}} \right) ,\ldots ,\phi \left( X_{a}W_{e_{n}} + b_{e_{n}} \right) ]\end{aligned}$$
(13)
$$\begin{aligned} E_{m}^{x} = [\xi \left( F_{x}^{n}W_{h_{1}} + b_{h_{1}} \right) ,\ldots ,\xi \left( F_{x}^{n}W_{h_{m}} + b_{h_{m}} \right) ]\end{aligned}$$
(14)

where the \(W_{e_{i}}\), \(W_{h_{i}}\) and \(b_{e_{i}}\), \(b_{h_{i}}\) are randomly generated during the initial BLS. Hence, the updating matrix is

$$\begin{aligned} A^{'} = \begin{bmatrix} A \\ A_{x}^{T} \\ \end{bmatrix} \end{aligned}$$
(15)

where \(A_{x}^{T} = [F_{x}^{n}\ |\ E_{m}^{x}]\). The associated updating pseudo-inverse could be deduced as follows:

$$\begin{aligned} W_{n}^{x} = W + (Y_{a} - A_{x}^{T}W)B \end{aligned}$$
(16)

where \(Y_{a}\) are the respective labels of additional \(X_{a}\). Similar to the two increment processes mentioned above, only the pseudo-inverse associated with new inputs is calculated. It greatly improves the update speed of the BLS model.

4 Results

Before training the model, one should determine the hyperparameters first. For the basic BLS, the hyperparameters include the number of feature windows (N1), the number of nodes in each feature window (N2), and the number of enhancement nodes (N3). To compare the training time and model accuracy, the classical ResNet50 model is also applied to the same dataset. For this ResNet50 model, the initial learning rate is 10–3, the multiplicative factor of learning rate decay is 0.5, the batch size is 16, and the number of epochs is 20. Before training the ResNet50 network, all input images were resized to \(224\times 224\). To fairly compare the training time of these two networks, the training tasks were operated on a computer with an Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz and 64GB RAM.

4.1 Basic BLS Results

Using the Hyperopt, the hyperparameters are optimized as: \(N _{1} = 5, N _{2}= 24, N _{3} = 2332\). The training and testing accuracies and training time of these two methods are shown in Table 2. The testing accuracy of BLS is 86.83%, which is slightly lower than the 91.88% of ResNet50. On the other hand, though the ResNet50 is operated with an accelerating GPU, the training time of 2090.45 s is still 20 times than 60.52 s of BLS. Therefore, BLS has obvious advantages in computational efficiency, but it is not as accurate as the deep learning network because it is insensitive to the image features. Furthermore, we compared the hit rate (HR) and false alarm rate (FAR) of these two models. Table 3 lists the HR and FAR for the training and testing processes. The testing HR and FAR are 81.14 and 11.18%, respectively. Compared to the 79.9–89.1% of HR and 32.8–53.4% of FAR of existing deep learning research [19], our results are competitive. But it should be noted that the size of the dataset used in [19] is 50000 TCs and 500000 non-TCs, which is significantly larger than our dataset.

Table 2 The results for BLS and ResNet50
Table 3 HR and FAR of BLS and results of [19]

In contrast to the massive number of hyperparameters, the BLS only has primary hyperparameters to influence the prediction accuracy. To know the prediction performance of BLS influenced by different combinations of these three hyperparameters, we trained the same dataset several times, and the results are listed in Table 4. The combination of the number of feature nodes and enhancement nodes is essential for the accuracy of the model. It shows that the more nodes, the longer the training time, but the changing trend of model testing accuracy, HR, and FAR is inconsistent. However, the optimized hyperparameter combination has the highest HR because the optimization process takes the HR as the selection standard. Therefore, It is indispensable to determine these parameters using the optimization algorithm.

Table 4 Prediction accuracy with different hyperparameters

4.2 Incremental Learning Results

The capacity of incremental learning is the prominent superior feature of BLS to most traditional deep learning models. For most deep networks, the structures are fixed once the training process is finished. In contrast, the BLS can be updated by adding new nodes or updating the dataset. There is no need to retrain the whole network, which significantly saves the costing time of model updating. The dataset size is 3386, and we set the initial size as 1386 with the adding input patterns of 400. First, we trained the initial model with the initial samples. Then, the incremental learning method was operated to add corresponding input patterns each time until all training samples were input. During these incremental learning steps, the value of hyperparameters is still as \(N _{1 }= 5, N _{2}= 24, N _{3} = 2332\). The results are shown in Table 5 and note that the testing dataset is unchanged during the incremental learning. The results in the table show that with the size of inputs increasing, the training accuracy grows. For the initial process, the small size of the dataset leads to unreasonable accuracies. Because the test dataset for each incremental learning is the same, the testing accuracy tends to be stable.

Table 5 Prediction accuracy and CPU time of incremental learning

4.3 Case Study: Hurricane Wilma (2005)

Once the model training/testing processes are finished, the model hyperparameters (N1, N2, and N3) and the weights of nodes (feature nodes and enhancement nodes) are fixed. Based on the trained BLS, the prediction task of specific TC cases can be executed. Here, we select Hurricane Wilma (2005) as the study case to validate the effectiveness of the proposed model. Wilma was an extremely intense hurricane over the northwestern Caribbean Sea. It had the all-time lowest central pressure for an Atlantic basin hurricane. According to the statistics, twenty-three deaths have been directly attributed to Wilma, and the total economic losses reached 16 billion to 20 billion dollars. As the best tracks in Table 6 shown, Wilma developed into a tropical storm (TS) from tropical depression (TD) at 06:00 UTC 17 October, and then strengthened into a hurricane (HU) at 12:00 UTC 18 October. We planned to collect the samples from 18:00 UTC 15 October to 15:00 UTC 18 October during the data preparation. However, there were only four qualified samples were retained (see Fig. 4). The corresponding best track records of Fig. 4a–d are 00:00 UTC 17 October, 12:00 UTC 17 October, 00:00 UTC 18 October, and 12:00 UTC 18 October, respectively.

Table 6 The best track information during the formation time of Wilma (2005)
Fig. 4
figure 4

The bright temperature images during the formation of Wilma (2005)

Table 7 lists the prediction results of these four samples. It shows that Fig. 4b is incorrectly classified into a non-TC label, and the remaining three samples are correctly identified. Figure 4a is the first sample captured by SSM/I during Wilma’s formation time, and its correct classification means the proposed model can detect tropical cyclogenesis as early as possible. Figure 4c, d are the observations during Wilma’s mature period, and its TC structure is relatively stable and complete. So it is predictable to obtain the correct results. However, the negative result of Fig. 4b proves that the quality of samples brings uncertainty and error to the model prediction. Specifically, compared with the other three samples, Fig. 4b loses nearly half of the TC system information, which reduces the number of effective pixels and damages spiral or TB distribution characteristics of TCs. This could be the main reason for the poor prediction.

Table 7 Prediction results for the four samples of Wilma (2005)

However, not all samples with missing information will be incorrectly identified, but the core TC system information loss will lead to misclassification. To verify this conclusion, we select the tropical storm Hilda (2009) as the test case. Figure 5 shows Hilda’s four qualified TC samples, and Table 8 lists the corresponding prediction results. It shows that all the samples are correctly classified though parts of information lost in Fig. 5. In particular, the size of the missing part in Fig. 5a, b is similar to that in Fig. 4b, but there is less TC system information in the missing part. The TC structure and TB distribution pattern are not contaminated significantly. Therefore, these two samples can still be correctly identified. All in all, the proposed model can well detect tropical cyclogenesis with high-quality data.

Fig. 5
figure 5

The bright temperature images during the formation of Hilda (2009)

Table 8 Prediction results for the four samples of Hilda (2009)

5 Conclusion

In this study, a tropical cyclogenesis detection model is proposed using BLS. In contrast to the deep network methods, the new model is a lightweight flatted network, leading to lower computation and shorter training time. Meanwhile, the capacity of incremental learning of BLS is consistent with the continuously updated and accumulated remote sensing data. Adding new input data does not need to retrain the whole updated dataset. Based on the dataset consisting of 3386 TB images, the testing accuracy, HT, and FAR of BLS are 86.83%, 81.14%, and 11.18% respectively. This study confirms the applicability of BLS in the binary classification problem in ocean remote sensing. It also proves the possibility of detection of TC formation from satellite microwave TB data.

Although the BLS has shown great power in the classification problem, two defects need to be addressed: (1) the BLS is insensitive to the features of images, which will lead to poor accuracy when the image features are complicated. Inspired by the powerful ability of convolution neural network (CNN) to capture and learn image features, we will add a feature extracting mode before our BLS to improve its image processing ability. (2) The size of the dataset is too small to support the learning requirements perfectly. There are three ways to expand the dataset: One is to add the TB data from other channels (e.g., 19H/V), the other is to obtain samples in a longer period, and the last is to utilize the TB observations from other microwave radiometers (e.g., Microwave Imager onboard FengYun series satellites).