Identifying Developing Cloud Clusters Using Predictive Features
Forecasters need better data-driven techniques using feature extraction to determine whether a cyclone will develop from a loosely organized cluster of clouds. Prior studies have attempted to predict the formation of tropical cyclones using numerical weather prediction models and satellite and radar data. However, refined observational data and forecasting techniques are not always available or accurate in areas such as the North Atlantic Ocean where data are sparse. In response, this research investigates the predictive features that contribute to a cloud cluster developing into a tropical cyclone without using dynamic models. Instead, it will only use global gridded satellite data which are readily available. Generally, an imbalance occurs in the classification process of cloud clusters since the number of non-developing cloud clusters is greater than the number of developing cloud clusters. Imbalanced data are an essential source of low performance in learning about rare events. To address this issue, the produced cloud cluster feature dataset is balanced by applying the Selective Clustering based Oversampling Technique (SCOT), which addresses data imbalance in a selective manner and can be used in many applications. In this research, the predictive features are identified based on the performance of separating developing and non-developing cloud clusters from the balanced feature dataset when using a standard classifier. The predictive features are identified only if the classification yields a geometric mean of at least 80 % and a Heidke Skill Score of at least 0.8.
KeywordsFeature extraction Imbalanced data Oversampling Tropical cyclone
This work is partially supported by the Expeditions in Computing by the National Science Foundation under Award CCF-1029731.
- Bekkar M, Djemaa HK, Alitouche TA (2013) Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl 3(10):27–38Google Scholar
- García V, Sánchez JS, Mollineda RA, Alejo R, Sotoca JM (2007) The class imbalance problem in pattern classification and learning (pp 283–291). Presented at the Congreso Español de Informática 2007, Zaragoza. Retrieved from http://marmota.dlsi.uji.es/WebBIB/papers/2007/1_GarciaTamida2007.pdf
- Hennon CC (2003). Investigating probabilistic forecasting of tropical cyclogenesis over the North Atlantic using linear and non-linear classifiers. The Ohio State University. Retrieved from https://etd.ohiolink.edu/ap/10?0::NO:10:P10_ACCESSION_NUM:osu1047237423
- Knapp KR, Ansari S, Bain CL, Bourassa MA, Dickinson MJ, Funk C, Helms CN, Hennon CC, Holmes C, Huffman GJ, Kossin JP, Lee H-T, Loew A, Magnusdottir G (2011) Globally gridded satellite observations for climate studies. Bull Am Meteorol Soc 92(7):893–907. doi: 10.1175/2011BAMS3039.1 CrossRefGoogle Scholar
- Lacewell CW, Homaifar A (2014) Identifying predictive features of developing cloud clusters. Presented at the 4th international workshop on climate informatics, Boulder, Colorado. Retrieved from https://www2.image.ucar.edu/event/ci2014/poster20