Abstract
A probabilistic clustering technique is developed for classification of wintertime extratropical cyclone (ETC) tracks over the North Atlantic. We use a regression mixture model to describe the longitude-time and latitude-time propagation of the ETCs. A simple tracking algorithm is applied to 6-hourly mean sea-level pressure fields to obtain the tracks from either a general circulation model (GCM) or a reanalysis data set. Quadratic curves are found to provide the best description of the data. We select a three-cluster classification for both data sets, based on a mix of objective and subjective criteria. The track orientations in each of the clusters are broadly similar for the GCM and reanalyzed data; they are characterized by predominantly south-to-north (S–N), west-to-east (W–E), and southwest-to-northeast (SW–NE) tracking cyclones, respectively. The reanalysis cyclone tracks, however, are found to be much more tightly clustered geographically than those of the GCM. For the reanalysis data, a link is found between the occurrence of cyclones belonging to different clusters of trajectory-shape, and the phase of the North Atlantic Oscillation (NAO). The positive phase of the NAO is associated with the SW–NE oriented cluster, whose tracks are relatively straight and smooth (with cyclones that are typically faster, more intense, and of longer duration). The negative NAO phase is associated with more-erratic W–E tracks, with typically weaker and slower-moving cyclones. The S–N cluster is accompanied by a more transient geopotential trough over the western North Atlantic. No clear associations are found in the case of the GCM composites. The GCM is able to capture cyclone tracks of quite realistic orientation, as well as subtle associated features of cyclone intensity, speed and lifetimes. The clustering clearly highlights, though, the presence of serious systematic errors in the GCM’s simulation of ETC behavior.
Similar content being viewed by others
References
Anderson D, Hodges KI, Hoskins BJ (2003) Sensitivity of feature-based analysis methods of storm tracks to the form of background field. Mon Wea Rev 131(3):565–573
Bernardo JM, Smith AFM (1994) Bayesian theory. Wiley, New York
Blackmon ML, Wallace JM, Lau N-C, Mullen SL (1977) An observational study of the Northern Hemisphere wintertime circulation. J Atmos Sci 34:1040–1053
Blender R, Fraedrich K, Lunkeit F (1997) Identification of cyclone-track regimes in the North Atlantic. Quart J Royal Meteor Soc 123:727–741
Camargo SJ, Robertson AW, Gaffney SJ, Smyth P (2007a) Cluster analysis of typhoon tracks. Part I: General properties. J Climate (in press)
Camargo SJ, Robertson AW, Gaffney SJ, Smyth P (2007b) Cluster analysis of typhoon tracks. Part II: Large-scale circulation and ENSO. J Climate (in press)
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B 39:1–38
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39:1–38
DeSarbo WS, Cron WL (1988) A maximum likelihood methodology for clusterwise linear regression. J Classificat 5(1):249–282
Draper NR, Smith H (1981) Applied regression analysis, 2nd edn. Wiley , New York
Elsner JB (2003) Tracking hurricanes. Bull Amer Meteor Soc 84(3):353–356
Elsner JB, Liu Kb, Kocher B (2000) Spatial variations in major US hurricane activity: statistics and a physical mechanism. J Climate 13:2293–2305
Fraley C, Raftery AE (1998) How many clusters? Which clustering methods? Answers via model-based cluster analysis. Comput J 41(8):578–588
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Amer Stat Assoc 97(458):611–631
Fyfe JC (2003) Extratropical southern hemisphere cyclones: Harbingers of climate change. J Climate 16:2802–2805
Gaffney SJ (2004) Probabilistic curve-aligned clustering and prediction with regression mixture models. Ph.D. Dissertation, Department of Computer Science, University of California, Irvine
Gaffney S, Smyth P (1999) Trajectory clustering with mixtures of regression models. In: Surajit Chaudhuri, David Madigan (eds) Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, August 15–18, 1999. ACM Press, New York, pp 63–72
Gaffney SJ, Smyth P (2003) Curve clustering with random effects regression mixtures. In: Bishop CM, Frey BJ (eds) Proceedings of the 9th international workshop on artificial intelligence and statistics, Key West, FL, January 3–6, 2003
Gneiting T, Raftery AE (2004) Strictly proper scoring rules, prediction, and estimation. Technical Report 463, Department of Statistics, University of Washington
Hack JJ, Kiehl JT, Hurrell JW (1998) The hydrologic and thermodynamic characteristics of the NCAR CCM3. J Climate 11:1179–1206
Hannachi A, O’Neill A (2001) Atmospheric multiple equilibria and non-Gaussian behaviour in model simulations. Quart J Royal Meteor Soc 127(573):939–958
Hartigan JA, Wong MA (1978) Algorithm AS 136: A K-means clustering algorithm. Appl Stat 28:100–108
Hodges KI (1994) A general method for tracking analysis and its applications to meteorological data. Mon Wea Rev 122(11):2573–2586
Hodges KI (1995) Feature tracking on the unit sphere. Mon Wea Rev 123(12):3458–3465
Hodges KI (1998) Feature-point detection using distance transforms: Application to tracking tropical convective complexes. Mon Wea Rev 126(3):785–795
Hoskins BJ, Hodges KI (2002) New perspectives on the Northern Hemisphere winter storm tracks. J Atmos Sci 59(6):1041–1061
Hurrell JW, Kushnir Y, Ottersen G, Visbeck M (2003) An overview of the North Atlantic Oscillation. Geophys Monogr 134:2217–2231
Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J, Zhu Y, Chelliah M, Ebisuzaki W, Higgins W, Janowiak J, Mo KC, Ropelewski C, Wang J, Leetmaa A, Reynolds R, Jenne R, Joseph D (1996) The NCEP/NCAR 40-year reanalysis project. Bull Amer Meteor Soc 77:437–441
Kimoto M, Ghil M (1993) Multiple flow regimes in the northern hemisphere winter. Part II: Sectorial regimes and preferred transitions. J Atmos Sci 16:2645–2673
König W, Sausen R, Sielman F (1993) Objective identification of cyclones in GCM simulations. J Climate 6(12):2217–2231
Lau N-C (1988) Variability of the observed midlatitude storm tracks in relation to low-frequency changes in the circulation pattern. J Atmos Sci 45:2718–2743
Le Treut H, Kalnay E (1990) Comparison of observed and simulated cyclone frequency distribution as determined by an objective method. Atmosfera 3:57–71
Lenk PJ, DeSarbo WS (2000) Bayesian inference for finite mixtures of generalized linear models with random effects. Psychometrika 65(1):93–119
Lwin T, Martin PJ (1989) Probits of mixtures. Biometrics 45:721–732
Mailier PJ, Stephenson DB, Ferro CAT, Hodges KJ (2006) Serial clustering of extratropical cyclones. Mon Wea Rev 134:2224–2240
McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New York
McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Mesrobian E, Muntz R, Shek E, Mechoso CR, Farrara J, Spahr J, Stolorz P (1995) Real time data mining, management, and visualization of GCM output. Supercomputing ’94, IEEE Computer Society, pp 81–87
Mo K, Ghil M (1988) Cluster analysis of multiple planetary flow regimes. J Geophys Res 93D:10927–10952
MunichRe (2002) Winter storms in Europe (II). Analysis of 1999 losses and loss potentials. Technical Report 302-03109, 72 pp [Available from Münchner Rückversicherungs-Gesellschaft, Königinstr. 107, 80802 München, Germany]
Preisendorfer RW (1988) Principal component analysis in meteorology and oceanography. Elsevier, Amsterdam
Ramsay JO, Silverman BW (1997) Functional data analysis. Springer, New York
Ramsay JO, Silverman BW (2002) Applied functional data analysis: methods and case studies. Springer, New York
Robertson AW, Metz W (1990) Transient-eddy feedbacks derived from linear theory and observations. J Atmos Sci 47:2743–2764
Ross J. Murray, Ian Simmonds (1991) A numerical scheme for tracking cyclone centres from digital data. Part I: Development and operation of the scheme. Aust Meteor Mag 39:155–166
Saunders MA (1999) Earth’s future climate. Philos Trans Roy Soc Lond A 357:3459–3480
Simmons AJ, Hoskins BJ (1978) The life cycles of some nonlinear baroclinic waves. J Atmos Sci 35(3):414–432
Smyth P, Ide K, Ghil M (1999) Multiple regimes in northern hemisphere height fields via mixture model clustering. J Atmos Sci 56(21):3704–3723
Smyth P (2000) Model selection for probabilistic clustering using cross-validated likelihood. Stat Comput 10(1):63–72
von Storch H, Zwiers FW (1999) Statistical analysis in climate research. Cambridge University Press, Cambridge
Terry J, Atlas R (1996) Objective cyclone tracking and its applications to ERS-1 scatterometer forecast impact studies. In: 15th conference on weather analysis and forecasting, Norfolk, VA. American Meteorological Society
Trenberth KE (1986) An assessment of the impact of transient eddies on the zonal flow during a blocking episode using localized Eliassen-Palm flux diagnostics. J Atmos Sci 43:2070–2087
Vautard R (1990) Multiple weather regimes over the North Atlantic: analysis of precursors and successors. Mon Wea Rev 45:2845–2867
Vrac M, Chedin A, Diday E (2005) Clustering a global field of atmospheric profiles by mixture decomposition of copulas. J Atmos Ocean Technol 22(10):1445–1459
Wang K, Gasser T (1997) Alignment of curves by dynamic time warping. Annal Stat 25:1251–1276
Yiou P, Nogaj M (2004) Extreme climatic events and weather regimes over the North Atlantic: when and where? Geophys Res Lett 31:L07202. doi:10.1029/2003GL019119
Acknowledgments
We wish to thank Kevin Hodges for helpful discussions, and Jim Boyle and Peter Glecker for help in obtaining the NCAR CCM3 data. We are grateful to Kevin Hodges and two anonymous referees for their constructive reviews which substantially improved the paper. The NCEP–NCAR Reanalysis data were provided by the NOAA CIRES Climate Diagnostics Center, Boulder, Colorado, from their Web site available online at http://www.cdc.noaa.go. This work was supported in part by a Department of Energy grant DE-FG02-02ER63413 (MG and AWR), by NOAA through a block grant to the International Research Institute for Climate and Society (SJC and AWR), and by the National Science Foundation under grants No. SCI-0225642, IIS-0431085, and ATM-0530926 (SJG and PS).
Author information
Authors and Affiliations
Corresponding author
Appendix A: Expectation maximization algorithm
Appendix A: Expectation maximization algorithm
The EM algorithm is an iterative maximum likelihood (ML) procedure that provides a general and efficient framework for parameter estimation. At a base level, EM is an approximate root-finding procedure used to seek the root of the likelihood equation by iteratively searching for a set of parameters that maximize the probability of the observed data. EM is primarily used for finding ML parameter estimates in missing- or hidden-data problems. Parameter estimation in hidden-data problems is difficult because the likelihood equation takes on a complex form, often involving an integral or a sum over the hidden data itself.
For example, Eq. (5) in Sect. 3.3 gives the likelihood of ϕ given both Z and T (repeated here):
Notice that the hidden data in this case are the unknown cluster memberships which must be summed-out of the likelihood to arrive at L(ϕ|Z,T). It is understood in hidden-data problems that this operation cannot be easily carried out. The EM algorithm is an iterative two-step procedure used to circumvent this integration (or sum) by (1) indirectly estimating values for the unobserved data, and (2) finding the ML parameter estimates that correspond to the now completely observed data. The new ML estimates from step (2) are then used to re-estimate the hidden data in step (1), and these iterations are continued until some stopping criterion is reached (typically this involves stopping when the change in log-likelihood falls below a particular threshold, and thus the iterations have stabilized).
In the first step, the E-step, we estimate the hidden cluster memberships by forming the ratio of the likelihood of trajectory i under cluster k, to the sum-total likelihood of trajectory i under all clusters:
These w ik give the probabilities that the ith trajectory was generated from cluster k. They represent a posterior expectation for the value of the actual binary cluster memberships (i.e., the ith trajectory was either generated by the kth cluster or it was not).
In the second step, the M-step, the expected cluster memberships from the E-step are used to form the weighted log-likelihood function:
The membership probabilities weight the contribution that the kth density component adds to the overall likelihood. In the case where the w ik are binary, and thus cluster membership is perfectly known, this reduces to the usual fully-observed log-likelihood. This weighted log-likelihood is then maximized with respect to the parameter set ϕ.
For the sake of completeness, we give each of the re-estimation equations below. Let \({{{\bf w}}_{ik} = w_{ik}{{\bf I}}_{n_i},}\) where \({{{\bf I}}_{n_i}}\) is an n i -vector of ones, and let W k = diag(w′1k , ..., w′ nk ) be an N × N diagonal matrix. Then, in the M-step we use W k to calculate the mixture parameters
and the mixture weights
for k = 1,...,K. These update equations are equivalent to the well-known weighted least-squares solution in regression (Draper and Smith 1981). The diagonal elements of W k represent the weights to be applied to Z and T during the weighted regression.
Because most of the difficult work is carried out in estimating the cluster memberships, the maximization carried out in the M-step is straightforward. This is a common attribute of the EM algorithm. Dempster et al. (1977b) showed that under fairly general conditions, the likelihood will never decrease during the E- and M-step iterations. Due to the presence of local maxima on the likelihood surface, the solution is not guaranteed to correspond to a global maximum. However, we can increase the chances of finding the global maximum by running the EM algorithm multiple times from different starting points in parameter space and selecting the parameters that result in the highest overall likelihood.
Rights and permissions
About this article
Cite this article
Gaffney, S.J., Robertson, A.W., Smyth, P. et al. Probabilistic clustering of extratropical cyclones using regression mixture models. Clim Dyn 29, 423–440 (2007). https://doi.org/10.1007/s00382-007-0235-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00382-007-0235-z