Detecting Extreme Events from Climate Time Series via Topic Modeling

  • Cheng TangEmail author
  • Claire Monteleoni


We propose a topic-model-based approach to define and detect patterns corresponding to extreme climate-related events over different regions around the globe from the time series data of various climate variables. While topic models are popular for tasks such as natural language processing, bioinformatics, and computer vision, we are unaware of their applications to modeling climate extremes. Inference from our model can be used to construct climate extreme indices, predict disastrous extreme events such as drought and floods, and understand the influence of climate change on climate extremes.


Climate extremes Extreme events Topic modeling Latent Dirichlet allocation Unsupervised learning 


  1. Agovic A, Banerjee A (2012) Gaussian process topic models. In: Uncertainty in Artificial Intelligence (UAI), 2010. CoRR. abs/1203.3462Google Scholar
  2. Beirlant J, Goegebeur Y, Segers J, Teugels J, De Waal D, Ferro C (2004) Statistics of extremes: theory and applications. Wiley series in probability and statistics. Wiley, HobokenCrossRefGoogle Scholar
  3. Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84CrossRefGoogle Scholar
  4. Blei DM, McAuliffe JD (2007) Supervised topic models. In: Advances in neural information processing systems 20, Proceedings of the twenty-first annual conference on neural information processing systems, Vancouver, 3–6 Dec 2007Google Scholar
  5. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022Google Scholar
  6. Cook KH (2008) Climate science: the mysteries of Sahel droughts. Nat Geosci 1(10):647–648CrossRefGoogle Scholar
  7. Dai A, Trenberth KE, Qian T (2004) A global dataset of palmer drought severity index for 1870–2002: Relationship with soil moisture and effects of surface warming. J Hydrometeorol 5:1117–1130CrossRefGoogle Scholar
  8. Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407CrossRefGoogle Scholar
  9. Dirmeyer PA, Shukla J (1996) The effect on regional and global climate of expansion of the world’s deserts. OJR Meteorol Soc 122(530):451–482CrossRefGoogle Scholar
  10. Qiang Fu, Banerjee A, Liess S, Snyder PK (2012) Drought detection of the last century: an mrf-based approach. In: SIAM SDM, Anaheim, pp 24–34Google Scholar
  11. Rekatsinas T, Ghosh S, Mekaru SR, Nsoesie EO, Brownstein JS, Getoor L, Ramakrishnan N (2013) Forecasting rare disease outbreaks using multiple data sources. In: SIAM International Conference on Data Mining (SDM), 2015, NIPS 2013 workshop on topic modelsGoogle Scholar
  12. Griffiths T, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101:5228–5235CrossRefGoogle Scholar
  13. Gumbel EJ (1954) Statistical theory of extreme values and some practical applications: a series of lectures. Applied mathematics series. U.S. Govt. Print. Office, Washington DCGoogle Scholar
  14. Heffernan JE, Tawn JA (2004) A conditional approach for multivariate extreme values. R Stat Soc B(66):497–547Google Scholar
  15. Hennig P, Stern DH, Herbrich R, Graepel T (2012) Kernel topic models. In: Proceedings of the fifteenth international conference on artificial intelligence and statistics, AISTATS 2012, La Palma, pp 511–519, 21–23 April 2012Google Scholar
  16. Liu Y, Bahadori MT, Li H (2012) Sparse-gev: sparse latent space model for multivariate extreme value time serie modeling. In: Proceedings of the 29th international conference on machine learning, ICML 2012, Edinburgh, June 26–July 1 2012Google Scholar
  17. Managing the risks of extreme events and disasters to advance climate change adaptation. Special Report of the IPCC (2012)Google Scholar
  18. Mimno DM, McCallum A (2012) Topic models conditioned on arbitrary features with dirichlet-multinomial regression. In: CoRR. UAI, 2008, abs/1206.3278Google Scholar
  19. Monteleoni C et al (2013) Climate Informatics, chapter 4, pp 81–126Google Scholar
  20. Papadimitriou CH, Raghavan P, Tamaki H, Vempala S (2000) Latent semantic indexing: a probabilistic analysis. J Comput Syst Sci 61(2):217–235CrossRefGoogle Scholar
  21. Scheffer M, Holmgren M, Brovkin V, Claussen M (2005) Synergy between small- and large-scale feedbacks of vegetation on the water cycle. Glob Chang Biol 11:1003–1012+Google Scholar
  22. Schubert SD, Suarez MJ, Pegion PJ, Koster RD, Bacmeister JT (2004) On the cause of the 1930s Dust Bowl. Science 303(5665):1855–1859CrossRefGoogle Scholar
  23. Steinbach M, Tan P-N, Kumar V, Klooster SA, Potter C (2003) Discovery of climate indices using clustering. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, Washington DC, pp 446–455, 24–27 Aug 2003Google Scholar
  24. Steinhaeuser K, Chawla NV, Ganguly AR (2011) Comparing predictive power in climate data: clustering matters. In: Advances in spatial and temporal databases – 12th international symposium, SSTD 2011, Proceedings, Minneapolis, 24–26 Aug 2011, pp 39–55Google Scholar
  25. Steinhaeuser K, Chawla NV, Ganguly AR (2011) Comparing predictive power in climate data: clustering matters. In: SSTD, Minneapolis, pp 39–55Google Scholar
  26. Wallach HM, Murray I, Salakhutdinov R, Mimno DM (2009) Evaluation methods for topic models. In: Proceedings of the 26th Annual international conference on machine learning, ICML 2009, Montreal, pp 1105–1112, 14–18 June 2009Google Scholar
  27. World climate research programme: Grand challenges (2013)Google Scholar
  28. Zhu J, Xing EP (2010) Conditional topic random fields. In: Proceedings of the 27th international conference on machine learning (ICML-10), Haifa, pp 1239–1246, 21–24 June 2010Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.George Washington UniversityWashington, DCUSA

Personalised recommendations