Comparing Predictive Power in Climate Data: Clustering Matters

  • Karsten Steinhaeuser
  • Nitesh V. Chawla
  • Auroop R. Ganguly
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6849)


Various clustering methods have been applied to climate, ecological, and other environmental datasets, for example to define climate zones, automate land-use classification, and similar tasks. Measuring the “goodness” of such clusters is generally application-dependent and highly subjective, often requiring domain expertise and/or validation with field data (which can be costly or even impossible to acquire). Here we focus on one particular task: the extraction of ocean climate indices from observed climatological data. In this case, it is possible to quantify the relative performance of different methods. Specifically, we propose to extract indices with complex networks constructed from climate data, which have been shown to effectively capture the dynamical behavior of the global climate system, and compare their predictive power to candidate indices obtained using other popular clustering methods. Our results demonstrate that network-based clusters are statistically significantly better predictors of land climate than any other clustering method, which could lead to a deeper understanding of climate processes and complement physics-based climate models.


Root Mean Square Error Climate Data Support Vector Regression Spectral Cluster Community Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Asur, S., Ucar, D., Parthasarathy, S.: An ensemble framework for clustering protein-protein interaction graphs. Bioinformatics 23(13), 29–40 (2007)CrossRefGoogle Scholar
  2. 2.
    Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. Mach. Learn. Res. 7, 1–30 (2006)zbMATHGoogle Scholar
  3. 3.
    Donges, J.F., Zou, Y., Marwan, N., Kurths, J.: Complex networks in climate dynamics. Eur. Phs. J. Special Topics 174, 157–179 (2009)CrossRefGoogle Scholar
  4. 4.
    Floyd, R.W.: Algorithm 97: Shortest Path. Comm. ACM 5(6), 345 (1962)CrossRefGoogle Scholar
  5. 5.
    Fovell, R.G., Fovell, M.-Y.C.: Climate Zones of the Conterminous United States Defined Using Cluster Analysis. J. Climate 6(11), 2103–2135 (1993)CrossRefGoogle Scholar
  6. 6.
    Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 92, 611–631 (2002)CrossRefzbMATHGoogle Scholar
  7. 7.
    Glantz, M.H., Katz, R.W., Nicholls, N.: Teleconnections linking worldwide climate anomalies: scientific basis and societal impact. Cambridge University Press, Cambridge (1991)Google Scholar
  8. 8.
    Guimerá, R., Mossa, S., Turtschi, A., Amaral, L.A.N.: The worldwide air transportation network: Anomalous centrality, community structure, and cities’ global roles. Proc. Nat. Acad. Sci. USA 102(22), 7794–7799 (2005)CrossRefzbMATHGoogle Scholar
  9. 9.
    Hall, M.A., Smith, L.A.: Feature Selection for Machine Learning: Comparing a Correlation-based Filter Approach to the Wrapper. In: Int’l Florida AI Research Society Conf., pp. 235–239 (1999)Google Scholar
  10. 10.
    Han, J., Kamber, M., Tung, A.K.H.: Spatial Clustering in Data Mining: A Survey, pp. 1–29. Taylor and Francis, Abington (2001)Google Scholar
  11. 11.
    Hargrove, W.W., Hoffman, F.M.: Using Multivariate Clustering to Characterize Ecoregion Borders. Comput. Sci. Eng. 1(4), 18–25 (1999)CrossRefGoogle Scholar
  12. 12.
    Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A K-means clustering algorithm. Applied Statistics (28), 100–108 (1979)CrossRefzbMATHGoogle Scholar
  13. 13.
    Jain, A.K., Murty, N.N., Flynn, P.J.: Data clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)CrossRefGoogle Scholar
  14. 14.
    Kalnay, E., et al.: The NCEP/NCAR 40-Year Reanalysis Project. BAMS 77(3), 437–470 (1996)CrossRefGoogle Scholar
  15. 15.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Clustering Analysis. Wiley, Chichester (1990)CrossRefzbMATHGoogle Scholar
  16. 16.
    Loveland, T.R., et al.: Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data. Int. J. Remote Sensing 21(6-7), 1303–1330 (2000)CrossRefGoogle Scholar
  17. 17.
    Ng, A.Y., Jordan, M.I., Weiss, Y.: On Spectral Clustering: Analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856 (2001)Google Scholar
  18. 18.
    Pons, P., Latapy, M.: Computing communities in large networks using random walks. J. Graph Alg. App. 10(2), 191–218 (2006)CrossRefzbMATHGoogle Scholar
  19. 19.
    Race, C., Steinbach, M., Ganguly, A.R., Semazzi, F., Kumar, V.: A Knowledge Discovery Strategy for Relating Sea Surface Temperatures to Frequencies of Tropical Storms and Generating Predictions of Hurricanes Under 21st-century Global Warming Scenarios. In: NASA Conf. on Intelligent Data Understanding, Mountain View, CA (2010)Google Scholar
  20. 20.
    Ropelewski, C.F., Jones, P.D.: An Extension of the Tahiti-Darwin Southern Oscillation Index. Mon. Weather Rev. 115, 2161–2165 (1987)CrossRefGoogle Scholar
  21. 21.
    Serrano, A., Boguna, M., Vespignani, A.: Extracting the multiscale backbone of complex weighted networks. PNAS 106(16), 8847–8852 (2009)CrossRefGoogle Scholar
  22. 22.
    Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. In: ACM SIGKDD Workshop on Text Mining (2000)Google Scholar
  23. 23.
    Steinbach, M., Tan, P.-N., Kumar, V., Klooster, S., Potter, C.: Discovery of Climate Indices using Clustering. In: ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, pp. 446–455 (2003)Google Scholar
  24. 24.
    Steinhaeuser, K., Chawla, N.V., Ganguly, A.R.: Complex Networks as a Unified Framework for Descriptive Analysis and Predictive Modeling in Climate. Technical Report TR-2010-07. University of Notre Dame (2010)Google Scholar
  25. 25.
    Steinhaeuser, K., Chawla, N.V., Ganguly, A.R.: Complex Networks in Climate Science: Progress, Opportunities and Challenges. In: NASA Conf. on Intelligent Data Understanding, Mountain View, CA (2010)Google Scholar
  26. 26.
    Steinhaeuser, K., Chawla, N.V., Ganguly, A.R.: An Exploration of Climate Data Using Complex Networks. ACM SIGKDD Explorations 12(1), 25–32 (2010)CrossRefGoogle Scholar
  27. 27.
    Tsonis, A.A., Roebber, P.J.: The architecture of the climate network. Physica A 333, 497–504 (2004)CrossRefGoogle Scholar
  28. 28.
    Tsonis, A.A., Swanson, K.L., Roebber, P.J.: What Do Networks Have to Do with Climate? BAMS 87(5), 585–595 (2006)CrossRefGoogle Scholar
  29. 29.
    Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge (1994)CrossRefzbMATHGoogle Scholar
  30. 30.
    Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998)CrossRefGoogle Scholar
  31. 31.
    Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE T. Pattern Anal. 15(11), 1101–1113 (1993)CrossRefGoogle Scholar
  32. 32.
    Yamasaki, K., Gozolchiani, A., Havlin, S.: Climate Networks around the Globe are Significantly Affected by El Niño. Phys. Rev. Lett. 100(22), 157–179 (2008)CrossRefGoogle Scholar
  33. 33.
  34. 34.
  35. 35.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Karsten Steinhaeuser
    • 1
    • 2
  • Nitesh V. Chawla
    • 1
  • Auroop R. Ganguly
    • 2
  1. 1.Department of Computer Science and Engineering, Interdisciplinary Center for Network Science and ApplicationsUniversity of Notre DameNotre DameUSA
  2. 2.Computational Sciences and Engineering DivisionOak Ridge National LaboratoryOak RidgeUSA

Personalised recommendations