Data Mining and Knowledge Discovery

, Volume 27, Issue 2, pp 225–258 | Cite as

Discovery of extreme events-related communities in contrasting groups of physical system networks

  • Zhengzhang Chen
  • William Hendrix
  • Hang Guan
  • Isaac K. Tetteh
  • Alok Choudhary
  • Fredrick Semazzi
  • Nagiza F. Samatova
Open Access


The latent behavior of a physical system that can exhibit extreme events such as hurricanes or rainfalls, is complex. Recently, a very promising means for studying complex systems has emerged through the concept of complex networks. Networks representing relationships between individual objects usually exhibit community dynamics. Conventional community detection methods mainly focus on either mining frequent subgraphs in a network or detecting stable communities in time-varying networks. In this paper, we formulate a novel problem—detection of predictive and phase-biased communities in contrasting groups of networks, and propose an efficient and effective machine learning solution for finding such anomalous communities. We build different groups of networks corresponding to different system’s phases, such as higher or low hurricane activity, discover phase-related system components as seeds to help bound the search space of community generation in each network, and use the proposed contrast-based technique to identify the changing communities across different groups. The detected anomalous communities are hypothesized (1) to play an important role in defining the target system’s state(s) and (2) to improve the predictive skill of the system’s states when used collectively in the ensemble of predictive models. When tested on the two important extreme event problems—identification of tropical cyclone-related and of African Sahel rainfall-related climate indices—our algorithm demonstrated the superior performance in terms of various skill and robustness metrics, including 8–16 % accuracy increase, as well as physical interpretability of detected communities. The experimental results also show the efficiency of our algorithm on synthetic datasets.


Spatio-temporal data mining Complex network analysis Community detection Comparative analysis Network motif detection Extreme event prediction 



The authors would like to thank the editor and the anonymous reviewers for their valuable comments and suggestions to improve the paper. This work was supported in part by the U.S. Department of Energy, Office of Science, the Office of Advanced Scientific Computing Research (ASCR) and the Office of Biological and Environmental Research (BER) and the U.S. National Science Foundation (Expeditions in Computing). Oak Ridge National Laboratory is managed by UT-Battelle for the LLC U.S. D.O.E. under contract no. DEAC05-00OR22725.

Open Access

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.


  1. Balasundaram B, Butenko S, Hicks IV (2011) Clique relaxations in social network analysis: the maximum k-plex problem. Oper Res 59(1): 133–142MathSciNetCrossRefzbMATHGoogle Scholar
  2. Borgelt C, Berthold MR(2002) Mining molecular fragments: finding relevant substructures of molecules. In: ICDM, p 51Google Scholar
  3. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, MontereyzbMATHGoogle Scholar
  4. Camargo S, Kossin J, Sitkowski M (2010) Climate modulation of North Atlantic hurricane tracks. J Clim 23: 3057–3076CrossRefGoogle Scholar
  5. Chakrabarti D.(2004) AutoPart: parameter-free graph partitioning and outlier detection. In: PKDD, pp 112–124Google Scholar
  6. Chakrabarti D, Zhan Y, Faloutsos C (2004) R-mat: a recursive model for graph mining. In: SIAM international conference on data miningGoogle Scholar
  7. Chan PK, Mahoney MV (2005) Modeling multiple time series for anomaly detection. In: ICDM, pp 90–97Google Scholar
  8. Chen Z, Hendrix W, Samatova, N (2011) Community-based anomaly detection in evolutionary networks. J Intell Inf Syst 1–27. doi: 10.1007/s10844-011-0183-2
  9. Cheng H, Tan P, Potter C, Klooster S (2008) A robust graph-based algorithm for detection and characterization of anomalies in noisy multivariate time series. In: ICDM workshops, pp 349–358Google Scholar
  10. Chu P, Zhao X, Lee C, Lu M (2007) Climate prediction of tropical cyclone activity in the vicinity of taiwan using the multivariate least absolute deviation regression method. Terr Atmos Ocean Sci 18(4): 805–825CrossRefGoogle Scholar
  11. Clark JD, Chu PS (2002) Interannual variation of tropical cyclone activity over the Central North Pacific. JMSJ 80(3): 403–418CrossRefGoogle Scholar
  12. Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 1–6.
  13. Donges JF, Zou Y, Marwan N, Kurths J (2009) Complex networks in climate dynamics. Eur Phys J Special Top 174(1): 157–179. doi: 10.1140/epjst/e2009-01098-2 CrossRefGoogle Scholar
  14. Eberle W, Holder, L (2007) Discovering structural anomalies in graph-based data. In: ICDM workshops, pp 393–398Google Scholar
  15. Elsner J (2001) Tracking hurricanes. AMS 84: 353–356Google Scholar
  16. Fayyad UM, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: IJCAI, pp 1022–1027Google Scholar
  17. Ganguly AR, Steinhaeuser K, Erickson DJ, Branstetter M, Parish ES, Singh N, Drake JB, Buja L (2009) Higher trends but larger uncertainty and geographic variability in 21st century temperature and heat waves. Proc Natl Acad Sci 106(37): 15555–15559CrossRefGoogle Scholar
  18. Gill R, Datta S, Datta S (2010) A statistical framework for differential network analysis from microarray data. BMC Bioinf 11(1):95+. doi: 10.1186/1471-2105-11-95
  19. Girvan M, Newman ME (2002) Community structure in social and biological networks. Natl Acad Sci USA 99: 7821–7826MathSciNetCrossRefzbMATHGoogle Scholar
  20. Goldenberg S, Shapiro L (1996) Physical mechanisms for the association of El Nino and West African rainfall with Atlantic major hurricane activity. J Clim 9(6): 1169–1187CrossRefGoogle Scholar
  21. Gozolchiani A, Yamasaki K, Gazit O, Havlin S (2008) Pattern of climate network blinking links follows el nino events. EPL 83:28005Google Scholar
  22. Hey T, Tansley S, Tolle K (2009) The fourth paradigm: data-intensive scientific discovery. Microsoft Research, RedmondGoogle Scholar
  23. Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C (2009) STRING 8—a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res 37(Database):D412–D416Google Scholar
  24. Jolliffe IT, Stephenson DB (2003) Forecast verification: a practitioner’s guide in atmospheric science. Wiley, New YorkGoogle Scholar
  25. Kalaev M, Bafna V, Sharan R (2008) Fast and accurate alignment of multiple protein networks. In: RECOMB, pp 246–256Google Scholar
  26. Kawale J, Chatterjee S, Kumar A, Liess S, Steinbach M, Kumar V (2011) Anomaly construction in climate data: issues and challenges. In: CIDU, pp 189–203Google Scholar
  27. Kim HM, Webster PJ (2010) Extended-range seasonal hurricane forecasts for the North At lantic with a hybrid dynamical-statistical model. Geophys Res Lett 37(21):L21705Google Scholar
  28. Kim HS, Ho CH, Chu PS, Kim JH (2010) Seasonal prediction of summertime tropical cyclone activity over the East China Sea using the least absolute deviation regression and the Poisson regression. Int J Climatol 30(2): 210–219Google Scholar
  29. Magill T, Christopher J, Magill TH, Clark JV, Melick CJ, Market PS (2008) The interannual variability of hurricane activity in the art. NWD 32: 1–15Google Scholar
  30. Moonesinghe H, Tan PN (2006) Outlier detection using random walks. In: ICTAI, pp 532 – 539Google Scholar
  31. Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45(2): 167–256MathSciNetCrossRefzbMATHGoogle Scholar
  32. Pei J, Jiang D, Zhang A (2005) Mining cross-graph quasi-cliques in gene expression and protein interaction data. In: Proceedings of the 21st international conference on data engineering (ICDE 2005), pp 353–356Google Scholar
  33. Pei J, Jiang D, Zhang A (2005) On mining cross-graph quasi-cliques. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, KDD ’05. ACM, New York, pp 228–238. doi: 10.1145/1081870.1081898
  34. Peng J, Yang L, Wang J, Liu Z, Li M (2008) An efficient algorithm for detecting closed frequent subgraphs in biological networks. In: BMEI, pp 677–681Google Scholar
  35. Saunders M, Harris A (1997) Statistical evidence links exceptional 1995 Atlantic hurricane season to record sea warming. JGRL 24: 1255–1258CrossRefGoogle Scholar
  36. Seidman SB, Foster BL (1978) A graph-theoretic generalization of the clique concept. J Math Sociol 6: 139–154MathSciNetCrossRefzbMATHGoogle Scholar
  37. Sharan R, Ideker T, Kelley B, Shamir R, 2004, RMK, Karp RM (2005) Identification of protein complexes by comparative analysis of yeast and bacterial protein interaction data. J Comput Biol 12(6):835–846Google Scholar
  38. Steinhaeuser K, Chawla NV, Ganguly AR (2009) An exploration of climate data using complex networks. In: SensorKDD, pp 23–31Google Scholar
  39. Steinhaeuser K, Chawla NV, Ganguly AR (2011) Complex networks as a unified framework for descriptive analysis and predictive modeling in climate science. Stat Anal Data Mining 4(5): 497–511MathSciNetCrossRefGoogle Scholar
  40. Sun J, Faloutsos C, Papadimitriou S, Yu PS (2007) Graph scope: parameter-free mining of large time- evolving graphs. In: KDD, pp 687–696Google Scholar
  41. Sun J, Qu H, Chakrabarti D, Faloutsos C(2005) Neighborhood formation and anomaly detection in bipartite graphs. In: The fifth IEEE ICDM, pp 418–425Google Scholar
  42. Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: KDD ’06, pp 374–383Google Scholar
  43. Sutton RT, Jewson S, Rowell DP (2000) The elements of climate variability in the tropical atlantic region. J Clim 13: 3261–3284CrossRefGoogle Scholar
  44. Tan A, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl Bioinf 2:S75–S83Google Scholar
  45. Tao D, Tang X, Li X, Wu X (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans Pattern Anal 28: 1088–1099CrossRefGoogle Scholar
  46. Tsonis A, Roebber P (2004) The architecture of the climate network. Physica A 333: 497–504CrossRefGoogle Scholar
  47. Tsonis A, Swanson K(2008) Topology and predictability of el nino and la nina networks. Phys Rev Lett 100(22):228502Google Scholar
  48. Tsonis A, Swanson K, Kravtsov S (2007) A new dynamical mechanism for major climate shifts. GRL 34:L13705+Google Scholar
  49. Tsonis A, Swanson K, Roebber P (2006) What do networks have to do with climate?. BAMS 87(5): 585–595CrossRefGoogle Scholar
  50. Tsonis A, Swanson K, Wang G (2008) On the role of atmospheric teleconnections in climate. J Clim 21: 2990–3001CrossRefGoogle Scholar
  51. Tsonis A, Wang G, Swanson K, Rodrigues F, Costa L (2010) Community structure and dynamics in climate networks. Clim Dyn 1–8. doi: 10.1007/s00382-010-0874-3
  52. Wakita K, Tsurumi T (2007) Finding community structure in mega-scale social networks. CoRR abs/cs/0702048Google Scholar
  53. Xie L, Yan T, Pietrafesa L (2005) The effect of Atlantic sea surface temperature dipole mode on hurricanes: implications for the 2004 Atlantic hurricane season. JGRL 32:3701+Google Scholar
  54. Yeshanew A, Jury MR (2007) North african climate variability. Part 3: resource prediction. Theor Appl Climatol 89(1–2): 51–62CrossRefGoogle Scholar
  55. Zeng Z, Wang J, Zhou, L, Karypis G (2006) Coherent closed quasi-clique discovery from large dense graph databases. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’06. ACM, New York, pp 797–802. doi: 10.1145/1150402.1150506
  56. Zeng Z, Wang J, Zhou L, Karypis G (2007) Out-of-core coherent closed quasi-clique mining from large dense graph databases. ACM Trans Database Syst 32(2): 13CrossRefGoogle Scholar
  57. Zhang B, Li H, Riggins RB, Zhan M, Xuan J, Zhang Z, Hoffman EP, Clarke R, Wang Y (2009) Differential dependency network analysis to identify condition-specific topological changes in biological networks. Bioinformatics 25(4):526–532. doi: 10.1093/bioinformatics/btn660 Google Scholar

Copyright information

© The Author(s) 2012

Authors and Affiliations

  • Zhengzhang Chen
    • 1
    • 2
  • William Hendrix
    • 1
  • Hang Guan
    • 3
  • Isaac K. Tetteh
    • 1
  • Alok Choudhary
    • 4
  • Fredrick Semazzi
    • 1
  • Nagiza F. Samatova
    • 1
    • 2
  1. 1.North Carolina State UniversityRaleighUSA
  2. 2.Oak Ridge National LaboratoryOak RidgeUSA
  3. 3.Zhejiang UniversityHangzhouChina
  4. 4.Northwestern UniversityEvanstonUSA

Personalised recommendations