Abstract
The latent behavior of a physical system that can exhibit extreme events such as hurricanes or rainfalls, is complex. Recently, a very promising means for studying complex systems has emerged through the concept of complex networks. Networks representing relationships between individual objects usually exhibit community dynamics. Conventional community detection methods mainly focus on either mining frequent subgraphs in a network or detecting stable communities in time-varying networks. In this paper, we formulate a novel problem—detection of predictive and phase-biased communities in contrasting groups of networks, and propose an efficient and effective machine learning solution for finding such anomalous communities. We build different groups of networks corresponding to different system’s phases, such as higher or low hurricane activity, discover phase-related system components as seeds to help bound the search space of community generation in each network, and use the proposed contrast-based technique to identify the changing communities across different groups. The detected anomalous communities are hypothesized (1) to play an important role in defining the target system’s state(s) and (2) to improve the predictive skill of the system’s states when used collectively in the ensemble of predictive models. When tested on the two important extreme event problems—identification of tropical cyclone-related and of African Sahel rainfall-related climate indices—our algorithm demonstrated the superior performance in terms of various skill and robustness metrics, including 8–16 % accuracy increase, as well as physical interpretability of detected communities. The experimental results also show the efficiency of our algorithm on synthetic datasets.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Balasundaram B, Butenko S, Hicks IV (2011) Clique relaxations in social network analysis: the maximum k-plex problem. Oper Res 59(1): 133–142
Borgelt C, Berthold MR(2002) Mining molecular fragments: finding relevant substructures of molecules. In: ICDM, p 51
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, Monterey
Camargo S, Kossin J, Sitkowski M (2010) Climate modulation of North Atlantic hurricane tracks. J Clim 23: 3057–3076
Chakrabarti D.(2004) AutoPart: parameter-free graph partitioning and outlier detection. In: PKDD, pp 112–124
Chakrabarti D, Zhan Y, Faloutsos C (2004) R-mat: a recursive model for graph mining. In: SIAM international conference on data mining
Chan PK, Mahoney MV (2005) Modeling multiple time series for anomaly detection. In: ICDM, pp 90–97
Chen Z, Hendrix W, Samatova, N (2011) Community-based anomaly detection in evolutionary networks. J Intell Inf Syst 1–27. doi:10.1007/s10844-011-0183-2
Cheng H, Tan P, Potter C, Klooster S (2008) A robust graph-based algorithm for detection and characterization of anomalies in noisy multivariate time series. In: ICDM workshops, pp 349–358
Chu P, Zhao X, Lee C, Lu M (2007) Climate prediction of tropical cyclone activity in the vicinity of taiwan using the multivariate least absolute deviation regression method. Terr Atmos Ocean Sci 18(4): 805–825
Clark JD, Chu PS (2002) Interannual variation of tropical cyclone activity over the Central North Pacific. JMSJ 80(3): 403–418
Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 1–6. www.ece.unm.edu/ifis/papers/community-moore.pdf
Donges JF, Zou Y, Marwan N, Kurths J (2009) Complex networks in climate dynamics. Eur Phys J Special Top 174(1): 157–179. doi:10.1140/epjst/e2009-01098-2
Eberle W, Holder, L (2007) Discovering structural anomalies in graph-based data. In: ICDM workshops, pp 393–398
Elsner J (2001) Tracking hurricanes. AMS 84: 353–356
Fayyad UM, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: IJCAI, pp 1022–1027
Ganguly AR, Steinhaeuser K, Erickson DJ, Branstetter M, Parish ES, Singh N, Drake JB, Buja L (2009) Higher trends but larger uncertainty and geographic variability in 21st century temperature and heat waves. Proc Natl Acad Sci 106(37): 15555–15559
Gill R, Datta S, Datta S (2010) A statistical framework for differential network analysis from microarray data. BMC Bioinf 11(1):95+. doi:10.1186/1471-2105-11-95
Girvan M, Newman ME (2002) Community structure in social and biological networks. Natl Acad Sci USA 99: 7821–7826
Goldenberg S, Shapiro L (1996) Physical mechanisms for the association of El Nino and West African rainfall with Atlantic major hurricane activity. J Clim 9(6): 1169–1187
Gozolchiani A, Yamasaki K, Gazit O, Havlin S (2008) Pattern of climate network blinking links follows el nino events. EPL 83:28005
Hey T, Tansley S, Tolle K (2009) The fourth paradigm: data-intensive scientific discovery. Microsoft Research, Redmond
Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C (2009) STRING 8—a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res 37(Database):D412–D416
Jolliffe IT, Stephenson DB (2003) Forecast verification: a practitioner’s guide in atmospheric science. Wiley, New York
Kalaev M, Bafna V, Sharan R (2008) Fast and accurate alignment of multiple protein networks. In: RECOMB, pp 246–256
Kawale J, Chatterjee S, Kumar A, Liess S, Steinbach M, Kumar V (2011) Anomaly construction in climate data: issues and challenges. In: CIDU, pp 189–203
Kim HM, Webster PJ (2010) Extended-range seasonal hurricane forecasts for the North At lantic with a hybrid dynamical-statistical model. Geophys Res Lett 37(21):L21705
Kim HS, Ho CH, Chu PS, Kim JH (2010) Seasonal prediction of summertime tropical cyclone activity over the East China Sea using the least absolute deviation regression and the Poisson regression. Int J Climatol 30(2): 210–219
Magill T, Christopher J, Magill TH, Clark JV, Melick CJ, Market PS (2008) The interannual variability of hurricane activity in the art. NWD 32: 1–15
Moonesinghe H, Tan PN (2006) Outlier detection using random walks. In: ICTAI, pp 532 – 539
Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45(2): 167–256
Pei J, Jiang D, Zhang A (2005) Mining cross-graph quasi-cliques in gene expression and protein interaction data. In: Proceedings of the 21st international conference on data engineering (ICDE 2005), pp 353–356
Pei J, Jiang D, Zhang A (2005) On mining cross-graph quasi-cliques. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, KDD ’05. ACM, New York, pp 228–238. doi:10.1145/1081870.1081898
Peng J, Yang L, Wang J, Liu Z, Li M (2008) An efficient algorithm for detecting closed frequent subgraphs in biological networks. In: BMEI, pp 677–681
Saunders M, Harris A (1997) Statistical evidence links exceptional 1995 Atlantic hurricane season to record sea warming. JGRL 24: 1255–1258
Seidman SB, Foster BL (1978) A graph-theoretic generalization of the clique concept. J Math Sociol 6: 139–154
Sharan R, Ideker T, Kelley B, Shamir R, 2004, RMK, Karp RM (2005) Identification of protein complexes by comparative analysis of yeast and bacterial protein interaction data. J Comput Biol 12(6):835–846
Steinhaeuser K, Chawla NV, Ganguly AR (2009) An exploration of climate data using complex networks. In: SensorKDD, pp 23–31
Steinhaeuser K, Chawla NV, Ganguly AR (2011) Complex networks as a unified framework for descriptive analysis and predictive modeling in climate science. Stat Anal Data Mining 4(5): 497–511
Sun J, Faloutsos C, Papadimitriou S, Yu PS (2007) Graph scope: parameter-free mining of large time- evolving graphs. In: KDD, pp 687–696
Sun J, Qu H, Chakrabarti D, Faloutsos C(2005) Neighborhood formation and anomaly detection in bipartite graphs. In: The fifth IEEE ICDM, pp 418–425
Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: KDD ’06, pp 374–383
Sutton RT, Jewson S, Rowell DP (2000) The elements of climate variability in the tropical atlantic region. J Clim 13: 3261–3284
Tan A, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl Bioinf 2:S75–S83
Tao D, Tang X, Li X, Wu X (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans Pattern Anal 28: 1088–1099
Tsonis A, Roebber P (2004) The architecture of the climate network. Physica A 333: 497–504
Tsonis A, Swanson K(2008) Topology and predictability of el nino and la nina networks. Phys Rev Lett 100(22):228502
Tsonis A, Swanson K, Kravtsov S (2007) A new dynamical mechanism for major climate shifts. GRL 34:L13705+
Tsonis A, Swanson K, Roebber P (2006) What do networks have to do with climate?. BAMS 87(5): 585–595
Tsonis A, Swanson K, Wang G (2008) On the role of atmospheric teleconnections in climate. J Clim 21: 2990–3001
Tsonis A, Wang G, Swanson K, Rodrigues F, Costa L (2010) Community structure and dynamics in climate networks. Clim Dyn 1–8. doi:10.1007/s00382-010-0874-3
Wakita K, Tsurumi T (2007) Finding community structure in mega-scale social networks. CoRR abs/cs/0702048
Xie L, Yan T, Pietrafesa L (2005) The effect of Atlantic sea surface temperature dipole mode on hurricanes: implications for the 2004 Atlantic hurricane season. JGRL 32:3701+
Yeshanew A, Jury MR (2007) North african climate variability. Part 3: resource prediction. Theor Appl Climatol 89(1–2): 51–62
Zeng Z, Wang J, Zhou, L, Karypis G (2006) Coherent closed quasi-clique discovery from large dense graph databases. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’06. ACM, New York, pp 797–802. doi:10.1145/1150402.1150506
Zeng Z, Wang J, Zhou L, Karypis G (2007) Out-of-core coherent closed quasi-clique mining from large dense graph databases. ACM Trans Database Syst 32(2): 13
Zhang B, Li H, Riggins RB, Zhan M, Xuan J, Zhang Z, Hoffman EP, Clarke R, Wang Y (2009) Differential dependency network analysis to identify condition-specific topological changes in biological networks. Bioinformatics 25(4):526–532. doi:10.1093/bioinformatics/btn660
Acknowledgments
The authors would like to thank the editor and the anonymous reviewers for their valuable comments and suggestions to improve the paper. This work was supported in part by the U.S. Department of Energy, Office of Science, the Office of Advanced Scientific Computing Research (ASCR) and the Office of Biological and Environmental Research (BER) and the U.S. National Science Foundation (Expeditions in Computing). Oak Ridge National Laboratory is managed by UT-Battelle for the LLC U.S. D.O.E. under contract no. DEAC05-00OR22725.
Open Access
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Eamonn Keogh.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Chen, Z., Hendrix, W., Guan, H. et al. Discovery of extreme events-related communities in contrasting groups of physical system networks. Data Min Knowl Disc 27, 225–258 (2013). https://doi.org/10.1007/s10618-012-0289-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-012-0289-3