Multi-source models for civil unrest forecasting

  • Gizem Korkmaz
  • Jose Cadena
  • Chris J. Kuhlman
  • Achla Marathe
  • Anil Vullikanti
  • Naren Ramakrishnan
Original Article


Civil unrest events (protests, strikes, and “occupy” events) range from small, nonviolent protests that address specific issues to events that turn into large-scale riots. Detecting and forecasting these events is of key interest to social scientists and policy makers because they can lead to significant societal and cultural changes. We forecast civil unrest events in six countries in Latin America on a daily basis, from November 2012 through August 2014, using multiple data sources that capture social, political and economic contexts within which civil unrest occurs. The models contain predictors extracted from social media sites (Twitter and blogs) and news sources, in addition to volume of requests to Tor, a widely used anonymity network. Two political event databases and country-specific exchange rates are also used. Our forecasting models are evaluated using a Gold Standard Report, which is compiled by an independent group of social scientists and subject matter experts. We use logistic regression models with Lasso to select a sparse feature set from our diverse datasets. The experimental results, measured by F1-scores, are in the range 0.68–0.95, and demonstrate the efficacy of using a multi-source approach for predicting civil unrest. Case studies illustrate the insights into unrest events that are obtained with our method. The ablation study demonstrates the relative value of data sources for prediction. We find that social media and news are more informative than other data sources, including the political event databases, and enhance the prediction performance. However, social media increases the variation in the performance metrics.


Social Medium Receiver Operating Characteristic Curve Multiple Data Source News Source Social Media Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work has been partially supported by the following Grants: DTRA Grant HDTRA1-11-1-0016, DTRA CNIMS Contract HDTRA1-11-D-0016-0010, NSF ICES CCF-1216000, NSF NETSE Grant CNS-1011769 and NIH 1R01GM109718. Also, supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center (DoI/NBC) Contract No. D12PC000337, the US Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.


  1. Arias M, Arratia A, Xuriguera R (2013) Forecasting with Twitter data. ACM Trans Intell Syst Technol (TIST) 5(1):8:1–8:24Google Scholar
  2. Asur S, Huberman BA (2010) Predicting the future with social media. In: Proceedings of the 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology (WI-IAT), vol 1, pp 492–499Google Scholar
  3. Bellemare MF (2015) Rising food prices, food price volatility, and social unrest. Am J Agric Econ 97(1):1–21CrossRefGoogle Scholar
  4. Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8CrossRefGoogle Scholar
  5. Chakraborty P, Khadivi P, Lewis B, Mahendiran A, Chen J, Butler P, Nsoesie EO, Mekaru SR, Brownstein JS, Marathe M, et al (2014) Forecasting a moving target: ensemble models for ILI case count predictions. In: Proceedings of the 2014 SIAM international conference on data mining, pp 262–270Google Scholar
  6. Chen F, Neill DB (2014) Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1166–1175Google Scholar
  7. Culotta A (2010) Towards detecting influenza epidemics by analyzing Twitter messages. In: Proceedings of the first workshop on social media analytics, pp 115–122Google Scholar
  8. El-Katiri L, Fattouh B, Mallinson R (2014) The Arab uprisings and MENA political instability: implications for oil & gas markets. OIES Paper: MEP 8, Oxford Institute for Energy Studies, Oxford. implications-foroil-gas-markets/
  9. Gerner DJ, Schrodt PA, Francisco RA, Weddle JL (1994) Machine coding of event data using regional and international sources. Int Stud Q 38(1):91–119CrossRefGoogle Scholar
  10. Gerner DJ, Schrodt PA, Yilmaz O, Abu-Jabr R (2002) Conflict and mediation event observations (CAMEO): a new event data framework for the analysis of foreign policy interactions. In: 43rd Annual convention of the international studies association, pp 24–27Google Scholar
  11. Golub GH, Reinsch C (1970) Singular value decomposition and least squares solutions. Numer Math 14(5):403–420MathSciNetCrossRefzbMATHGoogle Scholar
  12. González-Bailón S, Borge-Holthoefer J, Rivero A, Moreno Y (2011) The dynamics of protest recruitment through an online network. Sci Rep 1(197). doi: 10.1038/srep00197
  13. Kallus N (2014) Predicting crowd behavior with big public data. In: Proceedings of the Companion publication of the 23rd international conference on world wide web companion, pp 625–630Google Scholar
  14. Keneshloo Y, Cadena J, Korkmaz G, Ramakrishnan N (2014) Detecting and forecasting domestic political crises: a graph-based approach. In: Proceedings of the 2014 ACM conference on web science, pp 192–196Google Scholar
  15. Korkmaz G, Cadena J, Kuhlman CJ, Marathe A, Vullikanti A, Ramakrishnan N (2015) Combining heterogeneous data sources for civil unrest forecasting. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, pp 258–265Google Scholar
  16. Lampos V, De Bie T, Cristianini N (2010) Flu detector: tracking epidemics on Twitter. In: Balcázar JL, Bonchi F, Gionis A, Sebag M (eds) Proceedings of the 2010 European conference on machine learning and knowledge discovery in databases: Part III (ECML PKDD'10). Springer, Berlin, Heidelberg, pp 599–602 Google Scholar
  17. Leetaru K, Schrodt PA (2013) GDELT: Global data on events, location, and tone, 1979–2012. In: International Studies Association (ISA) Annual Convention, vol 2. CiteseerGoogle Scholar
  18. Lynch J (1973) The Spanish-American revolutions, 1808–1826. Norton, New YorkGoogle Scholar
  19. McFadden D (1973) Conditional logit analysis of qualitative choice behavior. Front Econ pp 105–142Google Scholar
  20. McFadden D (1977) Quantitative methods for analyzing travel behaviour of individuals: some recent developments. Technical report, Cowles Foundation for Research in Economics, Yale UniversityGoogle Scholar
  21. Muthiah S, Huang B, Arredondo J, Mares D, Getoor L, Katz G, Ramakrishnan N (2015) Planned protest modeling in news and social media. In: Proceedings of the Twenty-Seventh annual conference on innovative applications of artificial intelligence (IAAI), pp 3920–3927Google Scholar
  22. Piven FF, Cloward RA (1977) Poor people’s movements. Pantheon, New YorkGoogle Scholar
  23. Ramakrishnan N, Butler P, Muthiah S, Self N, Khandpur R, Saraf P, Wang W, Cadena J, Vullikanti A, Korkmaz G, Kuhlman C, Marathe A, Zhao L, Hua T, Chen F, Lu CT, Huang B, Srinivasan A, Trinh K, Getoor L, Katz G, Doyle A, Ackermann C, Zavorin I, Ford J, Summers K, Fayed Y, Arredondo J, Gupta D, Mares D (2014) “Beating the news” with EMBERS: forecasting civil unrest using open source indicators. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1799–1808Google Scholar
  24. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on world wide web. ACM, pp 851–860Google Scholar
  25. Shi L, Agarwal N, Agrawal A, Garg R, Spoelstra J (2012) Predicting US primary elections with Twitter.
  26. Starbird K, Palen L (2012) (How) will the revolution be retweeted? Information diffusion and the 2011 Egyptian uprising. In: Proceedings of the 2012 ACM conference on computer supported cooperative work, pp 7–16Google Scholar
  27. Stoll RJ, Subramanian D (2006) Hubs, authorities, and networks: predicting conflict using events data. In: International Studies Association (ISA) Annual Convention, CiteseerGoogle Scholar
  28. Székely GJ, Rizzo ML, Bakirov NK et al (2007) Measuring and testing dependence by correlation of distances. Ann Stat 35(6):2769–2794MathSciNetCrossRefzbMATHGoogle Scholar
  29. Tang J, Wang X, Liu H (2012) Integrating social media data for community detection. In: Modeling and mining ubiquitous social media. Springer, Berlin, pp 1–20Google Scholar
  30. Theocharis Y (2013) The wealth of (occupation) networks? Communication patterns and information distribution in a Twitter protest network. J Inf Technol Politics 10(1):35–56CrossRefGoogle Scholar
  31. Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B (Methodological) 58(1):267–288MathSciNetzbMATHGoogle Scholar
  32. Tumasjan A, Sprenger TO, Sandner PG, Welpe IM (2010) Predicting elections with Twitter: what 140 characters reveal about political sentiment. In: Proceedings of the fourth international AAAI conference on weblogs and social media (ICWSM), vol 10, pp 178–185Google Scholar
  33. Ward MD, Metternich NW, Carrington C, Dorff C, Gallop M, Hollenbach FM, Schultz A, Weschle S (2012) Geographical models of crises: evidence from ICEWS. Adv Des Cross-Cult Activities 429–438Google Scholar
  34. Wulf V, Aal K, Abu Kteish I, Atam M, Schubert K, Rohde M, Yerousis GP, Randall D (2013a) Fighting against the wall: social media use by political activists in a Palestinian village. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 1979–1988Google Scholar
  35. Wulf V, Misaki K, Atam M, Randall D, Rohde M (2013b) On the ground in Sidi Bouzid: Investigating social media use during the Tunisian revolution. In: Proceedings of the 2013 ACM conference on computer supported cooperative work, pp 1409–1418Google Scholar
  36. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Wien 2016

Authors and Affiliations

  1. 1.Biocomplexity Institute of Virginia TechArlingtonUSA
  2. 2.Biocomplexity Institute of Virginia TechBlacksburgUSA
  3. 3.Discovery Analytics Center, Virginia TechArlingtonUSA

Personalised recommendations