Predicting Collective Action from Micro-Blog Data

  • Christos CharitonidisEmail author
  • Awais Rashid
  • Paul J. Taylor
Part of the Lecture Notes in Social Networks book series (LNSN)


Global and national events in recent years have shown that social media, and particularly micro-blogging services such as Twitter, can be a force for good (e.g., Arab Spring) and harm (e.g., London riots). In both of these examples, social media played a key role in group formation and organisation, and in the coordination of the group’s subsequent collective actions (i.e., the move from rhetoric to action). Surprisingly, despite its clear importance, little is understood about the factors that lead to this kind of group development and the transition to collective action. This paper focuses on an approach to the analysis of data from social media to detect weak signals, i.e., indicators that initially appear at the fringes, but are, in fact, early indicators of such large-scale real-world phenomena. Our approach is in contrast to existing research which focuses on analysing major themes, i.e., the strong signals, prevalent in a social network at a particular point in time. Analysis of weak signals can provide interesting possibilities for forecasting, with online user-generated content being used to identify and anticipate possible offline future events. We demonstrate our approach through analysis of tweets collected during the London riots in 2011 and use of our weak signals to predict tipping points in that context.


Social media Micro-blogs Twitter Weak signals Forecasting Content analysis London riots Civil unrest Event detection Machine learning Predictive modelling Crisis Informatics 



We would like to express our thanks to Dr. Paul Rayson for providing us access to Wmatrix’s web interface and API, and Prof. Mike Thelwall for providing us with the SentiStrength Java version.


  1. 1.
    Abel F, Hauff C, Houben GJ, Stronkman R, Tao K (2012) Twitcident: fighting fire with information from social web streams. In: Proceedings of the 21st international conference companion on world wide web, WWW ’12 companion. ACM, New York, pp 305–308CrossRefGoogle Scholar
  2. 2.
    Achrekar H, Gandhe A, Lazarus R, Yu SH, Liu B (2011) Predicting flu trends using twitter data. In: Proceedings of the 2011 IEEE conference on computer communications workshops (INFOCOM WKSHPS), pp 702–707Google Scholar
  3. 3.
    Ahlqvist T, Halonen M, Heinonen S (2007) Weak signals in social media. Report on two workshop experiments in futures monitoring. SOMED foresight report, 1Google Scholar
  4. 4.
    Asur S, Huberman BA (2010) Predicting the future with social media. In: Proceedings of the 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, WI-IAT ’10. IEEE Computer Society, Washington, DC, pp 492–499CrossRefGoogle Scholar
  5. 5.
    Baron A, Rayson P (2008) Vard2: a tool for dealing with spelling variation in historical corpora. In: Proceedings of the postgraduate conference in Corpus linguisticsGoogle Scholar
  6. 6.
    Bodendorf F, Kaiser C (2009) Detecting opinion leaders and trends in online social networks. In: Proceedings of the 2nd ACM workshop on social web search and mining, SWSM ’09. ACM, New York, pp 65–68CrossRefGoogle Scholar
  7. 7.
    Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8CrossRefGoogle Scholar
  8. 8.
    Castells M (2012) Networks of outrage and hope: social movements in the Internet age. Polity Press/Wiley, Malden/HobokenGoogle Scholar
  9. 9.
    Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27Google Scholar
  10. 10.
    Charitonidis C, Rashid A, Taylor PJ (2015) Weak signals as predictors of real-world phenomena in social media. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015, ASONAM ’15. ACM, New York, pp 864–871CrossRefGoogle Scholar
  11. 11.
    Conway M, Doan S, Kawazoe A, Collier N (2009) Classifying disease outbreak reports using n-grams and semantic features. Int J Med Inform 78(12):e47–e58. Mining of Clinical and Biomedical Text and Data Special IssueGoogle Scholar
  12. 12.
    Diani M, McAdam D (2003) Social movements and networks: relational approaches to collective action. Comparative politics series. Oxford University Press, OxfordCrossRefGoogle Scholar
  13. 13.
    Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305zbMATHGoogle Scholar
  14. 14.
    Forsyth DR (2009) Group dynamics. Cengage Learning, WadsworthCrossRefGoogle Scholar
  15. 15.
    Garside R, Smith N (1997) A hybrid grammatical tagger: Claws4. Corpus annotation: linguistic information from computer text corpora, pp 102–121Google Scholar
  16. 16.
    Gonzalez-Bailon S, Borge-Holthoefer J, Rivero A, Moreno Y (2011) The dynamics of protest recruitment through an online network. Sci Rep 1.
  17. 17.
    Granovetter MS (1973) The strength of weak ties. Am J Sociol 78(6):1360–1380CrossRefGoogle Scholar
  18. 18.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18CrossRefGoogle Scholar
  19. 19.
    Haythornthwaite C (1996) Social network analysis: an approach and technique for the study of information exchange. Libr Inf Sci Res 18(4):323–342CrossRefGoogle Scholar
  20. 20.
    Li R, Lei KH, Khadiwala R, Chang KCC (2012) Tedas: a twitter-based event detection and analysis system. In: Proceedings of the 2012 IEEE 28th international conference on data engineering (ICDE), pp 1273–1276Google Scholar
  21. 21.
    Lim SL, Quercia D, Finkelstein A (2010) Stakenet: using social networks to analyse the stakeholders of large-scale software projects. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, ICSE ’10, vol 1. ACM, New York, pp 295–304Google Scholar
  22. 22.
    Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the Twitter stream. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, SIGMOD ’10. ACM, New York, pp 1155–1158CrossRefGoogle Scholar
  23. 23.
    Piao SS, Rayson P, Archer D, McEnery T (2005) Comparing and combining a semantic tagger and a statistical tool for MWE extraction. Comput Speech Lang 19(4):378–397. Special issue on multiword expressionGoogle Scholar
  24. 24.
    Prentice S, Taylor PJ, Rayson P, Hoskins A, O’Loughlin B (2011) Analyzing the semantic content and persuasive composition of extremist media: a case study of texts produced during the Gaza conflict. Inform Syst Front 13(1):61–73CrossRefGoogle Scholar
  25. 25.
    Prentice S, Rayson P, Taylor PJ (2012) The language of islamic extremism: towards an automated identification of beliefs, motivations and justifications. Int J Corpus Linguis 17(2):259–286CrossRefGoogle Scholar
  26. 26.
    Rad AA, Benyoucef M (2011) Towards detecting influential users in social networks. In: International conference on E-technologies. Springer, Berlin/Heidelberg, pp 227–240Google Scholar
  27. 27.
    Rashid A, Baron A, Rayson P, May-Chahal C, Greenwood P, Walkerdine J (2013) Who am i? Analyzing digital personas in cybercrime investigations. Computer 46(4):54–61CrossRefGoogle Scholar
  28. 28.
    Rayson P (2008) From key words to key semantic domains. Int J Corpus Linguis 13(4):519–549CrossRefGoogle Scholar
  29. 29.
    Rayson P, Garside R (2000) Comparing corpora using frequency profiling. In: Proceedings of the workshop on comparing corpora, WCC ’00. Association for Computational Linguistics, Stroudsburg, pp 1–6CrossRefGoogle Scholar
  30. 30.
    Rayson P, Archer D, Piao S, McEnery A (2004) The UCREL semantic analysis system. In: Proceedings of the beyond named entity recognition semantic labelling for NLP tasks workshop, pp 7–12Google Scholar
  31. 31.
    Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on world wide web, WWW ’10. ACM, New York, pp 851–860CrossRefGoogle Scholar
  32. 32.
    Sankaranarayanan J, Samet H, Teitler BE, Lieberman MD, Sperling J (2009) Twitterstand: news in tweets. In: Proceedings of the 17th ACM SIGSPATIAL international conference on advances in geographic information systems, GIS ’09. ACM, New York, pp 42–51Google Scholar
  33. 33.
    Taylor PJ, Dando CJ, Ormerod TC, Ball LJ, Jenkins MC, Sandham A, Menacere T (2013) Detecting insider threats to organizations through language change. Law Human Behav 37(4):267–275CrossRefGoogle Scholar
  34. 34.
    Thelwall M, Buckley K, Paltoglou G, Cai D, Kappas, A (2010) Sentiment strength detection in short informal text. J Am Soc Inform Sci Technol 61(12):2544–2558CrossRefGoogle Scholar
  35. 35.
    Thelwall M, Buckley K, Paltoglou G (2011) Sentiment in twitter events. J Am Soc Inform Sci Technol 62(2):406–418CrossRefGoogle Scholar
  36. 36.
    Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San FranciscoGoogle Scholar
  37. 37.
    Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of the fourteenth international conference on machine learning, ICML ’97. Morgan Kaufmann Publishers Inc., San Francisco, pp 412–420Google Scholar
  38. 38.
    Yu B (2008) An evaluation of text classification methods for literary study. Literary Linguis Comput 23(3):327–343CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Christos Charitonidis
    • 1
    Email author
  • Awais Rashid
    • 1
  • Paul J. Taylor
    • 2
  1. 1.Security Lancaster Research Centre, Infolab21Lancaster UniversityLancasterUK
  2. 2.Department of Psychology, Centre for Research and Evidence on Security Threats (CREST)Lancaster UniversityLancasterUK

Personalised recommendations