A Task Set Proposal for Automatic Protest Information Collection Across Multiple Countries

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11438)


We propose a coherent set of tasks for protest information collection in the context of generalizable natural language processing. The tasks are news article classification, event sentence detection, and event extraction. Having tools for collecting event information from data produced in multiple countries enables comparative sociology and politics studies. We have annotated news articles in English from a source and a target country in order to be able to measure the performance of the tools developed using data from one country on data from a different country. Our preliminary experiments have shown that the performance of the tools developed using English texts from India drops to a level that are not usable when they are applied on English texts from China. We think our setting addresses the challenge of building generalizable NLP tools that perform well independent of the source of the text and will accelerate progress in line of developing generalizable NLP systems.


Natural language processing Information retrieval Machine learning Text classification Information extraction Event extraction Domain adaptation Transfer learning Computational social science Contentious politics Protest information 



This work is funded by the European Research Council (ERC) Starting Grant 714868 awarded to Dr. Erdem Yörük for his project Emerging Welfare. (, accessed January 19) We are grateful to our steering committee members for the CLEF 2019 lab Sophia Ananiadou, Antal van den Bosch, Kemal Oflazer, Arzucan Özgür, Aline Villavicencio, and Hristo Tanev. Finally, we thank to Theresa Gessler and Peter Makarov for their contribution in organizing the CLEF lab by reviewing the annotation manuals and sharing their work with us respectively.


  1. 1.
    Akdemir, A., Hürriyetoğlu, A., Yörük, E., Gürel, B., Yoltar, C., Yüret, D.: Towards generalizable place name recognition systems: analysis and enhancement of NER systems on English News from India. In: Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018, pp. 8:1–8:10. ACM, New York (2018).
  2. 2.
    Boschee, E., Natarajan, P., Weischedel, R.: Automatic extraction of events from open source text for predictive forecasting. In: Subrahmanian, V. (ed.) Handbook of Computational Approaches to Counterterrorism, pp. 51–67. Springer, New York (2013). Scholar
  3. 3.
    Büyüköz, B., Hürriyetoğlu, A., Yörük, E., Yüret, D.: Examining existing information extraction tools on manually-annotated protest events in Indian news. In: Proceedings of Computational Linguistics in Netherlands (CLIN), CLIN29 (2019)Google Scholar
  4. 4.
    Chenoweth, E., Lewis, O.A.: Unpacking nonviolent campaigns: introducing the NAVCO 2.0 dataset. J. Peace Res. 50(3), 415–423 (2013). Scholar
  5. 5.
    Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
  6. 6.
    Ettinger, A., Rao, S., Daumé III, H., Bender, E.M.: Towards linguistically generalizable NLP systems: a workshop and shared task. In: Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems, pp. 1–10. Association for Computational Linguistics (2017).
  7. 7.
    Giugni, M.G.: Was it worth the effort? The outcomes and consequences of social movements. Ann. Rev. Sociol. 24, 371–393 (1998). Scholar
  8. 8.
    Hammond, J., Weidmann, N.B.: Using machine-coded event data for the micro-level study of political violence. Res. Polit. 1(2) (2014). Scholar
  9. 9.
    Leetaru, K., Schrodt, P.A.: GDELT: global data on events, location, and tone, 1979–2012. In: ISA Annual Convention, vol. 2, pp. 1–49. Citeseer (2013)Google Scholar
  10. 10.
    Lorenzini, J., Makarov, P., Kriesi, H., Wueest, B.: Towards a dataset of automatically coded protest events from English-language Newswire documents. In: Paper Presented at the Amsterdam Text Analysis Conference (2016)Google Scholar
  11. 11.
    Nardulli, P.F., Althaus, S.L., Hayes, M.: A progressive supervised-learning approach to generating rich civil strife data. Sociol. Methodol. 45(1), 148–183 (2015). Scholar
  12. 12.
    Schrodt, P.A., Beieler, J., Idris, M.: Three’sa charm? Open event data coding with el: Diablo, Petrarch, and the open event data alliance. In: ISA Annual Convention (2014)Google Scholar
  13. 13.
    Soboroff, I., Ferro, N., Fuhr, N.: Report on GLARE 2018: 1st workshop on generalization in information retrieval: can we predict performance in new domains? SIGIR Forum 52(2), 132–137 (2018). Scholar
  14. 14.
    Sönmez, Ç., Özgür, A., Yörük, E.: Towards building a political protest database to explain changes in the welfare state. In: Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 106–110. Association for Computational Linguistics (2016).,
  15. 15.
    Tarrow, S.: Power in Movement: Social Movements, Collective Action and Politics. Cambridge Studies in Comparative Politics, Cambridge University Press (1994).
  16. 16.
    Wang, W.: Event detection and extraction from news articles. Ph.D. thesis, Virginia Tech (2018)Google Scholar
  17. 17.
    Wang, W., Kennedy, R., Lazer, D., Ramakrishnan, N.: Growing pains for global monitoring of societal events. Science 353(6307), 1502–1503 (2016). Scholar
  18. 18.
    Weidmann, N.B., Rød, E.G.: The Internet and Political Protest in Autocracies, Chap. Coding Protest Events in Autocracies. Oxford University Press, Oxford (2019)Google Scholar
  19. 19.
    Yoruk, E.: The politics of the Turkish welfare system transformation in the neoliberal era: welfare as mobilization and containment. The Johns Hopkins University (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Koc UniversityİstanbulTurkey

Personalised recommendations