A Joint Human/Machine Process for Coding Events and Conflict Drivers

  • Bradford Heap
  • Alfred Krzywicki
  • Susanne Schmeidl
  • Wayne Wobcke
  • Michael Bain
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10604)

Abstract

Constructing datasets to analyse the progression of conflicts has been a longstanding objective of peace and conflict studies research. In essence, the problem is to reliably extract relevant text snippets and code (annotate) them using an ontology that is meaningful to social scientists. Such an ontology usually characterizes either types of violent events (killing, bombing, etc.), and/or the underlying drivers of conflict, themselves hierarchically structured, for example security, governance and economics, subdivided into conflict-specific indicators. Numerous coding approaches have been proposed in the social science literature, ranging from fully automated “machine” coding to human coding. Machine coding is highly error prone, especially for labelling complex drivers, and suffers from extraction of duplicated events, but human coding is expensive, and suffers from inconsistency between annotators; thus hybrid approaches are required. In this paper, we analyse experimentally how human input can most effectively be used in a hybrid system to complement machine coding. Using two newly created real-world datasets, we show that machine learning methods improve on rule-based automated coding for filtering large volumes of input, while human verification of relevant/irrelevant text leads to improved performance of machine learning for predicting multiple labels in the ontology.

Notes

Acknowledgements

This work was supported by Data to Decisions Cooperative Research Centre. We are grateful to Josie Gardner for labelling the ICG DRC dataset, and to Michael Burnside and Kaitlyn Hedditch for coding the AfPak event data.

References

  1. 1.
    Azar, E.E.: The conflict and peace data bank (COPDAB) project. J. Confl. Resolut. 24, 143–152 (1980)CrossRefGoogle Scholar
  2. 2.
    Bagozzi, B.E., Schrodt, P.A.: The dimensionality of political news reports. Paper Presented at the Second Annual General Conference of the European Political Science Association, Berlin (2012)Google Scholar
  3. 3.
    Bond, D., Bond, J., Oh, C., Jenkins, J.C., Taylor, C.L.: Integrated data for events analysis (IDEA): an event typology for automated events data development. J. Peace Res. 40, 733–745 (2003)CrossRefGoogle Scholar
  4. 4.
    Bond, D., Jenkins, J.C., Taylor, C.L., Schock, K.: Mapping mass political conflict and civil society: issues and prospects for the automated development of event data. J. Confl. Resolut. 41, 553–579 (1997)CrossRefGoogle Scholar
  5. 5.
    Gerner, D.J., Schrodt, P.A., Yilmaz, O., Abu-Jabr, R.: Conflict and mediation event observations (CAMEO): a new event data framework for the analysis of foreign policy interactions. Paper Presented at the Annual Meetings of the International Studies Association, New Orleans, LA (2002)Google Scholar
  6. 6.
    LaFree, G., Dugan, L.: Introducing the global terrorism database. Terrorism Political Violence 19, 181–204 (2007)CrossRefGoogle Scholar
  7. 7.
    Leetaru, K., Schrodt, P.A.: GDELT: global data on events, location, and tone, 1979–2012. Paper Presented at the Annual Meetings of the International Studies Association, San Francisco, CA (2013)Google Scholar
  8. 8.
    McClelland, C.: World Event/Interaction Survey (WEIS) Project 1966–1978. Inter-University Consortium for Political and Social Research (1978)Google Scholar
  9. 9.
    Murphy, K.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge, MA (2012)MATHGoogle Scholar
  10. 10.
    Nardulli, P.F., Althaus, S.L., Hayes, M.: A progressive supervised-learning approach to generating rich civil strife data. Sociol. Methodol. 45, 148–183 (2015)CrossRefGoogle Scholar
  11. 11.
    Raleigh, C., Linke, A., Hegre, H., Karlsen, J.: Introducing ACLED: an armed conflict location and event dataset special data feature. J. Peace Res. 47, 651–660 (2010)CrossRefGoogle Scholar
  12. 12.
    Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 616–623 (2003)Google Scholar
  13. 13.
    Schneider, K.-M.: Techniques for improving the performance of Naive Bayes for text classification. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 682–693. Springer, Heidelberg (2005). doi: 10.1007/978-3-540-30586-6_76 CrossRefGoogle Scholar
  14. 14.
    Schrodt, P.A., Davis, S.G., Weddle, J.L.: Political science: KEDS—a program for the machine coding of event data. Soc. Sci. Comput. Rev. 12, 561–587 (1994)CrossRefGoogle Scholar
  15. 15.
    Schrodt, P.A., Yonamine, J.E.: A guide to event data: past, present, and future. All Azimuth 2(2), 5–22 (2013)Google Scholar
  16. 16.
    Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2, pp. 90–94 (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Bradford Heap
    • 1
  • Alfred Krzywicki
    • 1
  • Susanne Schmeidl
    • 2
  • Wayne Wobcke
    • 1
  • Michael Bain
    • 1
  1. 1.School of Computer Science and EngineeringUniversity of New South WalesSydneyAustralia
  2. 2.School of Social SciencesUniversity of New South WalesSydneyAustralia

Personalised recommendations