Knowledge Extraction from a Small Corpus of Unstructured Safeguarding Reports

  • Aleksandra EdwardsEmail author
  • Alun Preece
  • Hélène de Ribaupierre
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11762)


This paper presents results on the performance of a range of analysis tools for extracting entities and sentiments from a small corpus of unstructured, safeguarding reports. We use sentiment analysis to identify strongly positive and strongly negative segments in an attempt to attribute patterns on the sentiments extracted to specific entities. We use entity extraction for identifying key entities. We evaluate tool performance against non-specialist human annotators. An initial study comparing the inter-human agreement against inter-machine agreement shows higher overall scores from human annotators than software tools. However, the degree of consensus between the human annotators for entity extraction is lower than expected which suggests a need for trained annotators. For sentiment analysis the annotators reached a higher agreement for annotating descriptive sentences compared to reflective sentences, while inter-tool agreement was similarly low for the two sentence types. The poor performance of the entity extraction and sentiment analysis approaches point to the need for domain-specific approaches for knowledge extraction on these kinds of document. However, there is currently a lack of pre-existing ontologies in the safeguarding domain. Thus, in future our focus is the development of such a domain-specific ontology.


Text mining Sentiment analysis Entity extraction 


  1. 1.
    Cunningham, H., Tablan, V., Roberts, A., Bontcheva, K.: Getting more out of biomedical documents with gate’s full lifecycle open source text analytics. PLoS Comput. Biol. 9(2), e1002854 (2013)CrossRefGoogle Scholar
  2. 2.
    Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014)Google Scholar
  3. 3.
    Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment strength detection in short informal text. J. Am. Soc. Inform. Sci. Technol. 61(12), 2544–2558 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Aleksandra Edwards
    • 1
    • 2
    Email author
  • Alun Preece
    • 1
    • 2
  • Hélène de Ribaupierre
    • 1
  1. 1.School of Computer Science and InformaticsCardiff UniversityCardiffUK
  2. 2.Crime and Security Research InstituteCardiff UniversityCardiffUK

Personalised recommendations