Automatic Construction of a Semantic Knowledge Base from CEUR Workshop Proceedings

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 548)


We present an automatic workflow that performs text segmentation and entity extraction from scientific literature to primarily address Task 2 of the Semantic Publishing Challenge 2015. The goal of Task 2 is to extract various information from full-text papers to represent the context in which a document is written, such as the affiliation of its authors and the corresponding funding bodies. Our proposed solution is composed of two subsystems: (i) A text mining pipeline, developed based on the GATE framework, which extracts structural and semantic entities, such as authors’ information and references, and produces semantic (typed) annotations; and (ii) a flexible exporting module, the LODeXporter, which translates the document annotations into RDF triples according to custom mapping rules. Additionally, we leverage existing Named Entity Recognition (NER) tools to extract named entities from text and ground them to their corresponding resources on the Linked Open Data cloud, thus, briefly covering Task 3 objectives, which involves linking of detected entities to resources in existing open datasets. The output of our system is an RDF graph stored in a scalable TDB-based storage with a public SPARQL endpoint for the task’s queries.


General Architecture For Text Engineering (GATE) Text Mining Pipeline Perform Text Segmentation Mapping Rules Gazetteer Component 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Sateli, B., Witte, R.: What’s in this paper? Combining rhetorical entities with linked open data for semantic literature querying. In: Semantics, Analytics, Visualisation: Enhancing Scholarly Data (SAVE-SD 2015), Florence, Italy, ACM (2015)Google Scholar
  2. 2.
    Constantin, A., Peroni, S., Pettifer, S., David, S., Vitali, F.: The Document Components Ontology (DoCO). The Semantic Web Journal (2015) (in press).
  3. 3.
    Groza, T., Handschuh, S., Möller, K., Decker, S.: SALT - semantically annotated Open image in new window for scientific publications. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 518–532. Springer, Heidelberg (2007)Google Scholar
  4. 4.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V., Aswani, N., Roberts, I., Gorrell, G., Funk, A., Roberts, A., Damljanovic, D., Heitz, T., Greenwood, M.A., Saggion, H., Petrak, J., Li, Y., Peters, W.: Text Processing with GATE (Version 6). University of Sheffield, Department of Computer Science (2011)Google Scholar
  5. 5.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002) (2002)Google Scholar
  6. 6.
    Sateli, B., Witte, R.: Supporting researchers with a semantic literature management Wiki. In: The 4th Workshop on Semantic Publishing (SePublica 2014). CEUR Workshop Proceedings, vol. 1155, Anissaras, Crete, Greece. (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Semantic Software Lab, Department of Computer Science and Software EngineeringConcordia UniversityMontréalCanada

Personalised recommendations