Semantify CEUR-WS Proceedings: Towards the Automatic Generation of Highly Descriptive Scholarly Publishing Linked Datasets

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 475)


Rich and fine-grained semantic information describing varied aspects of scientific productions is essential to support their diffusion as well as to properly assess the quality of their output. To foster this trend, in the context of the ESWC2014 Semantic Publishing Challenge, we present a system that automatically generates rich RDF datasets from CEUR-WS workshop proceedings. Proceedings are analyzed through a sequence of processing phases. SVM classifiers complemented by heuristics are used to annotate missing CEUR-WS markups. Annotations are then linked to external datasets like DBpedia and Bibsonomy. Finally, the data is modeled and published as an RDF graph. Our system is provided as an on-line Web service to support on-the-fly RDF generation. In this paper we describe the system and present its evaluation following the procedure set by the organizers of the challenge.


Semantic Web Information extraction Scholarly publishing Open Linked Data 


  1. 1.
    Shotton, D.: Semantic publishing: the coming revolution in scientific journal publishing. Learn. Publ. 22(2), 85–94 (2009)CrossRefGoogle Scholar
  2. 2.
    Shotton, D., Portwin, K., Klyne, G., Miles, A.: Adventures in semantic publishing: exemplar semantic enhancements of a research article. PLoS Comput. Biol. 5(4), e1000361 (2009)CrossRefGoogle Scholar
  3. 3.
    Smit, E., Van Der Graaf, M.: Journal article mining: the scholarly publishers’ perspective. Learn. Publ. 25(1), 35–46 (2012)CrossRefGoogle Scholar
  4. 4.
    Bizer, C.: Linking data & publications expert report. Global Research Data Infrastructure of European Union (2012)Google Scholar
  5. 5.
    Ciancarini, P., Di Iorio, A., Nuzzolese, A.G., Peroni, S., Vitali, F.: Semantic annotation of scholarly documents and citations. In: Baldoni, M., Baroglio, C., Boella, G., Micalizio, R. (eds.) AI*IA 2013. LNCS, vol. 8249, pp. 336–347. Springer, Heidelberg (2013)Google Scholar
  6. 6.
    Attwood, T.K., Kell, D.B., McDermott, P., Marsh, J., Pettifer, S.R., Thorne, D.: Utopia documents: linking scholarly literature with research data. Bioinformatics 26(18), 568–574 (2010)CrossRefGoogle Scholar
  7. 7.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, ACL (2002)Google Scholar
  8. 8.
    Li, Y., Bontcheva, K., Cunningham, H.: Adapting SVM for data sparseness and imbalance: a case study on information extraction. Nat. Lang. Eng. (Cambridge University Press) 15, 241–271 (2009)Google Scholar
  9. 9.
    Mendes, P.N., Jakob, M., Garca-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.TALN Research GroupUniversitat Pompeu FabraBarcelonaSpain

Personalised recommendations