On the Automated Generation of Scholarly Publishing Linked Datasets: The Case of CEUR-WS Proceedings

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 548)


The availability of highly-informative semantic descriptions of scholarly publishing contents enables an easier sharing and reuse of research findings as well as a better assessment of the quality of scientific productions. In the context of the ESWC2015 Semantic Publishing Challenge, we present a system that automatically generates rich RDF datasets from CEUR-WS workshop proceedings and exposes them as Linked Data. Web pages of proceedings and textual contents of papers are analyzed through proper text processing pipelines. Semantic annotations are added by a set of SVM classifiers and refined by heuristics, gazetteers and rule-based grammars. Web services are exploited to link annotations to external datasets like DBpedia, CrossRef, FundRef and Bibsonomy. Finally, the data is modelled and published as an RDF graph.


Semantic Web Information extraction Scholarly publishing Open Linked Data 


  1. 1.
    Shotton, D.: Semantic publishing: the coming revolution in scientific journal publishing. Learned Publishing 22(2), 85–94 (2009)CrossRefGoogle Scholar
  2. 2.
    Spanos, D.E., Stavrou, P., Mitrou, N.: Bringing relational databases into the semantic web: a survey. Semant. Web 3(2), 169–209 (2012). IOS PressGoogle Scholar
  3. 3.
    World Wide Web Consortium: R2RML: RDB to RDF mapping language. W3C Recommendation (2012)Google Scholar
  4. 4.
    Bizer, C., Cyganiak, R.: D2r server-publishing relational databases on the semantic web. In: Poster at the 5th International Semantic Web Conference (2006)Google Scholar
  5. 5.
    Knoblock, C.A., Szekely, P., Ambite, J.L., Goel, A., Gupta, S., Lerman, K., Muslea, M., Taheriyan, M., Mallick, P.: Semi-automatically mapping structured sources into the semantic web. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 375–390. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  6. 6.
    Rizzo, G., Troncy, R., Hellmann, S., Bruemmer, M.: NERD meets NIF: lifting NLP extraction results to the linked data cloud. In: Proceedings of the Linked Data on the Web Workshop (2012)Google Scholar
  7. 7.
    Khalili, A., Auer, S., Hladky, D.: The RDFa content editor - from WYSIWYG to WYSIWYM. In: Proceedings of the IEEE Computer Software and Applications Conference, COMPSAC (2012)Google Scholar
  8. 8.
    Augenstein, I., Padó, S., Rudolph, S.: LODifier: generating linked data from unstructured text. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 210–224. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  9. 9.
    Exner, P., Nugues, P.: Entity extraction: from unstructured text to DBpedia RDF triples. In: Proceedings of the Web of Linked Entities Workshop, WoLE (2012)Google Scholar
  10. 10.
    Stegmaier, F., et al.: Unleashing semantics of research data. In: Rabl, T., Poess, M., Baru, C., Jacobsen, H.-A. (eds.) WBDB 2012. LNCS, vol. 8163, pp. 103–112. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  11. 11.
    Eefke, S., Van Der Graaf, M.: Journal article mining: the scholarly publishers’ perspective. Learned Publishing 25(1), 35–46 (2012)CrossRefGoogle Scholar
  12. 12.
    Bizer, C.: Linking data and publications expert report, global research data infrastructure of European Union (2012)Google Scholar
  13. 13.
    Ciancarini, P., Di Iorio, A., Nuzzolese, A.G., Peroni, S., Vitali, F.: Semantic annotation of scholarly documents and citations. In: Baldoni, M., Baroglio, C., Boella, G., Micalizio, R. (eds.) AI*IA 2013. LNCS, vol. 8249, pp. 336–347. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  14. 14.
    Attwood, T.K., Kell, D.B., McDermott, P., Marsh, J., Pettifer, S.R., Thorne, D.: Utopia documents: linking scholarly literature with research data. Bioinformatics 26(18), 568–574 (2010)CrossRefGoogle Scholar
  15. 15.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, ACL (2002)Google Scholar
  16. 16.
    Li, Y., Bontcheva, K., Cunningham, H.: Adapting SVM for Data sparseness and imbalance: a case study on information extraction. Nat. Lang. Eng. 15, 241–271 (2009). Cambridge University PressCrossRefGoogle Scholar
  17. 17.
    Mendes, P.N., Jakob, M., Garca-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)Google Scholar
  18. 18.
    Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open Information Extraction from the Web. Communications of the ACM - Surviving the data deluge 51(12), 68–74 (2008)CrossRefGoogle Scholar
  19. 19.
    Wimalasuriya, D.C., Dou, D.: Ontology-based Information Extraction: An Introduction and a Survey of Current Approaches. Journal of Information Science 36(3), 306–323 (2010)CrossRefGoogle Scholar
  20. 20.
    Saggion, H., Funk, A., Maynard, D., Bontcheva, K.: Ontology-based information extraction for business intelligence. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 843–856. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  21. 21.
    Bechhofer, S., et al.: Why linked data is not enough for scientists. Future Gener. Comput. Syst. Spec. Sect. Recent Adv. e-Sci. 29(2), 599–611 (2013). ElsevierCrossRefGoogle Scholar
  22. 22.
    Ronzano, F., del Bosque, G.C., Saggion, H.: Semantify CEUR-WS proceedings: towards the automatic generation of highly descriptive scholarly publishing linked datasets. In: Presutti, V., et al. (eds.) SemWebEval 2014. CCIS, vol. 475, pp. 83–88. Springer, Heidelberg (2014) Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.TALN Research GroupUniversitat Pompeu FabraBarcelonaSpain

Personalised recommendations