On the Automated Generation of Scholarly Publishing Linked Datasets: The Case of CEUR-WS Proceedings
- Cite this paper as:
- Ronzano F., Fisas B., del Bosque G.C., Saggion H. (2015) On the Automated Generation of Scholarly Publishing Linked Datasets: The Case of CEUR-WS Proceedings. In: Gandon F., Cabrio E., Stankovic M., Zimmermann A. (eds) Semantic Web Evaluation Challenges. Communications in Computer and Information Science, vol 548. Springer, Cham
The availability of highly-informative semantic descriptions of scholarly publishing contents enables an easier sharing and reuse of research findings as well as a better assessment of the quality of scientific productions. In the context of the ESWC2015 Semantic Publishing Challenge, we present a system that automatically generates rich RDF datasets from CEUR-WS workshop proceedings and exposes them as Linked Data. Web pages of proceedings and textual contents of papers are analyzed through proper text processing pipelines. Semantic annotations are added by a set of SVM classifiers and refined by heuristics, gazetteers and rule-based grammars. Web services are exploited to link annotations to external datasets like DBpedia, CrossRef, FundRef and Bibsonomy. Finally, the data is modelled and published as an RDF graph.