Skip to main content

On the Automated Generation of Scholarly Publishing Linked Datasets: The Case of CEUR-WS Proceedings

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 548))

Abstract

The availability of highly-informative semantic descriptions of scholarly publishing contents enables an easier sharing and reuse of research findings as well as a better assessment of the quality of scientific productions. In the context of the ESWC2015 Semantic Publishing Challenge, we present a system that automatically generates rich RDF datasets from CEUR-WS workshop proceedings and exposes them as Linked Data. Web pages of proceedings and textual contents of papers are analyzed through proper text processing pipelines. Semantic annotations are added by a set of SVM classifiers and refined by heuristics, gazetteers and rule-based grammars. Web services are exploited to link annotations to external datasets like DBpedia, CrossRef, FundRef and Bibsonomy. Finally, the data is modelled and published as an RDF graph.

The work described in this paper has been funded by the European Project Dr. Inventor (FP7-ICT-2013.8.1 - Grant no: 611383).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://dblp.l3s.de/d2r/.

  2. 2.

    http://acm.rkbexplorer.com/.

  3. 3.

    http://ieee.rkbexplorer.com/.

  4. 4.

    For a detailed description of how workshop related data are modeled as an RDF graph, refer to Sect. 4.

  5. 5.

    A semantic markup approach that conveys metadata and other attributes in Web pages by existing HTML/XHTML tags.

  6. 6.

    A semantic markup useful to embed RDF triples within XHTML documents.

  7. 7.

    http://www.bibsonomy.org/.

  8. 8.

    http://dblp.uni-trier.de/.

  9. 9.

    http://www.wikicfp.com/cfp/.

  10. 10.

    http://crossref.org/.

  11. 11.

    http://www.crossref.org/fundref/.

  12. 12.

    http://dbpedia.org/.

  13. 13.

    http://spotlight.dbpedia.org/.

  14. 14.

    https://gate.ac.uk/.

  15. 15.

    http://gate.ac.uk/sale/tao/splitch6.html.

  16. 16.

    https://open-data.europa.eu/en/data.

  17. 17.

    http://pdfx.cs.man.ac.uk/.

  18. 18.

    http://poppler.freedesktop.org/.

  19. 19.

    http://gate.ac.uk/sale/tao/splitch6.html.

  20. 20.

    http://search.crossref.org/help/api.

  21. 21.

    http://www.bibsonomy.org/help/doc/api.html.

  22. 22.

    http://freecite.library.brown.edu/.

  23. 23.

    https://gate.ac.uk/sale/tao/splitch8.html.

References

  1. Shotton, D.: Semantic publishing: the coming revolution in scientific journal publishing. Learned Publishing 22(2), 85–94 (2009)

    Article  Google Scholar 

  2. Spanos, D.E., Stavrou, P., Mitrou, N.: Bringing relational databases into the semantic web: a survey. Semant. Web 3(2), 169–209 (2012). IOS Press

    Google Scholar 

  3. World Wide Web Consortium: R2RML: RDB to RDF mapping language. W3C Recommendation (2012)

    Google Scholar 

  4. Bizer, C., Cyganiak, R.: D2r server-publishing relational databases on the semantic web. In: Poster at the 5th International Semantic Web Conference (2006)

    Google Scholar 

  5. Knoblock, C.A., Szekely, P., Ambite, J.L., Goel, A., Gupta, S., Lerman, K., Muslea, M., Taheriyan, M., Mallick, P.: Semi-automatically mapping structured sources into the semantic web. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 375–390. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  6. Rizzo, G., Troncy, R., Hellmann, S., Bruemmer, M.: NERD meets NIF: lifting NLP extraction results to the linked data cloud. In: Proceedings of the Linked Data on the Web Workshop (2012)

    Google Scholar 

  7. Khalili, A., Auer, S., Hladky, D.: The RDFa content editor - from WYSIWYG to WYSIWYM. In: Proceedings of the IEEE Computer Software and Applications Conference, COMPSAC (2012)

    Google Scholar 

  8. Augenstein, I., Padó, S., Rudolph, S.: LODifier: generating linked data from unstructured text. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 210–224. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Exner, P., Nugues, P.: Entity extraction: from unstructured text to DBpedia RDF triples. In: Proceedings of the Web of Linked Entities Workshop, WoLE (2012)

    Google Scholar 

  10. Stegmaier, F., et al.: Unleashing semantics of research data. In: Rabl, T., Poess, M., Baru, C., Jacobsen, H.-A. (eds.) WBDB 2012. LNCS, vol. 8163, pp. 103–112. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  11. Eefke, S., Van Der Graaf, M.: Journal article mining: the scholarly publishers’ perspective. Learned Publishing 25(1), 35–46 (2012)

    Article  Google Scholar 

  12. Bizer, C.: Linking data and publications expert report, global research data infrastructure of European Union (2012)

    Google Scholar 

  13. Ciancarini, P., Di Iorio, A., Nuzzolese, A.G., Peroni, S., Vitali, F.: Semantic annotation of scholarly documents and citations. In: Baldoni, M., Baroglio, C., Boella, G., Micalizio, R. (eds.) AI*IA 2013. LNCS, vol. 8249, pp. 336–347. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  14. Attwood, T.K., Kell, D.B., McDermott, P., Marsh, J., Pettifer, S.R., Thorne, D.: Utopia documents: linking scholarly literature with research data. Bioinformatics 26(18), 568–574 (2010)

    Article  Google Scholar 

  15. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, ACL (2002)

    Google Scholar 

  16. Li, Y., Bontcheva, K., Cunningham, H.: Adapting SVM for Data sparseness and imbalance: a case study on information extraction. Nat. Lang. Eng. 15, 241–271 (2009). Cambridge University Press

    Article  Google Scholar 

  17. Mendes, P.N., Jakob, M., Garca-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)

    Google Scholar 

  18. Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open Information Extraction from the Web. Communications of the ACM - Surviving the data deluge 51(12), 68–74 (2008)

    Article  Google Scholar 

  19. Wimalasuriya, D.C., Dou, D.: Ontology-based Information Extraction: An Introduction and a Survey of Current Approaches. Journal of Information Science 36(3), 306–323 (2010)

    Article  Google Scholar 

  20. Saggion, H., Funk, A., Maynard, D., Bontcheva, K.: Ontology-based information extraction for business intelligence. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 843–856. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  21. Bechhofer, S., et al.: Why linked data is not enough for scientists. Future Gener. Comput. Syst. Spec. Sect. Recent Adv. e-Sci. 29(2), 599–611 (2013). Elsevier

    Article  Google Scholar 

  22. Ronzano, F., del Bosque, G.C., Saggion, H.: Semantify CEUR-WS proceedings: towards the automatic generation of highly descriptive scholarly publishing linked datasets. In: Presutti, V., et al. (eds.) SemWebEval 2014. CCIS, vol. 475, pp. 83–88. Springer, Heidelberg (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Ronzano .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ronzano, F., Fisas, B., del Bosque, G.C., Saggion, H. (2015). On the Automated Generation of Scholarly Publishing Linked Datasets: The Case of CEUR-WS Proceedings. In: Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A. (eds) Semantic Web Evaluation Challenges. SemWebEval 2015. Communications in Computer and Information Science, vol 548. Springer, Cham. https://doi.org/10.1007/978-3-319-25518-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25518-7_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25517-0

  • Online ISBN: 978-3-319-25518-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics