Analysing and Discovering Semantic Relations in Scholarly Data

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 733)


Scholarly publishing has seen an ever increasing interest in Linked Open Data (LOD). However, most of the existing datasets are designed as flat translation of legacy data sources into RDF. Although that is a crucial step to address, a lot of useful information is not expressed in RDF, and humans are still required to infer relevant knowledge by reading and making sense of texts. Examples are the reasons why authors cite other papers, the rhetorical structure of scientific discourse, bibliometric measures, provenance information, and so on. In this paper we introduce the Semantic Lancet Project, whose goal is to make available a LOD which includes the formalisation of some useful knowledge hidden within the textual content of papers. We have developed a toolchain for reengineering and enhancing data extracted from some publisher’s legacy repositories. Finally, we show how these data are immediately useful to help humans to address relevant tasks, such as data browsing, expert finding, related works finding, and identification of data inconsistencies.



This paper was supported by MIUR PRIN 2016 GAUSS Project. We would like to thank Elsevier for granting access to Scopus and ScienceDirect APIs.


  1. 1.
    Ciancarini, P., Iorio, A., Nuzzolese, A.G., Peroni, S., Vitali, F.: Evaluating citation functions in CiTO: cognitive issues. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 580–594. Springer, Cham (2014). doi: 10.1007/978-3-319-07443-6_39 CrossRefGoogle Scholar
  2. 2.
    García-Castro, L., McLaughlin, C., García Castro, A.: Biotea: RDFizing PubMed central in support for the paper as an interface to the web of data. J. Biomed. Semant. 5(Suppl1), S5 (2013)CrossRefGoogle Scholar
  3. 3.
    IFLA Study Group on the FRBR (2009). Functional Requirements for Bibliographic Records. Accessed 7 Nov 2016
  4. 4.
    Lebo, T., Sahoo, S., McGuinness, D.: The PROV Ontology. W3C Recommendation, 30. World Wide Web Consortium. Accessed 7 Nov 2016
  5. 5.
    Gangemi, A., Presutti, V., Reforgiato Recupero, D., Nuzzolese, A.G., Draicchio, F., Mongiovì, M.: Semantic web machine reading with FRED. Semantic Web, Under review (2016).
  6. 6.
    Lehmann, J., et al.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web. 6(2), 167–195 (2015)Google Scholar
  7. 7.
    Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  8. 8.
    Möller, K., Heath, T., Handschuh, S., Domingue, J.: Recipes for semantic web dog food: the ESWC and ISWC metadata projects. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 802–815. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-76298-0_58 CrossRefGoogle Scholar
  9. 9.
    Di Iorio, A., Giannella, R., Poggi, F., Peroni, S., Vitali, F.: Exploring scholarly papers through citations. In: Proceedings of the 2015 ACM Symposium on Document Engineering, pp. 107–116. ACM (2015)Google Scholar
  10. 10.
    Horrocks, I., Patel-Schneider, P.F., van Harmelen, F.: From SHIQ and RDF to OWL: the making of a web ontology language. Web Semant. Sci. Serv. Agents World Wide Web 1(1), 7–26 (2003). doi: 10.1016/j.websem.2003.07.001 CrossRefGoogle Scholar
  11. 11.
    Ogbuji, C.: SPARQL 1.1 Graph Store HTTP Protocol. W3C Recommendation, 2013. World Wide Web Consortium (2013). Accessed 7 Dec 2016
  12. 12.
    Peroni, S.: The semantic publishing and referencing ontologies. Semantic Web Technologies and Legal Scholarly Publishing. LGTS, vol. 15, pp. 121–193. Springer, Cham (2014). doi: 10.1007/978-3-319-04777-5_5 Google Scholar
  13. 13.
    Picca, D., Gliozzo, A.M., Gangemi, A.: LMM: an OWL-DL MetaModel to represent heterogeneous lexical knowledge. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008) (2008)Google Scholar
  14. 14.
    Presutti, V., Draicchio, F., Gangemi, A.: Knowledge extraction based on discourse representation theory and linguistic frames. In: Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS (LNAI), vol. 7603, pp. 114–129. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33876-2_12 CrossRefGoogle Scholar
  15. 15.
    Qazvinian, V., Radev, D.: Identifying non-explicit citing sentences for citation-based summarization. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 555–564. Pennsylvania, USA (2010)Google Scholar
  16. 16.
    Sanderson, R., Ciccarese, P., Van de Sompel, H.: Designing the W3C open annotation data model. In: Proceedings of the 5th Annual ACM Web Science Conference (WebSci13), pp. 366–375. ACM Press, New York (2013)Google Scholar
  17. 17.
    Schuler, K.: A broad-coverage, comprehensive verb lexicon (2005). Accessed 1 Apr 2016
  18. 18.
    Shotton, D.: Publishing: open citations. Nature 502(7471), 295–297 (2013)CrossRefGoogle Scholar
  19. 19.
    Stasko, J.: Value-driven evaluation of visualizations. In: Proceedings of the Fifth Workshop on Beyond Time and Errors: Novel Evaluation Methods for Visualization, pp. 46–53. ACM (2014)Google Scholar
  20. 20.
    Teufel, S., Siddharthan, A., Tidhar, D.: Automatic classification of citation function. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), pp. 103–110 (2006)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.DASPLab, Department of Computer Science and EngineeringUniversity of BolognaBolognaItaly
  2. 2.STLab, Institute of Cognitive Science and TechnologiesNational Research CouncilRomeItaly

Personalised recommendations