Covid-on-the-Web: Knowledge Graph and Services to Advance COVID-19 Research

Michel, Franck; Gandon, Fabien; Ah-Kane, Valentin; Bobasheva, Anna; Cabrio, Elena; Corby, Olivier; Gazzotti, Raphaël; Giboin, Alain; Marro, Santiago; Mayer, Tobias; Simon, Mathieu; Villata, Serena; Winckler, Marco

doi:10.1007/978-3-030-62466-8_19

Franck Michel¹⁶,
Fabien Gandon¹⁶,
Valentin Ah-Kane¹⁶,
Anna Bobasheva¹⁶,
Elena Cabrio¹⁶,
Olivier Corby¹⁶,
Raphaël Gazzotti¹⁶,
Alain Giboin¹⁶,
Santiago Marro¹⁶,
Tobias Mayer¹⁶,
Mathieu Simon¹⁶,
Serena Villata¹⁶ &
…
Marco Winckler¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12507))

Included in the following conference series:

International Semantic Web Conference

3946 Accesses
33 Citations
1 Altmetric

Abstract

Scientists are harnessing their multi-disciplinary expertise and resources to fight the COVID-19 pandemic. Aligned with this mindset, the Covid-on-the-Web project aims to allow biomedical researchers to access, query and make sense of COVID-19 related literature. To do so, it adapts, combines and extends tools to process, analyze and enrich the “COVID-19 Open Research Dataset” (CORD-19) that gathers 50,000+ full-text scientific articles related to the coronaviruses. We report on the RDF dataset and software resources produced in this project by leveraging skills in knowledge representation, text, data and argument mining, as well as data visualization and exploration. The dataset comprises two main knowledge graphs describing (1) named entities mentioned in the CORD-19 corpus and linked to DBpedia, Wikidata and other BioPortal vocabularies, and (2) arguments extracted using ACTA, a tool automating the extraction and visualization of argumentative graphs, meant to help clinicians analyze clinical trials and make decisions. On top of this dataset, we provide several visualization and exploration tools based on the Corese Semantic Web platform, MGExplorer visualization library, as well as the Jupyter Notebook technology. All along this initiative, we have been engaged in discussions with healthcare and medical research institutes to align our approach with the actual needs of the biomedical community, and we have paid particular attention to comply with the open and reproducible science goals, and the FAIR principles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://team.inria.fr/wimmics/.
2.
http://ns.inria.fr/acta/.
3.
https://github.com/kermitt2/entity-fishing.
4.
https://github.com/frmichel/morph-xr2rml/.
5.
https://project.inria.fr/corese/.
6.
https://github.com/kermitt2/entity-fishing.
7.
https://github.com/Wimmics/covidontheweb/dataset.
8.
https://www.dublincore.org/specifications/dublin-core/dcmi-terms/.
9.
https://sparontologies.github.io/fabio/current/fabio.html.
10.
http://bibliontology.com/specification.html.
11.
http://xmlns.com/foaf/spec/.
12.
https://schema.org/.
13.
https://www.w3.org/TR/annotation-vocab/.
14.
https://www.w3.org/TR/prov-o/.
15.
PICO is a framework to answer health-care questions in evidence-based practice that comprises patients/population (P), intervention (I), control/comparison (C) and outcome (O).
16.
BERT is a self-attentive transformer models that uses language model (LM) pre-training to learn a task-independent understanding from vast amounts of text in an unsupervised fashion.
17.
The intervention and comparison label are treated as one joint class.
18.
http://purl.org/spar/amo/.
19.
http://rdfs.org/sioc/argument#.
20.
http://www.arg.dundee.ac.uk/aif#.
21.
https://github.com/frmichel/morph-xr2rml/.
22.
https://sourceforge.net/projects/dbpedia-spotlight/files/2016-10/en/.
23.
http://data.bioontology.org/documentation.
24.
Inputs were tokenized with the BERT tokenizer, where one sub-word token has a length of one to three characters.
25.
https://www.w3.org/TR/vocab-dcat/.
26.
https://www.w3.org/TR/void/.
27.
Covid-on-the-Web dataset URI: http://ns.inria.fr/covid19/covidontheweb-1-1.
28.
CORD-19 license https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/.
29.
ODC-By license: http://opendatacommons.org/licenses/by/1.0/.
30.
Covid Linked Data Visualizer can be tested at: http://covid19.i3s.unice.fr:8080.
31.
https://github.com/Wimmics/covidontheweb/tree/master/notebooks.
32.
Dataframes are tabular data structures widely used in Python and R for the data analysis.
33.
https://twitter.com/kidehen/status/1250530568955138048.
34.
https://github.com/Wimmics/covidontheweb/blob/master/doc/01-data-modeling.md.
35.
https://github.com/fhircat/CORD-19-on-FHIR.
36.
https://github.com/Knowledge-Graph-Hub/kg-covid-19/.
37.
https://covid19.pubannotation.org/.
38.
https://github.com/kingfish777/COVID19.
39.
https://github.com/usc-isi-i2/CKG-COVID-19.

References

Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620 (2019)
Google Scholar
Bersanelli, M.: Controversies about COVID-19 and anticancer treatment with immune checkpoint inhibitors. Immunotherapy 12(5), 269–273 (2020)
Article Google Scholar
Cabrio, E., Villata, S.: Five years of argument mining: a data-driven analysis. Proc. IJCAI 2018, 5427–5433 (2018)
Google Scholar
Cava, R.A., Freitas, C.M.D.S., Winckler, M.: Clustervis: visualizing nodes attributes in multivariate graphs. In: Seffah, A., Penzenstadler, B., Alves, C., Peng, X. (eds.) Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, 3–7 April 2017, pp. 174–179. ACM (2017)
Google Scholar
Corby, O., Dieng-Kuntz, R., Faron-Zucker, C.: Querying the semantic web with Corese search engine. In: Proceedings of the 16th European Conference on Artificial Intelligence (ECAI), Valencia, Spain, vol. 16, p. 705 (2004)
Google Scholar
Daiber, M. Jakob, C. Hokamp, J., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems, pp. 121–124 (2013)
Google Scholar
J. Devlin, M.-W. Chang, K.L., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186 (2019)
Google Scholar
Farias Lóscio, B., Burle, C., Calegari, N.: Data on the Web Best Practices. W3C Recommandation (2017)
Google Scholar
Gazzotti, R., Faron-Zucker, C., Gandon, F., Lacroix-Hugues, V., Darmon, D.: Injecting domain knowledge in electronic medical records to improve hospitalization prediction. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 116–130. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_8
Chapter Google Scholar
Gazzotti, R., Faron-Zucker, C., Gandon, F., Lacroix-Hugues, V., Darmon, D.: Injection of automatically selected DBpedia subjects in electronic medical records to boost hospitalization prediction. In: Hung, C., Cerný, T., Shin, D., Bechini, A. (eds.) The 35th ACM/SIGAPP Symposium on Applied Computing, SAC 2020, online event, 30 March–3 April 2020, pp. 2013–2020. ACM (2020)
Google Scholar
Green, N.: Argumentation for scientific claims in a biomedical research article. In: Proceedings of ArgNLP 2014 Workshop (2014)
Google Scholar
Jonquet, C., Shah, N.H., Musen, M.A.: The open biomedical annotator. Summit Transl. Bioinf. 2009, 56 (2009)
Google Scholar
Mayer, T., Cabrio, E., Villata, S.: ACTA a tool for argumentative clinical trial analysis. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), pp. 6551–6553 (2019)
Google Scholar
Mayer, T., Cabrio, E., Villata, S.: Transformer-based argument mining for healthcare applications. In: Proceedings of the 24th European Conference on Artificial Intelligence (ECAI) (2020)
Google Scholar
Michel, F., Djimenou, L., Faron-Zucker, C., Montagnat, J.: Translation of relational and non-relational databases into RDF with xR2RML. In: Proceeding of the 11th International Conference on Web Information Systems and Technologies (WebIST), Lisbon, Portugal, pp. 443–454 (2015)
Google Scholar
Neumann, M., King, D., Beltagy, I., Ammar, W.: ScispaCy: fast and robust models for biomedical natural language processing. In: Proceedings of the 18th BioNLP Workshop and Shared Task, Florence, Italy, pp. 319–327. Association for Computational Linguistics, August 2019
Google Scholar
Nye, B., Li, J.J., Patel, R., Yang, Y., Marshall, I., Nenkova, A., Wallace, B.: A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature. In: Proceedings of 56th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 197–207 (2018)
Google Scholar
Reimers, N., Schiller, B., Beck, T., Daxenberger, J., Stab, C., Gurevych, I.: Classification and clustering of arguments with contextualized word embeddings. Proc. ACL 2019, 567–578 (2019)
Google Scholar
Tchechmedjiev, A., Abdaoui, A., Emonet, V., Melzi, S., Jonnagaddala, J., Jonquet, C.: Enhanced functionalities for annotating and indexing clinical text with the NCBO annotator+. Bioinformatics 34(11), 1962–1965 (2018)
Article Google Scholar
Wang, L.L., et al.: Cord-19: the COVID-19 open research dataset. ArXiv, abs/2004.10706 (2020)
Google Scholar
Wei, C.-H., Kao, H.-Y., Lu, Z.: PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 41(W1), W518–W522 (2013)
Article Google Scholar
Zabkar, J., Mozina, M., Videcnik, J., Bratko, I.: Argument based machine learning in a medical domain. Proc. COMMA 2006, 59–70 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

University Côte d’Azur, Inria, CNRS, I3S (UMR 7271), Biot, France
Franck Michel, Fabien Gandon, Valentin Ah-Kane, Anna Bobasheva, Elena Cabrio, Olivier Corby, Raphaël Gazzotti, Alain Giboin, Santiago Marro, Tobias Mayer, Mathieu Simon, Serena Villata & Marco Winckler

Authors

Franck Michel
View author publications
You can also search for this author in PubMed Google Scholar
Fabien Gandon
View author publications
You can also search for this author in PubMed Google Scholar
Valentin Ah-Kane
View author publications
You can also search for this author in PubMed Google Scholar
Anna Bobasheva
View author publications
You can also search for this author in PubMed Google Scholar
Elena Cabrio
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Corby
View author publications
You can also search for this author in PubMed Google Scholar
Raphaël Gazzotti
View author publications
You can also search for this author in PubMed Google Scholar
Alain Giboin
View author publications
You can also search for this author in PubMed Google Scholar
Santiago Marro
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Mayer
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Simon
View author publications
You can also search for this author in PubMed Google Scholar
Serena Villata
View author publications
You can also search for this author in PubMed Google Scholar
Marco Winckler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Franck Michel .

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, UK
Jeff Z. Pan
University of Liverpool, Liverpool, UK
Valentina Tamma
University of Bari, Bari, Italy
Claudia d’Amato
University of California, Santa Barbara, Santa Barbara, CA, USA
Krzysztof Janowicz
California State University, Long Beach, Long Beach, CA, USA
Bo Fu
Vienna University of Economics and Business, Vienna, Austria
Axel Polleres
Rensselaer Polytechnic Institute, Troy, NY, USA
Oshani Seneviratne
Massachusetts Institute of Technology, Cambridge, MA, USA
Lalana Kagal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Michel, F. et al. (2020). Covid-on-the-Web: Knowledge Graph and Services to Advance COVID-19 Research. In: Pan, J.Z., et al. The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science(), vol 12507. Springer, Cham. https://doi.org/10.1007/978-3-030-62466-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-62466-8_19
Published: 01 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62465-1
Online ISBN: 978-3-030-62466-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the Semantic Web Science Association (opens in a new tab)