Skip to main content

Covid-on-the-Web: Knowledge Graph and Services to Advance COVID-19 Research

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2020 (ISWC 2020)

Abstract

Scientists are harnessing their multi-disciplinary expertise and resources to fight the COVID-19 pandemic. Aligned with this mindset, the Covid-on-the-Web project aims to allow biomedical researchers to access, query and make sense of COVID-19 related literature. To do so, it adapts, combines and extends tools to process, analyze and enrich the “COVID-19 Open Research Dataset” (CORD-19) that gathers 50,000+ full-text scientific articles related to the coronaviruses. We report on the RDF dataset and software resources produced in this project by leveraging skills in knowledge representation, text, data and argument mining, as well as data visualization and exploration. The dataset comprises two main knowledge graphs describing (1) named entities mentioned in the CORD-19 corpus and linked to DBpedia, Wikidata and other BioPortal vocabularies, and (2) arguments extracted using ACTA, a tool automating the extraction and visualization of argumentative graphs, meant to help clinicians analyze clinical trials and make decisions. On top of this dataset, we provide several visualization and exploration tools based on the Corese Semantic Web platform, MGExplorer visualization library, as well as the Jupyter Notebook technology. All along this initiative, we have been engaged in discussions with healthcare and medical research institutes to align our approach with the actual needs of the biomedical community, and we have paid particular attention to comply with the open and reproducible science goals, and the FAIR principles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://team.inria.fr/wimmics/.

  2. 2.

    http://ns.inria.fr/acta/.

  3. 3.

    https://github.com/kermitt2/entity-fishing.

  4. 4.

    https://github.com/frmichel/morph-xr2rml/.

  5. 5.

    https://project.inria.fr/corese/.

  6. 6.

    https://github.com/kermitt2/entity-fishing.

  7. 7.

    https://github.com/Wimmics/covidontheweb/dataset.

  8. 8.

    https://www.dublincore.org/specifications/dublin-core/dcmi-terms/.

  9. 9.

    https://sparontologies.github.io/fabio/current/fabio.html.

  10. 10.

    http://bibliontology.com/specification.html.

  11. 11.

    http://xmlns.com/foaf/spec/.

  12. 12.

    https://schema.org/.

  13. 13.

    https://www.w3.org/TR/annotation-vocab/.

  14. 14.

    https://www.w3.org/TR/prov-o/.

  15. 15.

    PICO is a framework to answer health-care questions in evidence-based practice that comprises patients/population (P), intervention (I), control/comparison (C) and outcome (O).

  16. 16.

    BERT is a self-attentive transformer models that uses language model (LM) pre-training to learn a task-independent understanding from vast amounts of text in an unsupervised fashion.

  17. 17.

    The intervention and comparison label are treated as one joint class.

  18. 18.

    http://purl.org/spar/amo/.

  19. 19.

    http://rdfs.org/sioc/argument#.

  20. 20.

    http://www.arg.dundee.ac.uk/aif#.

  21. 21.

    https://github.com/frmichel/morph-xr2rml/.

  22. 22.

    https://sourceforge.net/projects/dbpedia-spotlight/files/2016-10/en/.

  23. 23.

    http://data.bioontology.org/documentation.

  24. 24.

    Inputs were tokenized with the BERT tokenizer, where one sub-word token has a length of one to three characters.

  25. 25.

    https://www.w3.org/TR/vocab-dcat/.

  26. 26.

    https://www.w3.org/TR/void/.

  27. 27.

    Covid-on-the-Web dataset URI: http://ns.inria.fr/covid19/covidontheweb-1-1.

  28. 28.

    CORD-19 license https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/.

  29. 29.

    ODC-By license: http://opendatacommons.org/licenses/by/1.0/.

  30. 30.

    Covid Linked Data Visualizer can be tested at: http://covid19.i3s.unice.fr:8080.

  31. 31.

    https://github.com/Wimmics/covidontheweb/tree/master/notebooks.

  32. 32.

    Dataframes are tabular data structures widely used in Python and R for the data analysis.

  33. 33.

    https://twitter.com/kidehen/status/1250530568955138048.

  34. 34.

    https://github.com/Wimmics/covidontheweb/blob/master/doc/01-data-modeling.md.

  35. 35.

    https://github.com/fhircat/CORD-19-on-FHIR.

  36. 36.

    https://github.com/Knowledge-Graph-Hub/kg-covid-19/.

  37. 37.

    https://covid19.pubannotation.org/.

  38. 38.

    https://github.com/kingfish777/COVID19.

  39. 39.

    https://github.com/usc-isi-i2/CKG-COVID-19.

References

  1. Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620 (2019)

    Google Scholar 

  2. Bersanelli, M.: Controversies about COVID-19 and anticancer treatment with immune checkpoint inhibitors. Immunotherapy 12(5), 269–273 (2020)

    Article  Google Scholar 

  3. Cabrio, E., Villata, S.: Five years of argument mining: a data-driven analysis. Proc. IJCAI 2018, 5427–5433 (2018)

    Google Scholar 

  4. Cava, R.A., Freitas, C.M.D.S., Winckler, M.: Clustervis: visualizing nodes attributes in multivariate graphs. In: Seffah, A., Penzenstadler, B., Alves, C., Peng, X. (eds.) Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, 3–7 April 2017, pp. 174–179. ACM (2017)

    Google Scholar 

  5. Corby, O., Dieng-Kuntz, R., Faron-Zucker, C.: Querying the semantic web with Corese search engine. In: Proceedings of the 16th European Conference on Artificial Intelligence (ECAI), Valencia, Spain, vol. 16, p. 705 (2004)

    Google Scholar 

  6. Daiber, M. Jakob, C. Hokamp, J., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems, pp. 121–124 (2013)

    Google Scholar 

  7. J. Devlin, M.-W. Chang, K.L., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186 (2019)

    Google Scholar 

  8. Farias Lóscio, B., Burle, C., Calegari, N.: Data on the Web Best Practices. W3C Recommandation (2017)

    Google Scholar 

  9. Gazzotti, R., Faron-Zucker, C., Gandon, F., Lacroix-Hugues, V., Darmon, D.: Injecting domain knowledge in electronic medical records to improve hospitalization prediction. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 116–130. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_8

    Chapter  Google Scholar 

  10. Gazzotti, R., Faron-Zucker, C., Gandon, F., Lacroix-Hugues, V., Darmon, D.: Injection of automatically selected DBpedia subjects in electronic medical records to boost hospitalization prediction. In: Hung, C., Cerný, T., Shin, D., Bechini, A. (eds.) The 35th ACM/SIGAPP Symposium on Applied Computing, SAC 2020, online event, 30 March–3 April 2020, pp. 2013–2020. ACM (2020)

    Google Scholar 

  11. Green, N.: Argumentation for scientific claims in a biomedical research article. In: Proceedings of ArgNLP 2014 Workshop (2014)

    Google Scholar 

  12. Jonquet, C., Shah, N.H., Musen, M.A.: The open biomedical annotator. Summit Transl. Bioinf. 2009, 56 (2009)

    Google Scholar 

  13. Mayer, T., Cabrio, E., Villata, S.: ACTA a tool for argumentative clinical trial analysis. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), pp. 6551–6553 (2019)

    Google Scholar 

  14. Mayer, T., Cabrio, E., Villata, S.: Transformer-based argument mining for healthcare applications. In: Proceedings of the 24th European Conference on Artificial Intelligence (ECAI) (2020)

    Google Scholar 

  15. Michel, F., Djimenou, L., Faron-Zucker, C., Montagnat, J.: Translation of relational and non-relational databases into RDF with xR2RML. In: Proceeding of the 11th International Conference on Web Information Systems and Technologies (WebIST), Lisbon, Portugal, pp. 443–454 (2015)

    Google Scholar 

  16. Neumann, M., King, D., Beltagy, I., Ammar, W.: ScispaCy: fast and robust models for biomedical natural language processing. In: Proceedings of the 18th BioNLP Workshop and Shared Task, Florence, Italy, pp. 319–327. Association for Computational Linguistics, August 2019

    Google Scholar 

  17. Nye, B., Li, J.J., Patel, R., Yang, Y., Marshall, I., Nenkova, A., Wallace, B.: A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature. In: Proceedings of 56th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 197–207 (2018)

    Google Scholar 

  18. Reimers, N., Schiller, B., Beck, T., Daxenberger, J., Stab, C., Gurevych, I.: Classification and clustering of arguments with contextualized word embeddings. Proc. ACL 2019, 567–578 (2019)

    Google Scholar 

  19. Tchechmedjiev, A., Abdaoui, A., Emonet, V., Melzi, S., Jonnagaddala, J., Jonquet, C.: Enhanced functionalities for annotating and indexing clinical text with the NCBO annotator+. Bioinformatics 34(11), 1962–1965 (2018)

    Article  Google Scholar 

  20. Wang, L.L., et al.: Cord-19: the COVID-19 open research dataset. ArXiv, abs/2004.10706 (2020)

    Google Scholar 

  21. Wei, C.-H., Kao, H.-Y., Lu, Z.: PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 41(W1), W518–W522 (2013)

    Article  Google Scholar 

  22. Zabkar, J., Mozina, M., Videcnik, J., Bratko, I.: Argument based machine learning in a medical domain. Proc. COMMA 2006, 59–70 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Franck Michel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Michel, F. et al. (2020). Covid-on-the-Web: Knowledge Graph and Services to Advance COVID-19 Research. In: Pan, J.Z., et al. The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science(), vol 12507. Springer, Cham. https://doi.org/10.1007/978-3-030-62466-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62466-8_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62465-1

  • Online ISBN: 978-3-030-62466-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics