An Extensible Ontology Modeling Approach Using Post Coordinated Expressions for Semantic Provenance in Biomedical Research
- 2 Citations
- 736 Downloads
Abstract
Provenance metadata describing the source or origin of data is critical to verify and validate results of scientific experiments. Indeed, reproducibility of scientific studies is rapidly gaining significant attention in the research community, for example biomedical and healthcare research. To address this challenge in the biomedical research domain, we have developed the Provenance for Clinical and Healthcare Research (ProvCaRe) using World Wide Web Consortium (W3C) PROV specifications, including the PROV Ontology (PROV-O). In the ProvCaRe project, we are extending PROV-O to create a formal model of provenance information that is necessary for scientific reproducibility and replication in biomedical research. However, there are several challenges associated with the development of the ProvCaRe ontology, including: (1) Ontology engineering: modeling all biomedical provenance-related terms in an ontology has undefined scope and is not feasible before the release of the ontology; (2) Redundancy: there are a large number of existing biomedical ontologies that already model relevant biomedical terms; and (3) Ontology maintenance: adding or deleting terms from a large ontology is error prone and it will be difficult to maintain the ontology over time. Therefore, in contrast to modeling all classes and properties in an ontology before deployment (also called precoordination), we propose the “ProvCaRe Compositional Grammar Syntax” to model ontology classes on-demand (also called postcoordination). The compositional grammar syntax allows us to re-use existing biomedical ontology classes and compose provenance-specific terms that extend PROV-O classes and properties. We demonstrate the application of this approach in the ProvCaRe ontology and the use of the ontology in the development of the ProvCaRe knowledgebase that consists of more than 38 million provenance triples automatically extracted from 384,802 published research articles using a text processing workflow.
Keywords
Precoordinated and postcoordinated expression Ontology engineering Provenance metadata W3C PROV specification ProvCaRe semantic provenanceNotes
Acknowledgement
This work is supported in part by the National Institutes of Biomedical Imaging and Bioengineering (NIBIB) Big Data to Knowledge (BD2K) grant (1U01EB020955) NSF grant# 1636850
References
- 1.Collins, F.S., Tabak, L.A.: Policy: NIH plans to enhance reproducibility. Nature 505, 612–613 (2014)CrossRefGoogle Scholar
- 2.Landis, S.C., Amara, S.G., Asadullah, K., et al.: A call for transparent reporting to optimize the predictive value of preclinical research. Nature 490(7419), 187–191 (2012)CrossRefGoogle Scholar
- 3.Redline, S., Dean III, D., Sanders, M.H.: Entering the era of “Big Data”: getting our metrics right. SLEEP 36(4), 465–469 (2013)CrossRefGoogle Scholar
- 4.Baker, M.: 1,500 scientists lift the lid on reproducibility. Nature 533(7604), 452–454 (2016)CrossRefGoogle Scholar
- 5.NIH: Principles and Guidelines for Reporting Preclinical Research (2016). https://www.nih.gov/research-training/rigor-reproducibility/principles-guidelines-reporting-preclinical-research. Accessed 20 July 2017
- 6.Buneman, P., Davidson, S.: Data provenance - the foundation of data quality (2010)Google Scholar
- 7.Goble, C.: Position statement: musings on provenance, workflow and (semantic web) annotations for bioinformatics. In: Workshop on Data Derivation and Provenance, Chicago (2002)Google Scholar
- 8.Sahoo, S.S., Sheth, A., Henson, C.: Semantic provenance for escience: managing the deluge of scientific data. IEEE Internet Comput. 12(4), 46–54 (2008)CrossRefGoogle Scholar
- 9.Valdez, J., Kim, M., Rueschman, M., Socrates, V., Redline, S., Sahoo, S.S.: ProvCaRe semantic provenance knowledgebase: evaluating scientific reproducibility of research studies. Presented at the American Medical Informatics Association (AMIA) Annual Conference, Washington DC (2017)Google Scholar
- 10.Zhao, J., Goble, C., Stevens, R., Turi, D.: Mining Taverna’s semantic web of provenance. J. Concurr. Comput. Practice Exp. 20(5), 463–472 (2008)CrossRefGoogle Scholar
- 11.Simmhan, Y.L., Plale, A.B., Gannon, A.D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)CrossRefGoogle Scholar
- 12.Moreau, L., Clifford, B., Freire, J., et al.: The open provenance model core specification (v1.1). Future Gener. Comput. Syst. 27(6), 743–756 (2010)Google Scholar
- 13.Hitzler, P., Krötzsch, M., Parsia, B., Patel-Schneider, P.F., Rudolph, S.: OWL 2 Web Ontology Language Primer. In: W3C Recommendation. World Wide Web Consortium W3C (2009)Google Scholar
- 14.Sahoo, S.S., Sheth, A.: Provenir ontology: towards a framework for eScience provenance management. Presented at the Microsoft eScience Workshop, Pittsburgh, USA, October 2009Google Scholar
- 15.Moreau, L., Missier, P.: PROV Data Model (PROV-DM). In: W3C Recommendation. World Wide Web Consortium W3C (2013)Google Scholar
- 16.Lebo, T., Sahoo, S.S., McGuinness, D.; PROV-O: the PROV ontology. In: W3C Recommendation. World Wide Web Consortium W3C (2013)Google Scholar
- 17.Cheney, J., Missier, P., Moreau, L.: Constraints of the PROV data model. In: W3C Recommendation. World Wide Web Consortium W3C (2013)Google Scholar
- 18.Dean, D.A., Goldberger, A.L., Mueller, R., Kim, M., et al.: Scaling up scientific discovery in sleep medicine: the National Sleep Research Resource. SLEEP 39(5), 1151–1164 (2016)CrossRefGoogle Scholar
- 19.Cyganiak, R., Wood, D., Lanthaler, M.: RDF 1.1 concepts and abstract syntax. In: W3C Recommendation, World Wide Web Consortium (W3C) (2014)Google Scholar
- 20.Rector, A., Luigi, I.: Lexically suggest, logically define: quality assurance of the use of qualifiers and expected results of post-coordination in SNOMED CT. J. Biomed. Inform. 45(2), 199–209 (2012)CrossRefGoogle Scholar
- 21.Musen, M.A., Noy, N.F., Shah, N.H., Whetzel, P.L., Chute, C.G., Story, M.A., Smith, B.: NCBO team: The national center for biomedical ontology. J. Am. Med. Inform. Assoc. 19(2), 190–195 (2012)CrossRefGoogle Scholar
- 22.Köhler, S., Doelken, S.C., Mungall, C.J., et al.: The human phenotype ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 42, 966–974 (2014). Database IssueCrossRefGoogle Scholar
- 23.Giannangelo, K., Fenton, S.: SNOMED CT survey: an assessment of implementation in EMR/EHR applications. Perspect Health Inf. Manag. 5, 7 (2008)Google Scholar
- 24.Bodenreider, O., Stevens, R.: Bio-ontologies: current trends and future directions. Brief. Bioinform. 7(3), 256–274 (2006)CrossRefGoogle Scholar
- 25.Sim, I., Tu, S.W., Carini, S., Lehmann, H.P., Pollock, B.H., Peleg, M., Wittkowski, K.M.: The ontology of clinical research (OCRe): an informatics foundation for the science of clinical research. J. Biomed. Inform. 52, 78–91 (2014)CrossRefGoogle Scholar
- 26.Tu, S.W., Peleg, M., Carini, S., Bobak, M., Ross, J., Rubin, D., Sim, I.: A practical method for transforming free-text eligibility criteria into computable criteria. J. Biomed. Inform. 44(2), 239–250 (2011)CrossRefGoogle Scholar
- 27.Bandrowski, A., Brinkman, R., Brochhausen, M., et al.: The ontology for biomedical investigations. Plos One 11(4), e0154556 (2016)CrossRefGoogle Scholar
- 28.Huang, X., Lin, J., Demner-Fushman, D.: Evaluation of PICO as a knowledge representation for clinical questions. Presented at the AMIA Annual Symposium Proceedings (2006)Google Scholar
- 29.Overell, P.: Augmented BNF for Syntax Specifications: ABNF. https://tools.ietf.org/html/rfc5234. Accessed 20 Aug 2017
- 30.Hearst, M.A.: Untangling text data mining. In: 37th the Association for Computational Linguistics on Computational Linguistics meeting, pp. 3–10 (1999)Google Scholar
- 31.Rindflesch, T.C., Pakhomov, S.V., Fiszman, M., Kilicoglu, H., Sanchez, V.R.: Medical facts to support inferencing in natural language processing. Presented at the AMIA Annual Symposium Proceedings (2005)Google Scholar
- 32.O’Connor, G.T., Caffo, B., Newman, A.B., Quan, S.F., Rapoport, D.M., Redline, S., Resnick, H.E., Samet, J., Shahar, E.: Prospective study of sleep-disordered breathing and hypertension: the sleep heart health study. Am. J. Respir. Crit. Care Med. 179(12), 1159–1164 (2009)CrossRefGoogle Scholar