Skip to main content

Ontology-Driven Provenance Management in eScience: An Application in Parasite Research

  • Conference paper
On the Move to Meaningful Internet Systems: OTM 2009 (OTM 2009)

Abstract

Provenance, from the French word “provenir”, describes the lineage or history of a data entity. Provenance is critical information in scientific applications to verify experiment process, validate data quality and associate trust values with scientific results. Current industrial scale eScience projects require an end-to-end provenance management infrastructure. This infrastructure needs to be underpinned by formal semantics to enable analysis of large scale provenance information by software applications. Further, effective analysis of provenance information requires well-defined query mechanisms to support complex queries over large datasets. This paper introduces an ontology-driven provenance management infrastructure for biology experiment data, as part of the Semantic Problem Solving Environment (SPSE) for Trypanosoma cruzi (T.cruzi). This provenance infrastructure, called T.cruzi Provenance Management System (PMS), is underpinned by (a) a domain-specific provenance ontology called Parasite Experiment ontology, (b) specialized query operators for provenance analysis, and (c) a provenance query engine. The query engine uses a novel optimization technique based on materialized views called materialized provenance views (MPV) to scale with increasing data size and query complexity. This comprehensive ontology-driven provenance infrastructure not only allows effective tracking and management of ongoing experiments in the Tarleton Research Group at the Center for Tropical and Emerging Global Diseases (CTEGD), but also enables researchers to retrieve the complete provenance information of scientific results for publication in literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Society, B.C.: Grand challenges in computing research, BCS Survey (2004)

    Google Scholar 

  2. http://twiki.ipaw.info/bin/view/Challenge/WebHome

  3. Sahoo, S.S., Sheth, A., Henson, C.: Semantic Provenance for eScience: Managing the Deluge of Scientific Data. IEEE Internet Computing 12(4), 46–54 (2008)

    Article  Google Scholar 

  4. Sahoo, S.S., Barga, R.S., Goldstein, J., Sheth, A.: Provenance Algebra and Materialized View-based Provenance Management: Microsoft Research Technical Report; (November 2008)

    Google Scholar 

  5. http://twiki.ipaw.info/bin/view/Challenge/OPM

  6. Tan, W.C.: Provenance in Databases: Past, Current, and Future. IEEE Data Eng. Bull. 30(4), 3–12 (2007)

    Google Scholar 

  7. Simmhan, Y.L., Plale, A.B., Gannon, A.D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)

    Article  Google Scholar 

  8. Sahoo, S.S., Barga, R.S., Goldstein, J., Sheth, A.P., Thirunarayan, K.: Where did you come from.Where did you go? An Algebra and RDF Query Engine for Provenance Kno.e.sis Center, Wright State University (2009)

    Google Scholar 

  9. http://knoesis.wright.edu/research/semsci/projects/tcruzi/

  10. http://www.uniprot.org/

  11. Aurrecoechea, C., Heiges, M., Wang, H., Wang, Z., Fischer, S., Rhodes, P., Miller, J., Kraemer, E., Stoeckert Jr., C.J., Roos, D.S., Kissinger, J.C.: ApiDB: integrated resources for the apicomplexan bioinformatics resource center. Nucleic Acids Research 35(D), 427–430 (2007)

    Article  Google Scholar 

  12. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed

  13. http://www.w3.org/TR/rdf-mt/#defentail (January 22, 2008)

  14. Kelly, B.K., Anderson, P.E., Reo, N.V., DelRaso, N.J., Doom, T.E., Raymer, M.L.: A proposed statistical protocol for the analysis of metabolic toxicological data derived from NMR spectroscopy. In: 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE 2007), Cambridge - Boston, Massachusetts, USA, pp. 1414–1418 (2007)

    Google Scholar 

  15. http://www.oracle.com/technology/industries/life_sciences/olsug.html

  16. http://www.w3.org/TR/rdf-sparql-query (January 22, 2008)

  17. Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme. In: 31st international Conference on Very Large Data Bases, August 30 - September 02, pp. 1216–1227. VLDB Endowment, Trondheim (2005)

    Google Scholar 

  18. Sahoo, S.S., Thomas, C., Sheth, A., York, W.S., Tartir, S.: Knowledge modeling and its application in life sciences: a tale of two ontologies. In: Proceedings of the 15th international Conference on World Wide Web WWW 2006, Edinburgh, Scotland, May 23 - 26, pp. 317–326 (2006)

    Google Scholar 

  19. http://obo.sourceforge.net/

  20. Smith, B., Ceusters, W., Klagges, B., Kohler, J., Kumar, A., Lomax, J., et al.: Relations in biomedical ontologies. Genome Biol. 6(5), R46 (2005)

    Article  Google Scholar 

  21. http://www.w3.org/TR/owl-features/ (January 22, 2008)

  22. http://bioontology.org

  23. http://ncit.nci.nih.gov

  24. http://www.sanger.ac.uk/Users/mb4/PLO/

  25. Hobbs, J.R., Pan, F.: Time Ontology in OWL In: W3C Working Draft (2006)

    Google Scholar 

  26. Pérez, J., Arenas, M., Gutiérrez, C.: Semantics and Complexity of SPARQL. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 30–43. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  27. Vardi, M.: The Complexity of Relational Query Languages. In: 14th Ann. ACM Symp. Theory of Computing (STOC 1982), pp. 137–146 (1982)

    Google Scholar 

  28. Buneman, P., Khanna, S., Tan, W.C.: Why and Where: A Characterization of Data Provenance. In: 8th International Conference on Database Theory, pp. 316–330 (2001)

    Google Scholar 

  29. http://spcdis.hao.ucar.edu/

  30. http://iw.stanford.edu/2.0/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sahoo, S.S., Weatherly, D.B., Mutharaju, R., Anantharam, P., Sheth, A., Tarleton, R.L. (2009). Ontology-Driven Provenance Management in eScience: An Application in Parasite Research. In: Meersman, R., Dillon, T., Herrero, P. (eds) On the Move to Meaningful Internet Systems: OTM 2009. OTM 2009. Lecture Notes in Computer Science, vol 5871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05151-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-05151-7_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05150-0

  • Online ISBN: 978-3-642-05151-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics