Exploiting Provenance to Make Sense of Automated Decisions in Scientific Workflows

  • Paolo Missier
  • Suzanne Embury
  • Richard Stapenhurst
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5272)


Scientific workflows may include automated decision steps, for instance to accept/reject certain data products during the course of an in silico experiment, based on an assessment of their quality. The trustworthiness of these workflows can be enhanced by providing the users with a trace and explanation of the outcome of these decisions. In this paper we present a provenance model that is designed specifically to support this task. The model applies to a particular type of sub-workflow that is compiled automatically from a high-level specification of user-defined, quality-based data acceptance criteria. The keys to the effectiveness of the approach are that (i) these sub-workflows follow a predictable pattern structure, (ii) the purpose of their component services is defined using an ontology of Information Quality concepts, and (iii) the conceptual model for provenance is consistent with the ontology structure.


Semantic Space Ontology Class Automate Decision Metadata Element Quality View 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Aebersold, R., Mann, M.: Mass spectrometry-based proteomics. Nature 422, 198–207 (2003)CrossRefGoogle Scholar
  2. 2.
    Biton, O., Cohen-Boulakia, S., Davidson, S., Hara, C.: Querying and managing provenance through user views in scientific workflows. In: Procs. International Conference on Data Engineering (ICDE) (April 2008)Google Scholar
  3. 3.
    Callahan, S.P., Freire, J., Santos, E., Scheidegger, C.E.: VisTrails: visualization meets data management. In: SIGMOD Conference, pp. 745–747 (2006)Google Scholar
  4. 4.
    Chapman, A., Jagadish, H.V.: Issues in building practical provenance systems. IEEE Data Eng. Bull. 30(4), 38–43 (2007)Google Scholar
  5. 5.
    Davidson, S., Cohen-Boulakia, S., Eyal, A., Ludascher, B., McPhillips, T., Bowers, S., Kumar Anand, M., Freire, J.: Provenance in scientific workflow systems. Data Engineering Bulletin 30 (December 2007)Google Scholar
  6. 6.
    Hedeler, C., Missier, P.: Database Modeling in Biology: Practices and Challenges. In: Quality management challenges in the post-genomic era., Artech House (2007)Google Scholar
  7. 7.
    Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Research 34, W729–W732 (2006)CrossRefGoogle Scholar
  8. 8.
    Missier, P.: Modelling and Computing Information Quality in e-science. Ph.D thesis, School of Computer Science (2008)Google Scholar
  9. 9.
    Missier, P., Embury, S.M., Greenwood, M., Preece, A.D., Jin, B.: Quality views: Capturing and exploiting the user perspective on data quality. In: VLDB, Seoul, Korea, pp. 977–988 (September 2006)Google Scholar
  10. 10.
    Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., Li, P.: Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 3045–3054 (November 2004)Google Scholar
  11. 11.
    Stead, D.A., Preece, A., Brown, A.J.P.: Universal metrics for quality assessment of protein identifications by mass spectrometry. Molecular & Cellular Proteomics 5(7), 1205–1211 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Paolo Missier
    • 1
  • Suzanne Embury
    • 1
  • Richard Stapenhurst
    • 1
  1. 1.School of Computer ScienceUniversity of ManchesterUK

Personalised recommendations