Analyzing Provenance Across Heterogeneous Provenance Graphs

  • Wellington OliveiraEmail author
  • Paolo Missier
  • Kary Ocaña
  • Daniel de Oliveira
  • Vanessa Braganholo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9672)


Provenance generated by different workflow systems is generally expressed using different formats. This is not an issue when scientists analyze provenance graphs in isolation, or when they use the same workflow system. However, when analyzing heterogeneous provenance graphs from multiple systems poses a challenge. To address this problem we adopt ProvONE as an integration model, and show how different provenance databases can be converted to a global ProvONE schema. Scientists can then query this integrated database, exploring and linking provenance across several different workflows that may represent different implementations of the same experiment. To illustrate the feasibility of our approach, we developed conceptual mappings between the provenance databases of two workflow systems (e-Science Central and SciCumulus). We provide cartridges that implement these mappings and generate an integrated provenance database expressed as Prolog facts. To demonstrate its usage, we have developed Prolog rules that enable scientists to query the integrated database.


Provenance Data Provenance Model Prolog Rule Open Provenance Model Provenance Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: a survey. Comput. Sci. Eng. 10, 11–21 (2008)CrossRefGoogle Scholar
  2. 2.
    Lim, C., Lu, S., Chebotko, A., Fotouhi, F., Kashlev, A.: OPQL: querying scientific workflow provenance at the graph level. Data Knowl. Eng. 88, 37–59 (2013)CrossRefGoogle Scholar
  3. 3.
    Missier, P., Dey, S., Belhajjame, K., Cuevas-Vicenttín, V., Ludäscher, B.: D-PROV: extending the PROV provenance model with workflow structure. In: TaPP (2013)Google Scholar
  4. 4.
    Dey, S., Köhler, S., Bowers, S., Ludäscher, B.: Datalog as a lingua franca for provenance querying and reasoning. In: TaPP (2012)Google Scholar
  5. 5.
    Ocaña, K.A., de Oliveira, D., Ogasawara, E., Dávila, A.M., Lima, A.A., Mattoso, M.: SciPhy: a cloud-based workflow for phylogenetic analysis of drug targets in protozoan genomes. In: Norberto de Souza, O., Telles, G.P., Palakal, M. (eds.) BSB 2011. LNCS, vol. 6832, pp. 66–70. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    Oliveira, D., Ogasawara, E., Baião, F., Mattoso, M.: SciCumulus: a lightweight cloud middleware to explore many task computing paradigm in scientific workflows. In: International Conference on Cloud Computing (2010)Google Scholar
  7. 7.
    Watson, P., Hiden, H., Woodman, S.: e-Science central for CARMEN: science as a service. Concurr. Comput. Pract. Expert. 22, 2369–2380 (2010)CrossRefGoogle Scholar
  8. 8.
    Moreau, L., Freire, J., Futrelle, J., McGrath, R.E., Myers, J., Paulson, P.: The open provenance model: an overview. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, pp. 323–326. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  9. 9.
    Moreau, L., Missier, P.: PROV-DM: The PROV Data Model.
  10. 10.
    Costa, F., Silva, V., de Oliveira, D., Ocaña, K., Ogasawara, E., Dias, J., Mattoso, M.: Capturing and querying workflow runtime provenance with PROV: a practical approach. In: EDBT/ICDT Workshops (2013)Google Scholar
  11. 11.
    Murta, L., Braganholo, V., Chirigati, F., Koop, D., Freire, J.: noWorkflow: capturing and analyzing provenance of scripts. In: Ludaescher, B., Plale, B. (eds.) IPAW 2014. LNCS, vol. 8628, pp. 71–83. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  12. 12.
    Moreau, L., Missier, P.: PROV-N: The Provenance Notation.
  13. 13.
    Missier, P., Sahoo, S.S., Zhao, J., Goble, C., Sheth, A.: Janus: from workflows to semantic provenance and linked open data. In: IPAW (2010)Google Scholar
  14. 14.
    Belhajjame, K., Zhao, J., Garijo, D., Gamble, M., Hettne, K., Palma, R., Mina, E., Corcho, O., Gómez-Pérez, J.M., Bechhofer, S., Klyne, G., Goble, C.: Using a suite of ontologies for preserving workflow-centric research objects. Web Semant. Sci. Serv. Agents World Wide Web 32, 16–42 (2015)CrossRefGoogle Scholar
  15. 15.
    Batini, C., Lenzerini, M., Navathe, S.B.: A Comparative analysis of methodologies for database schema integration. ACM Comput. Surv. 18, 323–364 (1986)CrossRefGoogle Scholar
  16. 16.
    Ellqvist, T., Koop, D., Freire, J., Silva, C., Strömbäck, L.: Using mediation to achieve provenance interoperability. In: IEEE World Conference on Services (2009)Google Scholar
  17. 17.
    Ding, L., Michaelis, J., McCusker, J., McGuinness, D.L.: Linked provenance data: a semantic Web-based approach to interoperable workflow traces. Future Gener. Comput. Syst. 27, 797–805 (2011)CrossRefGoogle Scholar
  18. 18.
    Braun, U.J., Seltzer, M.I., Chapman, A., Blaustein, B., Allen, M.D., Seligman, L.: Towards query interoperability: PASSing PLUS (2011)Google Scholar
  19. 19.
    Muniswamy-Reddy, K.-K., Holland, D.A., Braun, U., Seltzer, M.I.: Provenance-aware storage systems. Harvard University (2006)Google Scholar
  20. 20.
    Blaustein, B., Seligman, L., Morse, M., Allen, M.D., Rosenthal, A.: PLUS: synthesizing privacy, lineage, uncertainty and security. In: International Conference on Data Engineering Workshops (2008)Google Scholar
  21. 21.
    Missier, P., Ludascher, B., Bowers, S., Dey, S., Sarkar, A., Shrestha, B., Altintas, I., Anand, M.K., Goble, C.: Linking multiple workflow provenance traces for interoperable collaborative science. In: Workshop on Workflows in Support of Large-Scale Science (WORKS) (2010)Google Scholar
  22. 22.
    Altintas, I., Anand, M.K., Crawl, D., Bowers, S., Belloum, A., Missier, P., Ludäscher, B., Goble, C.A., Sloot, P.M.: Understanding collaborative studies through interoperable workflow provenance. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) IPAW 2010. LNCS, vol. 6378, pp. 42–58. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  23. 23.
    Bechhofer, S., De Roure, D., Gamble, M., Goble, C., Buchan, I.: Research objects: towards exchange and reuse of digital knowledge. Nat. Precedings (2010). doi: 10.1038/npre.2010.4626.1
  24. 24.
    Terstyanszky, G., Kukla, T., Kiss, T., Kacsuk, P., Balasko, A., Farkas, Z.: Enabling scientific workflow sharing through coarse-grained interoperability. Future Gener. Comput. Syst. 37, 46–59 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Wellington Oliveira
    • 1
    • 2
    Email author
  • Paolo Missier
    • 3
  • Kary Ocaña
    • 4
  • Daniel de Oliveira
    • 1
  • Vanessa Braganholo
    • 1
  1. 1.Instituto de ComputaçãoUniversidade Federal Fluminense (UFF)NiteróiBrazil
  2. 2.DACCInstituto Federal do Sudeste de Minas GeraisRio PombaBrazil
  3. 3.School of Computing ScienceNewcastle UniversityNewcastle upon TyneUK
  4. 4.Laboratório Nacional de Computação Científica (LNCC)PetrópolisBrazil

Personalised recommendations