Performance Evaluation of the Karma Provenance Framework for Scientific Workflows

  • Yogesh L. Simmhan
  • Beth Plale
  • Dennis Gannon
  • Suresh Marru
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4145)


Provenance about workflow executions and data derivations in scientific applications help estimate data quality, track resources, and validate in silico experiments. The Karma provenance framework provides a means to collect workflow, process, and data provenance from data-driven scientific workflows and is used in the Linked Environments for Atmospheric Discovery (LEAD) project. This article presents a performance analysis of the Karma service as compared against the contemporary PReServ provenance service. Our study finds that Karma scales exceedingly well for collecting and querying provenance records, showing linear or sub-linear scaling with increasing number of provenance records and clients when tested against workloads in the order of 10,000 application-service invocations and over 36 concurrent clients.


Data Product Query Response Time Service Invocation Process Provenance Provenance Activity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Personal communication with Paul Groth, University of Southampton (2006)Google Scholar
  2. 2.
    Simple Linux Utility for Resource Management (SLURM) Reference Manual. Technical Report UCRL-WEB-201386, Lawrence Livermore National Laboratory (2006)Google Scholar
  3. 3.
    Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance Collection Support in the Kepler Scientific Workflow System. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 118–132. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Bose, R., Frew, J.: Lineage Retrieval for Scientific Data Processing: A Survey. ACM Computing Surveys 37(1), 1–28 (2005)CrossRefGoogle Scholar
  5. 5.
    Box, D., Cabrera, L.F., Critchley, C., Curbera, F., Ferguson, D., Geller, A., Graham, S., Hull, D., Kakivaya, G., Lewis, A., Lovering, B., Mihic, M., Niblett, P., Orchard, D., Saiyed, J., Samdarshi, S., Schlimmer, J., Sedukhin, I., Shewchuk, J., Smith, B., Weerawarana, S., Wortendyke, D.: Web Services Eventing (WS-Eventing) (August 2004)Google Scholar
  6. 6.
    Braun, U., Garfinkel, S., Holland, D.A., Muniswamy-Reddy, K.-K., Seltzer, M.I.: Issues in Automatic Provenance Collection. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 171–183. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. 7.
    Freire, J., Silva, C.T., Callahan, S.P., Santos, E., Scheidegger, C.E., Vo, H.T.: Managing Rapidly-Evolving Scientific Workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 10–18. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Groth, P., Luck, M., Moreau, L.: A Protocol for Recording Provenance in Service-oriented Grids. In: Higashino, T. (ed.) OPODIS 2004. LNCS, vol. 3544, pp. 124–139. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Groth, P., Miles, S., Fang, W., Wong, S.C., Zauner, K.-P., Moreau, L.: Recording and Using Provenance in a Protein Compressibility Experiment. In: HPDC (2005)Google Scholar
  10. 10.
    Huang, Y., Slominski, A., Herath, C., Gannon, D.: WS-Messenger: A Web Services based Messaging System for Service-Oriented Grid Computing. In: CCGrid (2006)Google Scholar
  11. 11.
    Kandaswamy, G., Fang, L., Huang, Y., Shirasuna, S., Marru, S., Gannon, D.: Building Web Services for Scientific Grid Applications. IBM Journal of Research and Development 50(2/3), 249–260 (2006)CrossRefGoogle Scholar
  12. 12.
    Myers, J.D., Pancerella, C., Lansing, C., Schuchardt, K.L., Didier, B.: Multi-Scale Science: Supporting Emerging Practice with Semantically Derived Provenance. In: Semantic Web Technologies for Searching and Retrieving Scientific Data Workshop (2003)Google Scholar
  13. 13.
    Plale, B.: Resource Requirements Study for LEAD Storage Repository. Technical Report 001, Linked Environments for Atmospheric Discovery (2005)Google Scholar
  14. 14.
    Plale, B., Gannon, D., Reed, D., Graves, S., Droegemeier, K., Wilhelmson, B., Ramamurthy, M.: Towards Dynamically Adaptive Weather Analysis and Forecasting in LEAD. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. LNCS, vol. 3515, pp. 624–631. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  15. 15.
    Simmhan, Y., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Record 34(3), 31–36 (2005)CrossRefGoogle Scholar
  16. 16.
    Simmhan, Y.L., Plale, B., Gannon, D.: A Framework for Collecting Provenance in Data-Centric Scientific Workflows. In: ICWS (2006)Google Scholar
  17. 17.
    Simmhan, Y.L., Plale, B., Gannon, D.: Towards a Quality Model for Effective Data Selection in Collaboratories. In: IEEE Workshop on Scientific Workflows and Dataflows (SciFlow) (2006)Google Scholar
  18. 18.
    Zhao, J., Goble, C., Stevens, R.: An Identity Crisis in The Life Sciences. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 254–269. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Zhao, Y., Wilde, M., Foster, I.T.: Applying the Virtual Data Provenance Model. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 148–161. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yogesh L. Simmhan
    • 1
  • Beth Plale
    • 1
  • Dennis Gannon
    • 1
  • Suresh Marru
    • 1
  1. 1.Indiana UniversityBloomingtonUSA

Personalised recommendations