A Scientific Workflow Framework Integrated with Object Deputy Model for Data Provenance

  • Liwei Wang
  • Zhiyong Peng
  • Min Luo
  • Wenhao Ji
  • Zeqian Huang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4016)


There is a critical need to automatically manage large volumes of scientific data and applications in scientific workflows. Database technologies seem to be well suited to handle highly complex data managements. However, most of the workflow management systems (WFMSs) only utilize database technologies to a limited extent. In this paper, we present a DB-integrated scientific workflow framework which adopts the object deputy model to describe the execution of a series of scientific tasks. This framework allows WFMS management operations to be performed in a way analogous to traditional data management operations. Most important of all, data provenance method of this framework can provide much higher performance than other methods. Three kinds of schemas for data provenance are proposed and performance for each schema is analyzed in this paper.


Storage Cost Scientific Object Event File Data Provenance Source Object 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lud ascher, B., Goble, C.: Guest Editors’ Introduction to the Special Section on Scientific Workflows. SIGMOD Record 34(3) (2005)Google Scholar
  2. 2.
    Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger-Frank, E., Jones, M., Lee, E., Tao, J., Zhao, Y.: Scientific workflow management and the Kepler system. In: Concurrency and Computation: Practice & Experience, Special Issue on Scientific Workflows (2005)Google Scholar
  3. 3.
    Ailamaki, A., Ioannidisz, Y.E., Livny, M.: Scientific Workflow Management by Database Management. In: 10th conference on scientific and statistical database management, SSDBM (1998)Google Scholar
  4. 4.
    Michael, D.T.L., Franklin, J.: GridDB: A Data-Centric Overlay for Scientific Grids. In: Proceedings of the 30th VLDB Conference, Toronto, Canada (2004)Google Scholar
  5. 5.
    Shankar, S., Kini, A., DeWitt, D.J., Naughton, J.: Integrating databases and workflow systems. SIGMOD Record 34(3) (2005)Google Scholar
  6. 6.
    Simmhan, Y.L., Plale, B., Gannon, D.: A Survey of Data Provenance Techniques. Technical Report TR-618: Computer Science Department, Indiana University (2005)Google Scholar
  7. 7.
    Foster, I., Vöckler, J., Wilde, M., Zhao, Y.: Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation. In: 14th conference on scientific and statistical database management (SSDBM), pp. 37–46 (2002)Google Scholar
  8. 8.
    Zhao, J., Goble, C.A., Stevens, R., Bechhofer, S.: Semantically Linking and Browsing Provenance Logs for E-science. In: Bouzeghoub, M., Goble, C.A., Kashyap, V., Spaccapietra, S. (eds.) ICSNW 2004. LNCS, vol. 3226, pp. 158–176. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Pancerella, C., Hewson, J., Koegler, W., Leahy, D. (eds).: Metadata in the collaboratory for multi-scale chemical science. In: Dublin Core Conference (2003)Google Scholar
  10. 10.
    Peng, Z., Li, Q., Feng, L., et al.: Using Object Deputy Model to Prepare Data for Data Warehousing. IEEE Transaction on Knowledge and Data Engineering 17(9) (2005)Google Scholar
  11. 11.
    Zhai, B., Peng, Z.: Object-deputy database language. In: The Fourth International Conference on Creating, Connecting and Collaborating through Computing (2006) (to be appear)Google Scholar
  12. 12.
    Greenwood, M., Goble, C., Stevens, R., Zhao, J., Addis, M., Marvin, D., Moreau, L., Oinn, T.: Provenance of e-Science Experiments - experience from Bioinformatics. In: Proceedings of the UK OST e-Science 2nd AHM (2003)Google Scholar
  13. 13.
    Buneman, P., Khanna, S., Tan, W.C.: Why and Where: A Characterization of Data Provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  14. 14.
    Widom, J.: Trio: A System for Integrated Management of Data, Accuracy, and Lineage. In: CIDR (2005)Google Scholar
  15. 15.
    Woodruff, A., Stonebraker, M.: Supporting Fine-grained Data Lineage in a Database Visualization Environment. In: ICDE, pp. 91–102 (1997)Google Scholar
  16. 16.
    Cui, Y., Widom, J.: Practical Lineage Tracing in Data Warehouses. In: ICDE (2000)Google Scholar
  17. 17.
    Bhagwat, D., Chiticariu, L., Tan, W.-C., Vijayvargiya, G.: An Annotation Management System for Relational Databases. In: Proceedings of the 30th VLDB Conference, Toronto, Canada (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Liwei Wang
    • 1
  • Zhiyong Peng
    • 1
  • Min Luo
    • 2
  • Wenhao Ji
    • 2
  • Zeqian Huang
    • 1
  1. 1.State Key Laboratory of Software EngineeringWuhan UniversityWuhanChina
  2. 2.Computer SchoolWuhan UniversityWuhanChina

Personalised recommendations