Semantic Metadata Generation for Large Scientific Workflows

  • Jihie Kim
  • Yolanda Gil
  • Varun Ratnakar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4273)


In recent years, workflows have been increasingly used in scientific applications. This paper presents novel metadata reasoning capabilities that we have developed to support the creation of large workflows. They include 1) use of semantic web technologies in handling metadata constraints on file collections and nested file collections, 2) propagation and validation of metadata constraints from inputs to outputs in a workflow component, and through the links among components in a workflow, and 3) sub-workflows that generate metadata needed for workflow creation. We show how we used these capabilities to support the creation of large executable workflows in an earthquake science application with more than 7,000 jobs, generating metadata for more than 100,000 new files.


metadata reasoning workflow generation grid workflows 


  1. 1.
    Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludäscher, B., Mock, S.: Kepler: Towards a Grid-Enabled System for Scientific Workflows. In: The Workflow in Grid Systems Workshop in GGF10 - The Tenth Global Grid Forum, Berlin, Germany (2004) Google Scholar
  2. 2.
    Campobasso, M., Giles, M.: Stabilization of a Linear Flow Solver for Turbomachinery Aeroelasticity Using Recursive Projection Method. AIAA Journal 42(9) (2004)Google Scholar
  3. 3.
    Churches, D., Gombas, G., Harrison, A., Maassen, J., Robinson, C., Shields, M., Taylor, I., Wang, I.: Programming Scientific and Distributed Workflow with Triana Services. Grid Workflow Special Issue of Concurrency and Computation: Practice and Experience (2004)Google Scholar
  4. 4.
    Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Patil, S., Su, M., Vahi, K., Livny, M.: Pegasus: Mapping Scientific Workflows onto the Grid. In: Across Grids Conference (2004)Google Scholar
  5. 5.
    Deelman, E., Blythe, J., Gil, Y., Kesselman, C.: Workflow Management in GriPhyN. The Grid ResourceManagement. Kluwer, Dordrecht (2003)Google Scholar
  6. 6.
    Gil, Y., Ratnakar, V., Deelman, E., Spraragen, M., Kim, J.: Wings for Pegasus: A Semantic Approach to Creating Very Large Scientific Workflows. Internal project report (2006)Google Scholar
  7. 7.
    Goble, C.: Using the Semantic Web for e-Science: Inspiration, Incubation, Irritation. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 1–3. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Goble, C.: Position Statement: Musings on Provenance, Workflow and (Semantic Web) Annotations for Bioinformatics. In: Workshop on Data Derivation and Provenance (2002)Google Scholar
  9. 9.
    Guo, Y., Pan, Z., Heflin, J.: An evaluation of knowledge base systems for large OWL datasets. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 274–288. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Hendler, J.: Science and the Semantic Web. Science 299, 520–521 (2003)CrossRefGoogle Scholar
  11. 11.
    Hustadt, U., Motik, B., Sattler, U.: Data Complexity of Reasoning in Very Expressive Description Logics. In: Proc. of the 19th International Joint Conference on AI (2005)Google Scholar
  12. 12.
    Kim, J., Spraragen, M., Gil, Y.: An Intelligent Assistant for Interactive Workflow Composition. In: Proceedings of the Intl. Conference on Intelligent User Interfaces (2004)Google Scholar
  13. 13.
    Maechling, P., Chalupsky, H., Dougherty, M., Deelman, E., Gil, Y., Gullapalli, S., Gupta, V., Kesselman, C., Kim, J., Mehta, G., Mendenhall, B., Russ, T., Singh, G., Spraragen, M., Staples, G., Vahi, K.: Simplifying Construction of Complex Workflows for Non-Expert Users of the Southern California Earthquake Center Community Modeling Environment. ACM SIGMOD Record, special issue on Scientific Workflows 34(3) (2005)Google Scholar
  14. 14.
    Myers, J., Pancerella, C., Lansing, C., Schuchardt, K., Didier, B.: Multi-scale Science: Supporting Emerging Practice with Semantically-Derived Provenance. In: Semantic Web Technologies for Searching and Retrieving Scientific Data Workshop (2003)Google Scholar
  15. 15.
    OpenRDF (2006),
  16. 16.
    OWL Web Ontology Language (2006),
  17. 17.
    Sabou, M., Wroe, C., Goble, C., Mishne, G.: Learning Domain Ontologies for Web Service Descriptions: an Experiment in Bioinformatics. In: Intl. Conf. on World Wide Web (2005)Google Scholar
  18. 18.
    Simmhan, Y., Plale, B., Gannon, D.: A Survey of Data Provenance in e-Science. SIGMOD Record 34, 31–36 (2005)CrossRefGoogle Scholar
  19. 19.
    Singh, G., Bharathi, S., Chervenak, A., Deelman, E., Kesselman, C., Manohar, M., Patil, S., Pearlman, L.: A Metadata Catalog Service for Data Intensive Applications. SC (2003)Google Scholar
  20. 20.
    Sirin, E., Parsia, B., Hendler, J.: Filtering and selecting semantic web services with interactive composition techniques. IEEE Intelligent Systems 19(4) (2004)Google Scholar
  21. 21.
    Sycara, K., Paolucci, M., Ankolekar, A., Srinivasan, N.: Automated Discovery, Interaction and Composition of Semantic Web services. Journal of Web Semantics 1(1) (2003)Google Scholar
  22. 22.
    TeraGrid 2006. NSF Teragrid Project (2003),
  23. 23.
    Wong, S., Miles, S., Fang, W., Groth, P., Moreau, L.: Validation of E-Science Experiments using a Provenance-based Approach. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  24. 24.
    Wroe, C., Goble, C., Greenwood, M., Lord, P., Miles, S., Papay, J., Payne, T., Moreau, L.: Automating Experiments Using Semantic Data on a Bioinformatics Grid. IEEE Intelligent Systems special issue on e-Science (2004)Google Scholar
  25. 25.
    Zhao, J., Goble, C., Stevens, R., Bechhofer, S.: Semantics of a Networked World: Semantics for Grid Databases. In: Bouzeghoub, M., Goble, C.A., Kashyap, V., Spaccapietra, S. (eds.) ICSNW 2004. LNCS, vol. 3226. Springer, Heidelberg (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jihie Kim
    • 1
  • Yolanda Gil
    • 1
  • Varun Ratnakar
    • 1
  1. 1.Information Sciences InstituteUniversity of Southern CaliforniaMarina del ReyUnited States

Personalised recommendations