Towards Automatic Generation of Semantic Types in Scientific Workflows

  • Shawn Bowers
  • Bertram Ludäscher
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3807)


Scientific workflow systems are problem-solving environments that allow scientists to automate and reproduce data management and analysis tasks. Workflow components include actors (e.g., queries, transformations, analyses, simulations, visualizations), and datasets which are produced and consumed by actors. The increasing number of such components creates the problem of discovering suitable components and of composing them to form the desired scientific workflow. In previous work we proposed the use of semantic types (annotations relative to an ontology) to solve these problems. Since creating semantic types can be complex and time-consuming, scalability of the approach becomes an issue. In this paper we propose a framework to automatically derive semantic types from a (possibly small) number of initial types. Our approach propagates the given semantic types through workflow steps whose input and output data structures are related via query expressions. By propagating semantic types, we can significantly reduce the effort required to annotate datasets and components and even derive new “candidate axioms” for inclusion in annotation ontologies.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)MATHGoogle Scholar
  2. 2.
    Berkley, C., Bowers, S., Jones, M., Ludaescher, B., Schildhauer, M., Tao, J.: Incorporating semantics in scientific workflow authoring. In: Proc. of SSDBM (2005)Google Scholar
  3. 3.
    Bhagwat, D., Chiticariu, L., Tan, W.C., Vijayvargiya, G.: An annotation management system for relational databases. In: Proc. of VLDB (2004)Google Scholar
  4. 4.
    Biskup, J., Kluck, A.: A new approach to inferences of semantic constraints. In: Proc. of Advances in Databases and Information Systems (1997)Google Scholar
  5. 5.
    Bowers, S., Ludäscher, B.: An ontology-driven framework for data transformation in scientific workflows. In: Rahm, E. (ed.) DILS 2004. LNCS (LNBI), vol. 2994, pp. 1–16. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Bowers, S., Ludäscher, B.: Actor-oriented design of scientific workflows. In: 24th Intl. Conf. on Conceptual Modeling, ER (2005)Google Scholar
  7. 7.
    Brooks, C., Lee, E.A., Liu, X., Neuendorffer, S., Zhao, Y., Zheng, H.: The Ptolemy II Manual (vol. 1-3). Technical report, UC Berkeley (2004)Google Scholar
  8. 8.
    Buneman, P., Khanna, S., Tan, W.-C.: Why and where: A characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001, vol. 1973, p. 316. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  9. 9.
    Chalcraft, D., Williams, J., Smith, M., Willig, M.: Scale dependence in the species-richness-productivity relationship: The role of species turnover. Ecology 85(10) (2004)Google Scholar
  10. 10.
    Clark, K.L.: Negation as failure. In: Logic and Databases. Plemum Press (1977)Google Scholar
  11. 11.
    Lee, E.A., Parks, T.M.: Dataflow process networks. Proc. of the IEEE 83(5) (1995)Google Scholar
  12. 12.
    Lenzerini, M.: Data integration: A theoretical perspective. In: Proc. of PODS (2002)Google Scholar
  13. 13.
    Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., Zhao, Y.: Scientific workflow management and the kepler system. Concurrency and Computation: Practice & Experience (2005) (to appear)Google Scholar
  14. 14.
    Ludäscher, B., Gupta, A., Martone, M.E.: Model-based mediation with domain maps. In: Proc. of ICDE (2001)Google Scholar
  15. 15.
    Nash, A., Bernstein, P.A., Melnik, S.: Composition of mappings given by embedded dependencies. In: Proc. of PODS (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Shawn Bowers
    • 1
  • Bertram Ludäscher
    • 2
  1. 1.UC Davis Genome Center 
  2. 2.Department of Computer ScienceUniversity of CaliforniaDavis

Personalised recommendations