Athena: Text Mining Based Discovery of Scientific Workflows in Disperse Repositories

  • Flavio Costa
  • Daniel de Oliveira
  • Eduardo Ogasawara
  • Alexandre A. B. Lima
  • Marta Mattoso
Conference paper

DOI: 10.1007/978-3-642-27392-6_8

Part of the Lecture Notes in Computer Science book series (LNCS, volume 6799)
Cite this paper as:
Costa F., de Oliveira D., Ogasawara E., Lima A.A.B., Mattoso M. (2012) Athena: Text Mining Based Discovery of Scientific Workflows in Disperse Repositories. In: Lacroix Z., Vidal M.E. (eds) Resource Discovery. RED 2010. Lecture Notes in Computer Science, vol 6799. Springer, Berlin, Heidelberg

Abstract

Scientific workflows are abstractions used to model and execute in silico scientific experiments. They represent key resources for scientists and are enacted and managed by engines called Scientific Workflow Management Systems (SWfMS). Each SWfMS has a particular workflow language. This heterogeneity of languages and formats poses as complex scenario for scientists to search or discover workflows in distributed repositories for reuse. The existing workflows in these repositories can be used to leverage the identification and construction of families of workflows (clusters) that aim at a particular goal. However it is hard to compare the structure of these workflows since they are modeled in different formats. One alternative way is to compare workflow metadata such as natural language descriptions (usually found in workflow repositories) instead of comparing workflow structure. In this scenario, we expect that the effective use of classical text mining techniques can cluster a set of workflows in families, offering to the scientists the possibility of finding and reusing existing workflows, which may decrease the complexity of modeling a new experiment. This paper presents Athena, a cloud-based approach to support workflow clustering from disperse repositories using their natural language descriptions, thus integrating these repositories and providing a facilitated form to search and reuse workflows.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Flavio Costa
    • 1
  • Daniel de Oliveira
    • 1
  • Eduardo Ogasawara
    • 1
    • 2
  • Alexandre A. B. Lima
    • 1
  • Marta Mattoso
    • 1
  1. 1.COPPEFederal University of Rio de JaneiroRio de JaneiroBrazil
  2. 2.Federal Center of Technological Education (CEFET/RJ)Rio de JaneiroBrazil

Personalised recommendations