The VLDB Journal

, Volume 14, Issue 3, pp 318–329 | Cite as

Rule-based workflow management for bioinformatics

  • John S. Conery
  • Julian M. Catchen
  • Michael Lynch
Regular Paper

Abstract

We describe a data-centric software architecture for bioinformatics workflows and a rule-based workflow enactment system that uses declarative specifications of data dependences between steps to automatically order the execution of those steps. A data-centric view allows researchers to develop abstract descriptions of workflow products and provides mechanisms for describing workflow steps as objects. The rule-based approach supports an iterative design methodology for creating new workflows, where steps can be developed in small, incremental updates, and the object orientation allows workflow steps developed for one project to be reused in other projects.

Keywords

Workflow Rule-based system Bioinformatics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–402 (1997)CrossRefPubMedGoogle Scholar
  2. 2.
    Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T.J., Higgins, D.G., Thompson, J.D.: Multiple sequence alignment with the CLUSTAL series of programs. Nucleic Acids Res. 31(13), 3497–3500 (2003)CrossRefPubMedGoogle Scholar
  3. 3.
    Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22), 4673–4680 (1994)PubMedGoogle Scholar
  4. 4.
    Huelsenbeck, J.P., Ronquist, F.: MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17(8), 754–775 (2001)CrossRefPubMedGoogle Scholar
  5. 5.
    Lopez, R., Silventoinen, V., Robinson, S., Kibria, A., Gish, W.: WU-Blast2 server at the European bioinformatics institute. Nucleic Acids Res. 31(13), 3795–3798 (2003)CrossRefPubMedGoogle Scholar
  6. 6.
    Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85, 2444–2448 (1988)PubMedGoogle Scholar
  7. 7.
    van der Aalst, W., van Hee, K.: Workflow Management: Models, Methods, and Systems. MIT Press, Cambridge, MA (2002)Google Scholar
  8. 8.
    WFMC: Workflow reference model. Technical report, Workflow Management Coalition, Brussels (1994) http://www.wfmc.org/standards/model.htmGoogle Scholar
  9. 9.
    Goodman, N., Rozen, S., Stein, L.D., Smith, A.G.: The LabBase system for data management in large scale biology research laboratories. Bioinformatics 14(7), 562–574 (1998)CrossRefPubMedGoogle Scholar
  10. 10.
    Medeiros, C.B., Vossen, G., Weske, M.: WASA: A workflow-based architecture to support scientific database applications (extended abstract). In: Database and Expert Systems Applications, pp. 574–583 (1995) citeseer.ist.psu.edu/bauzermedeiros95wasa.htmlGoogle Scholar
  11. 11.
    Ailamaki, A., Ioannidis, Y.E., Livny, M.: Scientific workflow management by database management. In: Rafanelli, M., Jarke, M. (eds.) SSDBM, pp. 190–199. IEEE Computer Society (1998)Google Scholar
  12. 12.
    Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure, 2nd edn. Morgan Kaufmann, San Francisco, CA (2003)Google Scholar
  13. 13.
    Liu, D.T., Franklin, M.J.: GridDB: A data-centric overlay for scientific grids. In: Proceedings of the 30th VLDB Conference, pp. 600–611 (2004)Google Scholar
  14. 14.
    Altintas, I., Bhagwanani, S., Buttler, D., Chandra, S., Cheng, Z., Coleman, M., Critchlow, T., Gupta, A., Han, W., Liu, L., Ludäscher, B., Pu, C., Moore, R., Shoshani, A., Vouk, M.A.: A modeling and execution environment for distributed scientific workflows. In: SSDBM, pp. 247–250. IEEE Computer Society (2003)Google Scholar
  15. 15.
    Fileto, R., Liu, L., Pu, C. et al.: POESIA: An ontological workflow approach for composing Web services in agriculture. VLDB J.: Very Large Data Bases 12(4), 352–367 (2003)CrossRefGoogle Scholar
  16. 16.
    Oinn, T., Addis, M., Ferris, J., Marvin, D., Greenwood, M., Carver, T., Pocock, M.R., Wipat, A., Li, P.: Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics (2004)Google Scholar
  17. 17.
    Jagadish, H.V., Olken, F.: Data management for the biosciences: Report of the NSF/NLM workshop on data management for molecular and cell biology. Technical report, LBNL-52767. Lawrence Berkeley National Laboratory (2003)Google Scholar
  18. 18.
    Bhowmick, S.S., Vedagiri, V., Laud, L.: HyperThesis: the gRNA spell on the curse of bioinformatics applications integration. In: Proceedings of the 2003 ACM International Conference on Information and Knowledge Management (CIKM-03), pp. 402–409. ACM Press, New York (2003)Google Scholar
  19. 19.
    Hoon, S., Ratnapu, K.K., Chia, J.M., Kumarasamy, B., Juguang, X., Clamp, M., Stabenau, A., Potter, S., Clarke, L., Stupka, E.: BioPipe: A flexible framework for protocol-based bioinformatics analysis. Genome Res. 13(8), 1904–1915 (2003)PubMedGoogle Scholar
  20. 20.
    IBM: Bioinformatics workflow builder (BioWBI) (2004). http://www.alphaworks.ibm.com/tech/biowbiGoogle Scholar
  21. 21.
    Mungall, C.J.: BioMake: functional logical task management for bioinformatics. In: Bioinformatics Open Source Conference (BOSC'04), Glasgow, Scotland (2004) http://open-bio.org/bosc2004/accepted_abstracts.htmlGoogle Scholar
  22. 22.
    Shah, S.P., He, D.Y. et al.: Pegasys: Software for executing and integrating analyses of biological sequences. BMC Bioinformatics 5(4) (2004)Google Scholar
  23. 23.
    van der Aalst, W., ter Hofstede, A., Kiepuszewski, B., Barros, A.: Workflow patterns. Dist. Parallel Databases 14(1), 5–51 (2003)CrossRefGoogle Scholar
  24. 24.
    van der Aalst, W.M.P., Aldred, L., Dumas, M., ter Hofstede, A.H.M.: Design and implementation of the YAWL system. In: Proceedings of the 16th International Conference on Advanced Information Systems Engineering (CAISE'04), pp. 142–159. Springer-Verlag, Heidelberg (2004)Google Scholar
  25. 25.
    Sterling, L., Shapiro, E.: The Art of Prolog: Advanced Programming Techniques. The MIT Press, New York (1986)Google Scholar
  26. 26.
    Davulcu, H., Kifer, M., Ramakrishnan, C.R., Ramakrishnan, I.V.: Logic based modeling and analysis of workflows. In: ACM Symposium on Principles of Database Systems, pp. 25–33 (1998)Google Scholar
  27. 27.
    Bonner, A.: Workflow, transactions, and datalog. In: ACM Symposium on Principles of Database Systems, pp. 294–305 (1999)Google Scholar
  28. 28.
    Senkul, P., Kifer, M., Toroslu, I.H.: A logical framework for scheduling workflows under resource allocation constraints. In: Bernstein, P.A. et al. (eds.) Proceedings of the Twenty-Eighth International Conference on Very Large Data Bases (VLDB'02), pp. 694–705. Morgan Kaufmann, Los Altos, CA 94022, USA (2002)Google Scholar
  29. 29.
    National Center for Biotechnology Information. http://www.ncbi.nih.govGoogle Scholar
  30. 30.
    Open Bioinformatics Foundation. http://www.open-bio.orgGoogle Scholar
  31. 31.
    Yang, Z.: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13(5), 555–556 (1997)PubMedGoogle Scholar
  32. 32.
    Yang, J., Papazoglou, M.P.: Service components for managing the life-cycle of service compositions. Inform. Syst. 29(2), 97–125 (2004) http://dx.doi.org/10.1016/S0306-4379(03)00051-6Google Scholar
  33. 33.
    MySQL. http://www.mysql.comGoogle Scholar
  34. 34.
    Lynch, M., Conery, J.S.: The evolutionary fate and consequences of duplicate genes. Science 290(5494), 1151–1155 (2000)PubMedGoogle Scholar
  35. 35.
    Lynch, M., Conery, J.S.: The evolutionary demography of duplicate genes. J. Struct. Funct. Genomics 3(1–4), 35–44 (2003)CrossRefPubMedGoogle Scholar
  36. 36.
    Li, W.H., Wu, C.I., Luo, C.C.: A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2(2), 150–174 (1985)PubMedGoogle Scholar
  37. 37.
    The R Project for Statistical Computing. http://www.r-project.orgGoogle Scholar
  38. 38.
    PHP: Hypertext Preprocessor. http://www.php.netGoogle Scholar
  39. 39.
    Garlan, D., Perry, D.E.: Introduction to the special issue on software architecture. IEEE Trans. Software Eng. 21(4), 269–274 (1995)Google Scholar
  40. 40.
    Ueda, K.: Guarded horn clauses. In: Proceedings of the 4th Conference on Logic Programming, pp. 168–179. Springer Verlag, New York (1986)Google Scholar
  41. 41.
    Hashmi, N., Lee, S., Cummings, M.P.: Abstracting workflows: unifying bioinformatics task conceptualization and specification through semantic web services. In: W3C Workshop on Semantic Web for Life Sciences. Cambridge, MA (2004)Google Scholar
  42. 42.
    Hull, R., Su, J.: Tools for design of composite web services. In: Weikum, G., König, A.C., Deßloch S. (eds.) SIGMOD Conference, pp. 958–961. ACM (2004)Google Scholar
  43. 43.
    Orriëns, B., Yang, J., Papazoglou, M.P.: Model driven service composition. In: Orlowska, M.E., Weerawarana, S., Papazoglou, M.P., Yang, J. (eds.) ICSOC. Lecture Notes in Computer Science, vol. 2910, pp. 75–90. Springer-Verlag, Berlin Heidelberg New York (2003)Google Scholar
  44. 44.
    Lord, P. et al.: Applying semantic web services to bioinformatics: Experiences gained, lessons learnt. In: ISWC'04. LNCS 3298, pp. 350–364. Springer-Verlag, Berlin Heidelberg New York (2004)Google Scholar

Copyright information

© Springer-Verlag 2005

Authors and Affiliations

  • John S. Conery
    • 1
  • Julian M. Catchen
    • 1
  • Michael Lynch
    • 2
  1. 1.Department of Computer and Information ScienceUniversity of OregonEugeneUSA
  2. 2.Department of BiologyIndiana UniversityBloomingtonUSA

Personalised recommendations