Skip to main content
Log in

Rule-based workflow management for bioinformatics

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

We describe a data-centric software architecture for bioinformatics workflows and a rule-based workflow enactment system that uses declarative specifications of data dependences between steps to automatically order the execution of those steps. A data-centric view allows researchers to develop abstract descriptions of workflow products and provides mechanisms for describing workflow steps as objects. The rule-based approach supports an iterative design methodology for creating new workflows, where steps can be developed in small, incremental updates, and the object orientation allows workflow steps developed for one project to be reused in other projects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–402 (1997)

    Article  PubMed  Google Scholar 

  2. Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T.J., Higgins, D.G., Thompson, J.D.: Multiple sequence alignment with the CLUSTAL series of programs. Nucleic Acids Res. 31(13), 3497–3500 (2003)

    Article  PubMed  Google Scholar 

  3. Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22), 4673–4680 (1994)

    PubMed  Google Scholar 

  4. Huelsenbeck, J.P., Ronquist, F.: MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17(8), 754–775 (2001)

    Article  PubMed  Google Scholar 

  5. Lopez, R., Silventoinen, V., Robinson, S., Kibria, A., Gish, W.: WU-Blast2 server at the European bioinformatics institute. Nucleic Acids Res. 31(13), 3795–3798 (2003)

    Article  PubMed  Google Scholar 

  6. Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85, 2444–2448 (1988)

    PubMed  Google Scholar 

  7. van der Aalst, W., van Hee, K.: Workflow Management: Models, Methods, and Systems. MIT Press, Cambridge, MA (2002)

    Google Scholar 

  8. WFMC: Workflow reference model. Technical report, Workflow Management Coalition, Brussels (1994) http://www.wfmc.org/standards/model.htm

  9. Goodman, N., Rozen, S., Stein, L.D., Smith, A.G.: The LabBase system for data management in large scale biology research laboratories. Bioinformatics 14(7), 562–574 (1998)

    Article  PubMed  Google Scholar 

  10. Medeiros, C.B., Vossen, G., Weske, M.: WASA: A workflow-based architecture to support scientific database applications (extended abstract). In: Database and Expert Systems Applications, pp. 574–583 (1995) citeseer.ist.psu.edu/bauzermedeiros95wasa.html

  11. Ailamaki, A., Ioannidis, Y.E., Livny, M.: Scientific workflow management by database management. In: Rafanelli, M., Jarke, M. (eds.) SSDBM, pp. 190–199. IEEE Computer Society (1998)

  12. Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure, 2nd edn. Morgan Kaufmann, San Francisco, CA (2003)

    Google Scholar 

  13. Liu, D.T., Franklin, M.J.: GridDB: A data-centric overlay for scientific grids. In: Proceedings of the 30th VLDB Conference, pp. 600–611 (2004)

  14. Altintas, I., Bhagwanani, S., Buttler, D., Chandra, S., Cheng, Z., Coleman, M., Critchlow, T., Gupta, A., Han, W., Liu, L., Ludäscher, B., Pu, C., Moore, R., Shoshani, A., Vouk, M.A.: A modeling and execution environment for distributed scientific workflows. In: SSDBM, pp. 247–250. IEEE Computer Society (2003)

  15. Fileto, R., Liu, L., Pu, C. et al.: POESIA: An ontological workflow approach for composing Web services in agriculture. VLDB J.: Very Large Data Bases 12(4), 352–367 (2003)

    Article  Google Scholar 

  16. Oinn, T., Addis, M., Ferris, J., Marvin, D., Greenwood, M., Carver, T., Pocock, M.R., Wipat, A., Li, P.: Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics (2004)

  17. Jagadish, H.V., Olken, F.: Data management for the biosciences: Report of the NSF/NLM workshop on data management for molecular and cell biology. Technical report, LBNL-52767. Lawrence Berkeley National Laboratory (2003)

  18. Bhowmick, S.S., Vedagiri, V., Laud, L.: HyperThesis: the gRNA spell on the curse of bioinformatics applications integration. In: Proceedings of the 2003 ACM International Conference on Information and Knowledge Management (CIKM-03), pp. 402–409. ACM Press, New York (2003)

  19. Hoon, S., Ratnapu, K.K., Chia, J.M., Kumarasamy, B., Juguang, X., Clamp, M., Stabenau, A., Potter, S., Clarke, L., Stupka, E.: BioPipe: A flexible framework for protocol-based bioinformatics analysis. Genome Res. 13(8), 1904–1915 (2003)

    PubMed  Google Scholar 

  20. IBM: Bioinformatics workflow builder (BioWBI) (2004). http://www.alphaworks.ibm.com/tech/biowbi

  21. Mungall, C.J.: BioMake: functional logical task management for bioinformatics. In: Bioinformatics Open Source Conference (BOSC'04), Glasgow, Scotland (2004) http://open-bio.org/bosc2004/accepted_abstracts.html

  22. Shah, S.P., He, D.Y. et al.: Pegasys: Software for executing and integrating analyses of biological sequences. BMC Bioinformatics 5(4) (2004)

  23. van der Aalst, W., ter Hofstede, A., Kiepuszewski, B., Barros, A.: Workflow patterns. Dist. Parallel Databases 14(1), 5–51 (2003)

    Article  Google Scholar 

  24. van der Aalst, W.M.P., Aldred, L., Dumas, M., ter Hofstede, A.H.M.: Design and implementation of the YAWL system. In: Proceedings of the 16th International Conference on Advanced Information Systems Engineering (CAISE'04), pp. 142–159. Springer-Verlag, Heidelberg (2004)

  25. Sterling, L., Shapiro, E.: The Art of Prolog: Advanced Programming Techniques. The MIT Press, New York (1986)

    Google Scholar 

  26. Davulcu, H., Kifer, M., Ramakrishnan, C.R., Ramakrishnan, I.V.: Logic based modeling and analysis of workflows. In: ACM Symposium on Principles of Database Systems, pp. 25–33 (1998)

  27. Bonner, A.: Workflow, transactions, and datalog. In: ACM Symposium on Principles of Database Systems, pp. 294–305 (1999)

  28. Senkul, P., Kifer, M., Toroslu, I.H.: A logical framework for scheduling workflows under resource allocation constraints. In: Bernstein, P.A. et al. (eds.) Proceedings of the Twenty-Eighth International Conference on Very Large Data Bases (VLDB'02), pp. 694–705. Morgan Kaufmann, Los Altos, CA 94022, USA (2002)

  29. National Center for Biotechnology Information. http://www.ncbi.nih.gov

  30. Open Bioinformatics Foundation. http://www.open-bio.org

  31. Yang, Z.: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13(5), 555–556 (1997)

    PubMed  Google Scholar 

  32. Yang, J., Papazoglou, M.P.: Service components for managing the life-cycle of service compositions. Inform. Syst. 29(2), 97–125 (2004) http://dx.doi.org/10.1016/S0306-4379(03)00051-6

    Google Scholar 

  33. MySQL. http://www.mysql.com

  34. Lynch, M., Conery, J.S.: The evolutionary fate and consequences of duplicate genes. Science 290(5494), 1151–1155 (2000)

    PubMed  Google Scholar 

  35. Lynch, M., Conery, J.S.: The evolutionary demography of duplicate genes. J. Struct. Funct. Genomics 3(1–4), 35–44 (2003)

    Article  PubMed  Google Scholar 

  36. Li, W.H., Wu, C.I., Luo, C.C.: A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2(2), 150–174 (1985)

    PubMed  Google Scholar 

  37. The R Project for Statistical Computing. http://www.r-project.org

  38. PHP: Hypertext Preprocessor. http://www.php.net

  39. Garlan, D., Perry, D.E.: Introduction to the special issue on software architecture. IEEE Trans. Software Eng. 21(4), 269–274 (1995)

    Google Scholar 

  40. Ueda, K.: Guarded horn clauses. In: Proceedings of the 4th Conference on Logic Programming, pp. 168–179. Springer Verlag, New York (1986)

  41. Hashmi, N., Lee, S., Cummings, M.P.: Abstracting workflows: unifying bioinformatics task conceptualization and specification through semantic web services. In: W3C Workshop on Semantic Web for Life Sciences. Cambridge, MA (2004)

  42. Hull, R., Su, J.: Tools for design of composite web services. In: Weikum, G., König, A.C., Deßloch S. (eds.) SIGMOD Conference, pp. 958–961. ACM (2004)

  43. Orriëns, B., Yang, J., Papazoglou, M.P.: Model driven service composition. In: Orlowska, M.E., Weerawarana, S., Papazoglou, M.P., Yang, J. (eds.) ICSOC. Lecture Notes in Computer Science, vol. 2910, pp. 75–90. Springer-Verlag, Berlin Heidelberg New York (2003)

    Google Scholar 

  44. Lord, P. et al.: Applying semantic web services to bioinformatics: Experiences gained, lessons learnt. In: ISWC'04. LNCS 3298, pp. 350–364. Springer-Verlag, Berlin Heidelberg New York (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John S. Conery.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Conery, J.S., Catchen, J.M. & Lynch, M. Rule-based workflow management for bioinformatics. The VLDB Journal 14, 318–329 (2005). https://doi.org/10.1007/s00778-005-0153-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-005-0153-9

Keywords

Navigation