Abstract
We describe a data-centric software architecture for bioinformatics workflows and a rule-based workflow enactment system that uses declarative specifications of data dependences between steps to automatically order the execution of those steps. A data-centric view allows researchers to develop abstract descriptions of workflow products and provides mechanisms for describing workflow steps as objects. The rule-based approach supports an iterative design methodology for creating new workflows, where steps can be developed in small, incremental updates, and the object orientation allows workflow steps developed for one project to be reused in other projects.
Similar content being viewed by others
References
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–402 (1997)
Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T.J., Higgins, D.G., Thompson, J.D.: Multiple sequence alignment with the CLUSTAL series of programs. Nucleic Acids Res. 31(13), 3497–3500 (2003)
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22), 4673–4680 (1994)
Huelsenbeck, J.P., Ronquist, F.: MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17(8), 754–775 (2001)
Lopez, R., Silventoinen, V., Robinson, S., Kibria, A., Gish, W.: WU-Blast2 server at the European bioinformatics institute. Nucleic Acids Res. 31(13), 3795–3798 (2003)
Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85, 2444–2448 (1988)
van der Aalst, W., van Hee, K.: Workflow Management: Models, Methods, and Systems. MIT Press, Cambridge, MA (2002)
WFMC: Workflow reference model. Technical report, Workflow Management Coalition, Brussels (1994) http://www.wfmc.org/standards/model.htm
Goodman, N., Rozen, S., Stein, L.D., Smith, A.G.: The LabBase system for data management in large scale biology research laboratories. Bioinformatics 14(7), 562–574 (1998)
Medeiros, C.B., Vossen, G., Weske, M.: WASA: A workflow-based architecture to support scientific database applications (extended abstract). In: Database and Expert Systems Applications, pp. 574–583 (1995) citeseer.ist.psu.edu/bauzermedeiros95wasa.html
Ailamaki, A., Ioannidis, Y.E., Livny, M.: Scientific workflow management by database management. In: Rafanelli, M., Jarke, M. (eds.) SSDBM, pp. 190–199. IEEE Computer Society (1998)
Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure, 2nd edn. Morgan Kaufmann, San Francisco, CA (2003)
Liu, D.T., Franklin, M.J.: GridDB: A data-centric overlay for scientific grids. In: Proceedings of the 30th VLDB Conference, pp. 600–611 (2004)
Altintas, I., Bhagwanani, S., Buttler, D., Chandra, S., Cheng, Z., Coleman, M., Critchlow, T., Gupta, A., Han, W., Liu, L., Ludäscher, B., Pu, C., Moore, R., Shoshani, A., Vouk, M.A.: A modeling and execution environment for distributed scientific workflows. In: SSDBM, pp. 247–250. IEEE Computer Society (2003)
Fileto, R., Liu, L., Pu, C. et al.: POESIA: An ontological workflow approach for composing Web services in agriculture. VLDB J.: Very Large Data Bases 12(4), 352–367 (2003)
Oinn, T., Addis, M., Ferris, J., Marvin, D., Greenwood, M., Carver, T., Pocock, M.R., Wipat, A., Li, P.: Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics (2004)
Jagadish, H.V., Olken, F.: Data management for the biosciences: Report of the NSF/NLM workshop on data management for molecular and cell biology. Technical report, LBNL-52767. Lawrence Berkeley National Laboratory (2003)
Bhowmick, S.S., Vedagiri, V., Laud, L.: HyperThesis: the gRNA spell on the curse of bioinformatics applications integration. In: Proceedings of the 2003 ACM International Conference on Information and Knowledge Management (CIKM-03), pp. 402–409. ACM Press, New York (2003)
Hoon, S., Ratnapu, K.K., Chia, J.M., Kumarasamy, B., Juguang, X., Clamp, M., Stabenau, A., Potter, S., Clarke, L., Stupka, E.: BioPipe: A flexible framework for protocol-based bioinformatics analysis. Genome Res. 13(8), 1904–1915 (2003)
IBM: Bioinformatics workflow builder (BioWBI) (2004). http://www.alphaworks.ibm.com/tech/biowbi
Mungall, C.J.: BioMake: functional logical task management for bioinformatics. In: Bioinformatics Open Source Conference (BOSC'04), Glasgow, Scotland (2004) http://open-bio.org/bosc2004/accepted_abstracts.html
Shah, S.P., He, D.Y. et al.: Pegasys: Software for executing and integrating analyses of biological sequences. BMC Bioinformatics 5(4) (2004)
van der Aalst, W., ter Hofstede, A., Kiepuszewski, B., Barros, A.: Workflow patterns. Dist. Parallel Databases 14(1), 5–51 (2003)
van der Aalst, W.M.P., Aldred, L., Dumas, M., ter Hofstede, A.H.M.: Design and implementation of the YAWL system. In: Proceedings of the 16th International Conference on Advanced Information Systems Engineering (CAISE'04), pp. 142–159. Springer-Verlag, Heidelberg (2004)
Sterling, L., Shapiro, E.: The Art of Prolog: Advanced Programming Techniques. The MIT Press, New York (1986)
Davulcu, H., Kifer, M., Ramakrishnan, C.R., Ramakrishnan, I.V.: Logic based modeling and analysis of workflows. In: ACM Symposium on Principles of Database Systems, pp. 25–33 (1998)
Bonner, A.: Workflow, transactions, and datalog. In: ACM Symposium on Principles of Database Systems, pp. 294–305 (1999)
Senkul, P., Kifer, M., Toroslu, I.H.: A logical framework for scheduling workflows under resource allocation constraints. In: Bernstein, P.A. et al. (eds.) Proceedings of the Twenty-Eighth International Conference on Very Large Data Bases (VLDB'02), pp. 694–705. Morgan Kaufmann, Los Altos, CA 94022, USA (2002)
National Center for Biotechnology Information. http://www.ncbi.nih.gov
Open Bioinformatics Foundation. http://www.open-bio.org
Yang, Z.: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13(5), 555–556 (1997)
Yang, J., Papazoglou, M.P.: Service components for managing the life-cycle of service compositions. Inform. Syst. 29(2), 97–125 (2004) http://dx.doi.org/10.1016/S0306-4379(03)00051-6
MySQL. http://www.mysql.com
Lynch, M., Conery, J.S.: The evolutionary fate and consequences of duplicate genes. Science 290(5494), 1151–1155 (2000)
Lynch, M., Conery, J.S.: The evolutionary demography of duplicate genes. J. Struct. Funct. Genomics 3(1–4), 35–44 (2003)
Li, W.H., Wu, C.I., Luo, C.C.: A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2(2), 150–174 (1985)
The R Project for Statistical Computing. http://www.r-project.org
PHP: Hypertext Preprocessor. http://www.php.net
Garlan, D., Perry, D.E.: Introduction to the special issue on software architecture. IEEE Trans. Software Eng. 21(4), 269–274 (1995)
Ueda, K.: Guarded horn clauses. In: Proceedings of the 4th Conference on Logic Programming, pp. 168–179. Springer Verlag, New York (1986)
Hashmi, N., Lee, S., Cummings, M.P.: Abstracting workflows: unifying bioinformatics task conceptualization and specification through semantic web services. In: W3C Workshop on Semantic Web for Life Sciences. Cambridge, MA (2004)
Hull, R., Su, J.: Tools for design of composite web services. In: Weikum, G., König, A.C., Deßloch S. (eds.) SIGMOD Conference, pp. 958–961. ACM (2004)
Orriëns, B., Yang, J., Papazoglou, M.P.: Model driven service composition. In: Orlowska, M.E., Weerawarana, S., Papazoglou, M.P., Yang, J. (eds.) ICSOC. Lecture Notes in Computer Science, vol. 2910, pp. 75–90. Springer-Verlag, Berlin Heidelberg New York (2003)
Lord, P. et al.: Applying semantic web services to bioinformatics: Experiences gained, lessons learnt. In: ISWC'04. LNCS 3298, pp. 350–364. Springer-Verlag, Berlin Heidelberg New York (2004)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Conery, J.S., Catchen, J.M. & Lynch, M. Rule-based workflow management for bioinformatics. The VLDB Journal 14, 318–329 (2005). https://doi.org/10.1007/s00778-005-0153-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-005-0153-9