Abstract
Transparent information integration across distributed and heterogeneous data sources and computational tools is a prime concern for bioinformatics. Recently, there have been proposals for a semantic web addressing these requirements. A promising approach for such a semantic web are the integration of rules to specify and implement workflows and object-orientation to cater for computational aspects.
We present PROVA, a Java-based rule-engine, which realises this integration. It enables one to separate a declarative description of information workflows from any implementation details and thus easily create and maintain code. We show how PROVA is used to compose an information and computation workflow involving
-
rules for specifying the workflow,
-
rules for reasoning over the data,
-
rules for accessing flat files, databases, and other services, and
-
rules involving heavy-duty computations.
The resulting code is very compact and re-usable.
We give a detailed account of PROVA and document its use with a example of a system, PSIMAP, which derives domain-domain interactions from multi-domain structures in the PDB using the SCOP domain and superfamily definitions. PSIMAP is a typical bioinformatics application in that it integrates disparate information resources in different formats (flat files (PDB) and database (SCOP)) requiring additional computations.
PROVA is available at comas.soi.city.ac.uk/prova
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Badea, L.: Functional discrimination of gene expression patterns in terms of the Gene Ontology. In: Proceedings of the Pacific Symposium on Biocomputing - PSB 2003 (2003)
Bolser, D., Dafas, P., Harrington, R., Park, J., Schroeder, M.: Visualisation and graph-theoretic analysis of the large-scale protein structural interactome network psimap. BMC Bioinformatics 4(45) (2003)
Bryson, K., Luck, M., Joy, M., Jones, D.T.: Applying agents to bioinformatics in geneweaver. In: Cooperative Information Agents, pp. 60–71 (2000)
Badea, L., Tilivea, D.: Integrating biological process modelling with gene expression data and ontologies for functional genomics (position paper). In: Proceedings of the International Workshop on Computational Methods in Systems Biology, University of Trento, Springer, Heidelberg (2003)
Backofen, R., Will, S., Bornberg-Bauer, E.: Application of Constraint Programming Techniques for Structure Prediction of Lattice Proteins with Extended Alphabets. Journal of Bioinformatics 15(3), 234–242 (1999)
Chabrier, N., Fages, F.: Symbolic model checking of biochemical networks. In: Priami, C. (ed.) CMSB 2003. LNCS, vol. 2602, pp. 149–162. Springer, Heidelberg (2003)
The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000)
Dafas, P., Bolser, D., Gomoluch, J., Park, J., Schroeder, M.: Fast and efficient computation of domain-domain interactions from known protein structures in the PDB. In: Frisch, H.W., Frishman, D., Heun, V., Kramer, S. (eds.) Proceedings of German Conference on Bioinformatics, pp. 27–32 (2003)
Dietrich, J.: Mandarax., http://www.mandarax.org
Fernandes, A.A.A., Williams, M.H., Paton, N., Maria, M.L.: Object-Oriented Database Programming Languages Founded on an Axiomatic Theory of Objects. In: Workshop on Logical Foundations of Object-oriented Programming (1994)
Grütter, R., Eikemeier, C.: Development of a Simple Ontology Definition Language (SOntoDL) and its Application to a Medical Information Service on the World Wide Web. In: Proceedings of the First Semantic Web Working Symposium (SWWS 2001), July/August 2001, pp. 587–597. Stanford University, California (2001)
Grütter, R., Eikemeier, C., Steurer, J.: Towards a Simple Ontology Definition Language (SontoDL) for a Semantic Web of Evidence-Based Medical Information. In: Quaglini, S., Barahona, P., Andreassen, S. (eds.) AIME 2001. LNCS (LNAI), vol. 2101, Springer, Heidelberg (2001)
Jarke, M., Gallersdörfer, R., Jeusfeld, M.A., Staudt, M., Eberer, S.: CConceptBase - a deductive object base for meta data management. J. of Intelligent Information Systems (February 1995)
Krippahl, L., Barahona, P.: PSICO: Solving Protein Structures with Constraint Programming and Optimisation. Constraints 7(3/4), 317–331 (2002)
Lambrix, P., Edberg, A.: Evaluation of ontology merging tools in bioinformatics. In: Proceedings of the Pacific Symposium on Biocomputing - PSB 2003, pp. 589–600 (2003)
Lambrix, P., Jakoniene, V.: Towards transparent access to multiple biological databanks. In: Proceedings of the First Asia-Pacific Bioinformatics Conference, Adelaide, Australia, pp. 53–60 (2003)
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995)
Möller, S., Kriventseva, E.V., Apweiler, R.: A collection of well characterised integral membrane proteins. Bioinformatics 16(12), 1159–1160 (2000)
Möller, S., Leser, U., Fleischmann, W., Apweiler, R.: EDITtoTrEMBL: a distributed approach to high-quality automated protein sequence annotation. Bioinformatics 15(3), 219–227 (1999)
Möller, S., Schroeder, M., Apweiler, R.: Conflict-resolution for the automated annotation of transmembrane proteins. Comput. Chem. 26(1), 41–46 (2001)
Palma, P.N., Krippahl, L., Wampler, J.E., Moura, J.J.G.: BiGGER: A new (soft) docking algorithm for predicting protein interactions. Proteins: Structure, Function and Genetics 39, 372–384 (2000)
Stevens, R.D., Robinson, A.J., Goble, C.A.: mygrid: personalised bioinformatics on the information grid. In: Eleventh International Conference on Intelligent Systems for Molecular Biology, vol. 19 (2003)
The PDB Team. The protein data bank. In: Structural Bioinformatics, pp. 181–198. Wiley, Chichester (2003)
Yang, G., Kifer, M.: FLORA: Implementing an efficient DOOD system using a tabling logic engine. In: Palamidessi, C., Moniz Pereira, L., Lloyd, J.W., Dahl, V., Furbach, U., Kerber, M., Lau, K.-K., Sagiv, Y., Stuckey, P.J. (eds.) CL 2000. LNCS (LNAI), vol. 1861, p. 1078. Springer, Heidelberg (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kozlenkov, A., Schroeder, M. (2004). PROVA: Rule-Based Java-Scripting for a Bioinformatics Semantic Web. In: Rahm, E. (eds) Data Integration in the Life Sciences. DILS 2004. Lecture Notes in Computer Science(), vol 2994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24745-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-24745-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21300-0
Online ISBN: 978-3-540-24745-6
eBook Packages: Springer Book Archive