Abstract
Transparent information integration across distributed and heterogeneous data sources and computational tools is a prime concern for bioinformatics. Recently, there have been proposals for a semantic web addressing these requirements. A promising approach for such a semantic web are the integration of rules to specify and implement workflows and object-orientation to cater for computational aspects.
We present PROVA, a Java-based rule-engine, which realises this integration. It enables one to separate a declarative description of information workflows from any implementation details and thus easily create and maintain code. We show how PROVA is used to compose an information and computation workflow involving
-
rules for specifying the workflow,
-
rules for reasoning over the data,
-
rules for accessing flat files, databases, and other services, and
-
rules involving heavy-duty computations.
The resulting code is very compact and re-usable.
We give a detailed account of PROVA and document its use with a example of a system, PSIMAP, which derives domain-domain interactions from multi-domain structures in the PDB using the SCOP domain and superfamily definitions. PSIMAP is a typical bioinformatics application in that it integrates disparate information resources in different formats (flat files (PDB) and database (SCOP)) requiring additional computations.
PROVA is available at comas.soi.city.ac.uk/prova
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Badea, L.: Functional discrimination of gene expression patterns in terms of the Gene Ontology. In: Proceedings of the Pacific Symposium on Biocomputing - PSB 2003 (2003)
Bolser, D., Dafas, P., Harrington, R., Park, J., Schroeder, M.: Visualisation and graph-theoretic analysis of the large-scale protein structural interactome network psimap. BMC Bioinformatics 4(45) (2003)
Bryson, K., Luck, M., Joy, M., Jones, D.T.: Applying agents to bioinformatics in geneweaver. In: Cooperative Information Agents, pp. 60–71 (2000)
Badea, L., Tilivea, D.: Integrating biological process modelling with gene expression data and ontologies for functional genomics (position paper). In: Proceedings of the International Workshop on Computational Methods in Systems Biology, University of Trento, Springer, Heidelberg (2003)
Backofen, R., Will, S., Bornberg-Bauer, E.: Application of Constraint Programming Techniques for Structure Prediction of Lattice Proteins with Extended Alphabets. Journal of Bioinformatics 15(3), 234–242 (1999)
Chabrier, N., Fages, F.: Symbolic model checking of biochemical networks. In: Priami, C. (ed.) CMSB 2003. LNCS, vol. 2602, pp. 149–162. Springer, Heidelberg (2003)
The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000)
Dafas, P., Bolser, D., Gomoluch, J., Park, J., Schroeder, M.: Fast and efficient computation of domain-domain interactions from known protein structures in the PDB. In: Frisch, H.W., Frishman, D., Heun, V., Kramer, S. (eds.) Proceedings of German Conference on Bioinformatics, pp. 27–32 (2003)
Dietrich, J.: Mandarax., http://www.mandarax.org
Fernandes, A.A.A., Williams, M.H., Paton, N., Maria, M.L.: Object-Oriented Database Programming Languages Founded on an Axiomatic Theory of Objects. In: Workshop on Logical Foundations of Object-oriented Programming (1994)
Grütter, R., Eikemeier, C.: Development of a Simple Ontology Definition Language (SOntoDL) and its Application to a Medical Information Service on the World Wide Web. In: Proceedings of the First Semantic Web Working Symposium (SWWS 2001), July/August 2001, pp. 587–597. Stanford University, California (2001)
Grütter, R., Eikemeier, C., Steurer, J.: Towards a Simple Ontology Definition Language (SontoDL) for a Semantic Web of Evidence-Based Medical Information. In: Quaglini, S., Barahona, P., Andreassen, S. (eds.) AIME 2001. LNCS (LNAI), vol. 2101, Springer, Heidelberg (2001)
Jarke, M., Gallersdörfer, R., Jeusfeld, M.A., Staudt, M., Eberer, S.: CConceptBase - a deductive object base for meta data management. J. of Intelligent Information Systems (February 1995)
Krippahl, L., Barahona, P.: PSICO: Solving Protein Structures with Constraint Programming and Optimisation. Constraints 7(3/4), 317–331 (2002)
Lambrix, P., Edberg, A.: Evaluation of ontology merging tools in bioinformatics. In: Proceedings of the Pacific Symposium on Biocomputing - PSB 2003, pp. 589–600 (2003)
Lambrix, P., Jakoniene, V.: Towards transparent access to multiple biological databanks. In: Proceedings of the First Asia-Pacific Bioinformatics Conference, Adelaide, Australia, pp. 53–60 (2003)
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995)
Möller, S., Kriventseva, E.V., Apweiler, R.: A collection of well characterised integral membrane proteins. Bioinformatics 16(12), 1159–1160 (2000)
Möller, S., Leser, U., Fleischmann, W., Apweiler, R.: EDITtoTrEMBL: a distributed approach to high-quality automated protein sequence annotation. Bioinformatics 15(3), 219–227 (1999)
Möller, S., Schroeder, M., Apweiler, R.: Conflict-resolution for the automated annotation of transmembrane proteins. Comput. Chem. 26(1), 41–46 (2001)
Palma, P.N., Krippahl, L., Wampler, J.E., Moura, J.J.G.: BiGGER: A new (soft) docking algorithm for predicting protein interactions. Proteins: Structure, Function and Genetics 39, 372–384 (2000)
Stevens, R.D., Robinson, A.J., Goble, C.A.: mygrid: personalised bioinformatics on the information grid. In: Eleventh International Conference on Intelligent Systems for Molecular Biology, vol. 19 (2003)
The PDB Team. The protein data bank. In: Structural Bioinformatics, pp. 181–198. Wiley, Chichester (2003)
Yang, G., Kifer, M.: FLORA: Implementing an efficient DOOD system using a tabling logic engine. In: Palamidessi, C., Moniz Pereira, L., Lloyd, J.W., Dahl, V., Furbach, U., Kerber, M., Lau, K.-K., Sagiv, Y., Stuckey, P.J. (eds.) CL 2000. LNCS (LNAI), vol. 1861, p. 1078. Springer, Heidelberg (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kozlenkov, A., Schroeder, M. (2004). PROVA: Rule-Based Java-Scripting for a Bioinformatics Semantic Web. In: Rahm, E. (eds) Data Integration in the Life Sciences. DILS 2004. Lecture Notes in Computer Science(), vol 2994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24745-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-24745-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21300-0
Online ISBN: 978-3-540-24745-6
eBook Packages: Springer Book Archive