An Automated ILP Server in the Field of Bioinformatics
- 244 Downloads
The identification of evolutionary related (homologous) proteins is a key problem in molecular biology. Here we present a inductive logic programming based method, Homology Induction (HI), which acts as a filter for existing sequence similarity searches to improve their performance in the detection of remote protein homologies. HI performs a PSI-BLAST search to generate positive, negative, and uncertain examples, and collects descriptions of these examples. It then learns rules to discriminate the positive and negative examples. The rules are used to filter the uncertain examples in the “twilight zone”. HI uses a multitable database of 51,430,710 pre-fabricated facts from a variety of biological sources, and the inductive logic programming system Aleph to induce rules. Hi was tested on an independent set of protein sequences with equal or less than 40 per cent sequence similarity (PDB40D). ROC analysis is performed showing that HI can significantly improve existing similarity searches. The method is automated and can be used via a web/mail interface.
KeywordsReceiver Operating Characteristic Receiver Operating Characteristic Curve Receiver Operating Characteristic Analysis Inductive Logic Programming Deductive Database
Unable to display preview. Download preview PDF.
- 1.S. F. Altschul, W. Gish, W. Miller, Eugene W. Myers, and D. J. Lipman. Basic local alignment search tool. J. Mol. Biol., 215:403–410, 1990.Google Scholar
- 5.L. Breiman. Bagging predictors. Machine Learning, 26(2):123–140, 1996.Google Scholar
- 6.L. Dehaspe. Frequent Pattern Discovery in First-Order Logic. PhD thesis, Department of Computer Science, Katholieke Universiteit Leuven, Belgium, 1998.Google Scholar
- 7.S. Dzeroski. Inductive logic programming and knowledge discovery. In U. M. Fayyad, G. Piatetsky-Sharpiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 117–152. AAAI/MIT Press, 1996.Google Scholar
- 8.J. P. Egan. Signal Detection Theory and ROC Analysis. Cognition and Perception. Academic Press, New York, 1975.Google Scholar
- 12.T. Jaakola, M. Diekhans, and D. Haussler. Using Fisher kernel method to detect remote protein homologies. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pages 149–158. AAAI, AAAI Press, 1999.Google Scholar
- 16.E. R. Kirk. Statistics: An Introduction. Hardcourt Brace College, USA, fourth edition, 1999.Google Scholar
- 17.N. Lavrac and S. Dzeroski. Inductive Logic Programming: Techniques and Applications. Ellis Horwood, 1994.Google Scholar
- 19.D. J. Lipman and W. R. Pearson. Rapid and sensitive protein similarity searches. Science, 277:1435–1441, March 1985.Google Scholar
- 22.Stephen Muggleton. Inverse entailment and progol. New Generation Computing Journal, 13:245–286, 1995.Google Scholar
- 28.W. R. Pearson and D. J. Lipman. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA, pages 2444–2448, 1988.Google Scholar
- 29.F. Provost, T. Fawcett, and R. Kohavi. The case against accuracy estimation for comparing induction algorithms. In Proc. 15th International Conf. on Machine Learning, pages 445–453. Morgan Kaufmann, San Francisco, CA, 1998.Google Scholar
- 30.F. J. Provost and T. Fawcett. Robust classification systems for imprecise environments. In AAAI/IAAI, pages 706–713, 1998.Google Scholar
- 33.J. A. Swets and R. M. Pickett. Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. Academic Press, New York, 1982.Google Scholar
- 34.G. Tecuci. Building Intelligent Agents: An Apprenticeship Multistrategy Learning Theory, Methodology, Tool and Case Studies. Academic Press, 1998.Google Scholar
- 35.M. Turcotte, Steven. H. Muggleton, and Micheal J. E. Sternberg. Application of inductive logic programming to discover rules governing the three-dimensional topology of protein structure. In C. D. Page, editor, Proc. 8th International Conference on Inductive Logic Programming (ILP-98), pages 53–64. Spinger Verlag, Berlin, 1998.CrossRefGoogle Scholar
- 38.P. Young. PrePRINTS. http://www.bioinf.man.ac.uk/ConceptualBlast.html.