The Effect of Relational Background Knowledge on Learning of Protein Three-Dimensional Fold Signatures

Turcotte, Marcel; Muggleton, Stephen H.; Sternberg, Michael J.E.

doi:10.1023/A:1007672817406

The Effect of Relational Background Knowledge on Learning of Protein Three-Dimensional Fold Signatures

Published: April 2001

Volume 43, pages 81–95, (2001)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

The Effect of Relational Background Knowledge on Learning of Protein Three-Dimensional Fold Signatures

Download PDF

Marcel Turcotte¹,
Stephen H. Muggleton² &
Michael J.E. Sternberg¹

498 Accesses
15 Citations
Explore all metrics

Abstract

As a form of Machine Learning the study of Inductive Logic Programming (ILP) is motivated by a central belief: relational description languages are better (in terms of accuracy and understandability) than propositional ones for certain real-world applications. This claim is investigated here for a particular application in structural molecular biology, that of constructing readable descriptions of the major protein folds. To the authors' knowledge Machine Learning has not previously been applied systematically to this task. In this application, the domain expert (third author) identified a natural divide between essentially propositional features and more structurally-oriented relational ones. The following null hypotheses are tested: 1) for a given ILP system (Progol) provision of relational background knowledge does not increase predictive accuracy, 2) a good propositional learning system (C5.0) without relational background knowledge will outperform Progol with relational background knowledge, 3) relational background knowledge does not produce improved explanatory insight. Null hypotheses 1) and 2) are both refuted on cross-validation results carried out over 20 of the most populated protein folds. Hypothesis 3 is refuted by demonstration of various insightful rules discovered only in the relationally-oriented learned rules.

Article PDF

An Overview of Scoring Functions Used for Protein–Ligand Interactions in Molecular Docking

Article 15 March 2019

Recent advances in decision trees: an updated survey

Article 10 October 2022

A survey of Bayesian Network structure learning

Article Open access 17 January 2023

References

Bashford, D., Chothia, C.,& Lesk, A. M. (1987). Determinants of a protein fold. Unique features of the globin amino acid sequences. Journal of Molecular Biology, 196(1), 199–216.
Google Scholar
Bourne, P. E. (1998). Editorial. Bioinformatics, 15(9), 715–716.
Google Scholar
Branden, C.& Tooze, J. (1999). Introduction to protein structure. Garland.
Brenner, S. E., Chothia, C., Hubbard, T. J.,& Murzin, A. G. (1996). Understanding protein structure: Using SCOP for fold interpretation. Methods in Enzymology, 266, 635–643.
Google Scholar
Finn, P., Muggleton, S., Page, D.,& Srinivasan, A. (1998). Pharmacophore discovery using the inductive logic programming system Progol. Machine Learning, 30, 241–271.
Google Scholar
Hutchinson, E. G.& Thornton, J. M. (1996). PROMOTIF—a program to identify and analyze structural motifs in proteins. Protein Science, 5(2), 212–220.
Google Scholar
Kelley, L. A., MacCallum, R. M.,& Sternberg, M. J. E. (2000). Enhanced genome annotation using structural profiles in the program 3D-pssm, Journal of Molecular Biology, 299(2), 510–522.
Google Scholar
Kim, S.-H. (1998). Shining a light on structural genomics. Nature Structural Biology, Synchrotron supplement: 643–645.
King, R., Muggleton, S., Lewis, R.,& Sternberg, M. (1992). Drug design by machine learning: The use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. Proceedings of the National Academy of Sciences, 89(23), 11322–11326.
Google Scholar
King, R., Muggleton, S., Srinivasan, A.,& Sternberg, M. (1996). Structure-activity relationships derived by machine learning: The use of atoms and their bond connectives to predict mutagenicity by inductive logic programming. Proceedings of the National Academy of Sciences, 93, 438–442.
Google Scholar
Kuntz, I. D. (1972). Protein folding. Journal of the American Chemical Society, 94(11), 4009–4012.
Google Scholar
Langley, P. (1998). The computer-aided discovery of scientific knowledge. In Proceedings of the First International Conference on Discovery Science, Fukuoka, Japan: Springer-Verlag.
Google Scholar
Muggleton, S.& Firth, J. (in press). CProgol4.4: Theory and use. In S. Džeroski& N. Lavrac (Eds.), Inductive Logic Programing and Knowledge Discovery in Databases.
Muggleton, S., King, R.,& Sternberg, M. (1992). Protein secondary structure prediction using logic-based machine learning. Protein Engineering, 5(7), 647–657.
Google Scholar
Muggleton, S.& De Raedt, L. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming, 19/20, 629–679.
Google Scholar
Orengo, C. A., Jones, D. T.,& Thornton, J. M. (1994). Protein superfamilies and domain superfolds. Nature, 372(6507), 631–634.
Google Scholar
Pauling, L., Corey, R. B.,& Branson, H. R. (1951). The structure of proteins: Two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Natl. Acad. Sci. USA, 37, 205–210.
Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann.
Rozwarski, D. A., Gronenborn, A. M., Clore, G. M., Bazan, J. F., Bohm, A., Wlodawer, A., Hatada, M.,& Karplus, P. A. (1994). Structural comparisons among the short-chain helical cytokines. Structure, 2, 159–173.
Google Scholar
Srinivasan, A., King, R. D., Muggleton, S. H.,& Sternberg, M. (1997). Carcinogenesis predictions using ILP. In N. Lavrač& S. Džeroski (Eds.), Proceedings of the Seventh International Workshop on Inductive Logic Programming (pp. 273–287). Berlin: Springer-Verlag, LNAI 1297.
Google Scholar
Srinivasan, A., Muggleton, S., King, R.,& Sternberg, M. (1996). Theories for mutagenicity: A study of first-order and feature based induction. Artificial Intelligence, 85(1/2), 277–299.
Google Scholar
Sternberg, M., King, R., Lewis, R.,& Muggleton, S. (1994). Application of machine learning to structural molecular biology. Philosophical Transactions of the Royal Society B, 344, 365–371.
Google Scholar
Wierenga, R. K., Terpstra, P.,& Hol, W. G. J. (1986). Prediction of the occurence of the ADP-binding β–α–β-fold in proteins, using and amino acid sequence fingerprint. Journal of Molecular Biology, 187, 101–107.
Google Scholar

Download references

Author information

Authors and Affiliations

Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, P.O. Box 123, London, WC2A 3PX, UK
Marcel Turcotte & Michael J.E. Sternberg
Department of Computer Science, University of York, Heslington, York, YO1 5DD, UK
Stephen H. Muggleton

Authors

Marcel Turcotte
View author publications
You can also search for this author in PubMed Google Scholar
Stephen H. Muggleton
View author publications
You can also search for this author in PubMed Google Scholar
Michael J.E. Sternberg
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Turcotte, M., Muggleton, S.H. & Sternberg, M.J. The Effect of Relational Background Knowledge on Learning of Protein Three-Dimensional Fold Signatures. Machine Learning 43, 81–95 (2001). https://doi.org/10.1023/A:1007672817406

Download citation

Issue Date: April 2001
DOI: https://doi.org/10.1023/A:1007672817406

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The Effect of Relational Background Knowledge on Learning of Protein Three-Dimensional Fold Signatures

Abstract

Article PDF

Similar content being viewed by others

An Overview of Scoring Functions Used for Protein–Ligand Interactions in Molecular Docking

Recent advances in decision trees: an updated survey

A survey of Bayesian Network structure learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

The Effect of Relational Background Knowledge on Learning of Protein Three-Dimensional Fold Signatures

Abstract

Article PDF

Similar content being viewed by others

An Overview of Scoring Functions Used for Protein–Ligand Interactions in Molecular Docking

Recent advances in decision trees: an updated survey

A survey of Bayesian Network structure learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation