Pertinent Background Knowledge for Learning Protein Grammars

  • Christopher H. Bryant
  • Daniel C. Fredouille
  • Alex Wilson
  • Channa K. Jayawickreme
  • Steven Jupe
  • Simon Topp
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)

Abstract

We are interested in using Inductive Logic Programming (ILP) to infer grammars representing sets of protein sequences. ILP takes as input both examples and background knowledge predicates. This work is a first step in optimising the choice of background knowledge predicates for predicting the function of proteins. We propose methods to obtain different sets of background knowledge. We then study the impact of these sets on inference results through a hard protein function inference task: the prediction of the coupling preference of GPCR proteins. All but one of the proposed sets of background knowledge are statistically shown to have positive impacts on the predictive power of inferred rules, either directly or through interactions with other sets. In addition, this work provides further confirmation, after the work of Muggleton et al., 2001 that ILP can help to predict protein functions.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Muggleton, S., et al.: Protein secondary structure prediction using logic-based machine learning. Protein Eng. 5, 647–657 (1992)CrossRefGoogle Scholar
  2. 2.
    Mozetic, I.: Secondary structure prediction by inductive logic programming. In: Proc. 3rd Meeting on the Critical Assessment of Techniques for Protein Structure Prediction, CASP3, pp. A–26 (1998)Google Scholar
  3. 3.
    Srinivasan, A., et al.: Theories for mutagenicity: A study in first-order and feature-based induction. Artificial Intelligence 85(1-2), 277–299 (1996)CrossRefGoogle Scholar
  4. 4.
    King, R.D.: Applying inductive logic programming to predicting gene function. AI Mag. 25(1), 57–68 (2004)Google Scholar
  5. 5.
    Clare, A., et al.: The ILP 2005 challenge (2005), http://www.protein-logic.com/index.html
  6. 6.
    Muggleton, S.H., et al.: Are grammatical representations useful for learning from biological sequence data? – a case study. Jour. Comp. Biol. 5(8), 493–522 (2001)CrossRefGoogle Scholar
  7. 7.
    Taguchi, G.: Introduction to quality engineering. In: Asian Productivity Organization, Tokyo (distributed by American Supplier Institute, Inc. Dearborn, MI.) (1986)Google Scholar
  8. 8.
    Searls, D.B.: String variable grammar: A logic grammar formalism for the biological language of DNA. Jour. of Log. Prog. 12 (1993)Google Scholar
  9. 9.
    Dsouza, M., et al.: Searching for patterns in genomic data. Trends in Genetics 13(12), 497–498 (1997)CrossRefGoogle Scholar
  10. 10.
    Brazma, A., et al.: Discovering patterns and subfamilies in biosequences. In: Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology, pp. 34–43. AAAI Press, Menlo Park (1996)Google Scholar
  11. 11.
    Falquet, L., et al.: Protein data bank. Nucleic Acid Research 30, 235–238 (2002)CrossRefGoogle Scholar
  12. 12.
    Leung, S.W., et al.: Basic Gene Grammars and DNA-ChartParser for language processing of Escherichia coli promoter DNA sequences. Bioinformatics 17(3), 226–236 (2001)CrossRefGoogle Scholar
  13. 13.
    Bateman, A., et al.: The Pfam protein families database. Nucleic Acids Research 32, D138–D141 (2004)CrossRefGoogle Scholar
  14. 14.
    Sakakibara, Y., et al.: Stochastic context-free grammars for tRNA modeling. Nucleic Acids Research 22, 5112–5120 (1994)CrossRefGoogle Scholar
  15. 15.
    Cussens, J., Pulman, S.: Experiments in inductive chart parsing. In: Cussens, J., Džeroski, S. (eds.) LLL 1999. LNCS, vol. 1925, pp. 72–83. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  16. 16.
    Pereira, F., Warren, D.H.D.: Definite clause grammars for language analysis – a survey of the formalism and a comparison with augmented transition networks. Artificial Intelligence 13(3), 231–278 (1980)MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Apostolico, A., et al.: Verbumculus and the discovery of unusual words. Jour. Comp. Sci. and Tech. 19(1), 22–41 (2003)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Pierce, K., et al.: Seven-transmembrane receptors. Nat. Rev. Mol. Cell Biol. 3(9)(6), 39–50 (2002)Google Scholar
  19. 19.
    Sgourakis, N., et al.: Prediction of the coupling specificity of GPCRs to four families of G-proteins using hidden markov models and artificial neural networks. Bioinformatics 21(22), 4101–4106 (2005)CrossRefGoogle Scholar
  20. 20.
    Altschul, S., et al.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 389–402 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Christopher H. Bryant
    • 1
  • Daniel C. Fredouille
    • 1
  • Alex Wilson
    • 2
  • Channa K. Jayawickreme
    • 3
  • Steven Jupe
    • 4
  • Simon Topp
    • 5
  1. 1.School of ComputingThe Robert Gordon UniversityAberdeenUK
  2. 2.School of Computing, Division of Mathematics and StatisticsThe Robert Gordon UniversityAberdeenUK
  3. 3.Discovery Research BiologyDurhamUSA
  4. 4.Department of BioinformaticsStevenageUK
  5. 5.Department of BioinformaticsHarlowUK

Personalised recommendations