An ILP Refinement Operator for Biological Grammar Learning

  • Daniel C. Fredouille
  • Christopher H. Bryant
  • Channa K. Jayawickreme
  • Steven Jupe
  • Simon Topp
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4455)

Abstract

We are interested in using Inductive Logic Programming (ILP) to infer grammars representing sets of biological sequences. We call these biological grammars. ILP systems are well suited to this task in the sense that biological grammars have been represented as logic programs using the Definite Clause Grammar or the String Variable Grammar formalisms. However, the speed at which ILP systems can generate biological grammars has been shown to be a bottleneck. This paper presents a novel refinement operator implementation, specialised to infer biological grammars with ILP techniques. This implementation is shown to significantly speed-up inference times compared to the use of the classical refinement operator: time gains larger than 5-fold were observed in \(\frac{4}{5}\) of the experiments, and the maximum observed gain is over 300-fold.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [BCD+04]
    Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L.L., Studholme, D.J., Yeats, C., Eddy, S.R.: The Pfam protein families database. Nucleic Acids Research 32, 138–141 (2004)CrossRefGoogle Scholar
  2. [BF05]
    Bryant, C.H., Fredouille, D.: A parser for the efficient induction of biological grammars. In: Kramer, S., Pfahringer, B. (eds.) 15th International Conference on ILP: late-breaking paper track. University of Bonn (2005), http://wwwbib.informatik.tu-muenchen.de/infberichte/2005/TUM-I0510.idx
  3. [BFW+06]
    Bryant, C.H., Fredouille, D., Wilson, A., Jayawickreme, C.K., Jupe, S., Topp, S.: Pertinent background knowledge for learning protein grammars. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 54–65. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. [CP99]
    Cussens, J., Pulman, S.: Experiments in inductive chart parsing. In: Cussens, J. (ed.) LLL 1999, pp. 72–83, Bled, Slovenia (June 1999)Google Scholar
  5. [DLO97]
    Dsouza, M., Larsen, N., Overbeek, R.: Searching for patterns in genomic data. Trends in Genetics 13(12), 497–498 (1997)CrossRefGoogle Scholar
  6. [FPB+02]
    Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.J., Hofmann, K., Bairoch, A.: Protein data bank. Nucleic Acid Research 30, 235–238 (2002)CrossRefGoogle Scholar
  7. [LD94]
    Lavrač, N., Džeroski, S.: Inductive Logic Programming: Techniques and Applications. Ellis Hortwood, New York (1994)MATHGoogle Scholar
  8. [LMR01]
    Leung, S.-W., Mellish, C., Robertson, D.: Basic Gene Grammars and DNA-ChartParser for language processing of Escherichia coli promoter DNA sequences. Bioinformatics 17(3), 226–236 (2001)CrossRefGoogle Scholar
  9. [MBS+01]
    Muggleton, S.H., Bryant, C.H., Srinivasan, A., Whittaker, A., Topp, S., Rawlings, C.: Are grammatical representations useful for learning from biological sequence data? – a case study. Journal of Computational Biology 5(8), 493–522 (2001)CrossRefGoogle Scholar
  10. [Mug95]
    Muggleton, S.H.: Inverse entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming 13(3-4), 245–286 (1995)Google Scholar
  11. [Mug97]
    Muggleton, S.H.: Learning from positive data. In: Inductive Logic Programming. LNCS, vol. 1314, pp. 358–376. Springer, Heidelberg (1997)Google Scholar
  12. [PC01]
    Pulman, S., Cussens, J.: Grammar learning using inductive logic programming. Oxford University Working Papers in Linguistics, Philology and Phonetics 6, 31–45 (2001)Google Scholar
  13. [PPL02]
    Pierce, K.L., Premont, R.T., Lefkowitz, R.J.: Seven-transmembrane receptors. Nat. Rev. Mol. Cell. Biol. 3(9,6), 39–50 (2002)Google Scholar
  14. [PW80]
    Pereira, F., Warren, D.H.D.: Definite clause grammars for language analysis – a survey of the formalism and a comparison with augmented transition networks. Artificial Intelligence 13(3), 231–278 (1980)MATHCrossRefMathSciNetGoogle Scholar
  15. [SBH+94]
    Sakakibara, Y., Brown, M., Hughey, R., Saira Mian, I.: Stochastic context-free grammars for tRNA modeling. Nucleic Acids Research 22, 5112–5120 (1994)CrossRefGoogle Scholar
  16. [Sea93]
    Searls, D.B.: String variable grammar: A logic grammar formalism for the biological language of DNA. Journal of logic Programming 12 (1993)Google Scholar
  17. [Sri93]
    Srinivasan, A.: A Learning Engine for Proposing Hypotheses (Aleph) (1993), http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph
  18. [Tau94]
    Tausend, B.: Representing biases for inductive logic programming. In: Bergadano, F., De Raedt, L. (eds.) Machine Learning: ECML-94. LNCS, vol. 784, pp. 427–430. Springer, Heidelberg (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Daniel C. Fredouille
    • 1
  • Christopher H. Bryant
    • 1
  • Channa K. Jayawickreme
    • 2
  • Steven Jupe
    • 3
  • Simon Topp
    • 4
  1. 1.School of Computing, The Robert Gordon University, AberdeenUK
  2. 2.Discovery Research Biology, GlaxoSmithKline, DurhamUSA
  3. 3.Department of Bioinformatics, GlaxoSmithKline, StevenageUK
  4. 4.Department of Bioinformatics, GlaxoSmithKline, HarlowUK

Personalised recommendations