Skip to main content

New Methods for Splice Site Recognition

  • Conference paper
  • First Online:
Artificial Neural Networks — ICANN 2002 (ICANN 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2415))

Included in the following conference series:

Abstract

Splice sites are locations in DNA which separate protein-coding regions (exons) from noncoding regions (introns). Accurate splice site detectors thus form important components of computational gene finders. We pose splice site recognition as a classification problem with the classifier learnt from a labeled data set consisting of only local information around the potential splice site. Note that finding the correct position of splice sites without using global information is a rather hard task. We analyze the genomes of the nematode Caenorhabditis elegans and of humans using specially designed support vector kernels. One of the kernels is adapted from our previous work on detecting translation initiation sites in vertebrates and another uses an extension to the well-known Fisher-kernel. We find excellent performance on both data sets.

We thank for valuable discussions with A. Zien, K. Karplus and T. Furey. G.R. would like to thank UC Santa Cruz for warm hospitality. This work was partially funded by DFG under contract JA 379/9-2, JA 379/7-2, MU 987/1-1, and NSF grant CCR-9821087. This work was supported by an award under the Merit Allocation Scheme on the National Facility of the Australian Partnerschip for Advanced Computing.

To our knowledge, on the splice site recognition problem, only the work of [13] explicitly documented the care it exercised in the design of the experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Genome sequence of the Nematode Caenorhabditis elegans. Science, 282:2012–2018, 1998.

    Google Scholar 

  2. P. Baldi, S. Brunak, Y. Chauvin, C.A.F. Andersen, and H. Nielsen. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics, 16(5):412–424, 2000.

    Article  Google Scholar 

  3. C.L. Blake and C.J. Merz. UCI repository of machine learning databases, 1998.

    Google Scholar 

  4. C. Burge and S. Karlin. Prediction of complete gene structures. J. Mol. Biol., 268:78–94, 1997.

    Article  Google Scholar 

  5. A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg. Improved microbial gene identification with GLIMMER. Nucleic Acids Research, 27(23):4636–4641, 1999.

    Article  Google Scholar 

  6. R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological sequence analysis probabilistic models of proteins and nucleic acids. Cambridge University Press, 1998.

    Google Scholar 

  7. D. Cai et al. Modeling splice sites with Bayes networks. Bioinformatics, 16(2): 152–158, 2000.

    Article  Google Scholar 

  8. M.P.S. Brown et al. Knowledge-based analysis by using SVMs. PNAS, 97:262–267, 2000.

    Article  Google Scholar 

  9. T.S. Jaakkola, M. Diekhans, and D. Haussler. J. Comp. Biol., 7:95–114, 2000.

    Article  Google Scholar 

  10. T.S. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In M.S. Kearnsetal., editor, Adv. in Neural Inf. Proc. Systems, volume 11, pages 487–493, 1999.

    Google Scholar 

  11. K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf. An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2):181–201, 2001.

    Article  Google Scholar 

  12. S. Rampone. Recognition of splice junctions on DNA. Bioinformatics, 14(8):676–684, 1998.

    Article  Google Scholar 

  13. M.G. Reese, E H. Eeckman, D. Kulp, and D. Haussler. J. Comp. Biol., 4:311–323, 1997.

    Google Scholar 

  14. S. Salzberg, A.L. Delcher, K.H. Fasman, and J. Henderson. J. Comp. Biol., 5(4):667–680, 1998.

    Google Scholar 

  15. B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.

    Google Scholar 

  16. A.J. Smola and J. MacNicol. Scalable kernel methods. Unpublished Manuscript, 2002.

    Google Scholar 

  17. S. Sonnenburg. Hidden Markov Model for Genome Analysis. Humbold University, 2001. Proj. Rep.

    Google Scholar 

  18. S. Sonnenburg. New methods for splice site recognition. Master’s thesis, 2002. Forthcoming.

    Google Scholar 

  19. K. Tsuda, M. Kawanabe, G. Rätsch, S. Sonnenburg, and K.R. Müller. A new discriminative kernel from probabilistic models. In Adv. in Neural Inf. proc. systems, volume 14, 2002. In press.

    Google Scholar 

  20. V.N. Vapnik. The nature of statistical learning theory. Springer Verlag, New York, 1995.

    MATH  Google Scholar 

  21. Y. Xu and E. Uberbacher. Automated gene identification. J. Comp. Biol., 4:325–338, 1997.

    Article  Google Scholar 

  22. A. Zien, G. Rätsch, S. Mika, B. Schölkopf, T. Lengauer, and K.-R. Müller. Engineering svm kernels that recognize translation initiation sites. Bioinformatics, 16(9):799–807, 2000.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sonnenburg, S., Rätsch, G., Jagota, A., Müller, KR. (2002). New Methods for Splice Site Recognition. In: Dorronsoro, J.R. (eds) Artificial Neural Networks — ICANN 2002. ICANN 2002. Lecture Notes in Computer Science, vol 2415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46084-5_54

Download citation

  • DOI: https://doi.org/10.1007/3-540-46084-5_54

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44074-1

  • Online ISBN: 978-3-540-46084-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics