New Methods for Splice Site Recognition
Splice sites are locations in DNA which separate protein-coding regions (exons) from noncoding regions (introns). Accurate splice site detectors thus form important components of computational gene finders. We pose splice site recognition as a classification problem with the classifier learnt from a labeled data set consisting of only local information around the potential splice site. Note that finding the correct position of splice sites without using global information is a rather hard task. We analyze the genomes of the nematode Caenorhabditis elegans and of humans using specially designed support vector kernels. One of the kernels is adapted from our previous work on detecting translation initiation sites in vertebrates and another uses an extension to the well-known Fisher-kernel. We find excellent performance on both data sets.
KeywordsSplice Site Acceptor Site Nematode Caenorhabditis Elegans Fisher Kernel Biological Prior Knowledge
Unable to display preview. Download preview PDF.
- 1.Genome sequence of the Nematode Caenorhabditis elegans. Science, 282:2012–2018, 1998.Google Scholar
- 3.C.L. Blake and C.J. Merz. UCI repository of machine learning databases, 1998.Google Scholar
- 6.R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological sequence analysis probabilistic models of proteins and nucleic acids. Cambridge University Press, 1998.Google Scholar
- 10.T.S. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In M.S. Kearnsetal., editor, Adv. in Neural Inf. Proc. Systems, volume 11, pages 487–493, 1999.Google Scholar
- 13.M.G. Reese, E H. Eeckman, D. Kulp, and D. Haussler. J. Comp. Biol., 4:311–323, 1997.Google Scholar
- 14.S. Salzberg, A.L. Delcher, K.H. Fasman, and J. Henderson. J. Comp. Biol., 5(4):667–680, 1998.Google Scholar
- 15.B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.Google Scholar
- 16.A.J. Smola and J. MacNicol. Scalable kernel methods. Unpublished Manuscript, 2002.Google Scholar
- 17.S. Sonnenburg. Hidden Markov Model for Genome Analysis. Humbold University, 2001. Proj. Rep.Google Scholar
- 18.S. Sonnenburg. New methods for splice site recognition. Master’s thesis, 2002. Forthcoming.Google Scholar
- 19.K. Tsuda, M. Kawanabe, G. Rätsch, S. Sonnenburg, and K.R. Müller. A new discriminative kernel from probabilistic models. In Adv. in Neural Inf. proc. systems, volume 14, 2002. In press.Google Scholar