Abstract
It is of significance for splice site prediction to develop novel algorithms that combine the sequence patterns of regulatory elements such as enhancers and silencers with the patterns of splicing signals. In this paper, a statistical model of splicing signals was built based on the entropy density profile (EDP) method, weight array method (WAM) and κ test; moreover, the model of splicing regulatory elements was developed by an unsupervised self-learning method to detect motifs associated with regulatory elements. With two models incorporated, a multi-level support vector machine (SVM) system was devised to perform ab initio prediction for splice sites originating from DNA sequence in eukaryotic genome. Results of large scale tests on human genomic splice sites show that the new method achieves a comparative high performance in splice site prediction. The method is demonstrated to be with at least the same level of performance and usually better performance than the existing SpliceScan method based on modeling regulatory elements, and shown to have higher accuracies than the traditional methods with modeling splicing signals such as the GeneSplicer. In particular, the method has evident advantage over splice site prediction for the genes with lower GC content.
Similar content being viewed by others
References
Burge C, Karlin S. Predictions of complete gene structures in human genomic DNA. J Mol Biol, 1997, 268: 78–94
Churbanov A, Rogozin I B, Deogun J S, et al. Method of predicting splice sites based on signal interactions. Biology Direct, 2006, 1: 10
Staden R. The current status and portability of our sequence handling software. Nucleic Acids Res, 1986, 14: 217–231
Zhang M Q, Marr T G. A weight array method for splicing signal analysis. Comp Appl Biol Sci, 1993, 9: 499–509
Pertea M, Lin X, Salzberg S L. GeneSplicer: A new computational method for splice site prediction. Nucleic Acids Res, 2001, 29: 1185–1190
Chen T M, Lu C C, Li W H. Prediction of splice sites with dependency graphs and their expanded Bayesian networks. Bioinformatics, 2005, 21: 471–482
Reese M G, Eeckman F H, Kulp D, et al. Improved splice site detection in Genie. J Compt Biol, 1997, 4: 311–324
Degroeve S, Saeys Y, De Baets B, et al. SpliceMachine: Predicting splice sites from high-dimensional local context representations. Bioinformatics, 2005, 21: 1332–1338
Rogozin I B, Milanesi L. Analysis of donor splice sites in different eukaryotic organisms. J Mol Evol, 1997, 45: 50–59
Sonnenburg S, Schweikert G, Philips P, et al. Accurate splice site prediction using support vector machines. BMC Bioinformatics, 2007, 8(Suppl 10): S7
Yeo G, Hoon S, Venkatesh B, et al. Variation in sequence and organization of splicing regulatory elements in vertebrate genes. Proc Natl Acad Sci USA, 2004, 101(44): 15700–15705
Burst M, Guigó R. Evaluation of gene structure prediction programs. Genomics, 1996, 34(3): 353–367
Saxonov S, Daizadeh I, Fedorov A et al. EID: The Exon-Intron Database-an exhaustive database of protein-coding intron-containing genes. Nucleic Acids Res, 2000, 28(1): 185–190
Ouyang Z Q, Zhu H Q, Wang J, et al. Multivariate entropy distance method for prokaryotic gene identification. J Bioinform Comput Biol, 2004, 2: 353–373
Zhu H Q, Hu G Q, Yang Y F, et al. MED: A new non-supervised gene prediction algorithm for bacterial and archaeal genomes. BMC Bioinformatics, 2007, 8: 97
Delcher A L, Bratke K A, Powers E C, et al. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007, 23(6): 673–679
Silverman B D, Linsker R. A measure of DNA periodicity. J Theor Biol, 1986, 118(3): 295–300
Bauer F L. Decrypted secrets: Methods and maxims of cryptology. Berlin: Springer-Verlag, 1997
Fairbrother W G, Yeh R F, Sharp P A, et al. Predictive identification of exonic splicing enhancers in human genes. Science, 2002, 297: 1007–1013
Chang C C, Lin C J. LIBSVM: A library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Haiminen N, Mannila H, Terzi E. Comparing segmentations by applying randomization techniques. BMC Bioinformatics, 2007, 8: 171
Cohen N, Dagan T, Stone L, et al. GC Composition of the human genome: In search of isochores. Mol Biol Evol, 2005, 22(5): 1260–1272
Xia H, Bi J, Li Y. Identification of alternative 5′/3′ splice sites based on the mechanism of splice site competition. Nucleic Acids Res, 2006, 34(21): 6305–6313
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the State Basic Research Program of China (Grant No. 2003CB715905), National Nature Science Foundation of China (Grant Nos. 30300071, 30770499 and 10721403) and Youth Foundation of College of Engineering of Peking University
About this article
Cite this article
Sun, Z., Sang, L., Ju, L. et al. A new method for splice site prediction based on the sequence patterns of splicing signals and regulatory elements. Chin. Sci. Bull. 53, 3331–3340 (2008). https://doi.org/10.1007/s11434-008-0448-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11434-008-0448-5