Skip to main content
Log in

A new method for splice site prediction based on the sequence patterns of splicing signals and regulatory elements

  • Articles/Bioinformaties
  • Published:
Chinese Science Bulletin

Abstract

It is of significance for splice site prediction to develop novel algorithms that combine the sequence patterns of regulatory elements such as enhancers and silencers with the patterns of splicing signals. In this paper, a statistical model of splicing signals was built based on the entropy density profile (EDP) method, weight array method (WAM) and κ test; moreover, the model of splicing regulatory elements was developed by an unsupervised self-learning method to detect motifs associated with regulatory elements. With two models incorporated, a multi-level support vector machine (SVM) system was devised to perform ab initio prediction for splice sites originating from DNA sequence in eukaryotic genome. Results of large scale tests on human genomic splice sites show that the new method achieves a comparative high performance in splice site prediction. The method is demonstrated to be with at least the same level of performance and usually better performance than the existing SpliceScan method based on modeling regulatory elements, and shown to have higher accuracies than the traditional methods with modeling splicing signals such as the GeneSplicer. In particular, the method has evident advantage over splice site prediction for the genes with lower GC content.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Burge C, Karlin S. Predictions of complete gene structures in human genomic DNA. J Mol Biol, 1997, 268: 78–94

    Article  PubMed  CAS  Google Scholar 

  2. Churbanov A, Rogozin I B, Deogun J S, et al. Method of predicting splice sites based on signal interactions. Biology Direct, 2006, 1: 10

    Article  PubMed  CAS  Google Scholar 

  3. Staden R. The current status and portability of our sequence handling software. Nucleic Acids Res, 1986, 14: 217–231

    Article  PubMed  CAS  Google Scholar 

  4. Zhang M Q, Marr T G. A weight array method for splicing signal analysis. Comp Appl Biol Sci, 1993, 9: 499–509

    CAS  Google Scholar 

  5. Pertea M, Lin X, Salzberg S L. GeneSplicer: A new computational method for splice site prediction. Nucleic Acids Res, 2001, 29: 1185–1190

    Article  PubMed  CAS  Google Scholar 

  6. Chen T M, Lu C C, Li W H. Prediction of splice sites with dependency graphs and their expanded Bayesian networks. Bioinformatics, 2005, 21: 471–482

    Article  PubMed  CAS  Google Scholar 

  7. Reese M G, Eeckman F H, Kulp D, et al. Improved splice site detection in Genie. J Compt Biol, 1997, 4: 311–324

    Article  CAS  Google Scholar 

  8. Degroeve S, Saeys Y, De Baets B, et al. SpliceMachine: Predicting splice sites from high-dimensional local context representations. Bioinformatics, 2005, 21: 1332–1338

    Article  PubMed  CAS  Google Scholar 

  9. Rogozin I B, Milanesi L. Analysis of donor splice sites in different eukaryotic organisms. J Mol Evol, 1997, 45: 50–59

    Article  PubMed  CAS  Google Scholar 

  10. Sonnenburg S, Schweikert G, Philips P, et al. Accurate splice site prediction using support vector machines. BMC Bioinformatics, 2007, 8(Suppl 10): S7

    Article  PubMed  CAS  Google Scholar 

  11. Yeo G, Hoon S, Venkatesh B, et al. Variation in sequence and organization of splicing regulatory elements in vertebrate genes. Proc Natl Acad Sci USA, 2004, 101(44): 15700–15705

    Article  PubMed  CAS  Google Scholar 

  12. Burst M, Guigó R. Evaluation of gene structure prediction programs. Genomics, 1996, 34(3): 353–367

    Article  Google Scholar 

  13. Saxonov S, Daizadeh I, Fedorov A et al. EID: The Exon-Intron Database-an exhaustive database of protein-coding intron-containing genes. Nucleic Acids Res, 2000, 28(1): 185–190

    Article  PubMed  CAS  Google Scholar 

  14. Ouyang Z Q, Zhu H Q, Wang J, et al. Multivariate entropy distance method for prokaryotic gene identification. J Bioinform Comput Biol, 2004, 2: 353–373

    Article  PubMed  CAS  Google Scholar 

  15. Zhu H Q, Hu G Q, Yang Y F, et al. MED: A new non-supervised gene prediction algorithm for bacterial and archaeal genomes. BMC Bioinformatics, 2007, 8: 97

    Article  PubMed  CAS  Google Scholar 

  16. Delcher A L, Bratke K A, Powers E C, et al. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007, 23(6): 673–679

    Article  PubMed  CAS  Google Scholar 

  17. Silverman B D, Linsker R. A measure of DNA periodicity. J Theor Biol, 1986, 118(3): 295–300

    Article  PubMed  CAS  Google Scholar 

  18. Bauer F L. Decrypted secrets: Methods and maxims of cryptology. Berlin: Springer-Verlag, 1997

    Google Scholar 

  19. Fairbrother W G, Yeh R F, Sharp P A, et al. Predictive identification of exonic splicing enhancers in human genes. Science, 2002, 297: 1007–1013

    Article  PubMed  CAS  Google Scholar 

  20. Chang C C, Lin C J. LIBSVM: A library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  21. Haiminen N, Mannila H, Terzi E. Comparing segmentations by applying randomization techniques. BMC Bioinformatics, 2007, 8: 171

    Article  PubMed  CAS  Google Scholar 

  22. Cohen N, Dagan T, Stone L, et al. GC Composition of the human genome: In search of isochores. Mol Biol Evol, 2005, 22(5): 1260–1272

    Article  PubMed  CAS  Google Scholar 

  23. Xia H, Bi J, Li Y. Identification of alternative 5′/3′ splice sites based on the mechanism of splice site competition. Nucleic Acids Res, 2006, 34(21): 6305–6313

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to HuaiQiu Zhu.

Additional information

Supported by the State Basic Research Program of China (Grant No. 2003CB715905), National Nature Science Foundation of China (Grant Nos. 30300071, 30770499 and 10721403) and Youth Foundation of College of Engineering of Peking University

About this article

Cite this article

Sun, Z., Sang, L., Ju, L. et al. A new method for splice site prediction based on the sequence patterns of splicing signals and regulatory elements. Chin. Sci. Bull. 53, 3331–3340 (2008). https://doi.org/10.1007/s11434-008-0448-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11434-008-0448-5

Keywords

Navigation