Determination of window size and identification of suitable method for prediction of donor splice sites in rice (Oryza sativa) genome

Meher, Prabina Kumar; Sahu, Tanmaya Kumar; Rao, A. R.; Wahi, S. D.

doi:10.1007/s13562-014-0286-2

Determination of window size and identification of suitable method for prediction of donor splice sites in rice (Oryza sativa) genome

Original Article
Published: 18 November 2014

Volume 24, pages 385–392, (2015)
Cite this article

Journal of Plant Biochemistry and Biotechnology Aims and scope Submit manuscript

Prabina Kumar Meher¹,
Tanmaya Kumar Sahu²,
A. R. Rao² &
…
S. D. Wahi¹

124 Accesses
2 Citations
Explore all metrics

Abstract

Accurate prediction of the gene structure depends upon the accurate prediction of splice sites. The conserved feature in splicing junction has been successfully used for the prediction of eukaryotic splice sites. In eukaryotes, though the di-nucleotide GT is conserved at 5′ splice sites, the pattern surrounding the conserved di-nucleotide varies from species to species. Most of the work related to splice site analysis has been extensively done in Homo sapiens and Arabidopsis thaliana. However, such works are yet to be fully explored in Oryza sativa and other species of grass family. In this study, statistical techniques have been applied to discriminate the real splice sites from pseudo splice sites in rice, maize and barley genomes and based on this a suitable window size is determined for the prediction of donor splice sites. Depending upon the determined window size, appropriate methods for predicting donor splice sites in rice have been considered and compared in terms of prediction accuracy. The results revealed that a window size of 9 base pair (3 bp at the exon end and 6 bp at the intron start including the conserved di-nucleotide GT at the beginning of intron) is an effective window size in all the three species of grass family for the prediction of donor splice sites. Further, the Maximum Entropy Model based method is found as best among the short sequence based prediction methods for donor splice sites with the 9 base pair window size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes

Article Open access 29 December 2018

A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data

Article Open access 25 November 2014

Calculating the most likely intron splicing orders in S. pombe, fruit fly, Arabidopsis thaliana, and humans

Article Open access 24 October 2020

Abbreviations

MLAs:: Machine Learning Approaches
MEM:: Maximum Entropy Modeling
MDD:: Maximal Dependency Decomposition
MM1:: Markov Model of 1st order
WMM:: Weighted Matrix Method

References

Burge C, Karlin S (1997) Prediction of complete gene structure in human genomic DNA. J Comput Biol 268(1):78–94
CAS Google Scholar
Cramér H (1946) Mathematical methods of statistics. Princeton University Press, Princeton, p 282. ISBN 0-691-08004-6
Google Scholar
De Bona F, Ossowski S, Schneeberger K, Rätsch G (2008) Optimal splice alignments of short sequence reads. Bioinformatics 24:174–180
Article Google Scholar
Degroeve S, De Baets B, Van de Peer Y, Rouz P (2002) Feature subset selection for splice site prediction. Bioinformatics 18:S75–S83
Article PubMed Google Scholar
Hebsgaard SM, Korning PG, Tolstrup N, Engelbrecht J, Rouze P, Brunak S (1996) Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucleic Acids Res 24:3439–3452
Article PubMed Central CAS PubMed Google Scholar
Ho LS, Rajapakse JC (2003) Splice site detection with a higher-order Markov model implemented on a neural network. Genome Inf 14:64–72
CAS Google Scholar
Huang J, Li T, Chen K, Wu J (2006) An approach of encoding for prediction of splice sites using SVM. Biochemie 88:923–929
Article CAS Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article Google Scholar
Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405:442–451
Article CAS PubMed Google Scholar
Pertea M, Lin X, Salzberg SL (2001) GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res 29(5):1185–1190
Article PubMed Central CAS PubMed Google Scholar
Sheth N, Roca X, Hastings ML, Roeder T, Krainer AR, Sachidanandam R (2006) Comprehensive splice site analysis using comparative genomics. Nucleic Acids Res 34:3955–3967
Article PubMed Central CAS PubMed Google Scholar
Sonnenburg S, Schweikert G, Philips P, Behr J, Rätsch G (2007) Accurate splice site prediction using support vector machines. BMC Bioinforma 8(Suppl 10):S7
Article Google Scholar
Staden R (1984) Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res 12:505–519
Article PubMed Central CAS PubMed Google Scholar
Sun YF, Fan XD, Li YD (2003) Identifying splicing sites in eukaryotic RNA: support vector machine approach. Comput Biol Med 33:17–29
Article CAS PubMed Google Scholar
Wu TD, Nacu S (2010) Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(7):873–881
Article PubMed Central CAS PubMed Google Scholar
Yeo G, Burge CB (2004) Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol 11(2–3):377–394
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This study is a part of Ph. D. thesis of P. K. Meher, PG School, IARI, New Delhi. Authors acknowledge the INSPIRE fellowship of Department of Science and Technology, New Delhi and IARI Fellowship. The authors also acknowledge the computational facilities of SCGL, developed under NAIP grant NAIP/Comp-4/C4/C-30033/2008-09.

Author information

Authors and Affiliations

Division of Statistical Genetics, Indian Agricultural Statistics Research Institute, New Delhi, 110012, India
Prabina Kumar Meher & S. D. Wahi
Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, New Delhi, 110012, India
Tanmaya Kumar Sahu & A. R. Rao

Authors

Prabina Kumar Meher
View author publications
You can also search for this author in PubMed Google Scholar
Tanmaya Kumar Sahu
View author publications
You can also search for this author in PubMed Google Scholar
A. R. Rao
View author publications
You can also search for this author in PubMed Google Scholar
S. D. Wahi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. R. Rao.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Fig. 1

Confusion Matrix. TP is the number of TSS being predicted as TSS, TN is the number of FSS being predicted as FSS, FN is the number of TSS being incorrectly predicted as FSS and FP is the number of FSS being incorrectly predicted as TSS. (TIFF 252 kb)

Supplementary Fig. 2

Bar diagram of calculated value of Pearson chi-square obtained from the sequence data of TSS and FSS for the three species. X-axis represents positions of the motif and the height of each bar corresponds to the value of chi-square of each positions. (TIFF 748 kb)

Supplementary Fig. 3

Graphical representation of the Kull-back Leibler Divergence for different positions of the splice site motifs. The height of each bar represents the distance between the true and false splice site for the corresponding position. (TIFF 187 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Meher, P.K., Sahu, T.K., Rao, A.R. et al. Determination of window size and identification of suitable method for prediction of donor splice sites in rice (Oryza sativa) genome. J. Plant Biochem. Biotechnol. 24, 385–392 (2015). https://doi.org/10.1007/s13562-014-0286-2

Download citation

Received: 24 February 2014
Accepted: 28 July 2014
Published: 18 November 2014
Issue Date: October 2015
DOI: https://doi.org/10.1007/s13562-014-0286-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Determination of window size and identification of suitable method for prediction of donor splice sites in rice (Oryza sativa) genome

Abstract

Access this article

Similar content being viewed by others

Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes

A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data

Calculating the most likely intron splicing orders in S. pombe, fruit fly, Arabidopsis thaliana, and humans

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary Fig. 1

Supplementary Fig. 2

Supplementary Fig. 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Determination of window size and identification of suitable method for prediction of donor splice sites in rice (Oryza sativa) genome

Abstract

Access this article

Similar content being viewed by others

Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes

A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data

Calculating the most likely intron splicing orders in S. pombe, fruit fly, Arabidopsis thaliana, and humans

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary Fig. 1

Supplementary Fig. 2

Supplementary Fig. 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation