Inferring an organism-specific optimal threshold for predicting protein coding regions in eukaryotes based on a bootstrapping algorithm

Xu, Shanglei; Rao, Nini; Chen, Xi; Zhou, Bo

doi:10.1007/s10529-011-0525-8

Inferring an organism-specific optimal threshold for predicting protein coding regions in eukaryotes based on a bootstrapping algorithm

Original Research Paper
Published: 14 January 2011

Volume 33, pages 889–896, (2011)
Cite this article

Biotechnology Letters Aims and scope Submit manuscript

Shanglei Xu¹,
Nini Rao¹,
Xi Chen¹ &
…
Bo Zhou¹

108 Accesses
4 Citations
Explore all metrics

Abstract

The accuracy of prediction methods based on power spectrum analysis depends on the threshold that is used to discriminate between protein coding and non-coding sequences in the genomes of eukaryotes. Because the structure of genes vary among different eukaryotes, it is difficult to determine the best prediction threshold for a eukaryote relying only on prior biological knowledge. To improve the accuracy of prediction methods based on power spectral analysis, we developed a novel method based on a bootstrap algorithm to infer organism-specific optimal thresholds for eukaryotes. As prior information, our method requires the input of only a few annotated protein coding regions from the organism being studied. Our results show that using the calculated optimal thresholds for our test datasets, the average prediction accuracy of our method is 81%, an increase of 19% over that obtained using the same empirical threshold P = 4 for all datasets. The proposed method is simple and convenient and easily applied to infer optimal thresholds that can be used to predict coding regions in the genomes of most organisms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analyses of whole-genome protein sequences from multiple organisms

Article Open access 01 May 2018

Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life

Article Open access 06 February 2023

Computational Identification of Essential Genes in Prokaryotes and Eukaryotes

References

Birney E, Clamp M, Durbin R (2004) GeneWise and Genomewise. Genome Res 14:988–995
Article PubMed CAS Google Scholar
Efron B, Tibshirani RJ (1994) An introduction to the Bootstrap. Chapman and Hall, London, pp 45–53
Google Scholar
Fickett JW (1982) Recognition of protein coding regions in DNA sequences. Nucleic Acids Res 10:5303–5318
Article PubMed CAS Google Scholar
Howe KL, Chothia T, Durbin R (2002) GAZE: A generic framework for the integration of gene-prediction data by dynamic programming. Genome Res 12:1418–1427
Article PubMed CAS Google Scholar
Korf I, Flicek P, Duan D, Brent MR (2001) Integrating genomic homology into gene structure prediction. Bioinformatics 17(Suppl 1):S140–S148
PubMed Google Scholar
Kotlar D, Lavner Y (2003) Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. Genome Res 13:1930–1937
PubMed CAS Google Scholar
Lukashin AV, Borodovsky M (1998) GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res 26:1107–1115
Article PubMed CAS Google Scholar
Mott R (1997) EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA. Comput Appl Biosci 13:477–478
PubMed CAS Google Scholar
Rao N, Lei X, Guo J, Huang H, Ren Z (2009) An efficient sliding window strategy for accurate location of eukaryotic protein coding regions. Comput Biol Med 39:392–395
Article PubMed CAS Google Scholar
Tiwari S, Ramachandran S, Bhattacharya A, Bhattacharya S, Ramaswamy R (1997) Prediction of probable genes by Fourier analysis of genomic sequences. Comput Appl Biosci 13:263–270
PubMed CAS Google Scholar
Tsonis AA, Elsner JB, Tsonis PA (1991) Periodicity in DNA coding sequences: implications in gene evolution. J Theor Biol 151:323–331
Article PubMed CAS Google Scholar
Uberbacher EC, Mural RJ (1991) Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc Natl Acad Sci USA 88:11261–11265
Article PubMed CAS Google Scholar
Voss RF (1992) Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys Rev Lett 68:3805–3808
Article PubMed CAS Google Scholar
Zhang MQ (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis. Proc Natl Acad Sci USA 94:565–568
Article PubMed CAS Google Scholar
Zhu H, Hu GQ, Yang YF, Wang J, She ZS (2007) MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes. BMC Bioinformatics 8:97
Article PubMed Google Scholar
Zoubir AM, Iskander DR (2004) Bootstrap techniques for signal processing. Cambridge University Press, Cambridge, pp 1–15
Google Scholar

Download references

Acknowledgment

This work is supported by National Natural Science Foundation in China (Grant No. 60571047).

Author information

Authors and Affiliations

School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, People’s Republic of China
Shanglei Xu, Nini Rao, Xi Chen & Bo Zhou

Authors

Shanglei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Nini Rao
View author publications
You can also search for this author in PubMed Google Scholar
Xi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Bo Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nini Rao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, S., Rao, N., Chen, X. et al. Inferring an organism-specific optimal threshold for predicting protein coding regions in eukaryotes based on a bootstrapping algorithm. Biotechnol Lett 33, 889–896 (2011). https://doi.org/10.1007/s10529-011-0525-8

Download citation

Received: 29 October 2010
Accepted: 06 January 2011
Published: 14 January 2011
Issue Date: May 2011
DOI: https://doi.org/10.1007/s10529-011-0525-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inferring an organism-specific optimal threshold for predicting protein coding regions in eukaryotes based on a bootstrapping algorithm

Abstract

Access this article

Similar content being viewed by others

Comparative analyses of whole-genome protein sequences from multiple organisms

Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life

Computational Identification of Essential Genes in Prokaryotes and Eukaryotes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inferring an organism-specific optimal threshold for predicting protein coding regions in eukaryotes based on a bootstrapping algorithm

Abstract

Access this article

Similar content being viewed by others

Comparative analyses of whole-genome protein sequences from multiple organisms

Machine learning classifiers predict key genomic and evolutionary traits across the kingdoms of life

Computational Identification of Essential Genes in Prokaryotes and Eukaryotes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation