A hybrid strategy for comprehensive annotation of the protein coding genes in prokaryotic genome

Yu, Jia-Feng; Guo, Jing; Liu, Qing-Bin; Hou, Yue; Xiao, Ke; Chen, Qing-Li; Wang, Ji-Hua; Sun, Xiao

doi:10.1007/s13258-014-0263-0

A hybrid strategy for comprehensive annotation of the protein coding genes in prokaryotic genome

Research Article
Published: 08 January 2015

Volume 37, pages 347–355, (2015)
Cite this article

Genes & Genomics Aims and scope Submit manuscript

Jia-Feng Yu^1,2,3,
Jing Guo⁴,
Qing-Bin Liu^1,5,
Yue Hou²,
Ke Xiao²,
Qing-Li Chen^1,5,
Ji-Hua Wang^1,3 &
…
Xiao Sun²

191 Accesses
2 Citations
Explore all metrics

Abstract

Protein coding gene annotation errors in prokaryotic genomes are accumulating continually in bioinformatics databases, while the update rate of genome annotation can not keep up with the explosive increasing genome sequences in most cases. Hence it is critical to manually rectify the genome annotation errors. In this paper, a hybrid strategy by combing the gene ab initio predicting programs and the over annotated gene re-annotation programs is proposed for re-annotation of the protein coding genes in prokaryotic genomes. Based on this strategy, the protein coding genes in Geobacter sulfurreducens PCA is comprehensively re-annotated. As a consequence, 16 hypothetical genes are annotated as non-coding sequences and 104 missing genes are retrieved as protein coding genes. Subsequent function analysis and sequences analysis show that the predicting results are much reliable and robust. Further application to other genomes show that this work can provide alternative tools for later post-process of prokaryotic genome annotations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Re-annotation of Annotated Eukaryotic Genomes

Neem Genome Annotation

Proteogenomic Methods to Improve Genome Annotation

References

Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M et al (2008) The RAST server: rapid annotations using subsystems technology. BMC Genom 9:75
Article Google Scholar
Bakke P, Carney N, Deloache W, Gearing M, Ingvorsen K, Lotz M, McNair J, Penumetcha P, Simpson S, Voss L et al (2009) Evaluation of three automated genome annotations for halorhabdus utahensis. PLoS ONE 4:e6291
Article PubMed Central PubMed Google Scholar
Besemer J, Lomsadze A, Borodovsky M (2001) GeneMarks: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29:2607–2618
Article PubMed Central CAS PubMed Google Scholar
Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF et al (1997) The complete genome sequence of Escherichia coli K-12. Science 277:1453–1462
Article CAS PubMed Google Scholar
Brenner SE (1999) Errors in genome annotation. Trends Genet 15:132–133
Article CAS PubMed Google Scholar
Burset M, Guigo R (1996) Evaluation of gene structure prediction programs. Genomics 34:353–367
Article CAS PubMed Google Scholar
Chen LL, Ma BG, Gao N (2008) Reannotation of hypothetical ORFs in plant pathogen Erwinia carotovora subsp. atroseptica SCRI1043. FEBS J 275:198–206
Article CAS PubMed Google Scholar
Chou KC, Zhang CT (1995) Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349
Article CAS PubMed Google Scholar
Delcher AL, Bratke KA, Powers EC, Salzberg SL (2007) Identifying bacterial genes and endosymbiont DNA with glimmer. Bioinformatics 23:673–679
Article PubMed Central CAS PubMed Google Scholar
Devos D, Valencia A (2001) Intrinsic errors in genome annotation. Trends Genet 17:429–431
Article CAS PubMed Google Scholar
Gao F, Zhang CT (2004) Comparison of various algorithms for recognizing short coding sequences of human genes. Bioinformatics 20:673–681
Article CAS PubMed Google Scholar
Gao N, Chen LL, Ji HF, Wang W, Chang JW, Gao B, Zhang L, Zhang SC, Zhang HY (2010) DIGAP—a database of improved gene annotation for phytopathogens. BMC Genom 11:54
Article Google Scholar
Guo FB, Xiong L, Teng JL, Yuen KY, Lau SK, Woo PC (2013) Re-annotation of protein-coding genes in 10 complete genomes of Neisseriaceae family by combining similarity—based and composition—based methods. DNA Res 20:273–286
Article PubMed Central CAS PubMed Google Scholar
Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform 11:19
Article Google Scholar
Kisand V, Lettieri T (2013) Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools. BMC Genom 14:211
Article CAS Google Scholar
Krause L, McHardy AC, Nattkemper TW, Pühler A, Stoye J, Meyer F (2007) GISMO-gene identification using a support vector machine for ORF classification. Nucleic Acids Res 35:540–549
Article PubMed Central CAS PubMed Google Scholar
Kyrpides NC (2009) Fifteen years of microbial genomics: meeting the challenges and fulfilling the dream. Nat Biotechnol 27:627–632
Article CAS PubMed Google Scholar
Li M, Wang J, Chen X, Wang H, Pan Y (2011) A local average connectivity-based method for identifying essential proteins from the network level. Comput Biol Chem 35:143–150
Article PubMed Google Scholar
Liao B, Xiong Q, Li D (2012) Incorporating secondary features into the general form of Chou’s PseAAC for predicting protein structural class. Protein Peptide Lett 19:1133–1138
Article CAS Google Scholar
Liolios K, Chen IM, Mavromatis K, Tavernarakis N, Hugenholtz P, Markowitz VM, Kyrpides NC (2010) The genomes on line database (gold) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 38:D346–D354
Article PubMed Central CAS PubMed Google Scholar
Luo CW, Hu GQ, Zhu HQ (2009) Genome reannotation of Escherichia coli CFT073 with new insights into virulence. BMC Genom 10:552
Article Google Scholar
Methé BA, Nelson KE, Eisen JA, Paulsen IT, Nelson W, Heidelberg JF, Wu D, Wu M, Ward N, Beanan MJ et al (2003) Genome of Geobacter sulfurreducens: metal reduction in subsurface environments. Science 302:1967–1969
Article PubMed Google Scholar
Nagy A, Hegyi H, Farkas K, Tordai H, Kozma E, Bányai L, Patthy L (2008) Identification and correction of abnormal, incomplete and mispredicted proteins in public databases. BMC Bioinform 9:353
Article Google Scholar
Pallejà A, Harrington ED, Bork P (2008) Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions? BMC Genom 9:335
Article Google Scholar
Pati A, Ivanova NN, Mikhailova N, Ovchinnikova G, Hooper SD, Lykidis A, Kyrpides NC (2010) GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes. Nat Methods 7:455–457
Article CAS PubMed Google Scholar
Petty NK (2010) Genome annotation: man versus machine. Nat Rev Microbiol 8:762
Article CAS PubMed Google Scholar
Poptsova MS, Gogarten JP (2010) Using comparative genome analysis to identify problems in annotated microbial genomes. Microbiol-SGM 156:1909–1917
Article CAS Google Scholar
Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35:D61–D65
Article PubMed Central CAS PubMed Google Scholar
Qiu Y, Cho BK, Park YS, Lovley D, Palsson BØ, Zengler K (2010) Structural and operational complexity of the Geobacter sulfurreducens genome. Genome Res 20:1304–1311
Article PubMed Central CAS PubMed Google Scholar
Reed JL, Famili I, Thiele I, Palsson BO (2006) Towards multidimensional genome annotation. Nat Rev Genet 7:130–141
Article CAS PubMed Google Scholar
Reeves GA, Talavera D, Thornton JM (2009) Genome and proteome annotation: organization, interpretation and integration. J R Soc Interface 6:129–147
Article PubMed Central CAS PubMed Google Scholar
Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 29:22–28
Article PubMed Central CAS PubMed Google Scholar
Ussery DW, Hallin PF (2004) Genome update: annotation quality in sequenced microbial genomes. Microbil-SGM 150:2015–2017
Article CAS Google Scholar
Wang Q, Lei Y, Xu X, Wang G, Chen LL (2013) Theoretical prediction and experimental verification of protein-coding genes in plant pathogen genome Agrobacterium tumefaciens strain C58. PLoS ONE 7:e43176
Article Google Scholar
Warren AS, Archuleta J, Feng WC, Setubal JC (2010) Missing genes in the annotation of prokaryotic genomes. BMC Bioinform 11:131
Article Google Scholar
Yu JF, Sun X (2010) Reannotation of protein-coding genes based on an improved graphical representation of DNA sequence. J Comput Chem 31:2126–2135
Article CAS PubMed Google Scholar
Yu JF, Sun X, Wang JH (2009) TN curve: a novel 3D graphical representation of DNA sequence based on trinucleotides and its applications. J Theor Biol 261:459–468
Article CAS PubMed Google Scholar
Yu JF, Xiao K, Jiang DK, Guo J, Wang JH, Sun X (2011) An integrative method for identifying the over-annotated protein-coding genes in microbial genomes. DNA Res 18:435–449
Article PubMed Central CAS PubMed Google Scholar
Yu JF, Jiang DK, Xiao K, Jin Y, Wang JH, Sun X (2012) Discriminate the falsely predicted protein-coding genes in Aeropyrum Pernix K1 genome based on graphical representation. MATCH Commun Math Comput Chem 67:845–866
CAS Google Scholar
Yu JF, Guo ZZ, Sun X, Wang JH (2014) A review of the computational methods for identifying the over-annotated genes and missing genes in microbial genomes. Current Bioinform 9:147–154
Article CAS Google Scholar

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China (Projects No. 61302186 and No. 61271378), Shandong Natural Science Foundation (Project No. ZR2010CQ041) and the funding from the State Key Laboratory of Bioelectronics of Southeast University.

Conflict of interest

The authors declare that there is no conflict of interest.

Author information

Authors and Affiliations

Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou, 253023, People’s Republic of China
Jia-Feng Yu, Qing-Bin Liu, Qing-Li Chen & Ji-Hua Wang
State Key Laboratory of Bioelectronics, Southeast University, Nanjing, 210096, People’s Republic of China
Jia-Feng Yu, Yue Hou, Ke Xiao & Xiao Sun
College of Physics and Electronic Information, Dezhou University, Dezhou, 253023, People’s Republic of China
Jia-Feng Yu & Ji-Hua Wang
School of Computer Engineering, Nanyang Technological University, Singapore, 639798, Singapore
Jing Guo
College of Life Science, Shandong Normal University, Jinan, 250014, People’s Republic of China
Qing-Bin Liu & Qing-Li Chen

Authors

Jia-Feng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jing Guo
View author publications
You can also search for this author in PubMed Google Scholar
Qing-Bin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yue Hou
View author publications
You can also search for this author in PubMed Google Scholar
Ke Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Qing-Li Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ji-Hua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jia-Feng Yu or Xiao Sun.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 169 kb)

Supplementary material 1 (DOCX 29 kb)

Supplementary material 1 (DOCX 15 kb)

Supplementary material 1 (FASTA 67 kb)

Supplementary material 1 (FASTA 130 kb)

Supplementary material 1 (FASTA 9 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, JF., Guo, J., Liu, QB. et al. A hybrid strategy for comprehensive annotation of the protein coding genes in prokaryotic genome. Genes Genom 37, 347–355 (2015). https://doi.org/10.1007/s13258-014-0263-0

Download citation

Received: 20 July 2014
Accepted: 23 December 2014
Published: 08 January 2015
Issue Date: April 2015
DOI: https://doi.org/10.1007/s13258-014-0263-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid strategy for comprehensive annotation of the protein coding genes in prokaryotic genome

Abstract

Access this article

Similar content being viewed by others

Improving Re-annotation of Annotated Eukaryotic Genomes

Neem Genome Annotation

Proteogenomic Methods to Improve Genome Annotation

References

Acknowledgments

Conflict of interest

Author information

Authors and Affiliations

Corresponding authors

Electronic supplementary material

Supplementary material 1 (DOCX 169 kb)

Supplementary material 1 (DOCX 29 kb)

Supplementary material 1 (DOCX 15 kb)

Supplementary material 1 (FASTA 67 kb)

Supplementary material 1 (FASTA 130 kb)

Supplementary material 1 (FASTA 9 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hybrid strategy for comprehensive annotation of the protein coding genes in prokaryotic genome

Abstract

Access this article

Similar content being viewed by others

References

Acknowledgments

Conflict of interest

Author information

Authors and Affiliations

Corresponding authors

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation