Three Computational Tools for Predicting Bacterial Essential Genes

  • Feng-Biao GuoEmail author
  • Yuan-Nong Ye
  • Lu-Wen Ning
  • Wen Wei
Part of the Methods in Molecular Biology book series (MIMB, volume 1279)


Essential genes are those genes indispensable for the survival of any living cell. Bacterial essential genes constitute the cornerstones of synthetic biology and are often attractive targets in the development of antibiotics and vaccines. Because identification of essential genes with wet-lab ways often means expensive economic costs and tremendous labor, scientists changed to seek for alternative way of computational prediction. Aiming to help to solve this issue, our research group (CEFG: group of Computational, Comparative, Evolutionary and Functional Genomics, has constructed three online services to predict essential genes in bacterial genomes. These freely available tools are applicable for single gene sequences without annotated functions, single genes with definite names, and complete genomes of bacterial strains. To ensure reliable predictions, the investigated species should belong to the same family (for EGP) or phylum (for CEG_Match and Geptop) with one of the reference species, respectively. As the pilot software for the issue, predicting accuracies of them have been assessed and compared with existing algorithms, and note that all of other published algorithms have not any formed online services. We hope these services at CEFG will help scientists and researchers in the field of essential genes.

Key words

Essential genes Predicting bacterial essential genes EGP CEG_Match Geptop 



We thank the book editor for his encouragement and advice. This work was supported by the National Natural Science Foundation of China (grant number 31470068), Sichuan Youth Science and Technology Foundation of China (grant number 2014JQ0051) and the Fundamental Research Funds for the Central Universities of China (grant number ZYGX2013J101).


  1. 1.
    Zhang R, Ou HY, Zhang CT (2004) DEG: a database of essential genes. Nucleic Acids Res 32:D271–D272PubMedCentralPubMedCrossRefGoogle Scholar
  2. 2.
    Mushegian AR, Koonin EV (1996) A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci U S A 93:10268–10273PubMedCentralPubMedCrossRefGoogle Scholar
  3. 3.
    Juhas M, Eberl L, Glass JI (2011) Essence of life: essential genes of minimal genomes. Trends Cell Biol 21:562–568PubMedCrossRefGoogle Scholar
  4. 4.
    Read TD, Gill SR, Tettelin H, Dougherty BA (2001) Finding drug targets in microbial genomes. Drug Discov Today 6:887–892PubMedCrossRefGoogle Scholar
  5. 5.
    Juhas M, Eberl L, Church GM (2012) Essential genes as antimicrobial targets and cornerstones of synthetic biology. Trends Biotechnol 30:601–607PubMedCrossRefGoogle Scholar
  6. 6.
    Koonin EV (2003) Comparative genomics, minimal gene-sets and the last universal common ancestor. Nat Rev Microbiol 1:127–136PubMedCrossRefGoogle Scholar
  7. 7.
    Deng J, Su S, Lin X, Hassett DJ, Lu LJ (2013) A statistical framework for improving genomic annotations of prokaryotic essential genes. PLoS One 8:e58178PubMedCentralPubMedCrossRefGoogle Scholar
  8. 8.
    Seringhaus M, Paccanaro A, Borneman A, Snyder M, Gerstein M (2006) Predicting essential genes in fungal genomes. Genome Res 16:1126–1135PubMedCentralPubMedCrossRefGoogle Scholar
  9. 9.
    Chen WH, Minguez P, Lercher MJ, Bork P (2012) OGEE: an online gene essentiality database. Nucleic Acids Res 40:D901–D906PubMedCentralPubMedCrossRefGoogle Scholar
  10. 10.
    Jeong H, Oltvai ZN, Barabási AL (2003) Prediction of protein essentiality based on genomic data. ComPlexUs 1:19–28CrossRefGoogle Scholar
  11. 11.
    Roberts SB, Mazurie AJ, Buck GA (2007) Integrating genome-scale data for gene essentiality prediction. Chem Biodivers 4:2618–2630PubMedCrossRefGoogle Scholar
  12. 12.
    Chen Y, Xu D (2005) Understanding protein dispensability through machine-learning analysis of high-throughput data. Bioinformatics 21:575–581PubMedCrossRefGoogle Scholar
  13. 13.
    Wang J, Peng W, Wu FX (2013) Computational approaches to predicting essential proteins: a survey. Proteomics Clin Appl 7:181–192PubMedCrossRefGoogle Scholar
  14. 14.
    Singh NK, Selvam SM, Chakravarthy P (2006) T-iDT: tool for identification of drug target in bacteria and validation by Mycobacterium tuberculosis. In Silico Biol 6:485–493PubMedGoogle Scholar
  15. 15.
    Jeong H, Mason SP, Barabasi AL, Oltvai ZN (2001) Lethality and centrality in protein networks. Nature 411:41–42PubMedCrossRefGoogle Scholar
  16. 16.
    Acencio ML, Lemke N (2009) Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinformatics 10:290PubMedCentralPubMedCrossRefGoogle Scholar
  17. 17.
    Plaimas K, Eils R, Konig R (2010) Identifying essential genes in bacterial metabolic networks with machine learning methods. BMC Syst Biol 4:56PubMedCentralPubMedCrossRefGoogle Scholar
  18. 18.
    Deng J, Deng L, Su S, Zhang M, Lin X, Wei L et al (2011) Investigating the predictability of essential genes across distantly related organisms using an integrative approach. Nucleic Acids Res 39:795–807PubMedCentralPubMedCrossRefGoogle Scholar
  19. 19.
    Ning LW, Lin H, Ding H, Huang J, Rao N, Guo FB (2014) Predict essential genes using only sequence composition information. Genet Mol Res 13:4564–4572PubMedCrossRefGoogle Scholar
  20. 20.
    Guo FB, Ning LW, Huang J, Lin H, Zhang HX (2010) Chromosome translocation and its consequence in the genome of Burkholderia cenocepacia AU-1054. Biochem Biophys Res Commun 403:375–379PubMedCrossRefGoogle Scholar
  21. 21.
    Ye YN, Hua ZG, Huang J, Rao N, Guo FB (2013) CEG: a database of essential gene clusters. BMC Genomics 14:769PubMedCentralPubMedCrossRefGoogle Scholar
  22. 22.
    Wei W, Ning LW, Ye YN, Guo FB (2013) Geptop: a gene essentiality prediction tool for sequenced bacterial genomes based on orthology and phylogeny. PLoS One 8:e72343PubMedCentralPubMedCrossRefGoogle Scholar
  23. 23.
    Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27CrossRefGoogle Scholar
  24. 24.
    Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659PubMedCrossRefGoogle Scholar
  25. 25.
    Gustafson AM, Snitkin ES, Parker SC, Delisi C, Kasif S (2006) Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genomics 7:265PubMedCentralPubMedCrossRefGoogle Scholar
  26. 26.
    Kobayashi K, Ehrlich SD, Albertini A, Amati G, Andersen KK, Arnaud M et al (2003) Essential Bacillus subtilis genes. Proc Natl Acad Sci U S A 100:4678–4683PubMedCentralPubMedCrossRefGoogle Scholar
  27. 27.
    Peterson SN, Fraser CM (2001) The complexity of simplicity. Genome Biol 2:Comment2002PubMedCentralPubMedCrossRefGoogle Scholar
  28. 28.
    Zhang CT, Zhang R (2008) Gene essentiality analysis based on DEG, a database of essential genes. Methods Mol Biol 416:391–400PubMedCrossRefGoogle Scholar
  29. 29.
    Zhang R, Lin Y (2009) DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res 37:D455–D458PubMedCentralPubMedCrossRefGoogle Scholar
  30. 30.
    Xu Z, Hao B (2009) CVTree update: a newly designed phylogenetic study platform using composition vectors and whole genomes. Nucleic Acids Res 37:W174–W178PubMedCentralPubMedCrossRefGoogle Scholar
  31. 31.
    Jordan IK, Rogozin IB, Wolf YI, Koonin EV (2002) Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res 12:962–968PubMedCentralPubMedCrossRefGoogle Scholar
  32. 32.
    Gong X, Fan S, Bilderbeck A, Li M, Pang H, Tao S (2008) Comparative analysis of essential genes and nonessential genes in Escherichia coli K12. Mol Genet Genomics 279:87–94PubMedCrossRefGoogle Scholar
  33. 33.
    Fang G, Rocha E, Danchin A (2005) How essential are nonessential genes? Mol Biol Evol 22:2147–2156PubMedCrossRefGoogle Scholar
  34. 34.
    Wei W, Ye YN, Luo S, Deng YY, Lin D, Guo FB (2014) IFIM: a database of integrated fitness information for microbial genes. Database (Oxford). pii bau052Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Feng-Biao Guo
    • 1
    Email author
  • Yuan-Nong Ye
    • 1
  • Lu-Wen Ning
    • 1
  • Wen Wei
    • 1
  1. 1.Computational, Comparative, Evolutionary and Functional Genomics Group (CEFG), School of Life Science and TechnologyUniversity of Electronic Science and Technology of ChinaChengduChina

Personalised recommendations