Abstract
Background
Rapid identification of new essential genes is necessary to understand biological mechanisms and identify potential targets for antimicrobial drugs. Many computational methods have been proposed.
Objectives
To construct an essential genes classifier which satisfies more different organisms, and to study the redundancy of features used in the prediction of essential genes.
Methods
We designed a 57-12-1 artificial neural network model to predict the essential genes of 31 prokaryotic genomes. Four methods including self-predictions of each organism, the leave-one-genome-out method, predicting all by one organism, and self-predictions of all organisms were applied to assess the predictive performance. Additionally, the 57 features used in the artificial neural network model were analyzed by weighted principal component analysis to screen the key features strongly related to the essentiality of genes.
Results
Our results compared with previous researches indicate that our models had better generalizability. Furthermore, this method reduced the features to 29 while maintaining stable prediction performance overall, suggesting that some features are redundant for gene essentiality, and the screened features contained more important biological information for gene essentiality.
Conclusion
This study showed the effectiveness and generalizability of our artificial neural network model. In addition, the screened features could be used as key features in computational analysis and biological experiments.
Similar content being viewed by others
References
Adetiba E, Olugbara OO (2015) Lung cancer prediction using neural network ensemble with histogram of oriented gradient genomic features. Sci World J 2015:1–17. https://doi.org/10.1155/2015/786013
Bhardwaj A, Tiwari A (2015) Breast cancer diagnosis using genetically optimized neural network model. Expert Syst Appl 42(10):4611–4620. https://doi.org/10.1016/j.eswa.2015.01.065
Bharti KK, Singh PK (2015) Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl 42(6):3105–3114. https://doi.org/10.1016/j.eswa.2014.11.038
Bland C, Newsome AS, Markovets AA (2010) Promoter prediction in E. coli based on SIDD profiles and artificial neural networks. BMC Bioinform 11(S6):S17. https://doi.org/10.1186/1471-2105-11-S6-S17
Chen YC, Ke WC, Chiu HW (2014) Risk classification of cancer survival using ANN with gene expression data from multiple laboratories. Comput Biol Med 48:1–7. https://doi.org/10.1016/j.compbiomed.2014.02.006
Cheng J, Wu W, Zhang Y, Li X, Jiang X, Wei G, Tao S (2013) A new computational strategy for predicting essential genes. BMC Genom 14:910. https://doi.org/10.1186/1471-2164-14-910
Commichau FM, Pietack N, Stülke J (2013) Essential genes in Bacillus subtilis: a re-evaluation after ten years. Mol BioSyst 9(6):1068. https://doi.org/10.1039/c3mb25595f
Deng J, Deng L, Su S, Zhang M, Lin X, Wei L, Minai AA, Hassett DJ, Lu LJ (2011) Investigating the predictability of essential genes across distantly related organisms using an integrative approach. Nucleic Acids Res 39(3):795–807. https://doi.org/10.1093/nar/gkq784
Francis NK, Luther A, Salib E, Allanby L, Messenger D, Allison AS, Smart NJ, Ockrim JB (2015) The use of artificial neural networks to predict delayed discharge and readmission in enhanced recovery following laparoscopic colorectal cancer surgery. Tech Coloproctol 19(7):419–428. https://doi.org/10.1007/s10151-015-1319-0
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 9(21):1263–1284. https://doi.org/10.1109/tkde.2008.239
Kim SB, Rattakorn P (2011) Unsupervised feature selection using weighted principal components. Expert Syst Appl 38(5):5704–5710. https://doi.org/10.1016/j.eswa.2010.10.063
Krogh A, Larsson B, Gv Heijne, Sonnhammer ELL (2001) predicting transmembrane protein topology with a hidden markov model: application to complete genomes11 Edited by F. Cohen. J Mol Biol 305:567–580. https://doi.org/10.1006/jmbi.2000.4315
Kurubanjerdjit N, Huang CH, Lee YL, Tsai JJ, Ng KL (2013) Prediction of microRNA-regulated protein interaction pathways in Arabidopsis using machine learning algorithms. Comput Biol Med 43(11):1645–1652. https://doi.org/10.1016/j.compbiomed.2013.08.010
Kusy M, Obrzut B, Kluska J (2013) Application of gene expression programming and neural networks to predict adverse events of radical hysterectomy in cervical cancer patients. Med Biol Eng Compu 51(12):1357–1365. https://doi.org/10.1007/s11517-013-1108-8
Liu JW, Chi GH, Li HE, Liu Y, Luo XL (2013) Prediction of protein secondary structure using multilayer feedforward neural networks. https://doi.org/10.1109/ccdc.2013.6561135
Liu YF, He GH, Tan M, Nie F, Li BJ (2014) Artificial neural network model for turbulence promoter-assisted crossflow microfiltration of particulate suspensions. Desalination 338:57–64. https://doi.org/10.1016/j.desal.2014.01.015
Lu Y, Deng J, Carson MB, Lu H, Lu LJ (2014a) Computational methods for the prediction of microbial essential genes. Curr Bioinform 9:89–101. https://doi.org/10.2174/1574893608999140109113434
Lu Y, Deng J, Rhodes JC, Lu H, Lu LJ (2014b) Predicting essential genes for identifying potential drug targets in Aspergillus fumigatus. Comput Biol Chem 50:29–40. https://doi.org/10.1016/j.compbiolchem.2014.01.011
Luo H, Lin Y, Gao F, Zhang CT, Zhang R (2014) DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements: table 1. Nucleic Acids Res 42(D1):574–580. https://doi.org/10.1093/nar/gkt1131
Masso M (2013) Fast and accurate structure-based prediction of resistance to the HIV-1 integrase inhibitor raltegravir, pp 735–740. http://doi.org/10.1145/2506583.2506703
Ning LW, Lin H, Ding H, Huang J, Rao F, Guo FB (2014) Predicting bacterial essential genes using only sequence composition information. Genet Mol Res 13(2):4564–4572. https://doi.org/10.4238/2014.June.17.8
Olson SA (2002) EMBOSS: the European molecular biology open software suite. Brief Bioinform 3(1):87–91. https://doi.org/10.1016/S0168-9525(00)02024-2
Palaniappan K, Mukherjee S (2011) Predicting “Essential” Genes across microbial genomes: a machine learning approach. 2:189-194. http://doi.org/10.1109/ICMLA.2011.114
Pearson KFRS (1901) LIII. On lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci 2(11):559–572. https://doi.org/10.1080/14786440109462720
Plaimas K, Eils R, Konig R (2010) Identifying essential genes in bacterial metabolic networks with machine learning methods. BMC Syst Biol 4:56. https://doi.org/10.1186/1752-0509-4-56
R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/. Accessed July 2017
Rocha DT, Salle FO, Perdoncini G, Rocha SLS, Fortes FBB, Moraes HLS, Nascimento VP, Salle CTP (2015) Classification of antimicrobial resistance using artificial neural networks and the relationship of 38 genes associated with the virulence of Escherichia coli isolates from broilers. Pesquisa Veterinária Brasileira. Rio de Janeiro 35:137–140. https://doi.org/10.1590/S0100-736X2015000200007
Sharp PM, Bailes E, Grocock RJ, Peden JF, Sockett RE (2005) Variation in the strength of selected codon usage bias among bacteria. Nucleic Acids Res 33(4):1141–1153. https://doi.org/10.1093/nar/gki242
Silva SdA, Gerhardt GJL, Echeverrigaray S (2011) Rules extraction from neural networks applied to the prediction and recognition of prokaryotic promoters. Genet Mol Biol 34(2):353–360. https://doi.org/10.1590/S1415-47572011000200031
Song K, Tong T, Wu F (2014) Predicting essential genes in prokaryotic genomes using a linear method: ZUPLS 6:460–469. https://doi.org/10.1039/c3ib40241j
Waller T, Nowak R, Tkacz M, Zapart D, Mazurek U (2013) Familial or Sporadic Idiopathic Scoliosis classification based on artificial neural network and GAPDH and ACTB transcription profile. Biomed Eng Online 12(1):1. https://doi.org/10.1186/1475-925X-12-1
Yang L, Wang HP, Wang JZ, Lv YL, Zuo YC, Li X, Jiang W (2014) Analysis and identification of essential genes in humans using topological properties and biological information. Gene 551(2):138–151. https://doi.org/10.1016/j.gene.2014.08.046
Yu NY, Wagner JR, Laird MR, Melli G, Rey S, Lo R, Dao P, Sahinalp SC, Ester M, Foster LJ, Brinkman FS (2010) PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 26(13):1608–1615. https://doi.org/10.1093/bioinformatics/btq249
Zhang GQ, Patuwo BE, Hu MY (1998) Forecasting with artificial neural networks: the state of the art. Int J Forecast 14:35–62. https://doi.org/10.1016/S0169-2070(97)00044-7
Zhang F, Chen J, Wang M, Drabier R (2013) A neural network approach to multi-biomarker panel discovery by high-throughput plasma proteomics profiling of breast cancer. BMC Proc S 7:S10. https://doi.org/10.1186/1753-6561-7-S7-S10
Zhong J, Wang J, Peng W, Zhang Z, Pan Y (2013) Prediction of essential proteins based on gene expression programming. BMC Genom 14(S4):S7. https://doi.org/10.1186/1471-2164-14-S4-S7
Zhou Y, Liang Y, Hu C, Wang L, Shi X (2008) An artificial neural network method for combining gene prediction based on equitable weights. Neurocomputing 71:538–543. https://doi.org/10.1016/j.neucom.2007.07.019
Acknowledgements
This work was supported by the China Postdoctoral Science Foundation funded project (Grant No. 2012M521673), and the Fundamental Research Funds for the Central Universities (Project No. 106112014CDJZR165503, CDJRC10160011, CDJZR12160007).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Luo Xu, Zhirui Guo and Xiao Liu declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xu, L., Guo, Z. & Liu, X. Prediction of essential genes in prokaryote based on artificial neural network. Genes Genom 42, 97–106 (2020). https://doi.org/10.1007/s13258-019-00884-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13258-019-00884-w