Abstract
Essential genes are indispensable for biological survival. Thus it is of great significance to identify and study essential genes. A machine learning method, K-Nearest Neighbor, is used for development of predicting essential bacterial genes. The homologous features, including sequence homology and functional homology, of the bacterial genomes are extracted for determining essential genes. Based on the features, we use K-Nearest Neighbor algorithm for determining of gene function. And we tune the minimum matching parameter (K) in the essential gene predicted model for building an optimal model of the Escherichia coli specificity model. The corresponding optimal parameter (K) is then extended to other bacterial essential genes predicting models. After cross validation, the highest accuracy is 0.89 while K between 5 and 7. Therefore, the features we extracted can increase the accuracy of the bacterial essential gene prediction. In the premise, we found that the prediction accuracy of the prediction model based on K-Nearest Neighbor was not significantly different in different evolutionary distances between organisms in the database and the investigated species. That means the machine learning model can be extended to more distant species. It wills have a better predictive performance for predicting essential genes of distant species than the usual sequence-based methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Juhas, M., Eberl, L., Glass, J.I.: Essence of life: Essential genes of minimal genomes. Trends Cell Biol. 21(10), 562–568 (2011)
Hu, W., Sillaots, S., Lemieux, S., et al.: Essential gene identification and drug target prioritization in aspergillus fumigatus. PLoS Pathog. 3(3), e24 (2007)
Wu, G., Yan, Q., Jones, J.A., et al.: Metabolic burden: cornerstones in synthetic biology and metabolic engineering applications. Trends Biotechnol. 34(8), 652–664 (2016)
Koonin, E.V.: Comparative genomics, minimal gene-sets and the last universal common ancestor. Nat. Rev. Microbiol. 1(2), 127–136 (2003)
Luo, H., Lin, Y., Liu, T., et al.: DEG 15, an update of the database of essential genes that includes built-in analysis tools. Nucleic Acids Res. 49(D1), 677–686 (2020)
Rancati, G., Moffat, J., Typas, A., et al.: Emerging and evolving concepts in gene essentiality. Nat. Rev. Genet. 19(1), 34–49 (2018)
Salama, N.R., Shepherd, B., Falkow, S.: Global transposon mutagenesis and essential gene analysis of helicobacter pylori. J. Bacteriol. 186(23), 7926–7935 (2004)
Gerdes, S.Y., Scholle, M.D., Campbell, J.W., et al.: Experimental determination and system level analysis of essential genes in Escherichia Coli MG1655. J. Bacteriol. 19(185), 5673–5684 (2003)
Juhas, M., Stark, M., von Mering, C., et al.: High confidence prediction of essential genes in burkholderia cenocepacia. PLoS ONE 6(7), e40064 (2012)
Aromolaran, O., Beder, T., Oswald, M., Oyelade, J., et al.: Essential gene prediction in drosophila melanogaster using machine learning approaches based on sequence and functional features. Comput. Struct. Biotechnol. 18, 612–621 (2020)
Nigatu, D., Sobetzko, P., Yousef, M., Henkel, W.: Sequence-based information-theoretic features for gene essentiality prediction. BMC Bioinf. 1(18), 473 (2017)
Lei, X., Yang, X., Fujita, H.: Random walk based method to identify essential proteins by integrating network topology and biological characteristics. Knowl-Based Syst. 167, 53–67 (2019)
Wei, W., Ning, L.W., Ye, Y.N., et al.: Geptop: a gene essentiality prediction tool for sequenced bacterial genomes based on orthology and phylogeny. PLoS ONE 8(8), e72343 (2013)
Acknowledgement
This study was jointly funded by the National Natural Science Foundation of China (61803112), the Science and Technology Foundation of Guizhou Province (2018–1133, 2019–2811), the Science and Technology Foundation of Guiyang (2017–30-15), the Science and Technology Fund project of Guizhou Health Commission (gzwjkj2019–1-40), and the Cell and Gene Engineering Innovative Research Groups of Guizhou Province (KY-2016–031).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Conflicts of Interest
The authors declare that they have no conflicts of interest to report regarding the present study.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ye, Y., Liang, D., Zeng, Z. (2022). The Algorithms of Predicting Bacterial Essential Genes and NcRNAs by Machine Learning. In: Liu, Q., Liu, X., Chen, B., Zhang, Y., Peng, J. (eds) Proceedings of the 11th International Conference on Computer Engineering and Networks. Lecture Notes in Electrical Engineering, vol 808. Springer, Singapore. https://doi.org/10.1007/978-981-16-6554-7_54
Download citation
DOI: https://doi.org/10.1007/978-981-16-6554-7_54
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6553-0
Online ISBN: 978-981-16-6554-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)