Abstract
Essential genes are reductively defined as those fundamental for an organism’s reproductive success and growth. Still, the so-called essentiality of a gene is a context-dependent dynamic attribute that can vary in different cells, tissues, or pathological conditions. Identifying essential genes at a genome-wide level is a challenging issue in primary and applied biomedical research, prominently in synthetic biology, drug targeting, and disease gene identification. Wet-lab experimental procedures designed to test whether a gene is essential or not are cost- and time-consuming, especially in the case of complex organisms such as humans. Consequently, computational approaches provide a fundamental alternative, still representing a demanding and challenging task due to the complex nature of the biological problem. Commonly explored methods are devoted to classifying nodes in protein-protein interaction networks, but they are scarcely successful, especially in the case of human genes. Node classification in graph modeling/analysis allows predicting an unknown node property based on defined node attributes. Here, we propose an overview of the different aspects of the biological background, methodologies, and applications related to identifying essential genes, with the aim to provide a small guide through the potentialities and open issues. We further present an experimental approach to examine the entire workflow, from the labeling of the nodes to the attribute choice to the learning modeling. To this extent, we exploit a tissue-specific integrated network enriched with pre-computed biological and embedding-derived topological features to develop a model through a deep learning approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The UCSC TFBS Conserved Track Settings identifies motifs that are conserved across humans, mice, and rats and scores these sites based on the motif match.
References
Magdalena Antczak, Martin Michaelis, and Mark N Wass. Environmental conditions shape the nature of a minimal bacterial genome. Nature communications, 10(1):1–13, 2019.
Olufemi Aromolaran, Damilare Aromolaran, Itunuoluwa Isewon, and Jelili Oyelade. Machine learning approach to gene essentiality prediction: a review. Briefings in Bioinformatics, 22(5), 04 2021. bbab128.
Olufemi Aromolaran, Thomas Beder, Marcus Oswald, Jelili Oyelade, Ezekiel Adebiyi, and Rainer Koenig. Essential gene prediction in drosophila melanogaster using machine learning approaches based on sequence and functional features. Computational and Structural Biotechnology Journal, 18:612–621, 2020.
Tomoya Baba, Takeshi Ara, Miki Hasegawa, Yuki Takai, Yoshiko Okumura, Miki Baba, Kirill A Datsenko, Masaru Tomita, Barry L Wanner, and Hirotada Mori. Construction of escherichia coli k-12 in-frame, single-gene knockout mutants: the keio collection. Molecular systems biology, 2(1):2006–0008, 2006.
István Bartha, Julia Di Iulio, J Craig Venter, and Amalio Telenti. Human gene essentiality. Nature Reviews Genetics, 19(1):51–62, 2018.
Fiona M Behan, Francesco Iorio, Gabriele Picco, Emanuel Gonçalves, Charlotte M Beaver, Giorgia Migliardi, Rita Santos, Yanhua Rao, Francesco Sassi, Marika Pinnelli, et al. Prioritization of cancer therapeutic targets using crispr–cas9 screens. Nature, 568(7753):511–516, 2019.
Tobias Bergmiller, Martin Ackermann, and Olin K Silander. Patterns of evolutionary conservation of essential genes correlate with their compensability. PLoS genetics, 8(6):e1002803, 2012.
Phillip Bonacich. Power and centrality: A family of measures. American Journal of Sociology, 92(5):1170–1182, 1987.
Ulrik Brandes. On variants of shortest-path betweenness centrality and their generic computation. Social Networks, 30(2):136–145, 2008.
Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
Marian Breuer, Tyler M Earnest, Chuck Merryman, Kim S Wise, Lijie Sun, Michaela R Lynott, Clyde A Hutchison, Hamilton O Smith, John D Lapek, David J Gonzalez, et al. Essential metabolism for a minimal cell. Elife, 8:e36842, 2019.
Tulio L. Campos, Pasi K. Korhonen, Robin B. Gasser, and Neil D. Young. An evaluation of machine learning approaches for the prediction of essential genes in eukaryotes using protein sequence-derived features. Computational and Structural Biotechnology Journal, 17:785–796, 2019.
Latarsha J Carithers, Kristin Ardlie, Mary Barcus, Philip A Branton, Angela Britton, Stephen A Buia, Carolyn C Compton, David S DeLuca, Joanne Peter-Demchok, Ellen T Gelfand, et al. A novel approach to high-quality postmortem tissue procurement: the gtex project. Biopreservation and biobanking, 13(5):311–319, 2015.
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, et al. Smote: Synthetic minority over-sampling technique. J. Artif. Int. Res., 16(1):321–357, June 2002.
Hebing Chen, Zhuo Zhang, Shuai Jiang, Ruijiang Li, Wanying Li, Chenghui Zhao, Hao Hong, Xin Huang, Hao Li, and Xiaochen Bo. New insights on human essential genes based on integrated analysis and the construction of the hegiap web-based platform. Briefings in bioinformatics, 21(4):1397–1410, 2020.
Liang Chen, Jintang Li, Jiaying Peng, Tao Xie, Zengxu Cao, Kun Xu, Xiangnan He, and Zibin Zheng. A survey of adversarial learning on graphs. CoRR, abs/2003.05730, 2020.
Wei-Hua Chen, Guanting Lu, Xiao Chen, Xing-Ming Zhao, and Peer Bork. OGEE v2: an update of the online gene essentiality database with special focus on differentially essential genes in human cancer cell lines. Nucleic Acids Research, 45(D1):D940–D944, 10 2016.
Wei-Hua Chen, Kalliopi Trachana, Martin J Lercher, and Peer Bork. Younger genes are less likely to be essential than older genes, and duplicates are less likely to be essential than singletons of the same age. Molecular biology and evolution, 29(7):1703–1706, 2012.
Hyonho Chun and Sündüz Keles. Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J.R. Statist. Soc.B, 72(1):3–25, 2010.
Genomes Project Consortium, A Auton, LD Brooks, RM Durbin, EP Garrison, and HM Kang. A global reference for human genetic variation. Nature, 526(7571):68–74, 2015.
Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20(3):273–297, 1995.
Wei Dai, Qi Chang, Wei Peng, Jiancheng Zhong, and Yongjiang Li. Network embedding the protein–protein interaction network for human essential genes identification. Genes, 11(2):153, 2020.
Andriy Didovyk, Bartłomiej Borek, Lev Tsimring, and Jeff Hasty. Transcriptional regulation with crispr-cas9: principles, advances, and applications. Current opinion in biotechnology, 40:177–184, 2016.
Chuan Dong, Yan-Ting Jin, Hong-Li Hua, Qing-Feng Wen, Sen Luo, Wen-Xin Zheng, and Feng-Biao Guo. Comprehensive review of the identification of essential genes using computational methods: focusing on feature implementation and assessment. Briefings in Bioinformatics, 21(1):171–181, 11 2020.
Jingcheng Du, Peilin Jia, Yulin Dai, Cui Tao, Zhongming Zhao, and Degui Zhi. Gene2vec: distributed representation of genes based on co-expression. BMC Genomics, 20(Suppl 1), 2019.
Steffen Durinck, Paul T. Spellman, Ewan Birney, and Wolfgang Huber. Mapping identifiers for the integration of genomic datasets with the r/bioconductor package biomart. Nature Protocols, 4:1184–1191, 2009.
Benjamin Georgi, Benjamin F Voight, and Maja Bućan. From mouse to human: evolutionary genomics analysis of human orthologs of essential genes. PLoS genetics, 9(5):e1003484, 2013.
Mark G Goebl and Thomas D Petes. Most of the yeast genomic sequences are not essential for cell growth and division. Cell, 46(7):983–992, 1986.
Ilaria Granata, Mario R. Guarracino, Valery A. Kalyagin, Lucia Maddalena, Ichcha Manipur, and Panos M. Pardalos. Supervised classification of metabolic networks. In 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 2688–2693, 2018.
Ilaria Granata, Mario Manzo, Ari Kusumastuti, and Mario R Guarracino. Learning from metabolic networks: Current trends and future directions for precision medicine. Current Medicinal Chemistry, 28(32):6619–6653, 2021.
Aditya Grover and Jure Leskovec. Node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 855–864, New York, NY, USA, 2016. Association for Computing Machinery.
Feng-Biao Guo, Chuan Dong, Hong-Li Hua, Shuo Liu, Hao Luo, Hong-Wan Zhang, Yan-Ting Jin, and Kai-Yue Zhang. Accurate prediction of human essential genes using only nucleotide composition and association information. Bioinformatics, 33(12):1758–1764, 01 2017.
Sanathoi Gurumayum, Puzi Jiang, Xiaowen Hao, Tulio L Campos, Neil D Young, Pasi K Korhonen, Robin B Gasser, Peer Bork, Xing-Ming Zhao, Li-jie He, and Wei-Hua Chen. OGEE v3: Online GEne Essentiality database with increased coverage of organisms and human cell lines. Nucleic Acids Research, 49(D1):D998–D1003, 10 2020.
Da Wei Huang, Brad T Sherman, and Richard A Lempicki. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic acids research, 37(1):1–13, 2009.
Da Wei Huang, Brad T Sherman, and Richard A Lempicki. Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nature protocols, 4(1):44–57, 2009.
Clyde A Hutchison III, Ray-Yuan Chuang, Vladimir N Noskov, Nacyra Assad-Garcia, Thomas J Deerinck, Mark H Ellisman, John Gill, Krishna Kannan, Bogumil J Karas, Li Ma, et al. Design and synthesis of a minimal bacterial genome. Science, 351(6280):aad6253, 2016.
H. Jeong, S. P. Mason, A. L. Barabási, and Z. N. Oltvai. Lethality and centrality in protein networks. Nature, 411(6833):41–42, May 2001.
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30, 2017.
Max Kotlyar, Chiara Pastrello, Nicholas Sheahan, and Igor Jurisica. Integrated interactions database: tissue-specific view of the human and model organism interactomes. Nucleic acids research, 44(D1):D536–D541, 2016.
Shuzhen Kuang, Yanzhang Wei, and Liangjiang Wang. Expression-based prediction of human essential genes and candidate lncrnas in cancer cells. Bioinformatics, 37(3):396–403, 2021.
Jean-Christophe Lachance, Dominick Matteau, Joëlle Brodeur, Colton J Lloyd, Nathan Mih, Zachary A King, Thomas F Knight, Adam M Feist, Jonathan M Monk, Bernhard O Palsson, et al. Genome-scale metabolic modeling reveals key features of a minimal gene set. Molecular systems biology, 17(7):e10099, 2021.
Katherine E Larrimore and Giulia Rancati. The conditional nature of gene essentiality. Current Opinion in Genetics & Development, 58:55–61, 2019.
Pascal Leuenberger, Stefan Ganscha, Abdullah Kahraman, Valentina Cappelletti, Paul J Boersema, Christian von Mering, Manfred Claassen, and Paola Picotti. Cell-wide analysis of protein thermal unfolding reveals determinants of thermostability. science, 2017.
Min Li, Jianxin Wang, Xiang Chen, Huan Wang, and Yi Pan. A local average connectivity-based method for identifying essential proteins from the network level. Comput. Biol. Chem., 35(3):143–150, 2011.
Taibo Li, Rasmus Wernersson, Rasmus B. Hansen, et al. A scored human protein–protein interaction network to catalyze genomic interpretation. Nat Methods, 14:61–64, 2017.
Xingyi Li, Wenkai Li, Min Zeng, Ruiqing Zheng, and Min Li. Network-based methods for predicting essential genes or proteins: a survey. Briefings in Bioinformatics, 21(2):566–583, 02 2020.
Hao Luo, Yan Lin, Feng Gao, Chun-Ting Zhang, and Ren Zhang. DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements. Nucleic Acids Research, 42(D1):D574–D580, 11 2013.
Hao Luo, Yan Lin, Tao Liu, Fei-Liao Lai, Chun-Ting Zhang, Feng Gao, and Ren Zhang. DEG 15, an update of the Database of Essential Genes that includes built-in analysis tools. Nucleic Acids Research, 49(D1):D677–D686, 10 2020.
Lucia Maddalena, Maurizio Giordano, Mario Manzo, and Mario R. Guarracino. Whole-graph embedding and adversarial attacks for life sciences. In Rubem P. Mondaini, editor, Trends in Biomathematics: Stability and Oscillations in Environmental, Social, and Biological Models, Cham, 2022. Springer International Publishing. https://link.springer.com/chapter/10.1007/978-3-031-12515-7_1.
Lucia Maddalena, Ichcha Manipur, Mario Manzo, and Mario R. Guarracino. On whole-graph embedding techniques. In Rubem P. Mondaini, editor, Trends in Biomathematics: Chaos and Control in Epidemics, Ecosystems, and Cells: Selected Works from the 20th BIOMAT Consortium Lectures, Rio de Janeiro, Brazil, 2020, pages 115–131, Cham, 2021. Springer International Publishing.
Ichcha Manipur, Ilaria Granata, Lucia Maddalena, and Mario Rosario Guarracino. Clustering analysis of tumor metabolic networks. BMC Bioinformatics, 2020.
Mario Manzo, Maurizio Giordano, Lucia Maddalena, Mario Rosario Guarracino, and Ilaria Granata. Tissue-specific essential genes identification using nodes classification on an integrated network. Studies in Computational Intelligence, 2022. submitted.
Vivien Marx. The DNA of a nation. Nature, 524(7566):503–505, 2015.
Tomás Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, page 3111–3119, Red Hook, NY, USA, 2013. Curran Associates Inc.
Tomás Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In Christopher J. C. Burges, Léon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, pages 3111–3119, 2013.
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, November 1999. Previous number = SIDL-WP-1999-0120.
Shashank J Patel, Neville E Sanjana, Rigel J Kishton, Arash Eidizadeh, Suman K Vodnala, Maggie Cam, Jared J Gartner, Li Jia, Seth M Steinberg, Tori N Yamamoto, et al. Identification of essential genes for cancer immunotherapy. Nature, 548(7669):537–542, 2017.
Janet Piñero, Juan Manuel Ramírez-Anguita, Josep Saüch-Pitarch, Francesco Ronzano, Emilio Centeno, Ferran Sanz, and Laura I Furlong. The disgenet knowledge platform for disease genomics: 2019 update. Nucleic acids research, 48(D1):D845–D855, 2020.
Giulia Rancati, Jason Moffat, Athanasios Typas, and Norman Pavelka. Emerging and evolving concepts in gene essentiality. Nature Reviews Genetics, 19(1):34–49, 2018.
Saeid Rasti and Chrysafis Vogiatzis. A survey of computational methods in protein-protein interaction networks. Annals of Operations Research, 276(1):35–87, May 2019.
Benedek Rozemberczki, Oliver Kiss, and Rik Sarkar. Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs. In Proc. of the 29th ACM Int. Conf. on Information and Knowledge Management (CIKM ’20). ACM, 2020.
Nina R Salama, Benjamin Shepherd, and Stanley Falkow. Global transposon mutagenesis and essential gene analysis of helicobacter pylori. Journal of bacteriology, 186(23):7926–7935, 2004.
João Schapke, Anderson Tavares, and Mariana Recamonde-Mendoza. EPGAT: Gene essentiality prediction with graph attention networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(3):1615–1626, 2022.
Chang Su, Jie Tong, Yongjun Zhu, Peng Cui, and Fei Wang. Network embedding in biomedical data science. Briefings in Bioinformatics, 21(1):182–197, 12 2018.
Gongyu Tang, Minsu Cho, and Xiaowei Wang. Oncodb: an interactive online database for analysis of gene expression and viral infection in cancer. Nucleic Acids Research, 50(D1):D1334–D1339, 2022.
Mathias Uhlén, Linn Fagerberg, Björn M Hallström, Cecilia Lindskog, Per Oksvold, Adil Mardinoglu, Åsa Sivertsson, Caroline Kampf, Evelina Sjöstedt, Anna Asplund, et al. Tissue-based map of the human proteome. Science, 347(6220), 2015.
Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res., 11:3371–3408, dec 2010.
Jianxin Wang, Min Li, Huan Wang, and Yi Pan. Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 9(4):1070–1080, jul 2012.
David L Wheeler, Tanya Barrett, Dennis A Benson, Stephen H Bryant, Kathi Canese, Vyacheslav Chetvernin, Deanna M Church, Michael DiCuccio, Ron Edgar, Scott Federhen, et al. Database resources of the national center for biotechnology information. Nucleic acids research, 36(suppl_1):D13–D21, 2007.
Guanming Wu, Xin Feng, and Lincoln Stein. A human functional protein interaction network and its application to cancer data analysis. Genome Biol, 11(R53), 2010.
Stefan Wuchty and Peter F. Stadler. Centers of complex networks. Journal of Theoretical Biology, 223(1):45–53, 2003.
Hsueh-Chi Sherry Yen, Qikai Xu, Danny M Chou, Zhenming Zhao, and Stephen J Elledge. Global protein stability profiling in mammalian cells. Science, 322(5903):918–923, 2008.
Min Zeng, Min Li, Zhihui Fei, Fang-Xiang Wu, Yaohang Li, Yi Pan, and Jianxin Wang. A deep learning framework for identifying essential proteins by integrating multiple types of biological information. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18(1):296–305, 2021.
Min Zeng, Min Li, Fang-Xiang Wu, Yaohang Li, and Yi Pan. Deepep: a deep learning framework for identifying essential proteins. BMC Bioinform., 20-S(16):506:1–506:10, 2019.
Chengxin Zhang, Wei Zheng, Micah Cheng, Gilbert S Omenn, Peter L Freddolino, and Yang Zhang. Functions of essential genes and a scale-free protein interaction network revealed by structure-based function and interaction prediction for a minimal genome. Journal of proteome research, 20(2):1178–1189, 2021.
Hong-Xia Zhang, Ying Zhang, and Hao Yin. Genome editing with mrna encoding zfn, talen, and cas9. Molecular Therapy, 27(4):735–746, 2019.
Xue Zhang, Marcio Luis Acencio, and Ney Lemke. Predicting essential genes and proteins based on machine learning and network topological features: A comprehensive review. Frontiers in Physiology, 7, 2016.
Xue Zhang, Wangxin Xiao, and Weijia Xiao. Deephe: Accurately predicting human essential genes based on deep learning. PLOS Computational Biology, 16(9):e1008229, 2020.
Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67:301–320, 2005.
Acknowledgements
This work has been partially funded by the BiBiNet project (H35F21000430002) within POR-Lazio FESR 2014–2020. It was carried out also within the activities of the authors as members of the INdAM Research group GNCS and the ICAR-CNR INdAM Research Unit and partially supported by the INdAM research project “Computational Intelligence methods for Digital Health.” The work of Mario R. Guarracino was conducted within the framework of the Basic Research Program at the National Research University Higher School of Economics (HSE). Mario Manzo thanks Prof. Alfredo Petrosino for the guidance and supervision during the years of working together.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Granata, I., Giordano, M., Maddalena, L., Manzo, M., Guarracino, M.R. (2023). Network-Based Computational Modeling to Unravel Gene Essentiality. In: Mondaini, R.P. (eds) Trends in Biomathematics: Modeling Epidemiological, Neuronal, and Social Dynamics. BIOMAT 2022. Springer, Cham. https://doi.org/10.1007/978-3-031-33050-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-33050-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33049-0
Online ISBN: 978-3-031-33050-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)