Improving Identification of Essential Proteins by a Novel Ensemble Method

Dai, Wei; Li, Xia; Peng, Wei; Song, Jurong; Zhong, Jiancheng; Wang, Jianxin

doi:10.1007/978-3-030-20242-2_13

Wei Dai¹⁷,
Xia Li¹⁷,
Wei Peng^17,18,
Jurong Song¹⁸,
Jiancheng Zhong¹⁹ &
…
Jianxin Wang²⁰

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 11490))

Included in the following conference series:

International Symposium on Bioinformatics Research and Applications

723 Accesses
3 Altmetric

Abstract

Essential proteins are indispensable for cell survival, and the identification of essential proteins plays a critical role in biological and pharmaceutical design research. Recently, some machine learning methods have been proposed by introducing effective protein features or by employing powerful classifiers. Seldom of them focused on improving the prediction accuracy by designing efficient strategies to ensemble different classifiers. In this work, a novel ensemble learning framework called by Tri-ensemble was proposed to integrate different classifiers, which selected three weak classifiers and trained these classifiers by continually adding the samples that are predicted to have abnormally high or abnormally low properties by the other two classifiers. We applied Tri-ensemble on predicting the essential protein of Yeast and E.coli. The results show that our approach achieves better performance than both individual classifiers and the other ensemble learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ren, Z., Yan, L.: DEG 50, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 37(Database issue), D455 (2009)
Google Scholar
Jeong, H., Mason, S.P., Barabasi, A.L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411(6833), 41–42 (2001)
Article Google Scholar
Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1), 35–41 (1977)
Article Google Scholar
Joy, M.P., Brock, A., Ingber, D.E., Huang, S.: High-betweenness proteins in the yeast protein interaction network. J. Biomed. Biotechnol. 2005(2), 96 (2014)
Article Google Scholar
Stefan, W., Stadler, P.F.: Centers of complex networks. J. Theor. Biol. 223(1), 45–53 (2003)
Article MathSciNet Google Scholar
Vallabhajosyula, R.R., Deboki, C., Samina, L., Animesh, R., Alpan, R.: Identifying hubs in protein interaction networks. PLoS ONE 4(4), e5344 (2009)
Article Google Scholar
Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987)
Article Google Scholar
Stephenson, K., Zelen, M.: Rethinking centrality: methods and examples ☆. Soc. Netw. 11(1), 1–37 (1989)
Article MathSciNet Google Scholar
Wang, J., Li, M., Wang, H., Pan, Y.: Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans. Comput. Biol. Bioinf. 9(4), 1070–1080 (2012)
Article Google Scholar
Ernesto, E., Rodríguez-Velázquez, J.A.: Subgraph centrality in complex networks. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 71(5 Pt 2), 056103 (2005)
MathSciNet Google Scholar
Li, M., Zhang, H., Fei, Y.: Essential protein discovery method based on integration of PPI and gene expression data. J. Cent. South Univ. 44(3), 1024–1029 (2013)
Google Scholar
Tang, X., Wang, J., Yi, P.: Identifying essential proteins via integration of protein interaction and gene expression data (2012)
Google Scholar
Jordan, I.K., Rogozin, I.B., Wolf, Y.I., Koonin, E.V.: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12(6), 962 (2002)
Article Google Scholar
Hart, G.T., Lee, I., Marcotte, E.M.: A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinform. 8(1), 1–11 (2007)
Article Google Scholar
Peng, W., Wang, J., Wang, W., Liu, Q., Wu, F.X., Pan, Y.: Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst. Biol. 6(1), 1–17 (2012)
Article Google Scholar
Gustafson, A.M., Snitkin, E.S., Parker, S.C., Delisi, C., Kasif, S.: Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genom. 7(1), 265 (2006)
Article Google Scholar
Hwang, Y.C., Lin, C.C., Chang, J.Y., Mori, H., Juan, H.F., Huang, H.C.: Predicting essential genes based on network and sequence analysis. Mol. BioSyst. 5(12), 1672–1678 (2009)
Article Google Scholar
Zhong, J., Wang, J., Peng, W., Zhang, Z., Pan, Y.: Prediction of essential proteins based on gene expression programming. BMC Genom. 14(S4), S7 (2013)
Article Google Scholar
Acencio, M.L., Lemke, N.: Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinform. 10(1), 290 (2009)
Article Google Scholar
Deng, J., et al.: Investigating the predictability of essential genes across distantly related organisms using an integrative approach. Nucleic Acids Res. 39(3), 795–807 (2011)
Article MathSciNet Google Scholar
Chen, Y., Xu, D.: Understanding protein dispensability through machine-learning analysis of high-throughput data. Bioinformatics 21(5), 575–581 (2005)
Article Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MathSciNet MATH Google Scholar
Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: SIGIR Proceedings of Annual International Conference on Research & Development in Information Retrieval, pp. 215–223 (1998)
Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system (2016)
Google Scholar
Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996)
MathSciNet MATH Google Scholar
Li, M., Zhou, Z.-H.: Tri-training exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)
Article Google Scholar
Mewes, F.D., et al.: MIPS: analysis and annotation of proteins from whole genomes in 2005. Nucleic Acids Res. 34(Database issue), 169–172 (2004)
Google Scholar
Cherry, J.M., et al.: SGD: saccharomyces genome database. Nucleic Acids Res. 26(1), 73–79 (1998)
Article Google Scholar
Saccharomyces Genome Deletion Project. http://www-sequence.stanford.edu/group/yeast_deletion_project/deletions3.html
Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M., Eisenberg, D.: DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30(1), 303 (2002)
Article Google Scholar
Tang, Y., Li, M., Wang, J., Pan, Y., Wu, F.X.: CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems 127, 67–72 (2015)
Article Google Scholar
Gabriel, O., et al.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 38(Database issue), D196 (2010)
Google Scholar
Tu, B.P., Andrzej, K., Maga, R., Mcknight, S.L.: Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science 310(5751), 1152 (2005)
Article Google Scholar
Andea, P., Pier Luigi, M., Piero, F., Rita, C.: eSLDB: eukaryotic subcellular localization database. Nucl. Acids Res. 35(Database issue), 208–212 (2007)
Google Scholar

Download references

Acknowledgment

This work is supported in part by the National Natural Science Foundation of China under grant No. 31560317, No. 61502214, No. 61502166, No. 61702122 and No. 81560221. Natural Science Foundation of Yunnan Province of China (No. 2016FB107).

Author information

Authors and Affiliations

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050, China
Wei Dai, Xia Li & Wei Peng
Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050, China
Wei Peng & Jurong Song
College of Engineering and Design, Hunan Normal University, Changsha, 410081, China
Jiancheng Zhong
Computer Science, Central South University, Changsha, 410081, China
Jianxin Wang

Authors

Wei Dai
View author publications
You can also search for this author in PubMed Google Scholar
Xia Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Peng
View author publications
You can also search for this author in PubMed Google Scholar
Jurong Song
View author publications
You can also search for this author in PubMed Google Scholar
Jiancheng Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Jianxin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Peng .

Editor information

Editors and Affiliations

Georgia State University, Atlanta, GA, USA
Zhipeng Cai
Georgia State University, Atlanta, GA, USA
Pavel Skums
Central South University, Changsha, China
Min Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dai, W., Li, X., Peng, W., Song, J., Zhong, J., Wang, J. (2019). Improving Identification of Essential Proteins by a Novel Ensemble Method. In: Cai, Z., Skums, P., Li, M. (eds) Bioinformatics Research and Applications. ISBRA 2019. Lecture Notes in Computer Science(), vol 11490. Springer, Cham. https://doi.org/10.1007/978-3-030-20242-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-20242-2_13
Published: 09 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20241-5
Online ISBN: 978-3-030-20242-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics