Skip to main content

Improving Identification of Essential Proteins by a Novel Ensemble Method

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 11490))

Included in the following conference series:

Abstract

Essential proteins are indispensable for cell survival, and the identification of essential proteins plays a critical role in biological and pharmaceutical design research. Recently, some machine learning methods have been proposed by introducing effective protein features or by employing powerful classifiers. Seldom of them focused on improving the prediction accuracy by designing efficient strategies to ensemble different classifiers. In this work, a novel ensemble learning framework called by Tri-ensemble was proposed to integrate different classifiers, which selected three weak classifiers and trained these classifiers by continually adding the samples that are predicted to have abnormally high or abnormally low properties by the other two classifiers. We applied Tri-ensemble on predicting the essential protein of Yeast and E.coli. The results show that our approach achieves better performance than both individual classifiers and the other ensemble learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ren, Z., Yan, L.: DEG 50, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 37(Database issue), D455 (2009)

    Google Scholar 

  2. Jeong, H., Mason, S.P., Barabasi, A.L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411(6833), 41–42 (2001)

    Article  Google Scholar 

  3. Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40(1), 35–41 (1977)

    Article  Google Scholar 

  4. Joy, M.P., Brock, A., Ingber, D.E., Huang, S.: High-betweenness proteins in the yeast protein interaction network. J. Biomed. Biotechnol. 2005(2), 96 (2014)

    Article  Google Scholar 

  5. Stefan, W., Stadler, P.F.: Centers of complex networks. J. Theor. Biol. 223(1), 45–53 (2003)

    Article  MathSciNet  Google Scholar 

  6. Vallabhajosyula, R.R., Deboki, C., Samina, L., Animesh, R., Alpan, R.: Identifying hubs in protein interaction networks. PLoS ONE 4(4), e5344 (2009)

    Article  Google Scholar 

  7. Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987)

    Article  Google Scholar 

  8. Stephenson, K., Zelen, M.: Rethinking centrality: methods and examples ☆. Soc. Netw. 11(1), 1–37 (1989)

    Article  MathSciNet  Google Scholar 

  9. Wang, J., Li, M., Wang, H., Pan, Y.: Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans. Comput. Biol. Bioinf. 9(4), 1070–1080 (2012)

    Article  Google Scholar 

  10. Ernesto, E., Rodríguez-Velázquez, J.A.: Subgraph centrality in complex networks. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 71(5 Pt 2), 056103 (2005)

    MathSciNet  Google Scholar 

  11. Li, M., Zhang, H., Fei, Y.: Essential protein discovery method based on integration of PPI and gene expression data. J. Cent. South Univ. 44(3), 1024–1029 (2013)

    Google Scholar 

  12. Tang, X., Wang, J., Yi, P.: Identifying essential proteins via integration of protein interaction and gene expression data (2012)

    Google Scholar 

  13. Jordan, I.K., Rogozin, I.B., Wolf, Y.I., Koonin, E.V.: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12(6), 962 (2002)

    Article  Google Scholar 

  14. Hart, G.T., Lee, I., Marcotte, E.M.: A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinform. 8(1), 1–11 (2007)

    Article  Google Scholar 

  15. Peng, W., Wang, J., Wang, W., Liu, Q., Wu, F.X., Pan, Y.: Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst. Biol. 6(1), 1–17 (2012)

    Article  Google Scholar 

  16. Gustafson, A.M., Snitkin, E.S., Parker, S.C., Delisi, C., Kasif, S.: Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Genom. 7(1), 265 (2006)

    Article  Google Scholar 

  17. Hwang, Y.C., Lin, C.C., Chang, J.Y., Mori, H., Juan, H.F., Huang, H.C.: Predicting essential genes based on network and sequence analysis. Mol. BioSyst. 5(12), 1672–1678 (2009)

    Article  Google Scholar 

  18. Zhong, J., Wang, J., Peng, W., Zhang, Z., Pan, Y.: Prediction of essential proteins based on gene expression programming. BMC Genom. 14(S4), S7 (2013)

    Article  Google Scholar 

  19. Acencio, M.L., Lemke, N.: Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinform. 10(1), 290 (2009)

    Article  Google Scholar 

  20. Deng, J., et al.: Investigating the predictability of essential genes across distantly related organisms using an integrative approach. Nucleic Acids Res. 39(3), 795–807 (2011)

    Article  MathSciNet  Google Scholar 

  21. Chen, Y., Xu, D.: Understanding protein dispensability through machine-learning analysis of high-throughput data. Bioinformatics 21(5), 575–581 (2005)

    Article  Google Scholar 

  22. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  23. Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: SIGIR Proceedings of Annual International Conference on Research & Development in Information Retrieval, pp. 215–223 (1998)

    Google Scholar 

  24. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system (2016)

    Google Scholar 

  25. Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996)

    MathSciNet  MATH  Google Scholar 

  26. Li, M., Zhou, Z.-H.: Tri-training exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17(11), 1529–1541 (2005)

    Article  Google Scholar 

  27. Mewes, F.D., et al.: MIPS: analysis and annotation of proteins from whole genomes in 2005. Nucleic Acids Res. 34(Database issue), 169–172 (2004)

    Google Scholar 

  28. Cherry, J.M., et al.: SGD: saccharomyces genome database. Nucleic Acids Res. 26(1), 73–79 (1998)

    Article  Google Scholar 

  29. Saccharomyces Genome Deletion Project. http://www-sequence.stanford.edu/group/yeast_deletion_project/deletions3.html

  30. Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M., Eisenberg, D.: DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30(1), 303 (2002)

    Article  Google Scholar 

  31. Tang, Y., Li, M., Wang, J., Pan, Y., Wu, F.X.: CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems 127, 67–72 (2015)

    Article  Google Scholar 

  32. Gabriel, O., et al.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 38(Database issue), D196 (2010)

    Google Scholar 

  33. Tu, B.P., Andrzej, K., Maga, R., Mcknight, S.L.: Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science 310(5751), 1152 (2005)

    Article  Google Scholar 

  34. Andea, P., Pier Luigi, M., Piero, F., Rita, C.: eSLDB: eukaryotic subcellular localization database. Nucl. Acids Res. 35(Database issue), 208–212 (2007)

    Google Scholar 

Download references

Acknowledgment

This work is supported in part by the National Natural Science Foundation of China under grant No. 31560317, No. 61502214, No. 61502166, No. 61702122 and No. 81560221. Natural Science Foundation of Yunnan Province of China (No. 2016FB107).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dai, W., Li, X., Peng, W., Song, J., Zhong, J., Wang, J. (2019). Improving Identification of Essential Proteins by a Novel Ensemble Method. In: Cai, Z., Skums, P., Li, M. (eds) Bioinformatics Research and Applications. ISBRA 2019. Lecture Notes in Computer Science(), vol 11490. Springer, Cham. https://doi.org/10.1007/978-3-030-20242-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20242-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20241-5

  • Online ISBN: 978-3-030-20242-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics