Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Identifying essential proteins based on dynamic protein-protein interaction networks and RNA-Seq datasets

Abstract

The identification of essential proteins is not only important for understanding organism structure on the molecular level, but also beneficial to drug-target detection and genetic disease prevention. Traditional methods often employ various centrality indices of static protein-protein interaction (PPI) networks and/or gene expression profiles to predict essential proteins. However, the prediction accuracy of most methods still has room to be further improved. In this study, we propose a strategy to increase the prediction accuracy of essential protein identification in three ways. Firstly, RNA-Seq datasets are employed to construct integrated dynamic PPI networks. Using a RNA-Seq dataset is expected to give more accurate predictions than using microarray gene expression profiles. Secondly, a novel integrated dynamic PPI network is constructed by considering both the co-expression pattern and the co-expression level of the RNA-Seq data. Thirdly, a novel two-step strategy is proposed to identify essential proteins from two known centrality indices. Numerical experiments have shown that the proposed strategy can increase the prediction accuracy dramatically, which can be generalized to many existing methods and centrality indices.

This is a preview of subscription content, log in to check access.

References

  1. 1

    Giaever G, Chu A M, Ni L, et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature, 2002, 418: 387–391

  2. 2

    Cullen L M, Arndt G M. Genome-wide screening for gene function using RNAi in mammalian cells. Immun Cell Biol, 2005, 83: 217–223

  3. 3

    Wang J X, Peng W, Wu F X. Computational approaches to predicting essential proteins: a survey. Proteom-Clin Appl, 2013, 7: 181–192

  4. 4

    Gerdes S Y, Scholle M D, Campbell J W, et al. Experimental determination and system level analysis of essential genes in Escherichia coli MG1655. J Bacteriol, 2003, 185: 5673–5684

  5. 5

    Batada N N, Hurst L D, Tyers M. Evolutionary and physiological importance of hub proteins. PLoS Comput Biol, 2006 2: e88

  6. 6

    Hahn M W, Kern A D. Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol, 2005, 22: 803–806

  7. 7

    Yu H, Greenbaum D, Lu H X, et al. Genomic analysis of essentiality within protein networks. Trends Genet, 2004, 20: 227–231

  8. 8

    Estrada E. Virtual identification of essential proteins within the protein interaction network of yeast. Proteomics, 2006, 6: 35–40

  9. 9

    Li M, Lu Y, Wang J X, et al. A topology potential-based method for identifying essential proteins from PPI networks. IEEE/ACM Trans Comput Biol Bioinform, 2015, 12: 372–383

  10. 10

    Ren J, Wang J X, Li M, et al. Discovering essential proteins based on PPI network and protein complex. Int J Data Min Bioinform, 2015, 12: 24–43

  11. 11

    Li M, Zheng R Q, Zhang H H, et al. Effective identification of essential proteins based on priori knowledge, network topology and gene expressions. Methods, 2014, 67: 325–333

  12. 12

    Tang Y, Li M, Wang J X, et al. CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems, 2015, 127: 67–72

  13. 13

    Wasserman S, Faust K. Social Network Analysis: Methods and Applications. Cambridge: Cambridge University Press, 1994

  14. 14

    Freeman L C. Centrality in social networks conceptual clarification. Soc Netw, 1979, 1: 215–239

  15. 15

    Zotenko E, Mestre J, O’leary D P, et al. Why do hubs in the yeast protein interaction network tend to be essential: reexamining the connection between the network topology and essentiality. PLoS Comput Biol, 2008, 4: e1000140

  16. 16

    Jeong H, Mason S P, Barabási A L, et al. Lethality and centrality in protein networks. Nature, 2001, 411: 41–42

  17. 17

    Bonacich P. Power and centrality: a family of measures. Amer J Sociol, 1987, 92: 1170–1182

  18. 18

    Li M, Wang J X, Chen X, et al. A local average connectivity-based method for identifying essential proteins from the network level. Comput Biol Chem, 2011, 35: 143–150

  19. 19

    Estrada E, Rodriguez-Velazquez J A. Subgraph centrality in complex networks. Phys Rev E, 2005, 71: 056103

  20. 20

    Wang J X, Peng X Q, Peng W, et al. Dynamic protein interaction network construction and applications. Proteomics, 2014, 14: 338–352

  21. 21

    Xiao Q H, Wang J X, Peng X Q, et al. Identifying essential proteins from active PPI networks constructed with dynamic gene expression. BMC Genomics, 2015, 16: S1

  22. 22

    Tang X W, Wang J X, Liu B B, et al. A comparison of the functional modules identified from time course and static PPI network data. BMC Bioinform, 2011, 12: 339

  23. 23

    Jin R M, Mccallen S, Liu C C, et al. Identifying dynamic network modules with temporal and spatial constraints. In: Proceedings of Pacific Symposium on Biocomputing, Big Island of Hawaii, 2009. 203–214

  24. 24

    Luo J W, Kuang L. A new method for predicting essential proteins based on dynamic network topology and complex information. Computl Biol Chem, 2014, 52: 34–42

  25. 25

    Chen B L, Fan W W, Liu J, et al. Identifying protein complexes and functional modules from static PPI networks to dynamic PPI networks. Brief Bioinform, 2014, 15: 177–194

  26. 26

    Oh S, Song S, Grabowski G, et al. Time series expression analyses using RNA-Seq: a statistical approach. BioMed Res Int, 2013, 203681

  27. 27

    Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol, 2005, 4: 17

  28. 28

    Langmead B, Salzberg S L. Fast gapped-read alignment with Bowtie 2. Nat Methods, 2012, 9: 357–359

  29. 29

    Ferragina P, Manzini G. Opportunistic data structures with applications. In: Proceedings of IEEE 41st Annual Symposium on Foundations of Computer Science, Redondo Beach, 2000. 390–398

  30. 30

    Trapnell C, Pachter L, Salzberg S L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 2009, 25: 1105–1111

  31. 31

    Trapnell C, Roberts A, Goff L, et al. Differential gene and transcript expression analysis of RNA-Seq experiments with TopHat and Cufflinks. Nat Protoc, 2012, 7: 562–578

  32. 32

    Wang J X, Li M, Wang H, et al. Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans Comput Biol Bioinform, 2012, 9: 1070–1080

  33. 33

    Liu G M, Wong L, Chua H N. Complex discovery from weighted PPI networks. Bioinformatics, 2009, 25: 1891–1897

  34. 34

    Lage K, Karlberg E O, Størling Z M, et al. A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat Biotechnol, 2007, 25: 309–316

  35. 35

    Chen Y X, Wang W H, Zhou Y Y, et al. In silico gene prioritization by integrating multiple data sources. PLoS ONE, 2011, 6: e21137

  36. 36

    Stocchetto S, Marin O, Carignani G, et al. Biochemical evidence that Saccharomyces cerevisiae YGR262c gene, required for normal growth, encodes a novel Ser/Thr-specific protein kinase. FEBS Lett, 1997, 414: 171–175

  37. 37

    Jaquet L, Jauniaux J C. Disruption and basic functional analysis of five chromosome X novel ORFs of Saccharomyces cerevisiae reveals YJL125c as an essential gene for vegetative growth. Yeast, 1999, 15: 51–61

  38. 38

    Huang M E, Cadieu E, Souciet J L, et al. Disruption of six novel yeast genes reveals three genes essential for vegetative growth and one required for growth at low temperature. Yeast, 1997, 13: 1181–1194

Download references

Author information

Correspondence to Bolin Chen.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shang, X., Wang, Y. & Chen, B. Identifying essential proteins based on dynamic protein-protein interaction networks and RNA-Seq datasets. Sci. China Inf. Sci. 59, 070106 (2016). https://doi.org/10.1007/s11432-016-5583-z

Download citation

Keywords

  • essential protein
  • dynamic protein network
  • RNA-Seq data
  • gene co-expression pattern
  • M2 measure