Quantitative Biology

, Volume 6, Issue 4, pp 313–320 | Cite as

On the statistical significance of protein complex

  • Youfu Su
  • Can Zhao
  • Zheng Chen
  • Bo Tian
  • Zengyou He
Research Article



Statistical validation of predicted complexes is a fundamental issue in proteomics and bioinformatics. The target is to measure the statistical significance of each predicted complex in terms of p-values. Surprisingly, this issue has not received much attention in the literature. To our knowledge, only a few research efforts have been made towards this direction.


In this article, we propose a novel method for calculating the p-value of a predicted complex. The null hypothesis is that there is no difference between the number of edges in target protein complex and that in the random null model. In addition, we assume that a true protein complex must be a connected subgraph. Based on this null hypothesis, we present an algorithm to compute the p-value of a given predicted complex.


We test our method on five benchmark data sets to evaluate its effectiveness.


The experimental results show that our method is superior to the state-of-the-art algorithms on assessing the statistical significance of candidate protein complexes.


predicted complex statistical significance testing subgraph mining community detection 



This work was partially supported by the National Natural Science Foundation of China (No. 61572094), the Fundamental Research Funds for the Central Universities of China (Nos. DUT2017TB02 and DUT14QY07). Additionally, we want to thank the academic support received from Mr. Ben Teng and Dr. Xiuli Ma.


  1. 1.
    Uetz, P., Giot, L., Cagney, G., Mansfield, T. A., Judson, R. S., Knight, J. R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., et al. (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature, 403, 623–627CrossRefGoogle Scholar
  2. 2.
    Gavin, A.-C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L. J., Bastuck, S., Dümpelfeld, B., et al. (2006) Proteome survey reveals modularity of the yeast cell machinery. Nature, 440, 631–636CrossRefGoogle Scholar
  3. 3.
    Nepusz, T., Yu, H. and Paccanaro, A. (2012) Detecting overlapping protein complexes in protein-protein interaction networks. Nat. Methods, 9, 471–472CrossRefGoogle Scholar
  4. 4.
    Teng, B., Zhao, C., Liu, X. and He, Z. (2015) Network inference from AP-MS data: computational challenges and solutions. Brief. Bioinform., 16, 658–674CrossRefGoogle Scholar
  5. 5.
    Ma, X., Zhou, G., Shang, J., Wang, J., Peng, J. and Han, J. (2017) Detection of complexes in biological networks through diversified dense subgraph mining. J. Comput. Biol., 24, 923–941CrossRefGoogle Scholar
  6. 6.
    Chen, B., Fan, W., Liu, J. and Wu, F.-X. (2014) Identifying protein complexes and functional modules–from static PPI networks to dynamic PPI networks. Brief. Bioinform., 15, 177–194CrossRefGoogle Scholar
  7. 7.
    Ji, J., Zhang, A., Liu, C., Quan, X. and Liu, Z. (2014) Survey: functional module detection from protein-protein interaction networks. IEEE Trans. Knowl. Data Eng., 26, 261–277CrossRefGoogle Scholar
  8. 8.
    Li, X., Wu, M., Kwoh, C.-K. and Ng, S.-K. (2010) Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genomics, 11, S3CrossRefGoogle Scholar
  9. 9.
    Wang, J., Li, M., Deng, Y. and Pan, Y. (2010) Recent advances in clustering methods for protein interaction networks. BMC Genomics, 11, S10Google Scholar
  10. 10.
    Bhowmick, S. S. and Seah, B. S. (2016) Clustering and summarizing protein-protein interaction networks: a survey. IEEE Trans. Knowl. Data Eng., 28, 638–658CrossRefGoogle Scholar
  11. 11.
    Adamcsek, B., Palla, G., Farkas, I. J., Derényi, I. and Vicsek, T. (2006) CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics, 22, 1021–1023CrossRefGoogle Scholar
  12. 12.
    Palla, G., Derényi, I., Farkas, I. and Vicsek, T. (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435, 814–818CrossRefGoogle Scholar
  13. 13.
    Brohée, S. and van Helden, J. (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics, 7, 488CrossRefGoogle Scholar
  14. 14.
    Song, J. and Singh, M. (2009) How and when should interactomederived clusters be used to predict functional modules and protein function? Bioinformatics, 25, 3143–3150CrossRefGoogle Scholar
  15. 15.
    Traag, V. A., Krings, G. and Van Dooren, P. (2013) Significant scales in community structure. Sci. Rep., 3, 2930CrossRefGoogle Scholar
  16. 16.
    Koyutürk, M., Szpankowski, W. and Grama, A. (2007) Assessing significance of connectivity and conservation in protein interaction networks. J. Comput. Biol., 14, 747–764CrossRefGoogle Scholar
  17. 17.
    Lancichinetti, A., Radicchi, F., Ramasco, J. J. and Fortunato, S. (2011) Finding statistically significant communities in networks. PLoS One, 6, e18961CrossRefGoogle Scholar
  18. 18.
    Spirin, V. and Mirny, L. A. (2003) Protein complexes and functional modules in molecular networks. Proc. Natl. Acad. Sci. USA, 100, 12123–12128CrossRefGoogle Scholar
  19. 19.
    Chakraborty, T., Dalmia, A., Mukherjee, A. and Ganguly, N. (2017) Metrics for community analysis: A survey. ACM Comput. Surv., 50, 1–37CrossRefGoogle Scholar
  20. 20.
    Zhang, P. and Moore, C. (2014) Scalable detection of statistically significant communities and hierarchies, using message passing for modularity. Proc. Natl. Acad. Sci. USA, 111, 18144–18149CrossRefGoogle Scholar
  21. 21.
    Csardi, G. and Nepusz, T. (2006) The Igraph software package for complex network research. Inter Journal Complex Systems, 1695, 1–9Google Scholar
  22. 22.
    Nepusz, T., Yu, H. and Paccanaro, A. Clusterone cytoscape plugin.
  23. 23.
    Collins, S. R., Kemmeren, P., Zhao, X.-C., Greenblatt, J. F., Spencer, F., Holstege, F. C., Weissman, J. S. and Krogan, N. J. (2007) Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol. Cell. Proteomics, 6, 439–450CrossRefGoogle Scholar
  24. 24.
    Krogan, N. J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N., Tikuisis, A. P., et al. (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature, 440, 637–643CrossRefGoogle Scholar
  25. 25.
    Stark, C., Breitkreutz, B.-J., Reguly, T., Boucher, L., Breitkreutz, A. and Tyers, M. (2006) Biogrid: a general repository for interaction datasets. Nucleic Acids Res. 34, Suppl 1, D535–D539CrossRefGoogle Scholar
  26. 26.
  27. 27.
    Shor, P. W. (1995) A new proof of cayley’s formula for counting labeled trees. J. Com. Theory, 71, 154–158CrossRefGoogle Scholar
  28. 28.
    Marquardt, D.W. (1963) An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math., 11, 431–441CrossRefGoogle Scholar
  29. 29.
    Moré, J. (1977) The levenberg–marquardt algorithm: Implementation and theory. In Conference on Numerical Analysis. Dundee, UKGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Youfu Su
    • 1
  • Can Zhao
    • 1
  • Zheng Chen
    • 1
  • Bo Tian
    • 1
  • Zengyou He
    • 1
    • 2
  1. 1.School of SoftwareDalian University of TechnologyDalianChina
  2. 2.Key Laboratory for Ubiquitous Network and Service Software of LiaoningDalianChina

Personalised recommendations