Detecting protein complexes from DPINs by density based clustering with Pigeon-Inspired Optimization Algorithm

Abstract

Detecting protein complexes is crucial to understand principles of cellular organization. Plenty evidences have indicated that sub-graphs with high density in protein-protein interaction (PPI) network, especially dynamic PPI network (DPIN), usually correspond to protein complexes. As a well-known density-based clustering algorithm, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) has been used in many areas due to its simplicity and the ability to detect clusters of different sizes and shapes. However, one of its limitations is that the performance of DBSCAN depends on two specified parameters ε and MinPts, where ε represents the maximum radius of a neighborhood from an observing point while MinPts means the minimum number of data points contained in such a neighborhood. In this article, we develop a new method named as P-DBSCAN to detect protein complexes in DPIN by using Pigeon-Inspired Optimization (PIO) Algorithm to optimize the parameters ε and MinPts in DBSCAN. The experiments on DIP and MIPS datasets show that P-DBSCAN outperforms the state-of-the-art methods for protein complex detection in terms of several criteria such as precision, recall and f-measure.

This is a preview of subscription content, access via your institution.

References

  1. 1

    Uetz P, Giot L, Cagney G, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature, 2000, 403: 623–627

    Article  Google Scholar 

  2. 2

    Zhu H, Bilgin M, Bangham R, et al. Global analysis of protein activities using proteome chips. Science, 2001, 293: 2101–2105

    Article  Google Scholar 

  3. 3

    Xenarios I, Salwnski L, Duan X J, et al. DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucl Acids Res, 2002, 30: 303–305

    Article  Google Scholar 

  4. 4

    Güldener U, Münsterkötter M, Kastenmller G, et al. CYGD: the comprehensive yeast genome database. Nucl Acids Res, 2005, 33: 364–368

    Article  Google Scholar 

  5. 5

    Cherry J M. SGD: Saccharomyces Genome Database. Nucl Acids Res, 1998, 26: 73–79

    Article  Google Scholar 

  6. 6

    Montanez G, Cho Y R. Predicting false positives of protein-protein interaction data by semantic similarity measures. Curr Bioinform, 2013, 8: 339–346

    Article  Google Scholar 

  7. 7

    Li M, Zheng R, Zhang H, et al. Effective identification of essential proteins based on priori knowledge, network topology and gene expressions. Methods, 2014, 67: 325–333

    Article  Google Scholar 

  8. 8

    Watts D J, Strogatz S H. Collective dynamics of ‘small-world’ networks. Nature, 1998, 393: 440–442

    Article  Google Scholar 

  9. 9

    Antonio S, Paul O M. Small-world network approach to identify key residues in protein-protein interaction. Proteins, 2005, 58: 672–682

    Google Scholar 

  10. 10

    Rives A W, Galitski T. Modular organization of cellular networks. Proc Nat Acad Sci USA, 2003, 100: 1128–1133

    Article  Google Scholar 

  11. 11

    Palla G, Dernyi I, Farkas I J, et al. Uncoverring the overlapping community structure of complex networks in nature and society. Nature, 2005, 435: 814–818

    Article  Google Scholar 

  12. 12

    Adamcsek B, Palla G, Farkas I, et al. CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics, 2006, 22: 1021–1023

    Article  Google Scholar 

  13. 13

    Altaf-Ul-Amin M, Shinbo Y, Mihara K, et al. Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinform, 2006, 7: 207–228

    Article  Google Scholar 

  14. 14

    Li M, Chen J, Wang J, et al. Modifying the DPClus algorithm for identifying protein complexes based on new topological structures. BMC Bioinform, 2008, 9: 398–413

    Article  Google Scholar 

  15. 15

    Peng J, Mona S. SPICi: a fast clustering algorithm for large biological networks. Bioinformatics, 2010, 26: 1105–1111

    Article  Google Scholar 

  16. 16

    Liu G, Wong L, Chua H N. Complex discovery from weighted PPI networks. Bioinformatics, 2009, 25: 1891–1897

    Article  Google Scholar 

  17. 17

    Leung H C M, Xiang Q, Yiu S M, et al. Predicting protein complexes from PPI data: a core-attachment approach. J Comput Biol, 2009, 16: 133–144

    MathSciNet  Article  Google Scholar 

  18. 18

    Wang J X, Liu B B, Li M, et al. Identifying protein complexes from interaction networks based on clique percolation and distance restriction. BMC Genom, 2010, 11: S10–S24

    Google Scholar 

  19. 19

    Ester M, Kriegel H P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, 1996. 226–231

    Google Scholar 

  20. 20

    Duan H B, Qiao P X. Pigeon-inspired optimization: a new swarm intelligence optimizer for air robot path planning. Int J Intell Comput Cybern, 2014, 7: 24–37

    MathSciNet  Article  Google Scholar 

  21. 21

    Lei X J, Wu S, Ge L, et al. Clustering and overlapping modules detection in PPI network based on IBFO. Proteomics, 2013, 13: 278–290

    Article  Google Scholar 

  22. 22

    Lei X J, Tian J F, Ge L, et al. The clustering model and algorithm of PPI network based on propagating mechanism of artificial bee colony. Inform Sci, 2013, 247: 21–39

    MathSciNet  Article  Google Scholar 

  23. 23

    Lv Q, Wu H J, Wu J Z, et al. A parallel ant colonies approach to de novo prediction of protein backbone in CASP8/9. Sci China Inf Sci, 2013, 56: 108103

    MathSciNet  Google Scholar 

  24. 24

    Lei X J, Wang F, Wu F X, et al. Protein complex identification through Markov clustering with firefly algorithm on dynamic proteinCprotein interaction networks. Inf Sci, 2016, 329: 303–316

    Article  Google Scholar 

  25. 25

    Lei X J, Ying C, Wu F X, et al. Clustering PPI data by combining FA and SHC method. BMC Genom, 2015, 16: S3–S12

    Article  Google Scholar 

  26. 26

    Zhao J, Zhou R. Pigeon-inspired optimization applied to constrained gliding trajectories. Nonlinear Dyn, 2015, 82: 1781–1795

    MathSciNet  Article  Google Scholar 

  27. 27

    Li C, Duan H B. Target detection approach for UAVs via improved Pigeon-inspired Optimization and Edge Potential Function. Aerosp Sci Technol, 2014, 39: 352–360

    Article  Google Scholar 

  28. 28

    Sun H, Duan H B. PID controller design based on Prey-Predator Pigeon-Inspired Optimization algorithm. In: Proceedings of the International Conference on Mechatronics and Automation, Tianjin, 2014. 1416–1421

    Google Scholar 

  29. 29

    Wang J X, Li M, Chen J, et al. A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. IEEE/ACM Trans Comput Biol Bioinform. 2011, 8: 607–620

    Article  Google Scholar 

  30. 30

    van Dongen S. Graph clustering by flow simulation. Dissertation for Doctoral Degree. Center for Math and Computer Science (CWI), University of Utrecht. 2000

    Google Scholar 

  31. 31

    King A D, Przulj N, Jurisica I. Protein complex prediction via cost-based clustering. Bioinformatics, 2004, 20: 3013–3020

    Article  Google Scholar 

  32. 32

    Zhang A D. Protein interaction networks. New York: Cambridge University Press, 2009

    Google Scholar 

  33. 33

    Radicchi F, Castellano C, Cecconi F, et al. Defining and identifying communities in networks. Proc Nat Acad Sci USA, 2004, 101: 2658–2663

    Article  Google Scholar 

  34. 34

    Washburn M P, Wolters D, Yates J R. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol, 2001, 19: 242–247

    Article  Google Scholar 

  35. 35

    Cho Y R, Hwang H, Ramanathan M, et al. Semantic integration to identify overlapping functional modules in protein interaction networks. BMC Bioinform, 2007, 8: 265–277

    Article  Google Scholar 

  36. 36

    Wang J X, Peng X Q, Li M, et al. Construction and application of dynamic protein interaction network based on time course gene expression data. Proteomics, 2013, 13: 301–312

    Article  Google Scholar 

  37. 37

    Tu B P, Kudlicki A, Rowicka M, et al. Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science, 2005, 310: 1152–1158

    Article  Google Scholar 

  38. 38

    Pu S, Wong J, Turner B, et al. Up-to-date catalogues of yeast protein complexes. Nucl Acids Res 2009, 37: 825–831

    Article  Google Scholar 

  39. 39

    Mewes H W, Amid C, Arnold R, et al. MIPS: analysis and annotation of proteins from whole genomes. Nucl Acids Res, 2004, 32: 41–44

    Article  Google Scholar 

  40. 40

    Tang Y, Li M, Wang J X. CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems, 2015, 127: 67–72

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Fang-Xiang Wu.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lei, X., Ding, Y. & Wu, FX. Detecting protein complexes from DPINs by density based clustering with Pigeon-Inspired Optimization Algorithm. Sci. China Inf. Sci. 59, 070103 (2016). https://doi.org/10.1007/s11432-016-5578-9

Download citation

Keywords

  • dynamic protein-protein interaction network (DPIN)
  • pigeon-inspired optimization (PIO)
  • protein complex
  • density based clustering
  • gene expression