Advertisement

A MapReduce-Based Parallel Clustering Algorithm for Large Protein-Protein Interaction Networks

  • Li Liu
  • Dangping Fan
  • Ming Liu
  • Guandong Xu
  • Shiping Chen
  • Yuan Zhou
  • Xiwei Chen
  • Qianru Wang
  • Yufeng Wei
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7713)

Abstract

Clustering proteins or identifying functionally related proteins in Protein-Protein Interaction (PPI) networks is one of the most computation-intensive problems in the proteomic community. Most researches focused on improving the accuracy of the clustering algorithms. However, the high computation cost of these clustering algorithms, such as Girvan and Newmans clustering algorithm, has been an obstacle to their use on large-scale PPI networks. In this paper, we propose an algorithm, called Clustering-MR, to address the problem. Our solution can effectively parallelize the Girvan and Newmans clustering algorithms based on edge-betweeness using MapReduce. We evaluated the performance of our Clustering-MR algorithm in a cloud environment with different sizes of testing datasets and different numbers of worker nodes. The experimental results show that our Clustering-MR algorithm can achieve high performance for large-scale PPI networks with more than 1000 proteins or 5000 interactions.

Keywords

PPI Clustering MapReduce Edge-betweenness 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Maslov, S., Sneppen, K.: Specificity and stability in topology of protein networks. Science 296(5569), 910–913 (2002)CrossRefGoogle Scholar
  2. 2.
    Baraba’si, A., Oltvai, Z.N.: Network Biology: Understanding the Cell’s Functional Organization. Nature Reviews Genetics 5, 101–113 (2004)CrossRefGoogle Scholar
  3. 3.
    Satuluri, V., Parthasarathy, S.: Scalable Graph Clustering Using Stochastic Flows: Applications to Community Discovery. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, Paris, France, pp. 737–745 (2009)Google Scholar
  4. 4.
    Hwang, W., Cho, Y., Zhang, A., Ramanathan, M.: CASCADE: a novel quasi all paths-based network analysis algorithm for clustering biological interactions. BMC Bioinformatics 9(64) (2008)Google Scholar
  5. 5.
    Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. PNAS 99(12), 7821–7826 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Dunn, R., Dudbridge, F., Sanderson, C.M.: The Use of Edge-Betweenness Clustering to Investigate Biological Function in Protein Interaction Networks. BMC Bioinformatics 6(39) (2005)Google Scholar
  7. 7.
    Bader, D.A., Madduri, K.: Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks. In: International Conference on Parallel Processing (ICPP 2006), pp. 539–550 (2006)Google Scholar
  8. 8.
    Madduri, K., Ediger, D., Jiang, K., Bader, D.A., Chavarria-Miranda, D.: A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2009), pp. 1–8 (2009)Google Scholar
  9. 9.
    Tan, G., Tu, D., Sun, N.: A Parallel Algorithm for Computing Betweenness Centrality. In: International Conference on Parallel Processing (ICPP 2009), pp. 340–347 (2009)Google Scholar
  10. 10.
    Maier, M., Rattigan, M., Jensen, D.: Indexing network structure with shortest-path tree. ACM Transactions on Knowledge Discovery from Data 5(3) (2011)Google Scholar
  11. 11.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Li Liu
    • 1
  • Dangping Fan
    • 1
  • Ming Liu
    • 2
  • Guandong Xu
    • 3
  • Shiping Chen
    • 2
    • 4
  • Yuan Zhou
    • 1
  • Xiwei Chen
    • 1
  • Qianru Wang
    • 1
  • Yufeng Wei
    • 5
  1. 1.School of Information Science and EngineeringLanzhou UniversityGansuP.R.China
  2. 2.School of Electrical and Information EngineeringThe University of SydneyAustralia
  3. 3.Advanced Analytics InstituteUniversity of Technology SydneyAustralia
  4. 4.CSIRO ICT CentreAustralia
  5. 5.The Third Peoples Hospital of LanzhouGansuP.R. China

Personalised recommendations