Skip to main content
Log in

Missing link prediction using path and community information

  • Regular Paper
  • Published:
Computing Aims and scope Submit manuscript

Abstract

Due to the evolving nature of complex networks, link prediction plays a crucial role in exploring likelihood of potential relationships among nodes. There exist a great number of strategies to apply the similarity-based metrics for estimating proximity of nodes in complex networks. In this paper, we propose three new variants – CCPAL3, LPCPA, and GPCPA – for the well-known Common Neighbor and Centrality-based Parameterized Algorithm (CCPA) taking into account 3-hop path, quasi-local path, and global path, respectively. In addition, four novel link prediction strategies based on community detection information, CCPA_CD, CCPAL3_CD, LPCPA_CD and GPCPA_CD, are proposed. Meanwhile, the Jaccard index is extended to three new metrics, i.e., Jaccard_L3, Jaccard_QuasiLoc and Jaccard_Global. Extensive experiments are conducted on thirteen real-world networks. The experimental results indicate that the proposed metrics improve the prediction accuracy measured by AUC and are more competitive on Precision compared to the state-of-the-art link prediction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Guo L, Zhang B (2019) Mining structural influence to analyze relationships in social network. Phys A 523:301–309

    Article  Google Scholar 

  2. Hadas Y, Gnecco G, Sanguineti M (2017) An approach to transportation network analysis via transferable utility games, Transportation research. Part B Methodol 105:120–143

    Article  Google Scholar 

  3. Gao J, Barzel B, Barabási A-L (2016) Universal resilience patterns in complex networks. Nature 530:307–312

    Article  Google Scholar 

  4. Sumathipala M, Weiss ST (2020) Predicting mirna-based disease-disease relationships through network difusion on multi-omics biological data. Sci Rep 10:8705

    Article  Google Scholar 

  5. Wang Y-B, You Z-H, Li X, Jiang T-H, Chen X, Zhou X, Wang L (2017) Predicting protein-protein interactions from protein sequences by a stacked sparse autoencoder deep neural network. Mol Bio Syst 13(7):1336–1344

    Google Scholar 

  6. Zhu J, Hong J, Hughes JG (2002) Using markov models for web site link prediction. Proc thirteen ACM conf hypertext hypermedia hypertext 02:11–15

    Google Scholar 

  7. Liu S, Dong Z, Ding C, Wang T, Zhang Y (2020) Do you need cobalt ore? Estim potential trade relat through link predict resour policy 66:101632

    Google Scholar 

  8. Lü L, Medo M, Yeung CH, Zhang Y-C, Zhang Z-K, Zhou T (2012) Recommender systems. Phys Rep 519(1):1–49

    Article  Google Scholar 

  9. Aziz F, Gul H, Uddin I, Gkoutos GV (2020) Path-based extensions of local link prediction methods for complex networks. Sci Rep 10:19848

    Article  Google Scholar 

  10. Das S, Das SK (2017) A probabilistic link prediction model in time-varying social networks. IEEE Int Conf Commun ICC 23:1–6

    Google Scholar 

  11. Pan L, Zhou T, Lü L, Hu C-K (2016) Predicting missing links and identifying spurious links via likelihood analysis. Sci Rep 6:22955

    Article  Google Scholar 

  12. Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks In: Proceedings of the 22nd international conference on knowledge discovery and data mining, pp 855-864

  13. Chen G-F, Xu C, Wang J-Y, Feng J-W, Feng J-Q (2019) Graph regularization weighted nonnegative matrix factorization for link prediction in weighted complex network. Neuro comput 369:50–60

    Google Scholar 

  14. Chen BL, Chen L, Li B (2016) A fast algorithm for predicting links to nodes of interest. Inf Sci 329:552–567

    Article  Google Scholar 

  15. Newman MEJ (2001) Clustering and preferential attachment in growing networks. Phys Rev E 64:025102

    Article  Google Scholar 

  16. Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43

    Article  Google Scholar 

  17. Lü L, Jin C-H, Zhou T (2009) Similarity index based on local paths for link prediction of complex networks. Phys Rev E 80:046122

    Article  Google Scholar 

  18. Pech R, Hao D, Lee Y-L, Yuan Y, Zhou T (2019) Link prediction via linear optimization. Phys A 528:121319

    Article  MathSciNet  Google Scholar 

  19. Zhou T, Lee Y-L, Wang G-N (2021) Experimental analyses on 2-hop-based and 3-hop-based link prediction algorithms. Phys A 564:125532

    Article  Google Scholar 

  20. Ahmad I, Akhtar MU, Noor S, Shahnaz A (2020) Missing link prediction using common neighbor and centrality based parameterized algorithm. Sci Rep 10:364

    Article  Google Scholar 

  21. Lü L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A 390:1150–1170

    Article  Google Scholar 

  22. Kumar A, Singh SS, Singh K, Biswas B (2020) Link prediction techniques, applications, and performance: a survey. Phys A 553:124289

    Article  MathSciNet  Google Scholar 

  23. Jaccard P (1901) Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bull Soc Vaud Sci Nat 37(140):241–272

    Google Scholar 

  24. Adamic LA, Adar E (2003) Friends and neighbors on the web. Soc Netw 25(3):211–230

    Article  Google Scholar 

  25. Zhou T, Lü L, Zhang Y-C (2009) Predicting missing links via local information. Eur Phys J B 71(4):623–630

    Article  Google Scholar 

  26. Barabási AL, Jeong H, Néda Z, Ravasz E, Schubert A, Vicsek T (2002) Evolution of the social network of scientific collaborations. Phys A 311:590–614

    Article  MathSciNet  Google Scholar 

  27. Leicht EA, Holme P, Newman MEJ (2006) Vertex similarity in networks. Phys Rev E 73:026120

    Article  Google Scholar 

  28. Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw Hill Inc, New York, NY, USA

    Google Scholar 

  29. Sørensen T (1948) A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons. K Dan Vidensk Selsk 5(4):1–34

    Google Scholar 

  30. Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási A-L (2002) Hierarchical organization of modularity in metabolic networks. Science 297(5586):1551–1555

    Article  Google Scholar 

  31. Wu Z, Lin Y, Wan H, Jamil W (2016) Predicting top-L missing links with node and link clustering information in large-scale networks. J Stat Mech Theory Exp 8:083202

    Article  Google Scholar 

  32. Wu Z, Lin Y, Wang J, Gregory S (2016) Link prediction with node clustering coefficient. Phys A 452:1–8

    Article  Google Scholar 

  33. Cannistraci BV, Alanis-Lobato G, Ravasi T (2013) From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks. Sci Rep 3:1613

    Article  Google Scholar 

  34. Rafiee S, Salavati C, Abdollahpouri A (2020) CNDP: Link prediction based on common neighbors degree penalization. Phys A 539:122950

    Article  Google Scholar 

  35. Tong H, Faloutsos C, Pan J (2006) Fast random walk with restart and its applications, in: 6th International Conference on Data Mining (ICDM). IEEE Press, Washington, pp 613-622

  36. Leicht EA, Holme P, Newman MEJ (2006) Vertex similarity in networks. Phys Rev E 73:026120

    Article  Google Scholar 

  37. D. Liben-Nowell, J. Kleinberg, The link prediction problem for social networks In: Proceedings of the twelfth International conference on information and knowledge management (CIKM). ACM Press, New York (2003) 556-559

  38. Fouss F, Pirotte A, Renders J-M, Saerens M (2007) Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans Knowl Data Eng 19(3):355–369

    Article  Google Scholar 

  39. G. Jeh, J. Widom (2002) Simrank a measure of structural-context similarity, in: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM Press, New York 538-543

  40. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117

    Article  Google Scholar 

  41. Liu W, Lü L (2010) Link prediction based on local random walk. Europhys Lett 89(5):58007

    Article  Google Scholar 

  42. Kumar A, Mishra S, Singh SS, Singh K, Biswas B (2020) Link prediction in complex networks based on significance of higher-order path index (SHOPI). Phys A 545:123790

    Article  Google Scholar 

  43. Aziz F, Gul H, Muhammad I, Uddin I (2020) Link prediction using node information on local paths. Phys A 557:124980

    Article  Google Scholar 

  44. Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inf Syst 22(1):5–53

    Article  Google Scholar 

  45. Luxburg UV (2004) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  46. Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70:066111

    Article  Google Scholar 

  47. Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473

    Article  Google Scholar 

  48. Lusseau D, Schneider K, Boisseau OJ et al (2003) The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav Ecol Sociobiol 54(4):396–405

    Article  Google Scholar 

  49. DE Knuth (1993) The stanford GraphBase: A platform for combinatorial algorithms, in: The fourth annual ACM-SIAM symposium on discrete algorithms (SODA), society for industrial and applied mathematics. Philadelphia, PA, pp 41–43

  50. Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577–8582

    Article  Google Scholar 

  51. Newman MEJ (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74:036104

    Article  MathSciNet  Google Scholar 

  52. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99:7821–7826

    Article  MathSciNet  Google Scholar 

  53. Coleman J, Katz E, Menzel H (1957) The diffusion of an innovation among physicians. Sociometry 20(4):253–270

    Article  Google Scholar 

  54. Gleiser PM, Danon L (2003) Community structure in jazz. Adv Complex Syst 6(4):565–573

    Article  Google Scholar 

  55. White JG, Southgate E, Thomson JN, Brenner S (1986) The structure of the nervous system of the nematode caenorhabditis elegans. Philos Trans R Soc Lond 314(1165):1–340

    Google Scholar 

  56. Batagelj V, Mrvar A (2014) Pajek. Springer, New York, pp 1245–1256

    Google Scholar 

  57. Newman MEJ (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74:036104

    Article  MathSciNet  Google Scholar 

  58. Guimera R, Danon L, Diaz-Guilera A, Giralt F, Arenas A (2003) Self-similar community structure in a network of human interactions. Phys Rev E 68:065103

    Article  Google Scholar 

  59. V. Batagelj, A. Mrvar, Pajek datasets, http://vlado.fmf.uni-lj.si/pub/networks/data/

Download references

Acknowledgements

This work was partly supported by the National Natural Science Foundation of China (Nos. 61977016 and 61572010), Natural Science Foundation of Fujian Province (Nos. 2020J01164, 2017J01738) and Education and Scientific Research Project for Young and Middle-aged Teachers of Fujian Province (No. JAT191119). This work was also partly supported by China Scholarship Council (CSC No. 202108350054).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuming Zhou.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, M., Zhou, S., Wang, D. et al. Missing link prediction using path and community information. Computing 106, 521–555 (2024). https://doi.org/10.1007/s00607-023-01229-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-023-01229-y

Keywords

Mathematics Subject Classification

Navigation