A Survey of Link Prediction in Social Networks

  • Mohammad Al Hasan
  • Mohammed J. Zaki


Link prediction is an important task for analying social networks which also has applications in other domains like, information retrieval, bioinformatics and e-commerce. There exist a variety of techniques for link prediction, ranging from feature-based classification and kernel-based method to matrix factorization and probabilistic graphical models. These methods differ from each other with respect to model complexity, prediction performance, scalability, and generalization ability. In this article, we survey some representative link prediction methods by categorizing them by the type of the models. We largely consider three types of models: first, the traditional (non-Bayesian) models which extract a set of features to train a binary classification model. Second, the probabilistic approaches which model the joint-probability among the entities in a network by Bayesian graphical models. And, finally the linear algebraic approach which computes the similarity between the nodes in a network by rank-reduced similarity matrices. We discuss various existing link prediction models that fall in these broad categories and analyze their strength and weakness. We conclude the survey with a discussion on recent developments and future research direction.


Link prediction network evolution model social network analysis probabilistic model local probabilistic model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Acar, Evrim, and Dunlavy, Daniel M., Kolda, Tamara G. (2009). Link Prediction on Evolving Data Using Matrix and Tensor Factorizations. In Pro ceedings of the Workshop on Large Scale Data Mining Theory and Applications. ICDM Workshops:262-269Google Scholar
  2. [2]
    Adamic, Lada A. and Adar, Eytan. (2003). Friends and neighbors on the web. Social Networks, 25(3):211-230.Google Scholar
  3. [3]
    Adafre, Sisay F., and Rijke, Maarten de. (2005). Discovering missing links in Wikipedia. LINK-KDD ’05: Proceedings of the Third International Workshop on Link Discovery.Google Scholar
  4. [4]
    Ahmed, Elmagarmid, and Ipeirotis, Panagiotis G., and Verykios, Vassilios. (2007) Duplicate Record Detection: A Survey. In IEEE Transactions on Knowledge and Data Engineering 19 (1):1âĂŞ16Google Scholar
  5. [5]
    Ahmed, Amr, and Xing, Eric P. (2009). Recovering time-varying network of dependencies in Social and biological studies. PNAS 106(29):11878-11883.CrossRefGoogle Scholar
  6. [6]
    Airodi, Edoardo M., and Blei, David M., and Xing, Eric P., and Fienberg, Stephen E. (2006). Mixed Membership stochastic block models for relational data, with applications to protein-protein interactions. Proceedings of Ineterational Biometric Society-ENAR Annual Meetings.Google Scholar
  7. [7]
    Barabasi, Albert-Laszlo, and Albert, Reka. (1999) Emergence of Scaling in Random Networks, Science, 286(5439):509.Google Scholar
  8. [8]
    Barabasi, Albert-Laszlo, and Jeong, H., and Neda, Z. and Ravasz, E. (2002) Evolution of the social network of scientific collaboration. Physics A, 311(3-4):590-614.MathSciNetzbMATHCrossRefGoogle Scholar
  9. [9]
    Basilico, J., and Hofmann, T. (2004) Unifying Collaborative and Contentbased filtering. In Proceedings of European Conference on Machine Learning.Google Scholar
  10. [10]
    Bilgic, Mustafa, and Namata, Galileo M., and Getoor, Lise. (2007). Combining collective classification and link prediction. In Proceedings of the Workshop on Mining Graphs and Complex Structures at ICDM Conference.Google Scholar
  11. [11]
    Brin, Sergey, and Page, Lawrence. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7):107-117.CrossRefGoogle Scholar
  12. [12]
    Chawla, Nitesh V, and Bowyer, Kevin W., and Hall, Lawrence O., and W. Kegelmeyer, Philip. (2002) SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(1):321-357.zbMATHGoogle Scholar
  13. [13]
    Chung, Fan, and Zhao, Wenbo, (2010). PageRank and random walks on graphs. Proceedings of the "Fete of Combinatorics" conference in honor of Lovasz.Google Scholar
  14. [14]
    Clause, Aaron, and Moore, Christopher, and Newman, M. E. J. (2008). Hierarchical structure and the prediction of missing links in network. Nature 453:98-101.CrossRefGoogle Scholar
  15. [15]
    Doppa, Janardhan R., and Yu, Jun, and Tadepalli, Prasad, and Getoor, Lise. (2009). Chance-Constrained Programs for Link Prediction. In Proceedings of Workshop on Analyzing Networks and Learning with Graphs at NIPS Conference.Google Scholar
  16. [16]
    Erketin, Syeda, and Huang, Jian, and Giles, Lee. (2007). Active learning for Class imbalance problem. In Proceedings of the 30th ACM SIGIR Conference.Google Scholar
  17. [17]
    Fabrikant, Alex, and Luthra, Ankur, and Maneva, Elitza, and Papadimitriou, Christos H., and Shenker, Scott. (2003). On a Network Creation Game. In Proc. of the twenty-second annual symposium on principles of distributed computing, pp:347-351.Google Scholar
  18. [18]
    Freschi, Valerio. (2009). A Graph-based Semi-Supervised Algorithm for Protein Function Prediction from Interaction Maps. In Learning and Intelligent Optimization, Lecture Notes in Computer Science, 5851:249-258Google Scholar
  19. [19]
    Frieze, A, and Kannan, R., and Vempala, S. (1998) Fast monte-carlo algorithms for finding low-rank approximations. in Journal of the ACM (JACM), 51(6):1025âĂŞ1041.MathSciNetGoogle Scholar
  20. [20]
    Fu, Wenjie, and Song, Le, and Xing, Eric P. (2009) . In Proc. of the 26th International Conference on Machine Learning.Google Scholar
  21. [21]
    Getoor, Lise, and Friedman, Nir, and Koller, Dephne, and Taskar, Benjamin. (2002) Learning Probabilistic Models of Link structure. Journal of Machine Learning Research, 3:679-707.MathSciNetCrossRefGoogle Scholar
  22. [22]
    Hasan, Mohammad A., and Chaoji, Vineet, and Salem, Saeed and Zaki, Mohammed. (2006) Link Prediction using Supervised Learning. In Proceedings of SDM Workshop of Link Analysis, Counterterrorism and Security.Google Scholar
  23. [23]
    Heckerman, David, and Chickering, David M., and Meek, Christopher, and Rounthwaite, Robert, and Kadie, Carl M. (2000) Dependency Networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1:49-75.CrossRefGoogle Scholar
  24. [24]
    Heckerman, David, and Meek, Christopher, and Koller, Daphne. (2004) Probabilistic models for relational data. Technical Report, Microsoft.Google Scholar
  25. [25]
    Huang, Zan, and Li, Xin, and Chen Hsinchun. (2005) Link Prediction approach to collaborative filtering. Proceedings of the fifth ACM/IEEE Joint Conference on Digital Libraries.Google Scholar
  26. [26]
    Imrich, W., Klavzar, S. (2000). Product Graphs: Structure and Recognition. Wiley.Google Scholar
  27. [27]
    Jeh, Glen, and Widom, Jennifer. (2002) SimRank: A measure of structural-context similarity. In Proceedings of ACM SIGKDD International Conference of Knowledge Discovery and Data Mining.Google Scholar
  28. [28]
    Karakoulas, Grigoris, and Shawe-Taylor, John. (1999). Optimizing classifiers for imbalanced training sets. Proceedings of NIPS, 253-259.Google Scholar
  29. [29]
    Kashima, Hisashi, and Abe, Naoke. (2006) A Parameterized Probabilistic Model of Network Evolution for Supervised Link Prediction. ICDM ’06: Proceedings of the Sixth IEEE International Conference on Data Mining. 340-349.Google Scholar
  30. [30]
    Kashima, Hisashi, and Oyama, Satoshi, and Yamanishi, Yoshihiro, and Tsuda, Koji. (2009). On Pairwise Kernels: An Efficient Alternative and Generalization Analysis, Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp.1030-1037.Google Scholar
  31. [31]
    Katz, Leo. (1953) A new status index derived from sociometric analysis. Psychometrika, 18(1):39-43.zbMATHCrossRefGoogle Scholar
  32. [32]
    Kleinberg, Jon M. (2000). Navigation in a small world. Nature 406, (845).Google Scholar
  33. [33]
    Kunegis, Jerome, and Lommatzsch, Andreas. (2009) Learning Spectral Graph Transformations for Link Prediction. In Proceedings of the International Conference on Machine Learning, pp 561-568.Google Scholar
  34. [34]
    Leskovec, Jure, and Kleinberg, Jon M, and Faloutsos, Christos. (2005). Graphs over time:densification laws, shrinking diameters and possible explanations. KDD ’05: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining.Google Scholar
  35. [35]
    Li, Xin, Chen Hsinchun. (2009). Recommendation as link prediction: a graph kernel-based machine learning approach. Proceedings of the ninth ACM/IEEE Joint Conference on Digital Libraries.Google Scholar
  36. [36]
    Liben-Nowell, David, and Kleinberg, Jon. (2007). The Link Prediction Problem for Social Networks. Journal of the American Society for Information Science and Technology, 58(7):1019-1031.CrossRefGoogle Scholar
  37. [37]
    Liu, Yan and Kou, Zhenzhen. (2007). Predicting who rated what in largescale datasets. SIGKDD Exploration Newsletter, 9 (2).Google Scholar
  38. [38]
    Madadhai, J., and Hutchins, J., and Smyth, P. (2005). Prediction and Ranking algorithms for event-based Network Data. SIGKDD Explorations Newsletter, 7(2):23-30.CrossRefGoogle Scholar
  39. [39]
    Malin, Bradley, and Airoldi, Edoardo, and Carley, Kathlee M. (2005). A Network Analysis Model for Disambiguation of Names in Lists. In Journal of Computational and Mathematical Organization Theory, 11(2):119-139.zbMATHCrossRefGoogle Scholar
  40. [40]
    Nallapati, Ramesh, and Ahmed, Amr, and Xing, Eric P., and Cohen, William W. (2008). Joint Latent Topic Models for Text and Citations. In Proc. of The Fourteen ACMSIGKDDInternational Conference on Knowledge Discovery and Data Mining.Google Scholar
  41. [41]
    Newman, M. E. J. (2001). Clustering and Preferential attachment in growing networks. PHysical Review Letters E, 64(025102).Google Scholar
  42. [42]
    Niculescu-Mizil, and Alexandru, and Caruana, Rich. (2005). Predicting Good Probabilities with Supervised Learning. International Conference on Machine Learning.Google Scholar
  43. [43]
    Oyama, Satoshi, and Manning, Christopher D., (2004). Using feature conjunctions across examples for learning pairwise classifiers, In The Proc. of European Conference on Machine Learning, pp. 323-333.Google Scholar
  44. [44]
    Pavlov, Dmitry, and Mannila, Heikki, and Smyth, Phadraic. (2009) Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data. University of California, Irvine Technical Report UCI-ICS-TR-01-09.Google Scholar
  45. [45]
    Pearl, Judea. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Fransisco.Google Scholar
  46. [46]
    Popescul, Alexandrin and Ungar, Lyle H. (2003). Statistical Relational Learning for Link Prediction. In Proceedings of Workshop on Learning Statistical Models from Relational Data at IJCAI Conference.Google Scholar
  47. [47]
    Popescul, Alexandrin and Ungar, Lyle H. (2003). Structural Logistic Regression for Link Analysis. In Proceedings of Workshop on Multi-Relational Data Mining at KDD Conference.Google Scholar
  48. [48]
    Provost, Foster, and Fawcell, Tom. (2001). Robust Classification for Imprecise Environments. Machine Learning, 42(3):203-231.zbMATHCrossRefGoogle Scholar
  49. [49]
    Rattigan, Matthew J., and Jensen, David. (2005). The case for anomalous link discovery. SIGKDD Explorations Newsletter, 7 (2):41-47.CrossRefGoogle Scholar
  50. [50]
    Sarukkai, Ramesh R. (2000). Link Prediction and Path Analysis using Markov Chain. WWW ’00: Proceedings of the Ninth World Wide Web Conference, 377-386.Google Scholar
  51. [51]
    Shawe-taylor, J., and Cristianini, Nelo. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press, NY.zbMATHGoogle Scholar
  52. [52]
    Song, Han H., and Cho Tae W., and Dave, Vacha, and Zhang, Yin, and Qiu, Lili. (2009). Scalable proximity Estimation and Link Prediction in Online Social Networks, IMC ’09: In Proceedings of the Internet Measurement Conference.Google Scholar
  53. [53]
    Tasker, Benjamin, and Wong, Ming F., and Abbeel, Pieter, and Koller, Daphne. (2003). Link Prediction in Relational Data. NIPS ’03: In Proceedings of Neural Information Processing Systems.Google Scholar
  54. [54]
    Tasker, Benjamin, and Abbeel, Pieter, and Koller, Daphne. (2002). Discriminative Probabilistic Models for Relational Data. In Proceedings of Uncertainty in Artificial Intelligence Conference.Google Scholar
  55. [55]
    Taskar, Benjamin, and Abbeel, Pieter, and Wong, M.-F, and Koller, Daphne (2007). Relational Markov Networks. In L. Getoor and B. Taskar, editors, Introduction to Statistical Relational Learning.Google Scholar
  56. [56]
    Tylenda, Tomasz, and Angelova, Ralitsa, and Bahadur, Srikanta. (2009). Towards time-aware link prediction in evolving social network. SNA-KDD ’09: Proceedings of the third Workshop on Social Network Mining and Analysis.Google Scholar
  57. [57]
    Campbell, Veropoulos, and Campbell, C.K., and Cristianini, N., Controlling the sensitivity of support vector machines. In: Dean, T. (Ed.), IJCAI: Proceedings of International Joint Conference on Artificial Intelligence. pp. 55-60.Google Scholar
  58. [58]
    Wang, Chao, and Satuluri, Venu, and Parthasarathy, Srinivasan. (2007). Local Probabilistic Models for Link Prediction. ICDM’07: In Proceedings of International Conference on Data Mining.Google Scholar
  59. [59]
    Watts, D, and Stogatz, S. (1998). Small world. Nature, 393:440-442.CrossRefGoogle Scholar
  60. [60]
    Weiss, Gary M. (2004) Mining with rarity: a unifying framework, In SIGKDD Explorations Newsletter, 6(1):7-19.CrossRefGoogle Scholar
  61. [61]
    Xu, Zhao, and Tresp, Volker, and Yu, Shipeng, and Yu, Kai. (2005). Nonparametric Relational Learning for Social Network Analysis. SNA-KDD ’08: In Proceedings of the Second Workshop on Social Network Mining and Analysis.Google Scholar
  62. [62]
    Xu, Zhao, and Tresp, Volker, and Yu, Kai and Kriegel, Hans-Peter. (2005). Dirichlet Enhanced Relational Learning. In Proceedings of International Conference on Machine Learning, pp 1004-1011.Google Scholar
  63. [63]
    Yang, Chan-Yun, and Yang, Jr-Syu, and Wang Jian-Jun. (2009). Margin Calibration in SVM class-imbalanced learning, Neurocomputing, 73(1-3):397-411.CrossRefGoogle Scholar
  64. [64]
    Yu, Kai, and Chu,Wei, and Yu, Shipeng, and Tresp, Volker, and Xu, Zhao. (2006). Stochastic relational models for discriminative link prediction. In Proceedings of NIPS, pp-1553-1560Google Scholar
  65. [65]
    Zhu, Jianhan, and Hong, Jun, and Hughes G. (2002). Using Markov models for web site link prediction. HYPERTEXT’02: Proceedings of the Thirteenth ACM Conference on Hypertext and Hypermedia.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Department of Computer and Information ScienceIndiana University- Purdue UniversityIndianapolisUSA
  2. 2.Department of Computer ScienceRensselaer Polytechnic InstituteTroyUSA

Personalised recommendations