Skip to main content

A Survey of Link Prediction in Social Networks

  • Chapter
  • First Online:
Social Network Data Analytics

Abstract

Link prediction is an important task for analying social networks which also has applications in other domains like, information retrieval, bioinformatics and e-commerce. There exist a variety of techniques for link prediction, ranging from feature-based classification and kernel-based method to matrix factorization and probabilistic graphical models. These methods differ from each other with respect to model complexity, prediction performance, scalability, and generalization ability. In this article, we survey some representative link prediction methods by categorizing them by the type of the models. We largely consider three types of models: first, the traditional (non-Bayesian) models which extract a set of features to train a binary classification model. Second, the probabilistic approaches which model the joint-probability among the entities in a network by Bayesian graphical models. And, finally the linear algebraic approach which computes the similarity between the nodes in a network by rank-reduced similarity matrices. We discuss various existing link prediction models that fall in these broad categories and analyze their strength and weakness. We conclude the survey with a discussion on recent developments and future research direction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Acar, Evrim, and Dunlavy, Daniel M., Kolda, Tamara G. (2009). Link Prediction on Evolving Data Using Matrix and Tensor Factorizations. In Pro ceedings of the Workshop on Large Scale Data Mining Theory and Applications. ICDM Workshops:262-269

    Google Scholar 

  2. Adamic, Lada A. and Adar, Eytan. (2003). Friends and neighbors on the web. Social Networks, 25(3):211-230.

    Google Scholar 

  3. Adafre, Sisay F., and Rijke, Maarten de. (2005). Discovering missing links in Wikipedia. LINK-KDD ’05: Proceedings of the Third International Workshop on Link Discovery.

    Google Scholar 

  4. Ahmed, Elmagarmid, and Ipeirotis, Panagiotis G., and Verykios, Vassilios. (2007) Duplicate Record Detection: A Survey. In IEEE Transactions on Knowledge and Data Engineering 19 (1):1âĂŞ16

    Google Scholar 

  5. Ahmed, Amr, and Xing, Eric P. (2009). Recovering time-varying network of dependencies in Social and biological studies. PNAS 106(29):11878-11883.

    Article  Google Scholar 

  6. Airodi, Edoardo M., and Blei, David M., and Xing, Eric P., and Fienberg, Stephen E. (2006). Mixed Membership stochastic block models for relational data, with applications to protein-protein interactions. Proceedings of Ineterational Biometric Society-ENAR Annual Meetings.

    Google Scholar 

  7. Barabasi, Albert-Laszlo, and Albert, Reka. (1999) Emergence of Scaling in Random Networks, Science, 286(5439):509.

    Google Scholar 

  8. Barabasi, Albert-Laszlo, and Jeong, H., and Neda, Z. and Ravasz, E. (2002) Evolution of the social network of scientific collaboration. Physics A, 311(3-4):590-614.

    Article  MathSciNet  MATH  Google Scholar 

  9. Basilico, J., and Hofmann, T. (2004) Unifying Collaborative and Contentbased filtering. In Proceedings of European Conference on Machine Learning.

    Google Scholar 

  10. Bilgic, Mustafa, and Namata, Galileo M., and Getoor, Lise. (2007). Combining collective classification and link prediction. In Proceedings of the Workshop on Mining Graphs and Complex Structures at ICDM Conference.

    Google Scholar 

  11. Brin, Sergey, and Page, Lawrence. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7):107-117.

    Article  Google Scholar 

  12. Chawla, Nitesh V, and Bowyer, Kevin W., and Hall, Lawrence O., and W. Kegelmeyer, Philip. (2002) SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(1):321-357.

    MATH  Google Scholar 

  13. Chung, Fan, and Zhao, Wenbo, (2010). PageRank and random walks on graphs. Proceedings of the "Fete of Combinatorics" conference in honor of Lovasz.

    Google Scholar 

  14. Clause, Aaron, and Moore, Christopher, and Newman, M. E. J. (2008). Hierarchical structure and the prediction of missing links in network. Nature 453:98-101.

    Article  Google Scholar 

  15. Doppa, Janardhan R., and Yu, Jun, and Tadepalli, Prasad, and Getoor, Lise. (2009). Chance-Constrained Programs for Link Prediction. In Proceedings of Workshop on Analyzing Networks and Learning with Graphs at NIPS Conference.

    Google Scholar 

  16. Erketin, Syeda, and Huang, Jian, and Giles, Lee. (2007). Active learning for Class imbalance problem. In Proceedings of the 30th ACM SIGIR Conference.

    Google Scholar 

  17. Fabrikant, Alex, and Luthra, Ankur, and Maneva, Elitza, and Papadimitriou, Christos H., and Shenker, Scott. (2003). On a Network Creation Game. In Proc. of the twenty-second annual symposium on principles of distributed computing, pp:347-351.

    Google Scholar 

  18. Freschi, Valerio. (2009). A Graph-based Semi-Supervised Algorithm for Protein Function Prediction from Interaction Maps. In Learning and Intelligent Optimization, Lecture Notes in Computer Science, 5851:249-258

    Google Scholar 

  19. Frieze, A, and Kannan, R., and Vempala, S. (1998) Fast monte-carlo algorithms for finding low-rank approximations. in Journal of the ACM (JACM), 51(6):1025âĂŞ1041.

    MathSciNet  Google Scholar 

  20. Fu, Wenjie, and Song, Le, and Xing, Eric P. (2009) . In Proc. of the 26th International Conference on Machine Learning.

    Google Scholar 

  21. Getoor, Lise, and Friedman, Nir, and Koller, Dephne, and Taskar, Benjamin. (2002) Learning Probabilistic Models of Link structure. Journal of Machine Learning Research, 3:679-707.

    Article  MathSciNet  Google Scholar 

  22. Hasan, Mohammad A., and Chaoji, Vineet, and Salem, Saeed and Zaki, Mohammed. (2006) Link Prediction using Supervised Learning. In Proceedings of SDM Workshop of Link Analysis, Counterterrorism and Security.

    Google Scholar 

  23. Heckerman, David, and Chickering, David M., and Meek, Christopher, and Rounthwaite, Robert, and Kadie, Carl M. (2000) Dependency Networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1:49-75.

    Article  Google Scholar 

  24. Heckerman, David, and Meek, Christopher, and Koller, Daphne. (2004) Probabilistic models for relational data. Technical Report, Microsoft.

    Google Scholar 

  25. Huang, Zan, and Li, Xin, and Chen Hsinchun. (2005) Link Prediction approach to collaborative filtering. Proceedings of the fifth ACM/IEEE Joint Conference on Digital Libraries.

    Google Scholar 

  26. Imrich, W., Klavzar, S. (2000). Product Graphs: Structure and Recognition. Wiley.

    Google Scholar 

  27. Jeh, Glen, and Widom, Jennifer. (2002) SimRank: A measure of structural-context similarity. In Proceedings of ACM SIGKDD International Conference of Knowledge Discovery and Data Mining.

    Google Scholar 

  28. Karakoulas, Grigoris, and Shawe-Taylor, John. (1999). Optimizing classifiers for imbalanced training sets. Proceedings of NIPS, 253-259.

    Google Scholar 

  29. Kashima, Hisashi, and Abe, Naoke. (2006) A Parameterized Probabilistic Model of Network Evolution for Supervised Link Prediction. ICDM ’06: Proceedings of the Sixth IEEE International Conference on Data Mining. 340-349.

    Google Scholar 

  30. Kashima, Hisashi, and Oyama, Satoshi, and Yamanishi, Yoshihiro, and Tsuda, Koji. (2009). On Pairwise Kernels: An Efficient Alternative and Generalization Analysis, Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp.1030-1037.

    Google Scholar 

  31. Katz, Leo. (1953) A new status index derived from sociometric analysis. Psychometrika, 18(1):39-43.

    Article  MATH  Google Scholar 

  32. Kleinberg, Jon M. (2000). Navigation in a small world. Nature 406, (845).

    Google Scholar 

  33. Kunegis, Jerome, and Lommatzsch, Andreas. (2009) Learning Spectral Graph Transformations for Link Prediction. In Proceedings of the International Conference on Machine Learning, pp 561-568.

    Google Scholar 

  34. Leskovec, Jure, and Kleinberg, Jon M, and Faloutsos, Christos. (2005). Graphs over time:densification laws, shrinking diameters and possible explanations. KDD ’05: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining.

    Google Scholar 

  35. Li, Xin, Chen Hsinchun. (2009). Recommendation as link prediction: a graph kernel-based machine learning approach. Proceedings of the ninth ACM/IEEE Joint Conference on Digital Libraries.

    Google Scholar 

  36. Liben-Nowell, David, and Kleinberg, Jon. (2007). The Link Prediction Problem for Social Networks. Journal of the American Society for Information Science and Technology, 58(7):1019-1031.

    Article  Google Scholar 

  37. Liu, Yan and Kou, Zhenzhen. (2007). Predicting who rated what in largescale datasets. SIGKDD Exploration Newsletter, 9 (2).

    Google Scholar 

  38. Madadhai, J., and Hutchins, J., and Smyth, P. (2005). Prediction and Ranking algorithms for event-based Network Data. SIGKDD Explorations Newsletter, 7(2):23-30.

    Article  Google Scholar 

  39. Malin, Bradley, and Airoldi, Edoardo, and Carley, Kathlee M. (2005). A Network Analysis Model for Disambiguation of Names in Lists. In Journal of Computational and Mathematical Organization Theory, 11(2):119-139.

    Article  MATH  Google Scholar 

  40. Nallapati, Ramesh, and Ahmed, Amr, and Xing, Eric P., and Cohen, William W. (2008). Joint Latent Topic Models for Text and Citations. In Proc. of The Fourteen ACMSIGKDDInternational Conference on Knowledge Discovery and Data Mining.

    Google Scholar 

  41. Newman, M. E. J. (2001). Clustering and Preferential attachment in growing networks. PHysical Review Letters E, 64(025102).

    Google Scholar 

  42. Niculescu-Mizil, and Alexandru, and Caruana, Rich. (2005). Predicting Good Probabilities with Supervised Learning. International Conference on Machine Learning.

    Google Scholar 

  43. Oyama, Satoshi, and Manning, Christopher D., (2004). Using feature conjunctions across examples for learning pairwise classifiers, In The Proc. of European Conference on Machine Learning, pp. 323-333.

    Google Scholar 

  44. Pavlov, Dmitry, and Mannila, Heikki, and Smyth, Phadraic. (2009) Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data. University of California, Irvine Technical Report UCI-ICS-TR-01-09.

    Google Scholar 

  45. Pearl, Judea. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Fransisco.

    Google Scholar 

  46. Popescul, Alexandrin and Ungar, Lyle H. (2003). Statistical Relational Learning for Link Prediction. In Proceedings of Workshop on Learning Statistical Models from Relational Data at IJCAI Conference.

    Google Scholar 

  47. Popescul, Alexandrin and Ungar, Lyle H. (2003). Structural Logistic Regression for Link Analysis. In Proceedings of Workshop on Multi-Relational Data Mining at KDD Conference.

    Google Scholar 

  48. Provost, Foster, and Fawcell, Tom. (2001). Robust Classification for Imprecise Environments. Machine Learning, 42(3):203-231.

    Article  MATH  Google Scholar 

  49. Rattigan, Matthew J., and Jensen, David. (2005). The case for anomalous link discovery. SIGKDD Explorations Newsletter, 7 (2):41-47.

    Article  Google Scholar 

  50. Sarukkai, Ramesh R. (2000). Link Prediction and Path Analysis using Markov Chain. WWW ’00: Proceedings of the Ninth World Wide Web Conference, 377-386.

    Google Scholar 

  51. Shawe-taylor, J., and Cristianini, Nelo. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press, NY.

    MATH  Google Scholar 

  52. Song, Han H., and Cho Tae W., and Dave, Vacha, and Zhang, Yin, and Qiu, Lili. (2009). Scalable proximity Estimation and Link Prediction in Online Social Networks, IMC ’09: In Proceedings of the Internet Measurement Conference.

    Google Scholar 

  53. Tasker, Benjamin, and Wong, Ming F., and Abbeel, Pieter, and Koller, Daphne. (2003). Link Prediction in Relational Data. NIPS ’03: In Proceedings of Neural Information Processing Systems.

    Google Scholar 

  54. Tasker, Benjamin, and Abbeel, Pieter, and Koller, Daphne. (2002). Discriminative Probabilistic Models for Relational Data. In Proceedings of Uncertainty in Artificial Intelligence Conference.

    Google Scholar 

  55. Taskar, Benjamin, and Abbeel, Pieter, and Wong, M.-F, and Koller, Daphne (2007). Relational Markov Networks. In L. Getoor and B. Taskar, editors, Introduction to Statistical Relational Learning.

    Google Scholar 

  56. Tylenda, Tomasz, and Angelova, Ralitsa, and Bahadur, Srikanta. (2009). Towards time-aware link prediction in evolving social network. SNA-KDD ’09: Proceedings of the third Workshop on Social Network Mining and Analysis.

    Google Scholar 

  57. Campbell, Veropoulos, and Campbell, C.K., and Cristianini, N., Controlling the sensitivity of support vector machines. In: Dean, T. (Ed.), IJCAI: Proceedings of International Joint Conference on Artificial Intelligence. pp. 55-60.

    Google Scholar 

  58. Wang, Chao, and Satuluri, Venu, and Parthasarathy, Srinivasan. (2007). Local Probabilistic Models for Link Prediction. ICDM’07: In Proceedings of International Conference on Data Mining.

    Google Scholar 

  59. Watts, D, and Stogatz, S. (1998). Small world. Nature, 393:440-442.

    Article  Google Scholar 

  60. Weiss, Gary M. (2004) Mining with rarity: a unifying framework, In SIGKDD Explorations Newsletter, 6(1):7-19.

    Article  Google Scholar 

  61. Xu, Zhao, and Tresp, Volker, and Yu, Shipeng, and Yu, Kai. (2005). Nonparametric Relational Learning for Social Network Analysis. SNA-KDD ’08: In Proceedings of the Second Workshop on Social Network Mining and Analysis.

    Google Scholar 

  62. Xu, Zhao, and Tresp, Volker, and Yu, Kai and Kriegel, Hans-Peter. (2005). Dirichlet Enhanced Relational Learning. In Proceedings of International Conference on Machine Learning, pp 1004-1011.

    Google Scholar 

  63. Yang, Chan-Yun, and Yang, Jr-Syu, and Wang Jian-Jun. (2009). Margin Calibration in SVM class-imbalanced learning, Neurocomputing, 73(1-3):397-411.

    Article  Google Scholar 

  64. Yu, Kai, and Chu,Wei, and Yu, Shipeng, and Tresp, Volker, and Xu, Zhao. (2006). Stochastic relational models for discriminative link prediction. In Proceedings of NIPS, pp-1553-1560

    Google Scholar 

  65. Zhu, Jianhan, and Hong, Jun, and Hughes G. (2002). Using Markov models for web site link prediction. HYPERTEXT’02: Proceedings of the Thirteenth ACM Conference on Hypertext and Hypermedia.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Al Hasan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Hasan, M.A., Zaki, M.J. (2011). A Survey of Link Prediction in Social Networks. In: Aggarwal, C. (eds) Social Network Data Analytics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-8462-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-8462-3_9

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-8461-6

  • Online ISBN: 978-1-4419-8462-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics