Abstract
Link prediction is an important task for analying social networks which also has applications in other domains like, information retrieval, bioinformatics and e-commerce. There exist a variety of techniques for link prediction, ranging from feature-based classification and kernel-based method to matrix factorization and probabilistic graphical models. These methods differ from each other with respect to model complexity, prediction performance, scalability, and generalization ability. In this article, we survey some representative link prediction methods by categorizing them by the type of the models. We largely consider three types of models: first, the traditional (non-Bayesian) models which extract a set of features to train a binary classification model. Second, the probabilistic approaches which model the joint-probability among the entities in a network by Bayesian graphical models. And, finally the linear algebraic approach which computes the similarity between the nodes in a network by rank-reduced similarity matrices. We discuss various existing link prediction models that fall in these broad categories and analyze their strength and weakness. We conclude the survey with a discussion on recent developments and future research direction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Acar, Evrim, and Dunlavy, Daniel M., Kolda, Tamara G. (2009). Link Prediction on Evolving Data Using Matrix and Tensor Factorizations. In Pro ceedings of the Workshop on Large Scale Data Mining Theory and Applications. ICDM Workshops:262-269
Adamic, Lada A. and Adar, Eytan. (2003). Friends and neighbors on the web. Social Networks, 25(3):211-230.
Adafre, Sisay F., and Rijke, Maarten de. (2005). Discovering missing links in Wikipedia. LINK-KDD ’05: Proceedings of the Third International Workshop on Link Discovery.
Ahmed, Elmagarmid, and Ipeirotis, Panagiotis G., and Verykios, Vassilios. (2007) Duplicate Record Detection: A Survey. In IEEE Transactions on Knowledge and Data Engineering 19 (1):1âĂŞ16
Ahmed, Amr, and Xing, Eric P. (2009). Recovering time-varying network of dependencies in Social and biological studies. PNAS 106(29):11878-11883.
Airodi, Edoardo M., and Blei, David M., and Xing, Eric P., and Fienberg, Stephen E. (2006). Mixed Membership stochastic block models for relational data, with applications to protein-protein interactions. Proceedings of Ineterational Biometric Society-ENAR Annual Meetings.
Barabasi, Albert-Laszlo, and Albert, Reka. (1999) Emergence of Scaling in Random Networks, Science, 286(5439):509.
Barabasi, Albert-Laszlo, and Jeong, H., and Neda, Z. and Ravasz, E. (2002) Evolution of the social network of scientific collaboration. Physics A, 311(3-4):590-614.
Basilico, J., and Hofmann, T. (2004) Unifying Collaborative and Contentbased filtering. In Proceedings of European Conference on Machine Learning.
Bilgic, Mustafa, and Namata, Galileo M., and Getoor, Lise. (2007). Combining collective classification and link prediction. In Proceedings of the Workshop on Mining Graphs and Complex Structures at ICDM Conference.
Brin, Sergey, and Page, Lawrence. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7):107-117.
Chawla, Nitesh V, and Bowyer, Kevin W., and Hall, Lawrence O., and W. Kegelmeyer, Philip. (2002) SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(1):321-357.
Chung, Fan, and Zhao, Wenbo, (2010). PageRank and random walks on graphs. Proceedings of the "Fete of Combinatorics" conference in honor of Lovasz.
Clause, Aaron, and Moore, Christopher, and Newman, M. E. J. (2008). Hierarchical structure and the prediction of missing links in network. Nature 453:98-101.
Doppa, Janardhan R., and Yu, Jun, and Tadepalli, Prasad, and Getoor, Lise. (2009). Chance-Constrained Programs for Link Prediction. In Proceedings of Workshop on Analyzing Networks and Learning with Graphs at NIPS Conference.
Erketin, Syeda, and Huang, Jian, and Giles, Lee. (2007). Active learning for Class imbalance problem. In Proceedings of the 30th ACM SIGIR Conference.
Fabrikant, Alex, and Luthra, Ankur, and Maneva, Elitza, and Papadimitriou, Christos H., and Shenker, Scott. (2003). On a Network Creation Game. In Proc. of the twenty-second annual symposium on principles of distributed computing, pp:347-351.
Freschi, Valerio. (2009). A Graph-based Semi-Supervised Algorithm for Protein Function Prediction from Interaction Maps. In Learning and Intelligent Optimization, Lecture Notes in Computer Science, 5851:249-258
Frieze, A, and Kannan, R., and Vempala, S. (1998) Fast monte-carlo algorithms for finding low-rank approximations. in Journal of the ACM (JACM), 51(6):1025âĂŞ1041.
Fu, Wenjie, and Song, Le, and Xing, Eric P. (2009) . In Proc. of the 26th International Conference on Machine Learning.
Getoor, Lise, and Friedman, Nir, and Koller, Dephne, and Taskar, Benjamin. (2002) Learning Probabilistic Models of Link structure. Journal of Machine Learning Research, 3:679-707.
Hasan, Mohammad A., and Chaoji, Vineet, and Salem, Saeed and Zaki, Mohammed. (2006) Link Prediction using Supervised Learning. In Proceedings of SDM Workshop of Link Analysis, Counterterrorism and Security.
Heckerman, David, and Chickering, David M., and Meek, Christopher, and Rounthwaite, Robert, and Kadie, Carl M. (2000) Dependency Networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1:49-75.
Heckerman, David, and Meek, Christopher, and Koller, Daphne. (2004) Probabilistic models for relational data. Technical Report, Microsoft.
Huang, Zan, and Li, Xin, and Chen Hsinchun. (2005) Link Prediction approach to collaborative filtering. Proceedings of the fifth ACM/IEEE Joint Conference on Digital Libraries.
Imrich, W., Klavzar, S. (2000). Product Graphs: Structure and Recognition. Wiley.
Jeh, Glen, and Widom, Jennifer. (2002) SimRank: A measure of structural-context similarity. In Proceedings of ACM SIGKDD International Conference of Knowledge Discovery and Data Mining.
Karakoulas, Grigoris, and Shawe-Taylor, John. (1999). Optimizing classifiers for imbalanced training sets. Proceedings of NIPS, 253-259.
Kashima, Hisashi, and Abe, Naoke. (2006) A Parameterized Probabilistic Model of Network Evolution for Supervised Link Prediction. ICDM ’06: Proceedings of the Sixth IEEE International Conference on Data Mining. 340-349.
Kashima, Hisashi, and Oyama, Satoshi, and Yamanishi, Yoshihiro, and Tsuda, Koji. (2009). On Pairwise Kernels: An Efficient Alternative and Generalization Analysis, Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp.1030-1037.
Katz, Leo. (1953) A new status index derived from sociometric analysis. Psychometrika, 18(1):39-43.
Kleinberg, Jon M. (2000). Navigation in a small world. Nature 406, (845).
Kunegis, Jerome, and Lommatzsch, Andreas. (2009) Learning Spectral Graph Transformations for Link Prediction. In Proceedings of the International Conference on Machine Learning, pp 561-568.
Leskovec, Jure, and Kleinberg, Jon M, and Faloutsos, Christos. (2005). Graphs over time:densification laws, shrinking diameters and possible explanations. KDD ’05: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining.
Li, Xin, Chen Hsinchun. (2009). Recommendation as link prediction: a graph kernel-based machine learning approach. Proceedings of the ninth ACM/IEEE Joint Conference on Digital Libraries.
Liben-Nowell, David, and Kleinberg, Jon. (2007). The Link Prediction Problem for Social Networks. Journal of the American Society for Information Science and Technology, 58(7):1019-1031.
Liu, Yan and Kou, Zhenzhen. (2007). Predicting who rated what in largescale datasets. SIGKDD Exploration Newsletter, 9 (2).
Madadhai, J., and Hutchins, J., and Smyth, P. (2005). Prediction and Ranking algorithms for event-based Network Data. SIGKDD Explorations Newsletter, 7(2):23-30.
Malin, Bradley, and Airoldi, Edoardo, and Carley, Kathlee M. (2005). A Network Analysis Model for Disambiguation of Names in Lists. In Journal of Computational and Mathematical Organization Theory, 11(2):119-139.
Nallapati, Ramesh, and Ahmed, Amr, and Xing, Eric P., and Cohen, William W. (2008). Joint Latent Topic Models for Text and Citations. In Proc. of The Fourteen ACMSIGKDDInternational Conference on Knowledge Discovery and Data Mining.
Newman, M. E. J. (2001). Clustering and Preferential attachment in growing networks. PHysical Review Letters E, 64(025102).
Niculescu-Mizil, and Alexandru, and Caruana, Rich. (2005). Predicting Good Probabilities with Supervised Learning. International Conference on Machine Learning.
Oyama, Satoshi, and Manning, Christopher D., (2004). Using feature conjunctions across examples for learning pairwise classifiers, In The Proc. of European Conference on Machine Learning, pp. 323-333.
Pavlov, Dmitry, and Mannila, Heikki, and Smyth, Phadraic. (2009) Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data. University of California, Irvine Technical Report UCI-ICS-TR-01-09.
Pearl, Judea. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Fransisco.
Popescul, Alexandrin and Ungar, Lyle H. (2003). Statistical Relational Learning for Link Prediction. In Proceedings of Workshop on Learning Statistical Models from Relational Data at IJCAI Conference.
Popescul, Alexandrin and Ungar, Lyle H. (2003). Structural Logistic Regression for Link Analysis. In Proceedings of Workshop on Multi-Relational Data Mining at KDD Conference.
Provost, Foster, and Fawcell, Tom. (2001). Robust Classification for Imprecise Environments. Machine Learning, 42(3):203-231.
Rattigan, Matthew J., and Jensen, David. (2005). The case for anomalous link discovery. SIGKDD Explorations Newsletter, 7 (2):41-47.
Sarukkai, Ramesh R. (2000). Link Prediction and Path Analysis using Markov Chain. WWW ’00: Proceedings of the Ninth World Wide Web Conference, 377-386.
Shawe-taylor, J., and Cristianini, Nelo. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press, NY.
Song, Han H., and Cho Tae W., and Dave, Vacha, and Zhang, Yin, and Qiu, Lili. (2009). Scalable proximity Estimation and Link Prediction in Online Social Networks, IMC ’09: In Proceedings of the Internet Measurement Conference.
Tasker, Benjamin, and Wong, Ming F., and Abbeel, Pieter, and Koller, Daphne. (2003). Link Prediction in Relational Data. NIPS ’03: In Proceedings of Neural Information Processing Systems.
Tasker, Benjamin, and Abbeel, Pieter, and Koller, Daphne. (2002). Discriminative Probabilistic Models for Relational Data. In Proceedings of Uncertainty in Artificial Intelligence Conference.
Taskar, Benjamin, and Abbeel, Pieter, and Wong, M.-F, and Koller, Daphne (2007). Relational Markov Networks. In L. Getoor and B. Taskar, editors, Introduction to Statistical Relational Learning.
Tylenda, Tomasz, and Angelova, Ralitsa, and Bahadur, Srikanta. (2009). Towards time-aware link prediction in evolving social network. SNA-KDD ’09: Proceedings of the third Workshop on Social Network Mining and Analysis.
Campbell, Veropoulos, and Campbell, C.K., and Cristianini, N., Controlling the sensitivity of support vector machines. In: Dean, T. (Ed.), IJCAI: Proceedings of International Joint Conference on Artificial Intelligence. pp. 55-60.
Wang, Chao, and Satuluri, Venu, and Parthasarathy, Srinivasan. (2007). Local Probabilistic Models for Link Prediction. ICDM’07: In Proceedings of International Conference on Data Mining.
Watts, D, and Stogatz, S. (1998). Small world. Nature, 393:440-442.
Weiss, Gary M. (2004) Mining with rarity: a unifying framework, In SIGKDD Explorations Newsletter, 6(1):7-19.
Xu, Zhao, and Tresp, Volker, and Yu, Shipeng, and Yu, Kai. (2005). Nonparametric Relational Learning for Social Network Analysis. SNA-KDD ’08: In Proceedings of the Second Workshop on Social Network Mining and Analysis.
Xu, Zhao, and Tresp, Volker, and Yu, Kai and Kriegel, Hans-Peter. (2005). Dirichlet Enhanced Relational Learning. In Proceedings of International Conference on Machine Learning, pp 1004-1011.
Yang, Chan-Yun, and Yang, Jr-Syu, and Wang Jian-Jun. (2009). Margin Calibration in SVM class-imbalanced learning, Neurocomputing, 73(1-3):397-411.
Yu, Kai, and Chu,Wei, and Yu, Shipeng, and Tresp, Volker, and Xu, Zhao. (2006). Stochastic relational models for discriminative link prediction. In Proceedings of NIPS, pp-1553-1560
Zhu, Jianhan, and Hong, Jun, and Hughes G. (2002). Using Markov models for web site link prediction. HYPERTEXT’02: Proceedings of the Thirteenth ACM Conference on Hypertext and Hypermedia.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Hasan, M.A., Zaki, M.J. (2011). A Survey of Link Prediction in Social Networks. In: Aggarwal, C. (eds) Social Network Data Analytics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-8462-3_9
Download citation
DOI: https://doi.org/10.1007/978-1-4419-8462-3_9
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-8461-6
Online ISBN: 978-1-4419-8462-3
eBook Packages: Computer ScienceComputer Science (R0)