Abstract
Many important real-world systems, modeled naturally as complex networks, have heterogeneous interactions and complicated dependency structures. Link prediction in such networks must model the influences between heterogenous relationships and distinguish the formation mechanisms of each link type, a task which is beyond the simple topological features commonly used to score potential links. In this paper, we introduce a novel probabilistically weighted extension of the Adamic/Adar measure for heterogenous information networks, which we use to demonstrate the potential benefits of diverse evidence, particularly in cases where homogeneous relationships are very sparse. However, we also expose some fundamental flaws of traditional unsupervised link prediction. We develop supervised learning approaches for relationship (link) prediction in multi-relational networks, and demonstrate that a supervised approach to link prediction can enhance performance. We present results on three diverse, real-world heterogeneous information networks and discuss the trends and tradeoffs of supervised and unsupervised link prediction in a multi-relational setting.
Similar content being viewed by others
References
Adamic L, Adar E (2003) Friends and neighbors on the web. Soc Netw 25(3):211–230
Barabási A (2003) Linked: how everything is connected to everything else and what it means. Penguin Group, New York
Barabási A, Jeong H, Néda Z, Ravasz E, Schubert A, Vicsek T (2002) Evolution of the social network of scientific collaborations. Phys A Stat Mech Appl 311(3-4):590–614
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117
Christakis N, Fowler J (2007) The spread of obesity in a large social network over 32 years. New Engl J Med 357(4):370–379
Christakis N, Fowler J (2008) The collective dynamics of smoking in a large social network. New Engl J Med 358(21):2249–2258
Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning ACM, pp 233–240
Friedkin N, Johnsen E (1990) Social influence and opinions. J Math Sociol 15(3):193–206
Goadrich M, Oliphant L, Shavlik J (2004) Learning ensembles of first-order clauses for recall-precision curves: a case study in biomedical information extraction. In: Inductive logic programming, pp 421–456
Han J (2009) Mining heterogeneous information networks by exploring the power of links. In: Discovery science, Springer, pp 13–30
Holland P, Leinhardt S (1970) A method for detecting structure in sociometric data. Am J Sociol 76(3):492–513
Huang Z, Li X, Chen H (2005) Link prediction approach to collaborative filtering. In: Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries, ACM, pp 141–142
Kas M, Carley K, Carley L (2012) Trends in science networks: understanding structures and statistics of scientific networks. In: Social network analysis and mining, pp 1–19
Kautz H, Selman B, Shah M (1997) Referral Web: combining social networks and collaborative filtering. Commun ACM 40(3):63–65
Liben-Nowell D, Kleinberg J (2003) The link-prediction problem for social networks. In: Proceedings of the 12th international conference on information and knowledge management, pp 556–559
Lichtenwalter R, Lussier J, Chawla N (2010) New perspectives and methods in link prediction. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 243–252
McPherson M, Smith-Lovin L, Cook J (2001) Birds of a feather: Homophily in social networks. Ann Rev Sociol 27:415–444
Newman M (2001) Clustering and preferential attachment in growing networks. Phys Rev E 64(2):025102
Newman M (2001) The structure of scientific collaboration networks. Proc Natl Acad Sci 98(2):404
O’Madadhain J, Hutchins J, Smyth P (2005) Prediction and ranking algorithms for event-based network data. ACM SIGKDD Explor Newslett 7(2):23–30
Pelan A, Steinhaeuser K, Chawla NV, de Alwis Pitts DA, Ganguly AR (2011) Empirical comparison of correlation measures and pruning levels in complex networks representing the global climate system. In: IEEE symposium series on computational intelligence and data mining
Pržulj N, Corneil D, Jurisica I (2004) Modeling interactome: scale-free or geometric? Bioinformatics 20(18):3508
Radivojac P, Peng K, Clark W, Peters B, Mohan A, Boyle S, Mooney S (2008) An integrated approach to inferring gene–disease associations in humans. Proteins Struct Funct Bioinform 72(3):1030–1037
Raeder T, Chawla N (2011) Market basket analysis with networks. In: Social network analysis and mining, pp 1–17
Raghavan V, Bollmann P, Jung G (1989) A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans Inform Syst (TOIS) 7(3):205–229
Rual J, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz G, Gibbons F, Dreze M, Ayivi-Guedehoussou N et al (2005) Towards a proteome-scale map of the human protein–protein interaction network. Nature 437(7062):1173–1178
Salton G, McGill M (1983) Introduction to modern information retrieval. McGraw-Hill, New York
Scott J (2011) Social network analysis: developments, advances, and prospects. In: Social network analysis and mining, pp 1–6
Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck F, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S et al (2005) A human protein-protein interaction network: a resource for annotating the proteome. Cell 122(6):957–968
Tang L, Wang X, Liu H (2009) Uncovering groups via heterogeneous interaction analysis. In: Proceedings of the 9th IEEE international conference on data mining, pp 503–512
Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge university press, Cambridge
Acknowledgments
Research was sponsored in part by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-09-2-0053 and in part by the National Science Foundation (NSF) Grant BCS-0826958. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Davis, D., Lichtenwalter, R. & Chawla, N.V. Supervised methods for multi-relational link prediction. Soc. Netw. Anal. Min. 3, 127–141 (2013). https://doi.org/10.1007/s13278-012-0068-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13278-012-0068-6