Skip to main content
Log in

Integrated anchor and social link predictions across multiple social networks

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In recent years, various online social networks offering specific services have gained great popularity and success. To enjoy more online social services, some users can be involved in multiple social networks simultaneously. A challenging problem in social network studies is to identify the common users across networks to gain better understanding of user behavior. This is referred to as the anchor link prediction problem. Meanwhile, across these partially aligned social networks, users can be connected by different kinds of links, e.g., social links among users in one single network and anchor links between accounts of the shared users in different networks. Many different link prediction methods have been proposed so far to predict each type of links separately. In this paper, we want to predict the formation of social links among users in the target network as well as anchor links aligning the target network with other external social networks. The problem is formally defined as the “collective link identification” problem. Predicting the formation of links in social networks with traditional link prediction methods, e.g., classification-based methods, can be very challenging. The reason is that, from the network, we can only obtain the formed links (i.e., positive links) but no information about the links that will never be formed (i.e., negative links). To solve the collective link identification problem, a unified link prediction framework, collective link fusion (CLF) is proposed in this paper, which consists of two phases: step (1) collective link prediction of anchor and social links with positive and unlabeled learning techniques, and step (2) propagation of predicted links across the partially aligned “probabilistic networks” with collective random walk. Extensive experiments conducted on two real-world partially aligned networks demonstrate that CLF can perform very well in predicting social and anchor links concurrently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Adamic L, Adar E (2001) Friends and neighbors on the web. Soc Netw 25:211–230

    Article  Google Scholar 

  2. Backstrom L, Leskovec J (2011) Supervised random walks: predicting and recommending links in social networks. In: WSDM

  3. Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  4. Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. In: KDD

  5. Fouss F, Pirotte A, Renders J, Saerens M (2007) Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. TKDE 19:355–369

    Google Scholar 

  6. Fujiwara Y, Nakatsuji M, Onizuka M, Kitsuregawa M (2012) Fast and exact top-k search for random walk with restart. VLDB 55:442–453

    Google Scholar 

  7. Getoor L, Diehl CP (2005) Link mining: a survey. SIGKDD Explor Newslett 7:3–12

    Article  Google Scholar 

  8. Hasan M, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: SDM

  9. Hasan M, Zaki MJ (2011) A survey of link prediction in social networks. In: Aggarwal CC (ed) Social network data analytics. Springer, New York

    Google Scholar 

  10. Hsieh C-J, Natarajan N, Dhillon IS (2015) PU learning for matrix completion. In: ICML, pp 2445–2453

  11. Hwang T, Kuang R (2010) A heterogeneous label propagation algorithm for disease gene discovery. In: SDM

  12. Iofciu T, Fankhauser P, Abel F, Bischoff K (2011) Identifying users across social tagging systems. In: ICWSM

  13. Jin S, Zhang J, Yu P, Yang S, Li A (2014) Synergistic partitioning in multiple large scale social networks. In: IEEE BigData

  14. Kong X, Zhang J, Yu P (2013) Inferring anchor links across multiple heterogeneous social networks. In: CIKM

  15. Konstas I, Stathopoulos V, Jose JM (2009) On social networks and collaborative recommendation. In: SIGIR

  16. Lü L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A Stat Mech Its Appl 390:1150–1170

    Article  Google Scholar 

  17. Leskovec J, Huttenlocher D, Kleinberg J (2010) Predicting positive and negative links in online social networks. In: WWW

  18. Liben-Nowell D, Kleinberg J (2003) The link prediction problem for social networks. In: CIKM

  19. Liu B, Dai Y, Li X, Lee W, Yu P (2003) Building text classifiers using positive and unlabeled examples. In: ICDM

  20. Liu J, Zhang F, Song X, Song Y, Lin C, Hon H (2013) What’s in a name? An unsupervised approach to link users across communities. In: WSDM

  21. Lü L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A Stat Mech Its Appl 390(6):1150–1170

    Article  Google Scholar 

  22. Namata G, Kok S, Getoor L (2011) Collective graph identification. In: KDD

  23. Perkins D, Salomon G (1992) Transfer of learning Pergamon Press, Oxford, England

  24. Sahraeian S, Yoon B (2013) Smetana: accurate and scalable algorithm for probabilistic alignment of large-scale biological networks. PLoS ONE 8:e67995

    Article  Google Scholar 

  25. Song D, Meyer D (2014) A model of consistent node types in signed directed social networks. In: ASONAM ’14 Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, IEEE Press, Piscataway, NJ, USA, pp 72–80

  26. Tong H, Faloutsos C, Pan J (2006) Fast random walk with restart and its applications. In: ICDM

  27. Wilcox K, Stephen AT (2012) Are close friends the enemy? Online social networks, self-esteem, and self-control. J Consum Res 40:90–103

    Article  Google Scholar 

  28. Xi W, Zhang B, Chen Z, Lu Y, Yan S, Ma W, Fox E (2004) Link fusion: a unified link analysis framework for multi-type interrelated data objects. In: WWW

  29. Xiang R, Neville J, Rogati M (2010) Modeling relationship strength in online social networks. In: WWW

  30. Yao Y, Tong H, Yan X, Xu F, Lu J (2013) Matri: a multi-aspect and transitive trust inference model. In: WWW

  31. Ye J, Cheng H, Zhu Z, Chen M (2013) Predicting positive and negative links in signed social networks by transfer learning. In: WWW

  32. Zafarani R, Liu H (2009) Connecting corresponding identities across communities. In: ICWSM

  33. Zhan Q, Wang S, Zhang J, Yu P, Xie J (2015) Influence maximization across partially aligned heterogenous social networks. In: PAKDD

  34. Zhang J, Kong X, Yu P (2013) Predicting social links for new users across aligned heterogeneous social networks. In: ICDM

  35. Zhang J, Kong X, Yu P (2014) Transferring heterogeneous links across location-based social networks. In: WSDM

  36. Zhang J, Shao W, Wang S, Kong X, Yu P (2015) Pna: Partial network alignment with generic stable matching. In: IEEE IRI

  37. Zhang J, Yu P (2015) Community detection for emerging networks. In: SDM

  38. Zhang J, Yu P (2015) Mcd: Mutual clustering across multiple heterogeneous networks. In: IEEE BigData Congress

  39. Zhang J, Yu P, Zhou Z (2014) Meta-path based multi-network collective link prediction. In: KDD

  40. Zhao Y, Kong X, Yu P (2011) Positive and unlabeled learning for graph classification. In: ICDM

Download references

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities under grant JUSRP11852. This work was partially supported by Florida State University Council on Research and Creativity (CRC) via the Project ID 041776. This work is also supported in part by NSF through Grants IIS-1526499, IIS-1763325, CNS-1626432 and NSFC 61672313. The views and conclusions are those of the authors and should not be interpreted as representing the official policies of the funding agencies or the government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qianyi Zhan.

Additional information

A preliminary version of this work appeared in: Proceedings of International Joint Conferences on Artificial Intelligence (IJCAI ’15), 2015.

Appendix

Appendix

Social features of anchor links have been introduced in previous part, in this part, we will introduce the social features of social links and spatial distribution features, temporal distribution features and text usage features of both anchor links and social links.

1.1 6.1. Social features

See Table 3.

Table 3 Social features defined for social link (uv)

\(\Gamma (u)\) is the set of neighbors of user u.

In addition to social information, we also extract features from users’ location check-ins. For a certain anchor/social link (uv), we can get the locations that u and v have been to \(\Phi (u)\) and \(\Phi (v)\), respectively. Since each user can visit a location many times, we construct vector l(u) and l(v) for u and v, respectively, each cell in which record the times that u and v visit a certain location in \(\Phi (u) \cup \Phi (v)\).

1.2 6.2. Spatial distribution features

See Table 4.

Table 4 Spatial distribution features for link (uv)

Similarly, we can get the set of locations that u has visited from the networks, \(\Phi (u)\). For a certain anchor/social link (uv), we can extract the spatial distribution features for it with those summarized in Table 3 except the “Adamic/Adar” measure based on \(\Phi (u)\) and \(\Phi (u)\).

1.3 6.3. Temporal distribution features

See Table 5.

Table 5 Other frequently features for link (uv)

Users’ temporal activity information is also used to extract features for link (uv). Each day is divided into 24 h slots, and the number of online posts published at certain hours is stored in vector \({\mathbf {x}}(u)\) and \({\mathbf {x}}(v)\), from which we can extract \(IP({\mathbf {x}}(u), {\mathbf {x}}(v))\), \(ED({\mathbf {x}}(u), {\mathbf {x}}(v))\) and \(CS({\mathbf {x}}(u), {\mathbf {x}}(v))\) summarized in Table 5 as the temporal distribution features of link (uv).

1.4 6.4. Text usage features

For a certain link (uv), we can get the words that u and v have used in the past and group them as two bag-of-words vectors, \({\mathbf {x}}(u)\) and \({\mathbf {x}}(v)\), weighted by TF-IDF. From \({\mathbf {x}}(u)\) and \({\mathbf {x}}(v)\), we also extract \(IP({\mathbf {x}}(u), {\mathbf {x}}(v))\), \(ED({\mathbf {x}}(u), {\mathbf {x}}(v))\) and \(CS({\mathbf {x}}(u), {\mathbf {x}}(v))\) summarized in Table 5 as the text usage features of link (uv).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhan, Q., Zhang, J. & Yu, P.S. Integrated anchor and social link predictions across multiple social networks. Knowl Inf Syst 60, 303–326 (2019). https://doi.org/10.1007/s10115-018-1210-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1210-1

Keywords

Navigation