Advertisement

World Wide Web

, Volume 16, Issue 5–6, pp 701–727 | Cite as

SocialSearch  + : enriching social network with web evidences

  • Gae-won You
  • Jin-woo Park
  • Seung-won HwangEmail author
  • Zaiqing Nie
  • Ji-Rong Wen
Article

Abstract

This paper introduces the problem of searching for social network accounts, e.g., Twitter accounts, with the rich information available on the Web, e.g., people names, attributes, and relationships to other people. For this purpose, we need to map Twitter accounts with Web entities. However, existing solutions building upon naive textual matching inevitably suffer low precision due to false positives (e.g., fake impersonator accounts) and false negatives (e.g., accounts using nicknames). To overcome these limitations, we leverage “relational” evidences extracted from the Web corpus. We consider two types of evidence resources—First, web-scale entity relationship graphs, extracted from name co-occurrences crawled from the Web. This co-occurrence relationship can be interpreted as an “implicit” counterpart of Twitter follower relationships. Second, web-scale relational repositories, such as Freebase with complementary strength. Using both textual and relational features obtained from these resources, we learn a ranking function aggregating these features for the accurate ordering of candidate matches. Another key contribution of this paper is to formulate confidence scoring as a separate problem from relevance ranking. A baseline approach is to use the relevance of the top match itself as the confidence score. In contrast, we train a separate classifier, using not only the top relevance score but also various statistical features extracted from the relevance scores of all candidates, and empirically validate that our approach outperforms the baseline approach. We evaluate our proposed system using real-life internet-scale entity-relationship and social network graphs.

Keywords

entity search graph matching ranking social network 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Backstrom, L., Leskovec, J.: Supervised random walks: predicting and recommending links in social networks. In: Proc. WSDM, pp. 635–644. ACM (2011)Google Scholar
  2. 2.
    Bekkerman, R., McCallum, A.: Disambiguating web appearances of people in a social network. In: Proc. WWW, pp. 463–470. ACM (2005)Google Scholar
  3. 3.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1–7), 107–117 (1998)Google Scholar
  4. 4.
    Chen, J., Geyer, W., Dugan, C., Muller, M., Guy, I.: Make new friends, but keep the old: recommending people on social networking sites. In: Proc. CHI, pp. 201–210. ACM (2009)Google Scholar
  5. 5.
    Google: Freebase data dumps. http://download.freebase.com/datadumps/ (2010). Accessed 18 Nov 2010
  6. 6.
    Guy, I., Zwerdling, N., Carmel, D., Ronen, I., Uziel, E., Yogev, S., Ofek-Koifman, S.: Personalized recommendation of social software items based on social relations. In: RecSys, pp. 53–60. ACM (2009)Google Scholar
  7. 7.
    Hill, S., Provost, F.: The myth of the double-blind review?: author identification using only citations. SIGKDD Explorations Newsletter 5(2), 179–184 (2003)CrossRefGoogle Scholar
  8. 8.
    Hu, B., Hu, B.: On capturing semantics in ontology mapping. World Wide Web 11(3), 361–385 (2008)CrossRefGoogle Scholar
  9. 9.
    Java, A., Song, X., Finin, T., Tseng, B.: Why we Twitter: understanding microblogging usage and communities. In: Proc. WebKDD/SNA-KDD, pp. 56–65. ACM (2007)Google Scholar
  10. 10.
    Joachims, T.: Making large-scale support vector machine learning practical. MIT Press (1999)Google Scholar
  11. 11.
    Joachims, T.: Optimizing search engines using clickthrough data. In: Proc. SIGKDD, pp. 133–142. ACM (2002)Google Scholar
  12. 12.
    Joachims, T.: Training linear SVMs in linear time. In: Proc. SIGKDD, pp. 217–226. ACM (2006)Google Scholar
  13. 13.
    Konstas, I., Stathopoulos, V., Jose, J.M.: On social networks and collaborative recommendation. In: Proc. SIGIR, pp. 195–202. ACM (2009)Google Scholar
  14. 14.
    Lee, J., Hwang, S.-w., Nie, Z., Wen, J.-R.: Query result clustering for object-level search. In: Proc. SIGKDD, pp. 1205–1214. ACM (2009)Google Scholar
  15. 15.
    Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proc. ICDE, pp. 117–128. IEEE Computer Society (2002)Google Scholar
  16. 16.
    Musiał, K., Kazienko, P.: Social networks on the internet. World Wide Web 1–42 (2012, in press). doi: 10.1007/s11280-011-0155-z
  17. 17.
    Narayanan, A., Shmatikov, V.: Robust de-anonymization of large sparse datasets. In: Proc. S&P, pp. 111–125. IEEE Computer Society (2008)Google Scholar
  18. 18.
    Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: Proc. S&P, pp. 173–187. IEEE Computer Society (2009)Google Scholar
  19. 19.
    Nie, Z., Wen, J.-R., Ma, W.-Y.: Object-level vertical search. In: Proc. CIDR, pp. 235–246 (2007)Google Scholar
  20. 20.
    On, B.-W., Lee, D., Kang, J., Mitra, P.: Comparative study of name disambiguation problem using a scalable blocking-based framework. In: Proc. JCDL, pp. 344–353. ACM (2005)Google Scholar
  21. 21.
    Taneva, B., Kacimi, M., Weikum, G.: Gathering and ranking photos of named entities with high precision, high recall, and diversity. In: Proc. WSDM, pp. 431–440. ACM (2010)Google Scholar
  22. 22.
    Yin, Z., Gupta, M., Weninger, T., Han, J.: LINKREC: a unified framework for link recommendation with user attributes and graph structure. In: Proc. WWW, pp. 1211–1212. ACM (2010)Google Scholar
  23. 23.
    You, G.-w., Hwang, S.-w., Nie, Z., Wen, J.-R.: SocialSearch: enhancing entity search with social network matching. In: Proc. EDBT, pp. 515–519. ACM (2011)Google Scholar
  24. 24.
    Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.-R.: StatSnowball: a statistical approach to extracting entity relationships In: Proc. WWW, pp. 101–110. ACM (2009)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Gae-won You
    • 1
  • Jin-woo Park
    • 1
  • Seung-won Hwang
    • 1
  • Zaiqing Nie
    • 2
  • Ji-Rong Wen
    • 2
  1. 1.Pohang University of Science and TechnologyPohangRepublic of Korea
  2. 2.Microsoft Research AsiaBeijingPeople’s Republic of China

Personalised recommendations