World Wide Web

, Volume 17, Issue 5, pp 1051–1079 | Cite as

Integration of scientific and social networks

  • Mahmood Neshati
  • Djoerd Hiemstra
  • Ehsaneddin Asgari
  • Hamid Beigy
Article

Abstract

In this paper, we address the problem of scientific-social network integration to find a matching relationship between members of these networks (i.e. The DBLP publication network and the Twitter social network). This task is a crucial step toward building a multi environment expert finding system that has recently attracted much attention in Information Retrieval community. In this paper, the problem of social and scientific network integration is divided into two sub problems. The first problem concerns finding those profiles in one network, which presumably have a corresponding profile in the other network and the second problem concerns the name disambiguation to find true matching profiles among some candidate profiles for matching. Utilizing several name similarity patterns and contextual properties of these networks, we design a focused crawler to find high probable matching pairs, then the problem of name disambiguation is reduced to predict the label of each candidate pair as either true or false matching. Because the labels of these candidate pairs are not independent, state-of-the-art classification methods such as logistic regression and decision tree, which classify each instance separately, are unsuitable for this task. By defining matching dependency graph, we propose a joint label prediction model to determine the label of all candidate pairs simultaneously. Two main types of dependencies among candidate pairs are considered for designing the joint label prediction model which are quite intuitive and general. Using the discriminative approaches, we utilize various feature sets to train our proposed classifiers. An extensive set of experiments have been conducted on six test collection collected from the DBLP and the Twitter networks to show the effectiveness of the proposed joint label prediction model.

Keywords

Social network integration Twitter DBLP Collective classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Balog, K., Bogers, T., Azzopardi, L., de Rijke, M., van den Bosch, A.: Broad expertise retrieval in sparse data environments. In: SIGIR ’07: Proceedings of the 30th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 551–558. ACM Press, New York, NY, USA (2007)CrossRefGoogle Scholar
  2. 2.
    Balog, K., Azzopardi, L., de Rijke, M.: A language modeling framework for expert finding. Inf. Process. Manage. 45(1), 1–19 (2009)CrossRefGoogle Scholar
  3. 3.
    Balog, K., Soboroff, I., Thomas, P., Craswell, N., de Vries, A.P., Bailey, P.: Overview of the TREC 2008 enterprise track. In: The Seventeenth Text Retrieval Conference Proceedings (TREC 2008), NIST Special Publication (2009)Google Scholar
  4. 4.
    Bekkerman, R., McCallum, A.: Disambiguating web appearances of people in a social network. In: Ellis, A., Hagino, T. (eds.), pp. 463–470. WWW, ACM (2005)Google Scholar
  5. 5.
    Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data 1(1), 1–36 (2007)CrossRefGoogle Scholar
  6. 6.
    Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proceedings of IJCAI-03 Workshop on Information Integration, pp. 73–78 (2003)Google Scholar
  7. 7.
    Craswell, N., Hawking, D., Robertson, S.: Effective site finding using link anchor information. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’01, pp. 250–257. ACM, New York, NY, USA (2001)CrossRefGoogle Scholar
  8. 8.
    Deng, H., Han, J., Lyu, M.R., King, I.: Modeling and exploiting heterogeneous bibliographic networks for expertise ranking. In: 2012 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2012). IEEE (2012)Google Scholar
  9. 9.
    Deng, H., King, I., Lyu, M.R.: Enhanced models for expertise retrieval using community-aware strategies. IEEE Trans. Syst. Man Cybern. Part B 42(1), 93–106 (2012)CrossRefGoogle Scholar
  10. 10.
    EntityCube: http://entitycube.research.microsoft.com/. Accessed April 2012
  11. 11.
    Fang, Y., Si, L., Mathur, A.P.: Discriminative graphical models for faculty homepage discovery. Inf. Retr. 13(6), 618–635 (2010)CrossRefGoogle Scholar
  12. 12.
    Fang, Y., Si, L., Mathur, A.P.: Discriminative probabilistic models for expert search in heterogeneous information sources. Inf. Retr. 14,158–177 (2011)CrossRefGoogle Scholar
  13. 13.
    Ferreira, A.A., Veloso, A., Gonçalves, M.A., Laender, A.H.: Effective self-training author name disambiguation in scholarly digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, JCDL ’10, pp. 39–48. ACM, New York, NY, USA (2010)CrossRefGoogle Scholar
  14. 14.
    Han, H., Giles, L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’04, pp. 296–305. ACM, New York, NY, USA (2004)Google Scholar
  15. 15.
    Han, H., Zha, H., Giles, C.L.: Name disambiguation in author citations using a k-way spectral clustering method. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’05, pp. 334–343. ACM, New York, NY, USA (2005)CrossRefGoogle Scholar
  16. 16.
    Hofmann, K., Balog, K., Bogers, T., de Rijke, M.: Contextual factors for finding similar experts. J. Am. Soc. Inform. Sci. Technol. 61(5), 994–1014 (2010)CrossRefGoogle Scholar
  17. 17.
    Kazai, G., Doucet, A.: Overview of the inex 2007 book search track (booksearch’07). In: Focused access to XML documents, Sixth International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007. Lecture Notes in Computer Science, vol. 4862, pp. 148–161. Springer (2008)Google Scholar
  18. 18.
    Kopcke, H., Rahm, E.: Frameworks for entity matching: a comparison. Data Knowl. Eng. 69(2), 197–210 (2010)CrossRefGoogle Scholar
  19. 19.
    Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2001)Google Scholar
  20. 20.
    Ley, M.: Dblp: some lessons learned. Proc. VLDB Endow. 2, 1493–1500 (2009)CrossRefGoogle Scholar
  21. 21.
    Macdonald, C., Ounis, I.: Voting techniques for expert search. Knowl. Inf. Sys. 16(3), 259–280 (2008). doi:10.1007/s10115-007-0105-3 CrossRefGoogle Scholar
  22. 22.
    McCallum, A.: Efficiently inducing features of conditional random fields. In: Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI03) (2003)Google Scholar
  23. 23.
    Mccallum, A., Wellner, B.: Conditional models of identity uncertainty with application to noun coreference. In: NIPS 2004 (2004)Google Scholar
  24. 24.
    Moghaddam, S., Ester, M.: Ilda: interdependent lda model for learning latent aspects and their ratings from online product reviews. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’11, pp. 665–674. ACM, New York, NY, USA (2011)Google Scholar
  25. 25.
    Rode, H., Serdyukov, P., Hiemstra, D.: Combining document- and paragraph-based entity ranking. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08, pp. 851–852. ACM, New York, NY, USA (2008)CrossRefGoogle Scholar
  26. 26.
    Serdyukov, P.: Search for expertise: going beyond direct evidence. PhD thesis, Enschede (2009)Google Scholar
  27. 27.
    Smirnova, E., Balog, K.: A user-oriented model for expert finding. In: 33rd European Conference on Information Retrieval (ECIR 2011), vol. 6611, pp. 580–592. Springer (2011)Google Scholar
  28. 28.
    Sutton, C., McCallum, A.: 4. In: An Introduction to Conditional Random Fields for Relational Learning, pp. 93–128. MIT Press (2006)Google Scholar
  29. 29.
    Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, pp. 990–998. ACM, New York, NY, USA (2008)CrossRefGoogle Scholar
  30. 30.
    Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: UAI, pp. 485–492 (2002)Google Scholar
  31. 31.
    WebMynd: http://www.webmynd.com (2012)
  32. 32.
    Xi, W., Fox, E., Tan, R., Shu, J.: Machine learning approach for homepage finding task. In: Laender, A., Oliveira, A. (eds.) String Processing and Information Retrieval. Lecture Notes in Computer Science, vol. 2476, pp. 169–174. Springer Berlin / Heidelberg (2002)Google Scholar
  33. 33.
    You, G.w., Park, J.w., Hwang, S.w., Nie, Z., Wen, J.R.: Socialsearchs+: enriching social network with web evidences. World Wide Web 1–27 (2012). doi:10.1007/s11280-012-0165-5

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Mahmood Neshati
    • 1
  • Djoerd Hiemstra
    • 2
  • Ehsaneddin Asgari
    • 3
  • Hamid Beigy
    • 1
  1. 1.Department of Computer EngineeringSharif University of TechnologyTehranIran
  2. 2.Database Research Group, Electrical Engineering, Mathematics and Computer Science (EEMCS) DepartmentUniversity of TwenteEnschedeThe Netherlands
  3. 3.School of Computer and Communication Science (IC)Ecole Polytechnique Fédérale de Lausanne - EPFLLausanneSwitzerland

Personalised recommendations