Machine Learning

, Volume 82, Issue 2, pp 211–237 | Cite as

Topic level expertise search over heterogeneous networks

  • Jie Tang
  • Jing Zhang
  • Ruoming Jin
  • Zi Yang
  • Keke Cai
  • Li Zhang
  • Zhong Su
Article

Abstract

In this paper, we present a topic level expertise search framework for heterogeneous networks. Different from the traditional Web search engines that perform retrieval and ranking at document level (or at object level), we investigate the problem of expertise search at topic level over heterogeneous networks. In particular, we study this problem in an academic search and mining system, which extracts and integrates the academic data from the distributed Web. We present a unified topic model to simultaneously model topical aspects of different objects in the academic network. Based on the learned topic models, we investigate the expertise search problem from three dimensions: ranking, citation tracing analysis, and topical graph search. Specifically, we propose a topic level random walk method for ranking the different objects. In citation tracing analysis, we aim to uncover how a piece of work influences its follow-up work. Finally, we have developed a topical graph search function, based on the topic modeling and citation tracing analysis. Experimental results show that various expertise search and mining tasks can indeed benefit from the proposed topic level analysis approach.

Keywords

Social network Information extraction Name disambiguation Topic modeling Expertise search Association search 

References

  1. Andrieu, C., de Freitas, N., Doucet, A., & Jordan, M. I. (2003). An introduction to mcmc for machine learning. Machine Learning, 50, 5–43. MATHCrossRefGoogle Scholar
  2. Asuncion, A., Welling, M., Smyth, P., & Teh, Y. W. (2009). On smoothing and inference for topic models. In Proceedings of the twenty-fifth annual conference on uncertainty in artificial intelligence (UAI’09) (pp. 27–34). Google Scholar
  3. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. New York: ACM. Google Scholar
  4. Balog, K., Azzopardi, L., & de Rijke, M. (2006). Formal models for expert finding in enterprise corpora. In Proceedings of the 29th ACM SIGIR international conference on information retrieval (SIGIR’2006) (pp. 43–55). Google Scholar
  5. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. MATHCrossRefGoogle Scholar
  6. Brefeld, U., & Scheffer, T. (2005). Auc maximizing support vector learning. In Proceedings of ICML’05 workshop on ROC analysis in machine learning. Google Scholar
  7. Buckley, C., & Voorhees, E. M. (2004). Retrieval evaluation with incomplete information. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’04) (pp. 25–32). Google Scholar
  8. Craswell, N., de Vries, A. P., & Soboroff, I. (2005). Overview of the trec-2005 enterprise track. In TREC 2005 conference notebook (pp. 199–205). Google Scholar
  9. Dom, B., Eiron, I., Cozzi, A., & Zhang, Y. (2003). Graph-based ranking algorithms for e-mail expertise analysis. In Data mining and knowledge discovery (pp. 42–48). Google Scholar
  10. Garfield, E. (1972). Citation analysis as a tool in journal evaluation. Science, 178(4060), 471–479. CrossRefGoogle Scholar
  11. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. In Proceedings of the national academy of sciences (PNAS’04) (pp. 5228–5235). Google Scholar
  12. Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd international conference on research and development in information retrieval (SIGIR’99) (pp. 50–57). Google Scholar
  13. Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632. MATHCrossRefMathSciNetGoogle Scholar
  14. Liu, X., Bollen, J., Nelson, M. L., & de Sompel, H. V. (2005). Co-authorship networks in the digital library research community. Information Processing & Management, 41(6), 681–682. CrossRefGoogle Scholar
  15. McCallum, A. (1999). Multi-label text classification with a mixture model trained by em. In Proceedings of AAAI’99 workshop on text learning. Google Scholar
  16. McCallum, A., Wang, X., & Corrada-Emmanuel, A. (2007). Topic and role discovery in social networks with experiments on enron and academic email. Journal of Artificial Intelligence Research (JAIR), 30, 249–272. Google Scholar
  17. McDonell, K. J. (1977). An inverted index implementation. The Computer Journal, 20(1), 116–123. CrossRefGoogle Scholar
  18. McNee, S. M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S. K., Rashid, A. M., Konstan, J. A., & Riedl, J. (2002). On the recommending of citations for research papers. In Proceedings of the 2002 ACM conference on computer supported cooperative work (CSCW’02) (pp. 116–125). Google Scholar
  19. Mei, Q., Ling, X., Wondra, M., Su, H., & Zhai, C. (2007). Topic sentiment mixture: modeling facets and opinions in weblogs. In Proceedings of the 16th international conference on world wide web (WWW’07) (pp. 171–180). Google Scholar
  20. Mimno, D., & McCallum, A. (2007). Expertise modeling for matching papers with reviewers. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’07) (pp. 500–509). Google Scholar
  21. Minka, T. (2003). Estimating a Dirichlet distribution. In Technique report. http://research.microsoft.com/minka/papers/dirichlet/.
  22. Moffat, A., & Zobel, J. (1996). Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems, 14(4), 349–379. CrossRefGoogle Scholar
  23. Moffat, A., Zobel, J., & Sacks-Davis, R. (1994). Memory efficient ranking. Information Processing and Management, 30(6), 733–744. CrossRefGoogle Scholar
  24. Nanba, H., & Okumura, M. (1999). Towards multi-paper summarization using reference information. In Proceedings of the sixteenth international joint conference on artificial intelligence (IJCAI’99) (pp. 926–931). Google Scholar
  25. Newman, D., Asuncion, A., Smyth, P., & Welling, M. (2007). Distributed inference for latent Dirichlet allocation. In Proceedings of the 19th neural information processing systems (NIPS’07). Google Scholar
  26. Nie, Z., Zhang, Y., Wen, J.-R., & Ma, W.-Y. (2005). Object-level ranking: bringing order to web objects. In Proceedings of the 14th international conference on world wide web (WWW’05) (pp. 567–574). Google Scholar
  27. Nie, Z., Ma, Y., Shi, S., Wen, J.-R., & Ma, W.-Y. (2007). Web object retrieval. In Proceedings of the 16th international conference on world wide web (WWW’07) (pp. 81–90). Google Scholar
  28. Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The pagerank citation ranking: bringing order to the web (Technical Report SIDL-WP-1999-0120). Stanford University. Google Scholar
  29. Robertson, S. E., Walker, S., Hancock-Beaulieu, M., Gatford, M., & Payne, A. (1996). Okapi at trec-4. In Text retrieval conference. Google Scholar
  30. Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. In Proceedings of the 20th international conference on uncertainty in artificial intelligence (UAI’04) (pp. 487–494). Google Scholar
  31. Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval. McGraw-Hill, New York. Google Scholar
  32. Steyvers, M., Smyth, P., & Griffiths, T. (2004). Probabilistic author-topic models for information discovery. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’04) (pp. 306–315). Google Scholar
  33. Tang, J., Hong, M., Li, J., & Liang, B. (2006). Tree-structured conditional random fields for semantic annotation. In Proceedings of the 5th international semantic web conference (ISWC’06) (pp. 640–653). Google Scholar
  34. Tang, J., Zhang, D., & Yao, L. (2007). Social network extraction of academic researchers. In Proceedings of 2007 IEEE international conference on data mining (ICDM’07) (pp. 292–301). Google Scholar
  35. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008a). Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’08) (pp. 990–998). Google Scholar
  36. Tang, J., Jin, R., & Zhang, J. (2008b). A topic modeling approach and its integration into the random walk framework for academic search. In Proceedings of 2008 IEEE international conference on data mining (ICDM’08) (pp. 1055–1060). Google Scholar
  37. Tang, J., Yao, L., Zhang, D., & Zhang, J. (2010, to appear). A combination approach to web user profiling. ACM Transactions on Knowledge Discovery from Data. Google Scholar
  38. Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In Proceedings of the twenty-first international conference on machine learning (ICML’04) (pp. 823–830). Google Scholar
  39. Wainwright, M. J., Jaakkola, T., & Willsky, A. S. (2001). Tree-based reparameterization for approximate estimation on loopy graphs. In Proceedings of the 13th neural information processing systems (NIPS’01) (pp. 1001–1008). Google Scholar
  40. Xi, W., Zhang, B., Chen, Z., Lu, Y., Yan, S., Ma, W.-Y., & Fox, E. A. (2004). Link fusion: a unified link analysis framework for multi-type interrelated data objects. In Proceedings of the 13th international conference on world wide web (WWW’04) (pp. 319–327). Google Scholar
  41. Xi, W., Fox, E. A., Fan, W., Zhang, B., Chen, Z., Yan, J., & Zhuang, D. (2005). Simfusion: measuring similarity using unified relationship matrix. In Proceedings of the 28th ACM SIGIR international conference on information retrieval (SIGIR’2005) (pp. 130–137). Google Scholar
  42. Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th ACM SIGIR international conference on information retrieval (SIGIR’01) (pp. 334–342). Google Scholar
  43. Zhai, C., Velivelli, A., & Yu, B. (2004). A cross-collection mixture model for comparative text mining. In KDD’04 (pp. 743–748). Google Scholar
  44. Zhang, D., Tang, J., & Li, J. (2007a). A constraint-based probabilistic framework for name disambiguation. In Proceedings of the 16th conference on information and knowledge management (CIKM’07) (pp. 1019–1022). Google Scholar
  45. Zhang, J., Ackerman, M. S., & Adamic, L. (2007b). Expertise networks in online communities: structure and algorithms. In Proceedings of the 16th international conference on world wide web (WWW’07) (pp. 221–230). Google Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  • Jie Tang
    • 1
  • Jing Zhang
    • 1
  • Ruoming Jin
    • 2
  • Zi Yang
    • 1
  • Keke Cai
    • 3
  • Li Zhang
    • 3
  • Zhong Su
    • 3
  1. 1.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  2. 2.Department of Computer ScienceKent State UniversityKentUSA
  3. 3.IBM, China Research LabBeijingChina

Personalised recommendations