Domain-Specific Term Rankings Using Topic Models

  • Zhiyuan Liu
  • Maosong Sun
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6458)

Abstract

A widely used approach for keyword extraction and content-based tag recommendation is ranking terms according to some statistical criteria. In many cases documents such as news articles and product reviews are in some specific domains. Domain knowledge may be important information for term rankings. In this paper, we present to model domain knowledge using latent topic models, referred to as Domain-Topic Model (DTM). Using DTM we perform domain-specific term rankings according to the relatedness between terms and domains. Experimental results on both keyword extraction and tag recommendation show advantages of DTM for domain-specific term rankings.

Keywords

Domain-Topic Model term ranking keyword extraction social tag recommendation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via dirichlet forest priors. In: Proceedings of ICML, pp. 25–32 (2009)Google Scholar
  2. 2.
    Blei, D.M., McAuliffe, J.: Supervised topic models. In: Proceedings of NIPS, pp. 121–128 (2007)Google Scholar
  3. 3.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)MATHGoogle Scholar
  4. 4.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 1–7 (1998)CrossRefGoogle Scholar
  5. 5.
    Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling documents by combining semantic concepts with unsupervised statistical learning. In: Proceedings of ISWC, pp. 229–244 (2010)Google Scholar
  6. 6.
    Cohn, D., Chang, H.: Learning to probabilistically identify authoritative documents. In: Proceedings of ICML, pp. 167–174 (2000)Google Scholar
  7. 7.
    Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of OSDI, pp. 137–150 (2004)Google Scholar
  8. 8.
    Frank, E., Paynter, G., Witten, I., Gutwin, C., Nevill-Manning, C.: Domain-specific keyphrase extraction. In: Proceedings of IJCAI, vol. 16, pp. 668–673 (1999)Google Scholar
  9. 9.
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. PNAS 101, 5228–5235 (2004)CrossRefGoogle Scholar
  10. 10.
    Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: Proceedings of WWW, pp. 661–670 (2009)Google Scholar
  11. 11.
    Heinrich, G.: Parameter estimation for text analysis. Tech. rep., Vsonix GmbH and University of Leipzig (2008)Google Scholar
  12. 12.
    Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of EMNLP, pp. 216–223 (2003)Google Scholar
  13. 13.
    Hulth, A., Karlgren, J., Jonsson, A., Bostrm, H., Asker, L.: Automatic Keyword Extraction Using Domain Knowledge. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 472–482. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  14. 14.
    Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel text classification for automated tag suggestion. In: ECML/PKDD Discovery Challenge 2008 (2008)Google Scholar
  15. 15.
    Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Lacoste-Julien, S., Sha, F., Jordan, M.: Disclda: Discriminative learning for dimensionality reduction and classification. In: NIPS, pp. 897–904 (2008)Google Scholar
  17. 17.
    Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization. In: Proceedings of the Workshop Multi-source Multilingual Information Extraction and Summarization, pp. 17–24 (2008)Google Scholar
  18. 18.
    Liu, Z., Huang, W., Zheng, Y., Sun, M.: Extracting keyphrases via topic decomposition. In: Proceedings of EMNLP (2010)Google Scholar
  19. 19.
    Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of EMNLP, pp. 257–266 (2009)Google Scholar
  20. 20.
    Mihalcea, R., Tarau, P.: Textrank: Bringing order into texts. In: Proceedings of EMNLP, pp. 404–411 (2004)Google Scholar
  21. 21.
    Mishne, G.: Autotag: a collaborative approach to automated tag assignment for weblog posts. In: Proceedings of WWW, pp. 953–954 (2006)Google Scholar
  22. 22.
    Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed inference for latent Dirichlet allocation. In: Proceedings of NIPS, pp. 1081–1088 (2007)Google Scholar
  23. 23.
    Over, P., Liggett, W., Gilbert, H., Sakharov, A., Thatcher, M.: Introduction to duc-2001: An intrinsic evaluation of generic news text summarization systems. In: Proceedings of DUC 2001 (2001)Google Scholar
  24. 24.
    Ramage, D., Hall, D., Nallapati, R., Manning, C.: Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of EMNLP, pp. 248–256 (2009)Google Scholar
  25. 25.
    Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of UAI, pp. 487–494 (2004)Google Scholar
  26. 26.
    Tatu, M., Srikanth, M., D’Silva, T.: RSDC 2008: Tag recommendations using bookmark content. ECML/PKDD Discovery Challenge (2008)Google Scholar
  27. 27.
    Turney, P.D.: Learning algorithms for keyphrase extraction. Information Retrieval 2, 303–336 (2000)CrossRefGoogle Scholar
  28. 28.
    Wan, X., Xiao, J.: Collabrank: Towards a collaborative approach to single-document keyphrase extraction. In: Proceedings of COLING, pp. 969–976 (2008)Google Scholar
  29. 29.
    Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of AAAI, pp. 855–860 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Zhiyuan Liu
    • 1
  • Maosong Sun
    • 1
  1. 1.Department of Computer Science and Technology, State Key Lab on Intelligent Technology and Systems, National Lab for Information Science and TechnologyTsinghua UniversityBeijingChina

Personalised recommendations