Advertisement

Entity Ranking from Annotated Text Collections Using Multitype Topic Models

  • Hitohiro Shiozaki
  • Koji Eguchi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4862)

Abstract

Very recently, topic model-based retrieval methods have produced good results using Latent Dirichlet Allocation (LDA) model or its variants in language modeling framework. However, for the task of retrieving annotated documents when using the LDA-based methods, some post-processing is required outside the model in order to make use of multiple word types that are specified by the annotations. In this paper, we explore new retrieval methods using a ‘multitype topic model’ that can directly handle multiple word types, such as annotated entities, category labels and other words that are typically used in Wikipedia. We investigate how to effectively apply the multitype topic model to retrieve documents from an annotated collection, and show the effectiveness of our methods through experiments on entity ranking using a Wikipedia collection.

Keywords

Information Retrieval Topic Model Latent Dirichlet Allocation Retrieval Model Word Type 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, USA, pp. 50–57 (1999)Google Scholar
  2. 2.
    Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, pp. 186–193 (2004)Google Scholar
  3. 3.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHCrossRefGoogle Scholar
  4. 4.
    Shiozaki, H., Koji, E., Ohkawa, T.: Entity network prediction using multitype topic models. In: The 12th Pacific-Asia Conference on Knowlede Discovery and Data Mining, Osaka, Japan (to appear, 2008)Google Scholar
  5. 5.
    Ueda, N., Saito, K.: Parametric mixture models for multi-labeled text. Advances in Neural Information Processing Systems 15 (2003)Google Scholar
  6. 6.
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America 101, 5228–5235 (2004)CrossRefGoogle Scholar
  7. 7.
    Steyvers, M., Griffiths, T.: 21: Probabilistic Topic Models. In: Handbook of Latent Semantic Analysis. Lawrence Erbaum Associates (2007)Google Scholar
  8. 8.
    Newman, D., Chemudugunta, C., Smyth, P., Steyvers, M.: Statistical entity-topic models. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, Pennsylvania, USA, pp. 680–686 (2006)Google Scholar
  9. 9.
    Teh, Y.W., Newman, D., Welling, M.: A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. Advances in Neural Information Processing Systems 19 (2007)Google Scholar
  10. 10.
    Wei, X., Croft, W.B.: Lda-based document models for ad-hoc retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, pp. 178–185 (2006)Google Scholar
  11. 11.
    Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, pp. 275–281 (1998)Google Scholar
  12. 12.
    Hiemstra, D.: A linguistically motivated probabilistic model of information retrieval. In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 569–584. Springer, Heidelberg (1998)Google Scholar
  13. 13.
    Song, F., Croft, W.B.: A general language model for information retrieval. In: Proceedings of the 8th International Conference on Information and Knowledge Management, Kansas City, Missouri, USA, pp. 316–321 (1999)Google Scholar
  14. 14.
    Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, USA, pp. 334–342 (2001)Google Scholar
  15. 15.
    Denoyer, L., Gallinari, P.: The Wikipedia XML corpus. ACM SIGIR Forum 40, 64–68 (2006)CrossRefGoogle Scholar
  16. 16.
    Baeza-Yates, R., Ribeiro-Neto, B.(eds.): 3: Retrieval Evaluation. In: Modern Information Retrieval, pp. 73–97. Addison-Wesley, Reading (1999)Google Scholar
  17. 17.
    Robertson, S.: On GMAP: and other transformations. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, Arlington, Virginia, USA, pp. 78–83 (2006)Google Scholar
  18. 18.
    Voorhees, E.: The TREC-8 Question Answering Track report. In: Proceedings of the 8th Text REtrieval Conference (TREC-8), NIST Special Publication 500-246, pp. 77–82 (1999)Google Scholar
  19. 19.
    Callan, J.P., Croft, W.B., Harding, S.M.: The INQUERY retrieval system. In: Proceedings of the 3rd International Conference on Database and Expert Systems Applications, Valencia, Spain, pp. 78–83 (1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Hitohiro Shiozaki
    • 1
  • Koji Eguchi
    • 2
  1. 1.Graduate School of Science and TechnologyKobe UniversityKobeJapan
  2. 2.Graduate School of EngineeringKobe UniversityKobeJapan

Personalised recommendations