Chinese document re-ranking based on automatically acquired term resource

Article

Abstract

In this paper, we address the problem of document re-ranking in information retrieval, which is usually conducted after initial retrieval to improve rankings of relevant documents. To deal with this problem, we propose a method which automatically constructs a term resource specific to the document collection and then applies the resource to document re-ranking. The term resource includes a list of terms extracted from the documents as well as their weighting and correlations computed after initial retrieval. The term weighting based on local and global distribution ensures the re-ranking not sensitive to different choices of pseudo relevance, while the term correlation helps avoid any bias to certain specific concept embedded in queries. Experiments with NTCIR3 data show that the approach can not only improve performance of initial retrieval, but also make significant contribution to standard query expansion.

Keywords

Term extraction Term weighting Maximal marginal relevance Document re-ranking Information retrieval 

References

  1. Balinski, J., & Danilowicz, C. (2005). Re-ranking method based on inter-document distance. Information Processing and Management, 41, 759–775.CrossRefGoogle Scholar
  2. Bear, J., Israel, D., Petit J., & Martin D. (1997). Using information extraction to improve document retrieval. Proceedings of TREC.Google Scholar
  3. Chen, K., Chen, H., Kando, N., Kuriyama, K., Lee, S., Sung, H., et al. (2003). Overview of CLIR task at the third NTCIR workshop. Proceedings of NTCIR III.Google Scholar
  4. Crouch, C., Crouch, D., Chen, Q., & Holtz, S. (2002). Improving the retrieval effectiveness of very short queries. Information Processing and Management, 38, 1–36.CrossRefGoogle Scholar
  5. Diaz, F. (2005). Regularizing ad hoc retrieval scores. Proceedings of CIKM.Google Scholar
  6. Kamps, J. (2004). Improving retrieval effectiveness by reranking documents based on controlled vocabulary. Proceedings of ECIR.Google Scholar
  7. Kurland, O., & Lee L. (2005). PageRank without hyper-links: Structural re-ranking using links induced by language models. Proceedings of the 28th ACM SIGIR.Google Scholar
  8. Lee, K., Park, Y., & Choi, K. S. (2001). Document re-ranking model using clusters. Information Processing and Management, 37(1), 1–14.CrossRefGoogle Scholar
  9. Luk, R. W. P., & Wong, K. F. (2002) Pseudo-relevance feedback and title re-ranking for Chinese IR. Proceedings of NTCIR Workshop 4.Google Scholar
  10. Mitra, M., Singhal A., & Buckley, C. (1998). Improving automatic query expansion. Proceedings of ACM SIGIR.Google Scholar
  11. Qu, Y. L., Xu, G. W., & Wang J. (2000). Rerank method based on individual thesaurus. Proceedings of NTCIR2 Workshop.Google Scholar
  12. Robertson, S. E., & Jones, K. S. (1977). Relevance weighting of search terms. Journal of the American Society for Information Science, 27.Google Scholar
  13. Robertson, S. E., Walker, S., & Jones K. S. (1995). Okapi at TREC-3. Proceedings of TREC.Google Scholar
  14. Rocchio, J. (1971). Relevant feedback in information retrieval. In G. Salton (Ed.), The smart retrieval system: Experiments in automatic document processing. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
  15. Salton, G. (1968). Automatic information organization and retrieval. New York: McGraw Hill Text.Google Scholar
  16. Schutze, H. (1998). The hypertext concordance: A better back-of-the-book index. Proceedings of First Workshop on Computational Terminology.Google Scholar
  17. Tao, T., & Zhai. C. X., (2004). A mixture clustering model for pseudo feedback in information retrieval. Proceedings of the Meeting of the International Federation of Classification Societies.Google Scholar
  18. Xu, J., & Croft, W. B. (1996). Query expansion using local and global document analysis. Proceedings of ACM SIGIR.Google Scholar
  19. Xu, J., & Croft, W. B. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18(1), 79–112.CrossRefGoogle Scholar
  20. Yang, L. P., Ji D. H., & Tang L. (2004). Document re-ranking based on automatically acquired key terms in chinese information retrieval. Proceedings of 20th COLING.Google Scholar
  21. Yang, L. P., Ji, D. H., & Zhou, G. D. (2006). Document re-ranking using cluster validation and label propagation. Proceedings of CIKM.Google Scholar
  22. Yang, L. P., Ji, D. H., Zhou, G. D., & Nie, Y. (2005). Improving retrieval effectiveness by using key terms in top retrieved documents. Proceedings of 27th ECIR.Google Scholar
  23. Zhai, C. X., & Lafferty, J. (2002). Two-stage language models for information retrieval. Proceedings of the 25th ACM SIGIR.Google Scholar
  24. Zhang, B. Y., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., et al. (2005). Improving search results using affinity graph. Proceedings of the 28th ACM SIGIR Conference.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  1. 1.Department of Computer Science, Center for Study of Language InformationWuhan UniversityWuhanChina
  2. 2.Department of Chinese Language and LiteratureWuhan UniversityWuhanChina
  3. 3.Center for Study of Language InformationWuhan UniversityWuhanChina

Personalised recommendations