Skip to main content
Log in

Chinese document re-ranking based on automatically acquired term resource

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

In this paper, we address the problem of document re-ranking in information retrieval, which is usually conducted after initial retrieval to improve rankings of relevant documents. To deal with this problem, we propose a method which automatically constructs a term resource specific to the document collection and then applies the resource to document re-ranking. The term resource includes a list of terms extracted from the documents as well as their weighting and correlations computed after initial retrieval. The term weighting based on local and global distribution ensures the re-ranking not sensitive to different choices of pseudo relevance, while the term correlation helps avoid any bias to certain specific concept embedded in queries. Experiments with NTCIR3 data show that the approach can not only improve performance of initial retrieval, but also make significant contribution to standard query expansion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://research.nii.ac.jp/ntcir-ws3/work-en.html.

References

  • Balinski, J., & Danilowicz, C. (2005). Re-ranking method based on inter-document distance. Information Processing and Management, 41, 759–775.

    Article  Google Scholar 

  • Bear, J., Israel, D., Petit J., & Martin D. (1997). Using information extraction to improve document retrieval. Proceedings of TREC.

  • Chen, K., Chen, H., Kando, N., Kuriyama, K., Lee, S., Sung, H., et al. (2003). Overview of CLIR task at the third NTCIR workshop. Proceedings of NTCIR III.

  • Crouch, C., Crouch, D., Chen, Q., & Holtz, S. (2002). Improving the retrieval effectiveness of very short queries. Information Processing and Management, 38, 1–36.

    Article  Google Scholar 

  • Diaz, F. (2005). Regularizing ad hoc retrieval scores. Proceedings of CIKM.

  • Kamps, J. (2004). Improving retrieval effectiveness by reranking documents based on controlled vocabulary. Proceedings of ECIR.

  • Kurland, O., & Lee L. (2005). PageRank without hyper-links: Structural re-ranking using links induced by language models. Proceedings of the 28th ACM SIGIR.

  • Lee, K., Park, Y., & Choi, K. S. (2001). Document re-ranking model using clusters. Information Processing and Management, 37(1), 1–14.

    Article  Google Scholar 

  • Luk, R. W. P., & Wong, K. F. (2002) Pseudo-relevance feedback and title re-ranking for Chinese IR. Proceedings of NTCIR Workshop 4.

  • Mitra, M., Singhal A., & Buckley, C. (1998). Improving automatic query expansion. Proceedings of ACM SIGIR.

  • Qu, Y. L., Xu, G. W., & Wang J. (2000). Rerank method based on individual thesaurus. Proceedings of NTCIR2 Workshop.

  • Robertson, S. E., & Jones, K. S. (1977). Relevance weighting of search terms. Journal of the American Society for Information Science, 27.

  • Robertson, S. E., Walker, S., & Jones K. S. (1995). Okapi at TREC-3. Proceedings of TREC.

  • Rocchio, J. (1971). Relevant feedback in information retrieval. In G. Salton (Ed.), The smart retrieval system: Experiments in automatic document processing. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Salton, G. (1968). Automatic information organization and retrieval. New York: McGraw Hill Text.

    Google Scholar 

  • Schutze, H. (1998). The hypertext concordance: A better back-of-the-book index. Proceedings of First Workshop on Computational Terminology.

  • Tao, T., & Zhai. C. X., (2004). A mixture clustering model for pseudo feedback in information retrieval. Proceedings of the Meeting of the International Federation of Classification Societies.

  • Xu, J., & Croft, W. B. (1996). Query expansion using local and global document analysis. Proceedings of ACM SIGIR.

  • Xu, J., & Croft, W. B. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18(1), 79–112.

    Article  Google Scholar 

  • Yang, L. P., Ji D. H., & Tang L. (2004). Document re-ranking based on automatically acquired key terms in chinese information retrieval. Proceedings of 20th COLING.

  • Yang, L. P., Ji, D. H., & Zhou, G. D. (2006). Document re-ranking using cluster validation and label propagation. Proceedings of CIKM.

  • Yang, L. P., Ji, D. H., Zhou, G. D., & Nie, Y. (2005). Improving retrieval effectiveness by using key terms in top retrieved documents. Proceedings of 27th ECIR.

  • Zhai, C. X., & Lafferty, J. (2002). Two-stage language models for information retrieval. Proceedings of the 25th ACM SIGIR.

  • Zhang, B. Y., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., et al. (2005). Improving search results using affinity graph. Proceedings of the 28th ACM SIGIR Conference.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Donghong Ji.

Additional information

First author is supported by NSF (60773011), NSF(90820005), and first two authors are supported by Wuhan University 985 Project (985yk004).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ji, D., Zhao, S. & Xiao, G. Chinese document re-ranking based on automatically acquired term resource. Lang Resources & Evaluation 43, 385–406 (2009). https://doi.org/10.1007/s10579-009-9106-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-009-9106-z

Keywords

Navigation