Skip to main content
Log in

Information Retrieval for short documents

  • Published:
Journal of Electronics (China)

Abstract

The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the abstract is available, the word-use variability problem will have substantial impact on the Information Retrieval (IR) performance. To solve the problem, a new technology to short document retrieval named Reference Document Model (RDM) is put forward in this letter. RDM gets the statistical semantic of the query/document by pseudo feedback both for the query and document from reference documents. The contributions of this model are three-fold: (1) Pseudo feedback both for the query and the document; (2) Building the query model and the document model from reference documents; (3) Flexible indexing units, which can be any linguistic elements such as documents, paragraphs, sentences, n-grams, term or character. For short document retrieval, RDM achieves significant improvements over the classical probabilistic models on the task of ad hoc retrieval on Text REtrieval Conference (TREC) test sets. Results also show that the shorter the document, the better the RDM performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Reference

  1. M. J. Bates. Subject access in online catalogs: a design model. Journal of the American Society for Information Science and Technology, 37(1986)6, 357–376.

    Article  Google Scholar 

  2. D. Tarr, H. Borko. Factors influencing inter-indexer consistency. Proceedings of the American Society for Information Science (ASIS) 37th Annual Meeting, Washington DC, 1974, vol. 11, 50–55.

    Google Scholar 

  3. R. Fidel. Individual variability in online searching behavior. C. A. Parkhurst (ed.). Proceedings of the American Society for Information Science (ASIS) 48th Annual Meeting, Las Vegas, 1985, vol.22, 69–72.

  4. G. Salton, C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(1990)4, 288–297.

    Article  Google Scholar 

  5. D. Harman. Relevance feedback revisited. Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’92), Copenhagen, Denmark, 1992, 1–10.

  6. S. Deerwester, S. T. Dumais, G. W. Furnas, et al. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(1990)1, 391–407.

    Article  Google Scholar 

  7. J. Lafferty, C. Zhai. Document language models, query models, and risk minimization for information retrieval. Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01), New Orleans, Louisiana, United States, 2001, 111–119.

  8. C. Zhai. Risk minimization and language modeling in text retrieval. [Ph.D. Dissertation], University of Massachusetts, Amherst, 2002.

    Google Scholar 

  9. G. Salton, A. Wong, C. S. Yang. A vector space model for information retrieval. Communications of the ACM, 18(1975)11, 613–620.

    Article  Google Scholar 

  10. C. Raman, C. Harr, C. O. Simon, et al. Subwebs for specialized search. Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’04), Sheffield, United Kingdom, 2004, 480–481.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Haoliang Ph.D..

Additional information

Supported by the Funds of Heilongjiang Outstanding Young Teacher (1151G037).

About this article

Cite this article

Qi, H., Li, M., Gao, J. et al. Information Retrieval for short documents. J. of Electron.(China) 23, 933–936 (2006). https://doi.org/10.1007/s11767-006-0044-2

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11767-006-0044-2

Key words

Navigation