Abstract
One of the most important research topics in Information Retrieval is term weighting for document ranking and retrieval, such as TFIDF, BM25, etc. We propose a term weighting method that utilizes past retrieval results consisting of the queries that contain a particular term, retrieval documents, and their relevance judgments. A term’s Discrimination Power(DP) is based on the difference degree of the term’s average weights obtained from between relevant and non-relevant retrieved document sets. The difference based DP performs better compared to ratio based DP introduced in the previous research. Our experimental result shows that a term weighting scheme based on the discrimination power method outperforms a TF*IDF based scheme.
Similar content being viewed by others
References
Baeza-Yates R (1999) Modern information retrieval. ACM Press
Broglio J, Callan JP, Croft WB, Nachbar DW (1994) Nachbar, document retrieval and routing using the INQUERY system. In: Proceedings of the third text REtrieval conference
Cao G, Nie J-Y, Si L, Bai J (2007) Learning to rank documents for Ad-Hoc retrieval with regularized models. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval (SIGIR 2007)
Chun H-W, Jeong C-H, Song S-K, Choi Y-S, Choi S-P, Sung W-K (2011) Composite kernel–based relation extraction using predicate-argument structure, UNESST 2011
Chun H-W, Jeong C-H, Song S-K, Choi Y-S, Jeong D-H, Choi S-P, Sung W-K (2011) Smart searching system for virtual science brain. LNCS 6890:324–332
Craswell N, Zaragoza H, Robertson S (2005) Microsoft Cambridge at TREC-14: enterprise track. In: Proceedings of the Fourteenth Text REtrieval Conference, Gaithersburg
Cummins R, O'Riordan C (2006) Evolving local and global weighting schemes in information retrieval. J Inf Retr 9(3):311–330
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46:667–668
Pahikkala T, Tsivtsivadze E, Airola A, Boberg J, Salakoski T (2007) Learning to rank with pairwise regularized least-squares. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval (SIGIR 2007)
Robertson S (2004) Understanding inverse document frequency: on theoretical arguments for IDF. J Doc 5:503–520
Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M (1996) Okapi at TREC-4. In: The Proceedings of the Fourth Text REtrieval Conference
Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Inf Sci 41(4):288–297
Song S-K, Myaeng SH (2012) A novel term weighting scheme based on discrimination power obtained from past retrieval results. Inf Process Manag, http://dx.doi.org/10.1016/j.ipm.2012.03.004
Turtle H, Croft WB (1991) Evaluation of an inference network-based retrieval model. ACM Trans Inf Syst (TOIS) 9(3):187–222
Yeh J-Y, Lin J-Y, Ke H-R, Yang W-P (2007) Learning to rank for information retrieval using genetic programming. In: Proceedings of the ACM Conference on Research and Development in Information Retrieval (SIGIR 2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, Q., Lee, S., Jung, H. et al. Term weighting for information retrieval based on term’s discrimination power. Multimed Tools Appl 71, 769–781 (2014). https://doi.org/10.1007/s11042-013-1420-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1420-1