Abstract
In the past decades, ontology-based query expansion has been studied to improve health and biomedical information retrieval by many researchers, but the results of previous works are inconsistent. Query expansion with domain ontologies could introduce noise that degrades the retrieval performance, therefore noise control is the key to its success. In this paper, we explore three noise control strategies for UMLS-based query expansion. The first strategy is the adoption of a word-phrase hybrid retrieval model, and the other two strategies explored are expansion term weighting and term filtering. All the three strategies are implemented based on the Indri search engine and evaluated on two standard datasets, OHSUMED and TREC Genomic Track 2006. The experimental results indicate that the word-phrase hybrid retrieval model is superior to the word-based model and the pure phrase-based model, and beneficial to not only baseline retrieval but also query expansion. Expansion term weighting is an effective strategy to suppress term noise and improve retrieval performance. And expansion term filtering can also give some positive effects in most cases but is not as effective as the other two strategies. By combining the three strategies together, the best retrieval performances can be achieved on both datasets.
Similar content being viewed by others
References
Arampatzis A, Tsoris T, Koster C, van der Weide T (1998) Phase-based information retrieval. Inf Process Manag 34(6):693–707
Aronson AR, Rindflesch TC (1997) Query expansion using the UMLS Metathesaurus. In: Proceedings of AMIA Annual Fall Symposium, pp 485–489
Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing Co., Inc, Bosten
Callan JP, Croft WB, Harding SM (1992) The INQUERY Retrieval System. In: Ramos I, Tjoa AM (eds) Database and expert systems applications. Springer, Vienna, pp 78–83
Croft WB, Turtle HR, Lewis DD (1991) The use of phrases and structured queries in information retrieval. In: Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval—SIGIR ’91, New York, USA, pp 32–45
Fagan JL (1987) Automatic phrase indexing for document retrieval: an examination of syntactic and non-syntactic methods. In: Proceedings of the 10th annual international ACM SIGIR conference on research and development in information retrieval, Buckley 1985, pp 91–101
Gao J, Nie JY, Wu G, Cao G (2004) Dependence language model for information retrieval. In: Proceedings of the 27th annual international conference on Research and development in information retrieval—SIGIR ’04, ACM Press, New York, USA, pp 170–177
Guo Y, Harkema H, Gaizauskas R (2004) Sheffield University and the TREC 2004 genomics track: query expansion using synonymous terms. In: Proceedings of the thirteenth text retrieval conference (TREC 2004)
Hersh W, Buckley C, Leone TJ, Hickam D (1994) OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval—SIGIR ’94, Springer, New York, pp 192–201
Hersh W, Price S, Donohoe L (2000) Assessing thesaurus-based query expansion using the UMLS Metathesaurus. In: Proceedings of AMIA annual symposium, pp 344–348
Hersh W, Cohen AM, Roberts P, Rekapalli HK (2006) TREC 2006 genomics track overview. In: The fifteenth text retrieval conference (TREC 2006), pp 14–17
Hettne KM, van Mulligen EM, Schuemie MJ, Schijvenaars BJ, Ja Kors (2010) Rewriting and suppressing UMLS terms for improved biomedical term identification. J Biomed Semant 1(1):5
Himani S, Vaidehi D (2018) A survey on medical information retrieval. In: Satapathy SC, Joshi A (eds) Information and communication technology for intelligent systems (ICTIS 2017) -, vol 1. Springer International Publishing, Cham, pp 543–550
Lang FM, Aronson AR (2010) Filtering the UMLS metathesaurus for MetaMap. Technical report, NLM
Li J, Liu C, Liu B, Mao R, Wang Y, Chen S, Yang JJ, Pan H, Wang Q (2015) Diversity-aware retrieval of medical records. Comput Ind 69(Supplement C):81–91 (special Issue: Information Technologies for Enhanced Healthcare)
Liu Z, Chu WW (2007) Knowledge-based query expansion to support scenario-specific retrieval of medical free text. Inf Retr 10(2):173–202
Lu Z (2011) PubMed and beyond: a survey of web tools for searching biomedical literature. Database J Biol Databases Curation 2011:baq36. https://doi.org/10.1093/database/baq036
Metzler D, Croft W (2004) Combining the language model and inference network approaches to retrieval. Inf Process Manag Spec Issue Bayesian Netw Inf Retr 40(5):735–750
Metzler D, Croft WB (2005) A Markov random field model for term dependencies. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval—SIGIR’05. ACM Press, New York, USA, pp 472–479
Mu X, Lu K (2010) Towards effective genomic information retrieval: the impact of query complexity and expansion strategies. J Inf Sci 36(2):194–208
Pickens J, Croft W (2000) An exploratory analysis of phrases in text retrieval. In: Proceedings of RIAO
Sakkopoulos E, Sourla E, Tsakalidis A, Lytras MD (2008) Integrated system for e-health advisory web services provision using broadband networks. Int J Soc Humanist Comput 1(1):36–52
Stokes N, Li Y, Cavedon L, Zobel J (2008) Exploring criteria for successful query expansion in the genomic domain. Inf Retr 12(1):17–50
Wu H, Tian C (2013) Thesaurus-assistant query expansion for context-based medical image retrieval. In: Jin JS, Xu C, Xu M (eds) The era of interactive media. Springer, New York, pp 15–24
Yang JJ, Li J, Mulder J, Wang Y, Chen S, Wu H, Wang Q, Pan H (2015a) Emerging information technologies for enhanced healthcare. Comput Ind 69(Supplement C):3–11 (special Issue: Information Technologies for Enhanced Healthcare)
Yang Z, Kotov A, Mohan A, Lu S (2015b) Parametric and non-parametric user-aware sentiment topic models. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, ACM, New York, NY, USA, SIGIR ’15, pp 413–422. https://doi.org/10.1145/2766462.2767758
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, H., Li, J., Kang, Y. et al. Exploring noise control strategies for UMLS-based query expansion in health and biomedical information retrieval. J Ambient Intell Human Comput 15, 1825–1836 (2024). https://doi.org/10.1007/s12652-018-0836-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-018-0836-x