Skip to main content
Log in

Exploring noise control strategies for UMLS-based query expansion in health and biomedical information retrieval

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

In the past decades, ontology-based query expansion has been studied to improve health and biomedical information retrieval by many researchers, but the results of previous works are inconsistent. Query expansion with domain ontologies could introduce noise that degrades the retrieval performance, therefore noise control is the key to its success. In this paper, we explore three noise control strategies for UMLS-based query expansion. The first strategy is the adoption of a word-phrase hybrid retrieval model, and the other two strategies explored are expansion term weighting and term filtering. All the three strategies are implemented based on the Indri search engine and evaluated on two standard datasets, OHSUMED and TREC Genomic Track 2006. The experimental results indicate that the word-phrase hybrid retrieval model is superior to the word-based model and the pure phrase-based model, and beneficial to not only baseline retrieval but also query expansion. Expansion term weighting is an effective strategy to suppress term noise and improve retrieval performance. And expansion term filtering can also give some positive effects in most cases but is not as effective as the other two strategies. By combining the three strategies together, the best retrieval performances can be achieved on both datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://umlsinfo.nlm.nih.gov.

  2. http://metamap.nlm.nih.gov.

  3. http://www.lemurproject.org.

References

  • Arampatzis A, Tsoris T, Koster C, van der Weide T (1998) Phase-based information retrieval. Inf Process Manag 34(6):693–707

    Article  Google Scholar 

  • Aronson AR, Rindflesch TC (1997) Query expansion using the UMLS Metathesaurus. In: Proceedings of AMIA Annual Fall Symposium, pp 485–489

  • Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing Co., Inc, Bosten

    Google Scholar 

  • Callan JP, Croft WB, Harding SM (1992) The INQUERY Retrieval System. In: Ramos I, Tjoa AM (eds) Database and expert systems applications. Springer, Vienna, pp 78–83

    Chapter  Google Scholar 

  • Croft WB, Turtle HR, Lewis DD (1991) The use of phrases and structured queries in information retrieval. In: Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval—SIGIR ’91, New York, USA, pp 32–45

  • Fagan JL (1987) Automatic phrase indexing for document retrieval: an examination of syntactic and non-syntactic methods. In: Proceedings of the 10th annual international ACM SIGIR conference on research and development in information retrieval, Buckley 1985, pp 91–101

  • Gao J, Nie JY, Wu G, Cao G (2004) Dependence language model for information retrieval. In: Proceedings of the 27th annual international conference on Research and development in information retrieval—SIGIR ’04, ACM Press, New York, USA, pp 170–177

  • Guo Y, Harkema H, Gaizauskas R (2004) Sheffield University and the TREC 2004 genomics track: query expansion using synonymous terms. In: Proceedings of the thirteenth text retrieval conference (TREC 2004)

  • Hersh W, Buckley C, Leone TJ, Hickam D (1994) OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval—SIGIR ’94, Springer, New York, pp 192–201

  • Hersh W, Price S, Donohoe L (2000) Assessing thesaurus-based query expansion using the UMLS Metathesaurus. In: Proceedings of AMIA annual symposium, pp 344–348

  • Hersh W, Cohen AM, Roberts P, Rekapalli HK (2006) TREC 2006 genomics track overview. In: The fifteenth text retrieval conference (TREC 2006), pp 14–17

  • Hettne KM, van Mulligen EM, Schuemie MJ, Schijvenaars BJ, Ja Kors (2010) Rewriting and suppressing UMLS terms for improved biomedical term identification. J Biomed Semant 1(1):5

    Article  Google Scholar 

  • Himani S, Vaidehi D (2018) A survey on medical information retrieval. In: Satapathy SC, Joshi A (eds) Information and communication technology for intelligent systems (ICTIS 2017) -, vol 1. Springer International Publishing, Cham, pp 543–550

    Google Scholar 

  • Lang FM, Aronson AR (2010) Filtering the UMLS metathesaurus for MetaMap. Technical report, NLM

  • Li J, Liu C, Liu B, Mao R, Wang Y, Chen S, Yang JJ, Pan H, Wang Q (2015) Diversity-aware retrieval of medical records. Comput Ind 69(Supplement C):81–91 (special Issue: Information Technologies for Enhanced Healthcare)

    Article  Google Scholar 

  • Liu Z, Chu WW (2007) Knowledge-based query expansion to support scenario-specific retrieval of medical free text. Inf Retr 10(2):173–202

    Article  MathSciNet  Google Scholar 

  • Lu Z (2011) PubMed and beyond: a survey of web tools for searching biomedical literature. Database J Biol Databases Curation 2011:baq36. https://doi.org/10.1093/database/baq036

    Article  Google Scholar 

  • Metzler D, Croft W (2004) Combining the language model and inference network approaches to retrieval. Inf Process Manag Spec Issue Bayesian Netw Inf Retr 40(5):735–750

    Google Scholar 

  • Metzler D, Croft WB (2005) A Markov random field model for term dependencies. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval—SIGIR’05. ACM Press, New York, USA, pp 472–479

  • Mu X, Lu K (2010) Towards effective genomic information retrieval: the impact of query complexity and expansion strategies. J Inf Sci 36(2):194–208

    Article  Google Scholar 

  • Pickens J, Croft W (2000) An exploratory analysis of phrases in text retrieval. In: Proceedings of RIAO

  • Sakkopoulos E, Sourla E, Tsakalidis A, Lytras MD (2008) Integrated system for e-health advisory web services provision using broadband networks. Int J Soc Humanist Comput 1(1):36–52

    Article  Google Scholar 

  • Stokes N, Li Y, Cavedon L, Zobel J (2008) Exploring criteria for successful query expansion in the genomic domain. Inf Retr 12(1):17–50

    Article  Google Scholar 

  • Wu H, Tian C (2013) Thesaurus-assistant query expansion for context-based medical image retrieval. In: Jin JS, Xu C, Xu M (eds) The era of interactive media. Springer, New York, pp 15–24

    Chapter  Google Scholar 

  • Yang JJ, Li J, Mulder J, Wang Y, Chen S, Wu H, Wang Q, Pan H (2015a) Emerging information technologies for enhanced healthcare. Comput Ind 69(Supplement C):3–11 (special Issue: Information Technologies for Enhanced Healthcare)

    Article  Google Scholar 

  • Yang Z, Kotov A, Mohan A, Lu S (2015b) Parametric and non-parametric user-aware sentiment topic models. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, ACM, New York, NY, USA, SIGIR ’15, pp 413–422. https://doi.org/10.1145/2766462.2767758

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianqiang Li.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, H., Li, J., Kang, Y. et al. Exploring noise control strategies for UMLS-based query expansion in health and biomedical information retrieval. J Ambient Intell Human Comput 15, 1825–1836 (2024). https://doi.org/10.1007/s12652-018-0836-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-018-0836-x

Keywords

Navigation