Consumer health search (CHS) is a challenging domain with vocabulary mismatch and considerable domain expertise hampering peoples’ ability to formulate effective queries. We posit that using knowledge bases for query reformulation may help alleviate this problem. How to exploit knowledge bases for effective CHS is nontrivial, involving a swathe of key choices and design decisions (many of which are not explored in the literature). Here we rigorously empirically evaluate the impact these different choices have on retrieval effectiveness. A state-of-the-art knowledge-base retrieval model—the Entity Query Feature Expansion model—was used to evaluate these choices, which include: which knowledge base to use (specialised vs. general purpose), how to construct the knowledge base, how to extract entities from queries and map them to entities in the knowledge base, what part of the knowledge base to use for query expansion, and if to augment the knowledge base search process with relevance feedback. While knowledge base retrieval has been proposed as a solution for CHS, this paper delves into the finer details of doing this effectively, highlighting both payoffs and pitfalls. It aims to provide some lessons to others in advancing the state-of-the-art in CHS.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Unified Medical Language System (UMLS) is a compendium of many controlled vocabularies in the biomedical sciences.
http://conceptnet.io/c/en/insomnia. Last visited 30/04/2018.
https://sleepfoundation.org/insomnia/content/what-causes-insomnia. Last visited 30/04/2018.
A Wikipedia Infobox is used to summarise important aspects of an entity and its relation with other articles.
A Wikipedia Infobox is used to summarise important aspects of an entity and its relation with other articles.
Only complete string matches were considered.
ECNU-2 had the highest effectiveness, but it used Google query suggestion service to gain expansions.
Aronson, A. R., & Lang, F. M. (2010). An overview of metamap: Historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3), 229–236.
Balaneshinkordan, S., & Kotov, A. (2016). An empirical comparison of term association and knowledge graphs for query expansion. In European conference on information retrieval (pp 761–767). Berlin: Springer.
Bendersky, M., Metzler, D., & Croft, W, (2012), Effective query formulation with multiple information sources. In Proceedings of the 5th ACM international conference on web search and data mining (pp. 443–452).
Bodenreider, O. (2004). The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Research, 32(suppl 1), D267–D270.
Dalton, J., Dietz, L., & Allan, J. (2014). Entity query feature expansion using knowledge base links. In Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval (pp. 365–374).
Díaz-Galiano, M., Martín-Valdivia, M., & Ureña-López, L. (2009). Query expansion with a medical ontology to improve a multimodal information retrieval system. Journal of Computers in Biology and Medicine, 39(4), 396–403.
Egozi, O., Markovitch, S., & Gabrilovich, E. (2011). Concept-based information retrieval using explicit semantic analysis. ACM Transactions on Information Systems (TOIS), 29(2), 8.
Fox, S., & Duggan, M. (2013). Health online 2013. Technical report. http://www.pewinternet.org/2013/01/15/health-online-2013/. Accessed 30 Oct 2018.
Jimmy, Zuccon, G., & Koopman, B. (2016). Boosting titles does not generally improve retrieval effectiveness. In Proceedings of the 21st Australasian document computing symposium (pp. 25–32).
Jimmy, Zuccon, G., & Koopman, B. (2017). Qut ielab at clef 2017 e-health IR task: Knowledge base retrieval for consumer health search. In CLEF.
Jimmy, Zuccon, G., & Koopman, B. (2018). Choices in knowledge-base retrieval for consumer health search. In Proceedings of the 40th European conference on information retrieval. Berlin: Springer.
Keselman, A., Smith, C. A., Divita, G., Kim, H., Browne, A. C., Leroy, G., et al. (2008). Consumer health concepts that do not map to the UMLS: Where do they fit? Journal of the American Medical Informatics Association, 15(4), 496–505.
Keselman, A., Tse, T., Crowell, J., Browne, A., Ngo, L., & Zeng, Q. (2006). Relating consumer knowledge of health terms and health concepts. In Proceedings of American medical informatics association.
Koopman, B., Zuccon, G., Bruza, P., Sitbon, L., & Lawley, M. (2012). Graph-based concept weighting for medical information retrieval. In Proceedings of the 17th Australasian document computing symposium (pp. 80–87).
Kotov, A., & Zhai, C. (2012). Tapping into knowledge base for concept feedback: Leveraging concept net to improve search results for difficult queries. In Proceedings of the 5th ACM international conference on web search and data mining, ACM (pp. 403–412).
Limsopatham, N., Macdonald, C., & Ounis, I. (2013). Inferring conceptual relationships to improve medical records search. In Proceedings of the 10th conference on open research areas in information retrieval (pp. 1–8).
Liu, X., & Fang, H. (2015). Latent entity space: A novel retrieval approach for entity-bearing queries. Information Retrieval Journal, 18(6), 473–503.
Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers, 28(2), 203–208.
McDaid, D., & Park, A. L. (2011). Online health: Untangling the web. Technical report. https://www.bupa.com.au/staticfiles/Bupa/HealthAndWellness/MediaFiles/PDF/LSE_Report_Online_Health.pdf. Accessed 30 Oct 2018.
Palotti, J., Goeuriot, L., Zuccon, G., & Hanbury, A. (2016). Ranking health web pages with relevance and understandability. In Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval (pp. 965–968).
Palotti, J., Zuccon, G., Jimmy, Pecina, P., Lupu, M., Goeuriot, L., Kelly, L., & Hanbury, A. (2017). Clef 2017 task overview: The IR task at the ehealth evaluation lab. In Working notes of conference and labs of the evaluation (CLEF) forum. CEUR workshop proceedings.
Plovnick, R., & Zeng, Q. (2004). Reformulation of consumer health queries with professional terminology: A pilot study. Journal of Medical Internet Research, 6(3), e27.
Sakai, T. (2007). Alternatives to bpref. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07 (pp. 71–78). New York: ACM.
Silva, R., & Lopes, C. (2016). The effectiveness of query expansion when searching for health related content: Infolab at clef ehealth 2016. In CLEF (working notes).
Soldaini, L., Cohan, A., Yates, A., Goharian, N., & Frieder, O. (2015). Retrieving medical literature for clinical decision support. In European conference on information retrieval (pp 538–549). Berlin: Springer.
Soldaini, L., & Goharian, N. (2016). QuickUMLS: A fast, unsupervised approach for medical concept extraction. In SIGIR MedIR workshop, Pisa, Italy.
Soldaini, L., & Goharian, N. (2017). Learning to rank for consumer health search: A semantic approach. In European conference on information retrieval (pp 640–646). Berlin: Springer.
Soldaini, L., Yates, A., Yom-Tov, E., Frieder, O., & Goharian, N. (2016). Enhancing web search in the medical domain via query clarification. Information Retrieval Journal, 19(1–2), 149–173.
Stanton, I., Ieong, S., & Mishra, N. (2014). Circumlocution in diagnostic medical queries. In Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 133–142).
Toms, E., & Latter, C. (2007). How consumers search for health information. Health Informatics Journal, 13(3), 223–235.
Xiong, C., & Callan, J. (2015). Query expansion with freebase. In Proceedings of the 2015 international conference on the theory of information retrieval, ACM (pp. 111–120).
Zeng, Q., Kogan, S., Ash, N., Greenes, R., & Boxwala, A. (2002). Characteristics of consumer terminology for health information retrieval. Methods of Information in Medicine-Methodik der Information in der Medizin, 41(4), 289–298.
Zeng, Q. T., Crowell, J., Plovnick, R. M., Kim, E., Ngo, L., & Dibble, E. (2006). Assisting consumer health information retrieval with query recommendations. Journal of the American Medical Informatics Association, 13(1), 80–90.
Zeng, Q. T., & Tse, T. (2006). Exploring and developing consumer health vocabularies. Journal of the American Medical Informatics Association, 13(1), 24–29.
Zhang, Y. (2014). Searching for specific health-related information in MedlinePlus: Behavioral patterns and user experience. Journal of the Association for Information Science and Technology, 65(1), 53–68.
Zuccon, G., Koopman, B., Nguyen, A., Vickers, D., & Butt, L. (2012). Exploiting medical hierarchies for concept-based information retrieval. In Proceedings of the 17th Australasian document computing symposium (pp. 111–114).
Zuccon, G., Koopman, B., & Palotti, J. (2015). Diagnose this if you can: On the effectiveness of search engines in finding medical self-diagnosis information. In European conference on information retrieval MedIR’15 (pp. 562–567).
Zuccon, G., Palotti, J., Goeuriot, L., Kelly, L., Lupu, M., Pecina, P., Mueller, H., Budaher, J., & Deacon, A. (2016). The IR task at the CLEF eHealth evaluation lab 2016: User-centred health information retrieval. In CLEF 2016-conference and labs of the evaluation forum.
Jimmy is sponsored by the Indonesia Endowment Fund for Education (Lembaga Pengelola Dana Pendidikan/LPDP). Guido Zuccon is the recipient of an Australian Research Council DECRA Research Fellowship (DE180101579) and a Google Faculty Research Award.
Appendix 1: Statistical significance analysis
Appendix 2: List of abbreviations
Consumer health search
Consumer health vocabulary
Entity query feature expansion
CHV entity mapping
CHV mention extraction
CHV source of expansion
Pseudo relevance feedback
Pseudo relevance feedback health term
Relevance feedback health term
Source of expansion
UMLS entity mapping
UMLS mention extraction
Unified medical language system
UMLS source of expansion
Wikipedia entity mapping
Wikipedia mention extraction
Wikipedia source of expansion
<Number of expanded queries, queries with gain, queries with loss>
The average number of terms added in the expanded query
Mean average precision
Normalised discounted cumulative gain at rank 10
Precission at rank 10
Rank-biased precision at rank 10
Residual of the rank-biased precision
About this article
Cite this article
Jimmy, Zuccon, G. & Koopman, B. Payoffs and pitfalls in using knowledge-bases for consumer health search. Inf Retrieval J 22, 350–394 (2019). https://doi.org/10.1007/s10791-018-9344-z