Information Retrieval Journal

, Volume 19, Issue 1–2, pp 189–224 | Cite as

How users search and what they search for in the medical domain

Understanding laypeople and experts through query logs
  • João Palotti
  • Allan Hanbury
  • Henning Müller
  • Charles E. KahnJr.
Medical Information Retrieval

Abstract

The internet is an important source of medical knowledge for everyone, from laypeople to medical professionals. We investigate how these two extremes, in terms of user groups, have distinct needs and exhibit significantly different search behaviour. We make use of query logs in order to study various aspects of these two kinds of users. The logs from America Online, Health on the Net, Turning Research Into Practice and American Roentgen Ray Society (ARRS) GoldMiner were divided into three sets: (1) laypeople, (2) medical professionals (such as physicians or nurses) searching for health content and (3) users not seeking health advice. Several analyses are made focusing on discovering how users search and what they are most interested in. One possible outcome of our analysis is a classifier to infer user expertise, which was built. We show the results and analyse the feature set used to infer expertise. We conclude that medical experts are more persistent, interacting more with the search engine. Also, our study reveals that, conversely to what is stated in much of the literature, the main focus of users, both laypeople and professionals, is on disease rather than symptoms. The results of this article, especially through the classifier built, could be used to detect specific user groups and then adapt search results to the user group.

Keywords

Query log analysis Health search User behavior 

References

  1. Aronson, A. R. (2001). Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program (pp. 17–21).Google Scholar
  2. Aronson, A. R., & Rindflesch, T. C. (1997). Query expansion using the UMLS Metathesaurus. In Proceedings of the AMIA annual symposium (pp. 485–489).Google Scholar
  3. Aronson, A. R., Bodenreider, O., Chang, H. F., Humphrey, S. M., Mork, J. G., Nelson, S. J., Rindflesch, T. C., & Wilbur, W. J. (2000). The NLM Indexing Initiative (pp. 17–21), Lister Hill National Center for Biomedical Communications (LHNCBC), National Library of Medicine, Bethesda, MD 20894, USA.Google Scholar
  4. Aronson, A. R., & Lang, F. (2010). An overview of metamap: Historical perspective and recent advances. JAMIA, 17(3), 229–236.Google Scholar
  5. Bhavnani, S. K. (2002). Domain-specific search strategies for the effective retrieval of healthcare and shopping information. In CHI ’02 extended abstracts on human factors in computing systems (pp. 610–611), CHI EA ’02. ACM.Google Scholar
  6. Boyer, C., Baujard, V., & Geissbuhler, A. (2011). Evolution of Health Web certification through the HONcode experience. Studies in Health Technology and Informatics, 169, 53–57.Google Scholar
  7. Brenes, D. J., & Gayo-Avello, D. (2009). Stratified analysis of AOL query log. Information Sciences, 179(12), 1844–1858.CrossRefGoogle Scholar
  8. Cartright, M.-A., White, R. W., & Horvitz, E. (2011). Intentions and attention in exploratory health search. In Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval (pp. 65–74), SIGIR ’11, New York, NY, USA, ACM.Google Scholar
  9. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). London: Routledge.MATHGoogle Scholar
  10. Cole, M. J., Gwizdka, J., Liu, C., Belkin, N. J., & Zhang, X. (2013). Inferring user knowledge level from eye movement patterns. Information Processing and Management, 49(5), 1075–1091.CrossRefGoogle Scholar
  11. Collins-Thompson, K., Bennett, P. N., White, R. W., de la Chica, S., & Sontag, D. (2011). Personalizing web search results by reading level. In Proceedings of the 20th ACM international conference on information and knowledge management (pp. 403–412), CIKM ’11, New York, NY, USA, ACM.Google Scholar
  12. Demner-Fushman, D., Humphrey, S. M., Ide, N. C., Loane, R. F., Mork, J. G., Ruch, P., Ruiz, M. E., Smith, L. H., Wilbur, W. J., & Aronson, A. R. (2007). Combining resources to find answers to biomedical questions. In Proceedings of the sixteenth text retrieval conference, TREC 2007, Gaithersburg, Maryland, USA, November 5–9, 2007.Google Scholar
  13. Denny, J. C., Smithers, J. D., Miller, R. A., & Spickard, A. (2003). “Understanding” medical school curriculum content using KnowledgeMap. Journal of the American Medical Informatics Association, 10(4), 351–362.CrossRefGoogle Scholar
  14. Duarte Torres, S., Hiemstra, D., & Serdyukov, P. (2010). Query log analysis in the context of information retrieval for children. In Proceeding of the 33rd international ACM SIGIR conference on research and development in information retrieval (pp. 847–848), New York, ACM.Google Scholar
  15. Duggan, G. B., & Payne, S. J. (2008). Knowledge in the head and on the web: Using topic expertise to aid search. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 39–48), CHI ’08.Google Scholar
  16. Eurobarometer. (2014). European citizens’ digital health literacy. Technical report, European Commision.Google Scholar
  17. Fox, S. (2011). Health topics. Technical report, The Pew Internet & American Life Project.Google Scholar
  18. Fox, S., & Duggan, M. (2013). Health online 2013. Technical report, The Pew Internet & American Life Project.Google Scholar
  19. Gayo-Avello, D. (2009). A survey on session detection methods in query logs and a proposal for future evaluation. Information Sciences, 179(12), 1822–1843.CrossRefGoogle Scholar
  20. Goeuriot, L., Kelly, L., Li, W., Palotti, J., Pecina, P., Zuccon, G., Hanbury, A., Jones, G. J. F., & Müller, H. (2014). ShARe/CLEF eHealth Evaluation Lab 2014, Task 3: User-centred Health Information Retrieval. In Working notes for CLEF 2014 conference, Sheffield, UK, September 15–18, 2014 (pp. 43–61).Google Scholar
  21. He, D., & Göker, A. (2000). Detecting session boundaries from web user logs. In Proceedings of the BCS-IRSG 22nd annual colloquium on information retrieval research (pp. 57–66).Google Scholar
  22. Herskovic, J., Tanaka, L., Hersh, W., & Bernstam, E. (2007). A day in the life of PubMed: Analysis of a typical day’s query log. Journal of the American Medical Informatics Association, 14(2), 212–220.CrossRefGoogle Scholar
  23. Hollink, V., Tsikrika, T., & de Vries, A. P. (2011). Semantic search log analysis: A method and a study on professional image search. Journal of the American Society for Information Science and Technology, 62(4), 691–713.CrossRefGoogle Scholar
  24. Hsieh-Yee, I. (1993). Effects of search experience and subject knowledge on the search tactics of novice and experienced searchers. Journal of the Association for Information Science and Technology, 44, 161–174.Google Scholar
  25. Islamaj Dogan, R., Murray, G. C., Névéol, A., & Lu, Z. (2009). Understanding PubMed® user search behavior through log analysis. Database, 2009, bap018.Google Scholar
  26. Jadhav, A. S., Sheth, A. P., & Pathak, J. (2014). Online information searching for cardiovascular diseases: An analysis of mayo clinic search query logs. Studies in Health Technology and Informatics, 205, 702–706.Google Scholar
  27. Jansen, B. J., & Spink, A. (2006). How are we searching the world wide web?: A comparison of nine search engine transaction logs. Information Processing and Management, 42(1), 248–263.CrossRefGoogle Scholar
  28. Jansen, B. J., Spink, A., Bateman, J., & Saracevic, T. (1998). Real life information retrieval: A study of user queries on the web. SIGIR Forum, 32(1), 5–17.CrossRefGoogle Scholar
  29. Jansen, B., Spink, A., & Taksai, I. (2008). Handbook of research on web log analysis. Information science reference. Hershey, PA: IGI Global Publishing.Google Scholar
  30. Jones, R., & Klinkner, K. L. (2008). Beyond the session timeout: Automatic hierarchical segmentation of search topics in query logs. In Proceedings of the 17th ACM conference on information and knowledge management (pp. 699–708), CIKM ’08, New York, NY, USA, ACM.Google Scholar
  31. Kritz, M., Gschwandtner, M., Stefanov, V., Hanbury, A., & Samwald, M. (2013). Utilization and perceived problems of online medical resources and search tools among different groups of european physicians. Journal of Medical Internet Research, 15(6), e122.CrossRefGoogle Scholar
  32. Lacroix, E.-M., & Mehnert, R. (2002). The US National Library of Medicine in the 21st century: Expanding collections, nontraditional formats, new audiences. Health Information and Libraries Journal, 19(3), 126–132.CrossRefGoogle Scholar
  33. Lui, M., & Baldwin, T. (2012). Langid.py: An off-the-shelf language identification tool. In Proceedings of the ACL 2012 system demonstrations (pp. 25–30), ACL ’12, Stroudsburg, PA, USA, Association for Computational Linguistics.Google Scholar
  34. Meats, E., Brassey, J., Heneghan, C., & Glasziou, P. (2007). Using the Turning Research Into Practice (TRIP) database: How do clinicians really search? Journal of the Medical Library Association, 95(2), 156–163.CrossRefGoogle Scholar
  35. Névéol, A., Kim, W., Wilbur, W. J., & Lu, Z. (2009). Exploring two biomedical text genres for disease recognition. In Proceedings of the workshop on current trends in biomedical natural language processing (pp. 144–152), BioNLP ’09, Stroudsburg, PA, USA, Association for Computational Linguistics.Google Scholar
  36. Névéol, A., Dogan, R. I., & Lu, Z. (2011). Semi-automatic semantic annotation of pubmed queries: A study on quality, efficiency, satisfaction. Journal of Biomedical Informatics, 44(2), 310–318.CrossRefGoogle Scholar
  37. NLM. (2009). UMLS reference manual. Bethesda (MD): National Library of Medicine (US).Google Scholar
  38. Palotti, J., Hanbury, A., & Muller, H. (2014a). Exploiting health related features to infer user expertise in the medical domain. In Proceedings of WSCD workshop on web search and data mining. Wiley.Google Scholar
  39. Palotti, J., Stefanov, V., & Hanbury, A. (2014b). User intent behind medical queries: An evaluation of entity mapping approaches with metamap and freebase. In Proceedings of the 5th information interaction in context symposium (pp. 283–286), IIiX ’14, ACM.Google Scholar
  40. Palotti, J., Zuccon, G., Goeuriot, L., Kelly, L., Hanbury, A., Jones, G. J. F., Lupu, M., & Pecina, P. (2015). ShARe/CLEF eHealth Evaluation Lab 2015, Task 2: User-centred Health Information Retrieval. In Working notes for CLEF 2015 conference, Toulouse, France, September 8–11, 2015.Google Scholar
  41. Pass, G., Chowdhury, A., & Torgeson, C. (2006). A picture of search. In Proceedings of the 1st international conference on scalable information systems, InfoScale ’06, New York, NY, USA, ACM.Google Scholar
  42. Pratt, W., & Yetisgen-Yildiz, M. (2003). A study of biomedical concept identification: Metamap vs. people. In AMIA annual symposium proceedings (Vol. 2003, pp. 529–533). American Medical Informatics Association.Google Scholar
  43. Roberts, K., Simpson, M., Demner-Fushman, D., & Voorhees, E., Hersh, W. (2014). State-of-the-art in biomedical literature retrieval for clinical cases: A survey of the TREC 2014 CDS Track.Google Scholar
  44. Schwarz, J., & Morris, M. (2011). Augmenting web pages and search results to support credibility assessment. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 1245–1254), CHI ’11, New York, NY, USA, ACM.Google Scholar
  45. Silverstein, C., Marais, H., Henzinger, M., & Moricz, M. (1999). Analysis of a very large web search engine query log. SIGIR Forum, 33(1), 6–12.CrossRefGoogle Scholar
  46. Silvestri, F. (2010). Mining query logs: Turning search usage data into knowledge. Foundations and Trends in Information Retrieval, 4(1:2), 1–174.MATHCrossRefGoogle Scholar
  47. Spink, A., Yang, Y., Jansen, J., Nykanen, P., Lorence, D. P., Ozmutlu, S., et al. (2004). A study of medical and health queries to web search engines. Health Information and Libraries Journal, 21(1), 44–51.CrossRefGoogle Scholar
  48. Tsikrika, T., Müller, H., & Kahn, C., Jr. (2012). Log analysis to understand medical professionals’ image searching behaviour. In Medical Informatics Europe.Google Scholar
  49. Walsh, T. M., & Volsko, T. A. (2008). Readability assessment of internet-based consumer health information. Respiratory Care, 53(10), 1310–1315.Google Scholar
  50. Wang, L., Wang, J., Wang, M., Li, Y., Liang, Y., & Xu, D. (2012). Using Internet search engines to obtain medical information: A comparative study. Journal of Medical Internet Research, 14(3), e74.CrossRefGoogle Scholar
  51. Weeber, M., Klein, H., Aronson, A. R., Mork, J. G., de Jong van den Berg, L. T. W., & Vos, R. (2000). Text-based discovery in biomedicine: The architecture of the dad-system. In Proceedings of the AMIA symposium (pp. 903–907).Google Scholar
  52. White, R. W. & Horvitz, E. (2012). Studies of the onset and persistence of medical concerns in search logs. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (pp. 265–274), SIGIR ’12, New York, NY, USA, ACM.Google Scholar
  53. White, R. W. , Dumais, S. T., & Teevan, J. (2009) Characterizing the influence of domain expertise on web search behavior. In Proceedings of the second ACM international conference on web search and data mining (pp. 132–141), WSDM ’09, New York, NY, USA, ACM.Google Scholar
  54. White, R. W., & Horvitz, E. (2009). Cyberchondria: Studies of the escalation of medical concerns in web search. ACM Transactions on Information Systems, 27(4), 23:1–23:37.CrossRefGoogle Scholar
  55. Wildemuth, B. M. (2004). The effects of domain knowledge on search tactic formulation. Journal of the Association for Information Science and Technology, 55(3), 246–258.CrossRefGoogle Scholar
  56. Yan, X., Lau, R. Y., Song, D., Li, X., & Ma, J. (2011). Toward a semantic granularity model for domain-specific information retrieval. ACM Transactions on Information Systems, 29(3), 151–1546.CrossRefGoogle Scholar
  57. Younger, P. (2010). Internet-based information-seeking behaviour amongst doctors and nurses: A short review of the literature. Health Information and Libraries Journal, 27(1), 2–10.CrossRefGoogle Scholar
  58. Zhang, X., Cole, M., Belkin, N. (2011). Predicting users’ domain knowledge from search behaviors. In Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval (pp. 1225–1226), SIGIR ’11, ACM.Google Scholar
  59. Zhang, Y. (2014). Searching for specific health-related information in MedlinePlus: Behavioral patterns and user experience. Journal of the Association for Information Science and Technology, 65(1), 53–68.CrossRefGoogle Scholar
  60. Zuccon, G., Koopman, B., Palotti, J. (2015) Diagnose this if you can: On the effectiveness of search engines in finding medical self-diagnosis information. In Advances in information retrieval (pp. 562–567). Springer.Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • João Palotti
    • 1
  • Allan Hanbury
    • 1
  • Henning Müller
    • 2
  • Charles E. KahnJr.
    • 3
  1. 1.Vienna University of TechnologyViennaAustria
  2. 2.University of Applied Sciences and Arts Western Switzerland (HES-SO)DelémontSwitzerland
  3. 3.University of PennsylvaniaPhiladelphiaUSA

Personalised recommendations