Log files of information retrieval systems that record user behavior have been used to improve the outcomes of retrieval systems, understand user behavior, and predict events. In this article, a log file of the ARRS GoldMiner search engine containing 222,005 consecutive queries is analyzed. Time stamps are available for each query, as well as masked IP addresses, which enables to identify queries from the same person. This article describes the ways in which physicians (or Internet searchers interested in medical images) search and proposes potential improvements by suggesting query modifications. For example, many queries contain only few terms and therefore are not specific; others contain spelling mistakes or non-medical terms that likely lead to poor or empty results. One of the goals of this report is to predict the number of results a query will have since such a model allows search engines to automatically propose query modifications in order to avoid result lists that are empty or too large. This prediction is made based on characteristics of the query terms themselves. Prediction of empty results has an accuracy above 88 %, and thus can be used to automatically modify the query to avoid empty result sets for a user. The semantic analysis and data of reformulations done by users in the past can aid the development of better search systems, particularly to improve results for novice users. Therefore, this paper gives important ideas to better understand how people search and how to use this knowledge to improve the performance of specialized medical search engines.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
CF: clinical findings, O: object, AE: anatomical entity, NS: non-anatomical substance, RD: RadLex descriptor, PP: property, P: procedure, PS: procedure step, IO: imaging observation, IM: imaging modality, RC: report component, R: report, PC: process.
High-level Expert Group on Scientific Data. Riding the wave: How Europe can gain from the rising tide of scientific data. Submission to the European Commission, available online at http://cordis.europa.eu/fp7/ict/e-infrastructure/docs/hlg-sdi-report.pdf, 2010
Doi K: Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Comput Med Imaging Graph 31:198–211, 2007
Müller H, Michoux N, Bandon D, Geissbuhler A: A review of content-based image retrieval systems in medicine—clinical benefits and future directions. Int J Med Inform 73:1–23, 2004
Markonis D, Holzer M, Dungs S, Vargas A, Langs G, Kriewel S, et al: A survey on visual information search behavior and requirements of radiologists. Methods Inf Med 51:539–548, 2012
Markonis D, Baroz F, de Castaneda RL R, Boyer C, Müller H: User tests for assessing a medical image retrieval system: a pilot study. Stud Health Technol Inf 192:224–228, 2013
Jansen BJ, Spink A, Taksai I. Handbook of research on web log analysis. IGI Global, 2009
Tsikrika T, Müller H, Kahn Jr, CE: Log analysis to understand medical professionals’ image searching behaviour. Stud Health Technol Inf 180:1020–1024, 2012
Yom-Tov E, White RW, Horvitz E: Seeking insights about cycling mood disorders via anonymized search logs. J Med Internet Res 16:e65, 2014
Müller H, Boyer C, Gaudinat A, Hersh W, Geissbuhler A: Analyzing web log files of the health on the net HONmedia search engine to define typical image search tasks for image retrieval evaluation. Stud Health Technol Inf 129(Pt 2):1319–1323, 2007
Müller H, Kalpathy-Cramer J, Hersh W, Geissbuhler A: Using Medline queries to generate image retrieval tasks for benchmarking. Stud Health Technol Inf 136:523–528, 2008
Herskovic JR, Tanaka LY, Hersh W, Bernstam EV: A day in the life of PubMed: analysis of a typical day’s query log. J Am Med Inform Assoc 14:212–220, 2007
Islamaj Dogan RI, Murray GC, Névéol A, Lu Z. Understanding PubMed user search behavior through log analysis. Database (Oxford) 2009:bap018, 2009
Rubin DL, Flanders A, Kim W, Siddiqui KM, Kahn Jr, CE: Ontology-assisted analysis of web queries to determine the knowledge radiologists seek. J Digit Imaging 24:160–164, 2011
Palotti J, Hanbury A, Müller H. Exploiting Health Related Features to Infer User Expertise in the Medical Domain. Web Search Click Data workshop at WSCM, New York City, NY, USA, 2014.
Ruch P: Automatic assignment of biomedical categories: toward a generic approach. Bioinformatics 22:658–664, 2006
Kahn Jr, CE, Thao C: GoldMiner: a radiology image search engine. AJR Am J Roentgenol 188:1475–1478, 2008
Silverstein C, Marais H, Henzinger M, Moricz M: Analysis of a very large web search engine query log. SIGIR Forum 33(1):6–12, 1999
Jones R, Klinkner KL. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 2008. p. 699–708
Langlotz CP: RadLex: a new method for indexing online educational materials. RadioGraphics 26:1595–1597, 2006
Rubin DL: Creating and curating a terminology for radiology: ontology modeling and analysis. J Digit Imaging 21:355–362, 2008
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP: SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res 16:321–357, 2002
Chang CC, Lin CJ. LIBSVM: a library for support vector machines, 2001
Le Cessie S, Van Houwelingen J. Ridge estimators in logistic regression. Applied Statistics. 1992; p. 191–201.
Breiman L: Random forests. Mach Learn 45:5–32, 2001
Viera AJ, Garrett JM: Understanding interobserver agreement: the kappa statistic. Fam Med 37:360–363, 2005
Manning CD, Raghavan P, Schütze H. Introduction to Information Retrieval. Cambridge University Press, 2008
Hall MA, Holmes G: Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15:1437–1447, 2003
Hollink V, Tsikrika T, de Vries AP: Semantic search log analysis: a method and a study on professional image search. J Am Soc Inf Sci Technol 62:691–713, 2011
Goeuriot L, Kelly L, Li W, Palotti J, Pecina P, Zuccon G, et al. ShARe/CLEF eHealth Evaluation Lab 2014, Task 3: User-centred health information retrieval CLEF eHealth overview. In: CLEF Proceedings. Springer LNCS, 2014
Seco de Herrera AG, Kalpathy-Cramer J, Demner Fushman D, Antani S, Müller H. Overview of the ImageCLEF 2013 medical tasks, CLEF working notes 2013, Valencia, Spain, 2013
About this article
Cite this article
De-Arteaga, M., Eggel, I., Kahn, C.E. et al. Analyzing Medical Image Search Behavior: Semantics and Prediction of Query Results. J Digit Imaging 28, 537–546 (2015). https://doi.org/10.1007/s10278-015-9792-6
- Image retrieval
- Human-computer interaction
- Machine learning
- Statistic analysis
- Information storage and retrieval
- Medical image search
- Log file analysis