• Emily Watt
  • Corey Arnold
  • James Sayre


Evaluation is a cornerstone of informatics, allowing us to objectively assess the strengths and weaknesses of a given tool. These insights ultimately provide insight and feedback for the improvement of a system and its approach in the future. Thus, this final chapter aims to provide an overview of the fundamental techniques that are used in informatics evaluations. The basis upon which any quantitative evaluation starts is with statistics and formal study design. A review of inferential statistical concepts is provided from the perspective of biostatistics (confidence intervals; hypothesis testing; error assessment including sensitivity/ specificity and receiver operating characteristics). Under study design, differences between observational investigations and controlled experiments are covered. Issues pertaining to population selection and study errors are briefly introduced. With these general tools, we then look to more specific informatics evaluations, using information retrieval (IR) systems and usability studies as examples to motivate further discussion. Methods for designing both types of evaluations and endpoint metrics are described in detail.


Receiver Operating Characteristic Curve Receiver Operating Characteristic Analysis Mean Average Precision Relevance Judgment CBIR System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Aisen A, Broderick L, Winer-Muram H, Brodley C, Kak A, Pavlopoulou C, Dy J, Shyu, CR, Marchiori A (2003) Automated storage and retrieval of thin-section CT images to assist diagnosis: System description and preliminary assessment. Radiology, 228(1):265-270.CrossRefGoogle Scholar
  2. 2.
    Ammenwerth E, Brender J, Nykanen P, Prokosch HU, Rigby M, Talmon J (2004) Visions and strategies to improve evaluation of health information systems: Reflections and lessons based on the HIS-EVAL workshop in Innsbruck. Int J Med Inform, 73(6):479-491.CrossRefGoogle Scholar
  3. 3.
    Ammenwerth E, de Keizer N (2005) An inventory of evaluation studies of information technology in health care trends in evaluation research 1982-2002. Methods Inf Med, 44(1):44-56.Google Scholar
  4. 4.
    Ammenwerth E, de Keizer N (2007) A viewpoint on evidence-based health informatics, based on a pilot survey on evaluation studies in health care informatics. J Am Med Inform Assoc, 14(3):368-371.CrossRefGoogle Scholar
  5. 5.
    Anderson JG, Aydin CE (2005) Overview: Theoretical perspectives and methodologies for the evaluation of healthcare information systems. In: Anderson JG, Aydin CE (eds) Evaluating the Organizational Impact of Healthcare Information Systems. Springer, New York, NY, pp 5-29.CrossRefGoogle Scholar
  6. 6.
    Benson K, Hartz AJ (2000) A comparison of observational studies and randomized, controlled trials. N Engl J Med, 342(25):1878-1886.CrossRefGoogle Scholar
  7. 7.
    Beuscart-Zephir MC, Anceaux F, Crinquette V, Renard JM (2001) Integrating users' activity modeling in the design and assessment of hospital electronic patient records: The example of anesthesia. Intl J Medical Informatics, 64(2):157-171.CrossRefGoogle Scholar
  8. 8.
    Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and Regression Trees. Wadsworth International Group, Belmont, CA.Google Scholar
  9. 9.
    Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proc 21st Intl ACM SIGIR Conf Research and Development in Information Retrieval, Melbourne, Australia, pp 335-336.Google Scholar
  10. 10.
    Card SK, Moran TP, Newell A (1983) The Psychology of Human-computer Interaction. L Erlbaum Associates, Hillsdale, NJ.Google Scholar
  11. 11.
    Chin JP, Diehl VA, Norman KL (1988) Development of an instrument measuring user satisfaction of the human-computer interface. Proc SIGCHI Conf Human Factors in Computing Systems, Washington DC, USA, pp 213-218.Google Scholar
  12. 12.
    Cleverdon C, Mills J, Keen M (1966) Factors determining the performace of indexing systems. Aslib Cranfield Research Project, College of Aeronautics.Google Scholar
  13. 13.
    Concato J, Shah N, Horwitz RI (2000) Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med, 342(25):1887-1892.CrossRefGoogle Scholar
  14. 14.
    Daniels J, Fels S, Kushniruk A, Lim J, Ansermino JM (2007) A framework for evaluating usability of clinical monitoring technology. J Clin Monit Comput, 21(5):323-330.CrossRefGoogle Scholar
  15. 15.
    Dawson B, Trapp RG (2004) Basic & Clinical Biostatistics. 4th edition. Lange Medical Books/McGraw-Hill, Medical Pub. Division, New York, NY.Google Scholar
  16. 16.
    Demner-Fushman D, Lin J (2007) Answering clinical questions with knowledge-based and statistical techniques. Computational Linguistics, 33(1):63-103.CrossRefGoogle Scholar
  17. 17.
    Denne JS, Jennison C (1999) Estimating the sample size for a t-test using an internal pilot. Stat Med, 18:1575-1585.CrossRefMathSciNetGoogle Scholar
  18. 18.
    Despont-Gros C, Mueller H, Lovis C (2005) Evaluating user interactions with clinical information systems: A model based on human-computer interaction models. J Biomedical Informatics, 38(3):244-255.CrossRefGoogle Scholar
  19. 19.
    Effken JA (2002) Different lenses, improved outcomes: A new approach to the analysis and design of healthcare information systems. Int J Med Inform, 65(1):59-74.CrossRefGoogle Scholar
  20. 20.
    Flack V, Afifi A, Lachenbruch P, Schouten H (1988) Sample size determinations for the two rater kappa statistic. Psychometrika, 53(3):321-325.MATHCrossRefGoogle Scholar
  21. 21.
    Fletcher RH, Fletcher SW (2005) Clinical epidemiology: The essentials. 4th edition. Lippincott Williams & Wilkins, Philadelphia, PA.Google Scholar
  22. 22.
    Friedman CP, Wyatt JC, Owens DK (2006) Evaluation and technology asessment. In: Shortliffe EH, Cimino JJ (eds) Biomedical Informatics: Computer Applications in Health Care and Biomedicine. Springer.Google Scholar
  23. 23.
    Graham MJ, Kubose TK, Jordan D, Zhang J, Johnson TR, Patel VL (2004) Heuristic evaluation of infusion pumps: Implications for patient safety in intensive care units. Int J Med Inform, 73(11-12):771-779.CrossRefGoogle Scholar
  24. 24.
    Hajdukiewicz JR, Doyle DJ, Milgram P, Vicente KJ, Burns CM (1998) A work domain analysis of patient monitoring in the operating room. Proc 42nd Annual Meeting Human Factors and Ergonomics Society, pp 1038-1042.Google Scholar
  25. 25.
    Hersh W (2003) Information Retrieval: A Health and Biomedical Perspective. Springer-Verlag, New York.Google Scholar
  26. 26.
    Hersh W, Hickam D (1998) How well do physicians use electronic information retrieval systems. JAMA, 280(15):1347-1352.CrossRefGoogle Scholar
  27. 27.
    Hornbæk K (2006) Current practice in measuring usability: Challenges to usability studies and research. Intl J Human-Computer Studies, 64(2):79-102.CrossRefGoogle Scholar
  28. 28.
    Horsthemke WH, Raicu DS, Furst JD (2008) Evaluation challenges for bridging the semantic gap: Shape disagreements on pulmonary nodules in the Lung Image Database Consortium. Intl J Healthcare Information Systems and Informatics, 4(1):17-33.Google Scholar
  29. 29.
    Huang X, Lin J, Demner-Fushman D (2006) Evaluation of PICO as a knowledge representation for clinical questions. Proc AMIA Annu Symp:359-363.Google Scholar
  30. 30.
    Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Information Systems, 20(4):422-446.CrossRefGoogle Scholar
  31. 31.
    Kaplan B (1997) Addressing organizational issues into the evaluation of medical systems. J Am Med Inform Assoc, 4(2):94-101.Google Scholar
  32. 32.
    Kaplan B, Maxwell J (2005) Qualitative research methods for evaluating computer information systems. In: Anderson JG, Aydin CE (eds) Evaluating the Organizational Impact of Healthcare Information Systems. Springer, New York, NY, pp 30-55.CrossRefGoogle Scholar
  33. 33.
    Kernan WN, Viscoli CM, Makuch RW, Brass LM, Horwitz RI (1999) Stratified randomization for clinical trials. J Clin Epidemiol, 52(1):19-26.CrossRefGoogle Scholar
  34. 34.
    Kjeldskov J, Skov MB, Stage J (2008) A longitudinal study of usability in health care: Does time heal? Int J Med Inform.Google Scholar
  35. 35.
    Kurosu M, Kashimura K (1995) Apparent usability vs. inherent usability. Proc SIGCHI Conf Human Factors in Computing Systems, pp 292-293.Google Scholar
  36. 36.
    Kushniruk AW, Patel VL (2004) Cognitive and usability engineering methods for the evaluation of clinical information systems. J Biomed Inform, 37(1):56-76.CrossRefGoogle Scholar
  37. 37.
    Laerum H, Ellingsen G, Faxvaag A (2001) Doctors' use of electronic medical records systems in hospitals: Cross sectional survey. BMJ, 323(7325):1344-1348.CrossRefGoogle Scholar
  38. 38.
    Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L (2005) The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform, 38(5):404-415.CrossRefGoogle Scholar
  39. 39.
    Lee F, Teich JM, Spurr CD, Bates DW (1996) Implementation of physician order entry: User satisfaction and self-reported usage patterns. J Am Med Inform Assoc, 3(1):42-55.Google Scholar
  40. 40.
    Lehmann TM, Guld MO, Thies C, Fischer B, Spitzer K, Keysers D, Ney H, Kohnen M, Schubert H, Wein BB (2004) Content-based image retrieval in medical applications. Methods Inf Med, 43(4):354-361.Google Scholar
  41. 41.
    Lewis JR (1995) IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use. Intl J Human-computer Interaction, 7(1):57-78.CrossRefGoogle Scholar
  42. 42.
    Limbourg Q, Vanderdonckt J (2003) Comparing task models for user interface design. In: Diaper D, Stanton N (eds) The Handbook of Task Analysis for Human-Computer Interaction, pp 135-154.Google Scholar
  43. 43.
    Lindgaard G, Chattratichart J (2007) Usability testing: What have we overlooked? Proc SIGCHI Conf Human Factors in Computing Systems pp 1415-1424.Google Scholar
  44. 44.
    Loh WY, Shih YS (1997) Split selection methods for classification trees. Statistica Sinica, 7:815-840.MATHMathSciNetGoogle Scholar
  45. 45.
    Long LR, Antani S, Deserno T, Thoma GR (2009) Content-based image retrieval in medicine: Retrospective assessment, state of the art, and future directions. Intl J Healthcare Information Systems and Informatics, 4(1):1-17.Google Scholar
  46. 46.
    Maclure M (1991) The case-crossover design: A method for studying transient effects on the risk of acute events. Am J Epidemiol, 133(2):144-153.Google Scholar
  47. 47.
    Mayhew DJ (1999) The Usability Engineering Lifecycle: A Practitioner's Handbook for User Interface Design. Morgan Kaufmann Publishers, San Francisco, Calif.Google Scholar
  48. 48.
    Metz CE (2006) Receiver operating characteristic analysis: A tool for the quantitative evaluation of observer performance and imaging systems. J Am Coll Radiol, 3(6):413-422.CrossRefGoogle Scholar
  49. 49.
    Militello LG, Hutton RJB (1998) Applied cognitive task analysis (ACTA): A practitioner's toolkit for understanding cognitive task demands. Ergonomics, 41(11):1618-1641.CrossRefGoogle Scholar
  50. 50.
    Morton SC, Adams JL, Suttorp MK, Shanman R, Valentine D, Rhodes S, Shekelle PG (2004) Meta-regression approaches: What, why, when, and how? (Technical Review 04-0033). Agency for Healthcare Research and Quality, Rockville, MD.Google Scholar
  51. 51.
    Müller H, Clough P, Hersh B, Geissbühler A (2007) Variation of relevance assessments for medical image retrieval. In: Marchand-Maillet S, Bruno E, Nurnberger A, Detyniecki M (eds) Adaptive Multimedia Retrieval: User, Context, and Feedback (LNCS). Springer, pp 232-246.CrossRefGoogle Scholar
  52. 52.
    Müller H, Deselaers T, Deserno T, Kalpathy-Cramer J, Kim E, Hersh W (2007) Overview of the ImageCLEF 2007 medical retrieval and annotation tasks. Advances in Multilingual and Multimodal Information Retrieval: Proc 8th Workshop Cross-Language Evaluation Forum (CLEF), Budapest, Hungary, pp 472-491.Google Scholar
  53. 53.
    Müller H, Michoux N, Bandon D, Geissbuhler A (2004) A review of content-based image retrieval systems in medical applications-clinical benefits and future directions. Int J Med Inform, 73(1):1-23.CrossRefGoogle Scholar
  54. 54.
    Müller H, Rosset A, Vallée J, Terrier F, Geissbuhler A (2004) A reference data set for the evaluation of medical image retrieval systems. Comp Med Imaging and Graphics, 28(6):295-305.CrossRefGoogle Scholar
  55. 55.
    Murff HJ, Kannry J (2001) Physician satisfaction with two order entry systems. J Am Med Inform Assoc, 8(5):499-509.Google Scholar
  56. 56.
    Nielsen J (1993) Usability Engineering. Academic Press, Boston.MATHGoogle Scholar
  57. 57.
    Nielsen J (1994) Heuristic evaluation. In: Nielsen J, Mack RL (eds) Usability Inspection Methods. Wiley, New York.Google Scholar
  58. 58.
    Obuchowski NA (2003) Receiver operating characteristic curves and their use in radiology. Radiology, 229(1):3-8.CrossRefGoogle Scholar
  59. 59.
    Obuchowski NA (2005) ROC analysis. Am. J. Roentgenol., 184(2):364-372.Google Scholar
  60. 60.
    Pampel FC (2000) Logistic Regression: A Primer Sage Publications, Thousand Oaks, CA.MATHGoogle Scholar
  61. 61.
    Quinlan JR (1986) Induction of decision trees. Machine Learning, 1(1):81-106.Google Scholar
  62. 62.
    Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artificial Intelligence, 4:77-90.MATHGoogle Scholar
  63. 63.
    Rose AF, Schnipper JL, Park ER, Poon EG, Li Q, Middleton B (2005) Using qualitative studies to improve the usability of an EMR. J Biomedical Informatics, 38(1):51-60.CrossRefGoogle Scholar
  64. 64.
    Rosenberger WF, Lachin JM (2002) Randomization in Clinical Trials: Theory and practice. Wiley, New York, NY.MATHCrossRefGoogle Scholar
  65. 65.
    Salton G, Lesk M (1965) The SMART automatic document retrieval systems - An illustration. Communications of the ACM, 8(6):391-398.CrossRefGoogle Scholar
  66. 66.
    Salton G, Wong A, C.S. Y (1975) A vector space model for automatic indexing. Communications of the ACM, 18(11):613-620.MATHCrossRefGoogle Scholar
  67. 67.
    Schamber L, Eisenberg M, Nilan M (1990) A re-examination of relevance: Toward a dynamic, situational definition. Information Processing and Management, 26(6):755-776.CrossRefGoogle Scholar
  68. 68.
    Shneiderman B, Plaisant C (2004) Designing the User Interface: Strategies for Effective Human-Computer Interaction. 4th edition. Pearson/Addison Wesley, Boston.Google Scholar
  69. 69.
    Shyu CR, Brodley C, Kak A, Kosaka A, Aisen A, Broderick L (1999) ASSERT: A physician-in-the-loop content-based retrieval system for HRCT image databases. Computer Vision and Image Understanding, 75(1-2):111-132.CrossRefGoogle Scholar
  70. 70.
    Sittig DF, Kuperman GJ, Fiskio J (1999) Evaluating physician satisfaction regarding user interactions with an electronic medical record system. Proc AMIA Symp:400-404.Google Scholar
  71. 71.
    Snyder C (2006) Bias in usability testing. Accessed February 19, 2009.Google Scholar
  72. 72.
    Stein C (1945) A two-sample test for a linear hypothesis whose power is independent of the variance. Ann Math Stat, 16:243-258.MATHCrossRefGoogle Scholar
  73. 73.
    Stoicu-Tivadar L, Stoicu-Tivadar V (2006) Human-computer interaction reflected in the design of user interfaces for general practitioners. Int J Med Inform, 75(3-4):335-342.CrossRefGoogle Scholar
  74. 74.
    Tagare H, Jaffe C, Duncan J (1997) Medical image databases: A content-based retrieval approach. J Am Med Inform Assoc, 4:184-198.Google Scholar
  75. 75.
    Talmon J, Enning J, Castaneda G, Eurlings F, Hoyer D, Nykanen P, Sanz F, Thayer C, Vissers M (1999) The VATAM guidelines. Int J Med Inform, 56(1-3):107-115.CrossRefGoogle Scholar
  76. 76.
    Tang Z, Johnson TR, Tindall RD, Zhang J (2006) Applying heuristic evaluation to improve the usability of a telemedicine system. Telemed J E Health, 12(1):24-34.CrossRefGoogle Scholar
  77. 77.
    Taylor RS (1962) The process of asking questions. American Documentation, 13(4):391-396.CrossRefGoogle Scholar
  78. 78.
    Tractinsky N, Katz AS, Ikar D (2000) What is beautiful is usable. Interact Comp, 13(2):127-145.CrossRefGoogle Scholar
  79. 79.
    Vicente KJ (1999) Cognitive Work Analysis: Toward Safe, Productive, and Healthy Computer-based Work. Lawrence Erlbaum Associates, Mahwah, NJ.Google Scholar
  80. 80.
    Virzi RA (1992) Refining the test phase of usability evaluation: How many subjects is enough? Human Factors, 34(4):457-468.Google Scholar
  81. 81.
    Wittes J, Brittain E (1990) The role of internal pilot studies in increasing the efficiency of clinical trials. Stat Med, 9:65-72.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Emily Watt
    • 1
  • Corey Arnold
    • 2
  • James Sayre
    • 3
  1. 1.Medical Imaging InformaticsUCLA Biomedical Engineering IDPLos AngelesUSA
  2. 2.Medical Imaging Informatics & Department of Information StudiesUniversity of CaliforniaLos AngelesUSA
  3. 3.Departments of Biostatistics & Radiological SciencesUCLA David Geffen School of MedicineLos AngelesUSA

Personalised recommendations