Journal of General Internal Medicine

, Volume 27, Issue 2, pp 213–219

Differential Diagnosis Generators: an Evaluation of Currently Available Computer Programs

  • William F. Bond
  • Linda M. Schwartz
  • Kevin R. Weaver
  • Donald Levick
  • Michael Giuliano
  • Mark L. Graber

DOI: 10.1007/s11606-011-1804-8

Cite this article as:
Bond, W.F., Schwartz, L.M., Weaver, K.R. et al. J GEN INTERN MED (2012) 27: 213. doi:10.1007/s11606-011-1804-8



Differential diagnosis (DDX) generators are computer programs that generate a DDX based on various clinical data.


We identified evaluation criteria through consensus, applied these criteria to describe the features of DDX generators, and tested performance using cases from the New England Journal of Medicine (NEJM©) and the Medical Knowledge Self Assessment Program (MKSAP©).


We first identified evaluation criteria by consensus. Then we performed Google® and Pubmed searches to identify DDX generators. To be included, DDX generators had to do the following: generate a list of potential diagnoses rather than text or article references; rank or indicate critical diagnoses that need to be considered or eliminated; accept at least two signs, symptoms or disease characteristics; provide the ability to compare the clinical presentations of diagnoses; and provide diagnoses in general medicine. The evaluation criteria were then applied to the included DDX generators. Lastly, the performance of the DDX generators was tested with findings from 20 test cases. Each case performance was scored one through five, with a score of five indicating presence of the exact diagnosis. Mean scores and confidence intervals were calculated.

Key Results

Twenty three programs were initially identified and four met the inclusion criteria. These four programs were evaluated using the consensus criteria, which included the following: input method; mobile access; filtering and refinement; lab values, medications, and geography as diagnostic factors; evidence based medicine (EBM) content; references; and drug information content source. The mean scores (95% Confidence Interval) from performance testing on a five-point scale were Isabel© 3.45 (2.53, 4.37), DxPlain® 3.45 (2.63–4.27), Diagnosis Pro® 2.65 (1.75–3.55) and PEPID™ 1.70 (0.71–2.69). The number of exact matches paralleled the mean score finding.


Consensus criteria for DDX generator evaluation were developed. Application of these criteria as well as performance testing supports the use of DxPlain® and Isabel© over the other currently available DDX generators.


differential diagnosisclinical decision support systemsdiagnostic errorsevidence-based medicinecomputer-assisted diagnosis

Supplementary material

11606_2011_1804_MOESM1_ESM.docx (11 kb)
Appendix 1Search Criteria for finding DDX Programs. (DOCX 10 kb)
11606_2011_1804_MOESM2_ESM.docx (13 kb)
Appendix 2Cases Used for Testing the DDX Generator. (DOCX 13 kb)
11606_2011_1804_MOESM3_ESM.docx (12 kb)
Appendix 3Excluded Programs. (DOCX 12 kb)

Copyright information

© Society of General Internal Medicine 2011

Authors and Affiliations

  • William F. Bond
    • 1
    • 2
  • Linda M. Schwartz
    • 2
  • Kevin R. Weaver
    • 1
  • Donald Levick
    • 3
    • 4
  • Michael Giuliano
    • 5
  • Mark L. Graber
    • 6
  1. 1.Department of Emergency MedicineLehigh Valley Health NetworkAllentownUSA
  2. 2.Division of EducationLehigh Valley Health NetworkAllentownUSA
  3. 3.Department of Information ServicesLehigh Valley Health NetworkAllentownUSA
  4. 4.Department of PediatricsLehigh Valley Health NetworkAllentownUSA
  5. 5.Department of PediatricsHackensack University Medical CenterHackensackUSA
  6. 6.Department of Medicine Veterans Administration Medical CenterNorthportUSA
  7. 7.Department of Emergency MedicineLehigh Valley Health NetworkAllentownUSA