Original Research

Journal of General Internal Medicine

, Volume 28, Issue 12, pp 1565-1572

First online:

Using Patients Like My Patient for Clinical Decision Support: Institution-Specific Probability of Celiac Disease Diagnosis Using Simplified Near-Neighbor Classification

  • Brian H. ShirtsAffiliated withDepartment of Pathology, University of Utah School of MedicineDepartment of Laboratory Medicine, University of Washington Email author 
  • , Sterling T. BennettAffiliated withDepartment of Pathology, University of Utah School of MedicineDepartment of Pathology, Intermountain Medical Center
  • , Brian R. JacksonAffiliated withDepartment of Pathology, University of Utah School of MedicineARUP Institute for Clinical and Experimental Pathology

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access



Interpretation of a diagnostic test result requires knowing what proportion of patients with a “similar” result has the condition in question. This information is often not readily available from the medical literature, or may be based on different clinical populations that make it nonapplicable. In certain settings, where correlated screening parameters and diagnostic data are available in electronic medical records, a representation of diagnostic test performance on “patients like my patient” can be obtained.


We sought to integrate patient demographic and physician practice information using a simplified nearest neighbor algorithm. We used this method to illustrate the relationship between tTG IgA test result and duodenal biopsy for celiac disease in a local diagnostic context.


We used a data set of 1,461 paired tissue transglutaminase (tTG) IgA and definitive duodenal biopsy results from Intermountain Healthcare with data on patient age and ordering physician specialty. This was split into a discovery set of 1,000 and a validation set of 461 paired results.


Accuracy of the local discovery data set in predicting probability of positive duodenal biopsy and confidence intervals around predicted probability in the test data compared to probabilities of positive biopsy implied from published logistic regression and from published sensitivity and specificity studies.


The near-neighbor method could estimate probability of clinical outcomes with predictive performance equivalent to other methods while adjusting probability estimates and confidence intervals to fit specific clinical situations.


Data from clinical encounters obtained from electronic medical records can yield prediction estimates that are tailored to the individual patient, local population, and healthcare delivery processes. Local analysis of diagnostic probability may be more clinically meaningful than probabilities inferred from published studies. This local utility may come at the expense of external validity and generalizability.


personalized medicine multifactorial analysis gluten sensitive enteropathy laboratory test information content evidence based medicine nearest neighbor