Modeling Perceptual Similarity Measures in CT Images of Focal Liver Lesions
- 215 Downloads
Motivation: A gold standard for perceptual similarity in medical images is vital to content-based image retrieval, but inter-reader variability complicates development. Our objective was to develop a statistical model that predicts the number of readers (N) necessary to achieve acceptable levels of variability. Materials and Methods: We collected 3 radiologists’ ratings of the perceptual similarity of 171 pairs of CT images of focal liver lesions rated on a 9-point scale. We modeled the readers’ scores as bimodal distributions in additive Gaussian noise and estimated the distribution parameters from the scores using an expectation maximization algorithm. We (a) sampled 171 similarity scores to simulate a ground truth and (b) simulated readers by adding noise, with standard deviation between 0 and 5 for each reader. We computed the mean values of 2–50 readers’ scores and calculated the agreement (AGT) between these means and the simulated ground truth, and the inter-reader agreement (IRA), using Cohen’s Kappa metric. Results: IRA for the empirical data ranged from =0.41 to 0.66. For between 1.5 and 2.5, IRA between three simulated readers was comparable to agreement in the empirical data. For these values , AGT ranged from =0.81 to 0.91. As expected, AGT increased with N, ranging from =0.83 to 0.92 for N = 2 to 50, respectively, with =2. Conclusion: Our simulations demonstrated that for moderate to good IRA, excellent AGT could nonetheless be obtained. This model may be used to predict the required N to accurately evaluate similarity in arbitrary size datasets.
KeywordsContent-based image retrieval Decision support Image perception Observer variation Observer performance Simulation Inter-observer variation Liver tumor
We are grateful to the following people for participating in our study: Aya Kamaya MD, Grace Tye MD, and Ronald Summers MD, PhD. We would like to acknowledge these following funding sources for supporting this project: SIIM 2011–2012 Research Grant, NIH Training Grant T32 GM063495.
- 15.Faruque J, Rubin D, Beaulieu C, Rosenberg J, Kamaya A, Tye G, Summers R, Napel S: A Scalable Reference Standard of Visual Similarity for a Content-Based Image Retrieval System. IEEE Symposium on Healthcare, Informatics, and Systems Biology, San Jose, 2011 158–165Google Scholar
- 17.Gwet K: Statistical Tables for Inter-Rater Agreement. StatAxis Publishing, Gaithersburg, 2001Google Scholar
- 19.Fisher R: Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh, 1925Google Scholar