Journal of Digital Imaging

, Volume 26, Issue 4, pp 714–720 | Cite as

Modeling Perceptual Similarity Measures in CT Images of Focal Liver Lesions

  • Jessica Faruque
  • Daniel L. Rubin
  • Christopher F. Beaulieu
  • Sandy Napel


Motivation: A gold standard for perceptual similarity in medical images is vital to content-based image retrieval, but inter-reader variability complicates development. Our objective was to develop a statistical model that predicts the number of readers (N) necessary to achieve acceptable levels of variability. Materials and Methods: We collected 3 radiologists’ ratings of the perceptual similarity of 171 pairs of CT images of focal liver lesions rated on a 9-point scale. We modeled the readers’ scores as bimodal distributions in additive Gaussian noise and estimated the distribution parameters from the scores using an expectation maximization algorithm. We (a) sampled 171 similarity scores to simulate a ground truth and (b) simulated readers by adding noise, with standard deviation between 0 and 5 for each reader. We computed the mean values of 2–50 readers’ scores and calculated the agreement (AGT) between these means and the simulated ground truth, and the inter-reader agreement (IRA), using Cohen’s Kappa metric. Results: IRA for the empirical data ranged from =0.41 to 0.66. For between 1.5 and 2.5, IRA between three simulated readers was comparable to agreement in the empirical data. For these values , AGT ranged from =0.81 to 0.91. As expected, AGT increased with N, ranging from =0.83 to 0.92 for N = 2 to 50, respectively, with =2. Conclusion: Our simulations demonstrated that for moderate to good IRA, excellent AGT could nonetheless be obtained. This model may be used to predict the required N to accurately evaluate similarity in arbitrary size datasets.


Content-based image retrieval Decision support Image perception Observer variation Observer performance Simulation Inter-observer variation Liver tumor 



We are grateful to the following people for participating in our study: Aya Kamaya MD, Grace Tye MD, and Ronald Summers MD, PhD. We would like to acknowledge these following funding sources for supporting this project: SIIM 2011–2012 Research Grant, NIH Training Grant T32 GM063495.


  1. 1.
    Federle MP, Blachar A: CT evaluation of the liver: principles and techniques. Seminars in Liver Disease 21(2):135–45, 2001PubMedCrossRefGoogle Scholar
  2. 2.
    Aisen AM, Broderick LS, Winer-Muram H, Brodley CE, Kak AC, Pavlopoulou C, et al: Automated storage and retrieval of thin-section CT images to assist diagnosis: system description and preliminary assessment. Radiology 228(1):265–70, 2003PubMedCrossRefGoogle Scholar
  3. 3.
    Datta R, Joshi D, Li J, Wang J: Image retrieval: Ideas, influences, and trends of the new age. ACM Computing. Survey 40:1–60, 2008CrossRefGoogle Scholar
  4. 4.
    Aigrain P, Zhang H, Petkovic D: Content-Based Representation and Retrieval of Visual Media: A Review of the State-of-the-art. Multimedia Tools and Applications 3:179–202, 1996CrossRefGoogle Scholar
  5. 5.
    Müller H, Rosset A, Vallée JP, Terrier F, Geissbuhler A: A reference data set for the evaluation of medical image retrieval systems. Comput. Med Imaging Graph 28:295–305, 2004PubMedCrossRefGoogle Scholar
  6. 6.
    Muramatsu C, Li Q, Schmidt RA, Shiraishi J, Li Q, Fujita H, Doi K: Presentation of similar images for diagnosis of breast masses on mammograms: analysis of the effect on residents. Proceedings of the SPIE 7260:72600R–72600R8, 2009CrossRefGoogle Scholar
  7. 7.
    Muramatsu C, Li Q, Schmidt R, Suzuki K, Shiraishi J, Newstead G, Doi K: Experimental determination of subjective similarity for pairs of clustered microcalcifications on mammograms: observer study results. Medical Physics 33(9):3460–8, 2006PubMedCrossRefGoogle Scholar
  8. 8.
    Muramatsu C, Li Q, Schmidt R, Shiraishi J, Doi K: Investigation of psychophysical similarity measures for selection of similar images in the diagnosis of clustered microcalcifications on mammograms. Medical Physics 35(12):5695–702, 2008PubMedCrossRefGoogle Scholar
  9. 9.
    Muramatsu C, Li Q, Schmidt RA, Shiraishi J, Doi K: Determination of similarity measures for pairs of mass lesions on mammograms by use of BI-RADS lesion descriptors and image features. Acad Radiol 16(4):443–449, 2009PubMedCrossRefGoogle Scholar
  10. 10.
    Muramatsu C, Schmidt RA, Shiraishi J, Li Q, Doi K: Presentation of similar images as a reference for distinction between benign and malignant masses on mammograms: analysis of initial observer study. Journal of Digital Imaging 23(5):592–602, 2010PubMedCrossRefGoogle Scholar
  11. 11.
    Nakayama R, Abe H, Shiraishi J, Doi K: Evaluation of Objective Similarity Measures for Selecting Similar Images of Mammographic Lesions. Journal of Digital Imaging 24(1):75–85, 2011PubMedCrossRefGoogle Scholar
  12. 12.
    Li Q, Li F, Shiraishi J, Katsuragawa S, Sone S, Doi K: Investigation of new psychophysical measures for evaluation of similar images on thoracic computed tomography for distinction between benign and malignant nodules. Medical Physics 30(10):2584–93, 2003PubMedCrossRefGoogle Scholar
  13. 13.
    Muramatsu C, Li Q, Suzuki K, Schmidt RA, Shiraishi J, Newstead GM, Doi K: Investigation of psychophysical measure for evaluation of similar images for mammographic masses: preliminary results. Medical Physics 32(7):2295–304, 2005PubMedCrossRefGoogle Scholar
  14. 14.
    Kitchin DR, et al: Learning radiology a survey investigating radiology resident use of textbooks, journals, and the internet. Academic Radiology 14:1113–1120, 2007PubMedCrossRefGoogle Scholar
  15. 15.
    Faruque J, Rubin D, Beaulieu C, Rosenberg J, Kamaya A, Tye G, Summers R, Napel S: A Scalable Reference Standard of Visual Similarity for a Content-Based Image Retrieval System. IEEE Symposium on Healthcare, Informatics, and Systems Biology, San Jose, 2011 158–165Google Scholar
  16. 16.
    Landis J, Koch G: The measurement of observer agreement for categorical data. Biometrics 33:159–174, 1977PubMedCrossRefGoogle Scholar
  17. 17.
    Gwet K: Statistical Tables for Inter-Rater Agreement. StatAxis Publishing, Gaithersburg, 2001Google Scholar
  18. 18.
    Sim J, Wright C: The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy 85:257–268, 2005PubMedGoogle Scholar
  19. 19.
    Fisher R: Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh, 1925Google Scholar

Copyright information

© Society for Imaging Informatics in Medicine 2012

Authors and Affiliations

  • Jessica Faruque
    • 1
  • Daniel L. Rubin
    • 2
  • Christopher F. Beaulieu
    • 3
  • Sandy Napel
    • 4
  1. 1.Electrical Engineering DepartmentStanford UniversityStanfordUSA
  2. 2.Departments of Radiology and Medicine (Biomedical Informatics Research)Stanford UniversityStanfordUSA
  3. 3.Department of Radiology and, by courtesy, Orthopedic SurgeryStanford UniversityStanfordUSA
  4. 4.Departments of Radiology and, by courtesy, Medicine (Biomedical Informatics Research) and Electrical EngineeringStanford UniversityStanfordUSA

Personalised recommendations