Advertisement

Information Retrieval

, Volume 17, Issue 3, pp 229–264 | Cite as

Multimodal biomedical image indexing and retrieval using descriptive text and global feature mapping

  • Matthew S. SimpsonEmail author
  • Dina Demner-Fushman
  • Sameer K. Antani
  • George R. Thoma
Article

Abstract

The images found within biomedical articles are sources of essential information useful for a variety of tasks. Due to the rapid growth of biomedical knowledge, image retrieval systems are increasingly becoming necessary tools for quickly accessing the most relevant images from the literature for a given information need. Unfortunately, article text can be a poor substitute for image content, limiting the effectiveness of existing text-based retrieval methods. Additionally, the use of visual similarity by content-based retrieval methods as the sole indicator of image relevance is problematic since the importance of an image can depend on its context rather than its appearance. For biomedical image retrieval, multimodal approaches are often desirable. We describe in this work a practical multimodal solution for indexing and retrieving the images contained in biomedical articles. Recognizing the importance of text in determining image relevance, our method combines a predominately text-based image representation with a limited amount of visual information, in the form of quantized content-based visual features, through a process called global feature mapping. The resulting multimodal image surrogates are easily indexed and searched using existing text-based retrieval systems. Our experimental results demonstrate that our multimodal strategy significantly improves upon the retrieval accuracy of existing approaches. In addition, unlike many retrieval methods that utilize content-based visual features, the response time of our approach is negligible, making it suitable for use with large collections.

Keywords

Multimodal image retrieval Image indexing Clustering 

Notes

Acknowledgments

The authors would like to thank Dr. Md. Mahmudur Rahman and Srinivas Phadnis for extracting and preparing the content-based and text-based features of the images used in this work. This work is supported by the intramural research program of the U.S. National Library of Medicine, National Institutes of Health, and by an appointment to the NLM Research Participation Program administered by the Oak Ridge Institute for Science and Education.

References

  1. Alpkocak, A., Ozturkmenoglu, O., Berber, T., Vahid, A. H., & Hamed, R. G. (2012). DEMIR at ImageCLEFMed 2011: Evaluation of fusion techniques for multimodal content-based medical image retrieval. In Working notes for the CLEF 2011 workshop.Google Scholar
  2. Arthur, D., & Vassilvitskii, S. (2007). k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, pp. 1027–1035.Google Scholar
  3. Atrey, P., Hossain, M., El Saddik, A., & Kankanhalli, M. (2010). Multimodal fusion for multimedia analysis: A survey. Multimedia Systems, 16(6), 345–379.CrossRefGoogle Scholar
  4. Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D. M., & Jordan, M. I. (2003). Matching words and pictures. Journal of Machine Learning Research, 3, 1107–1135.zbMATHGoogle Scholar
  5. Beatty, M., & Manjunath, B. (1997). Dimensionality reduction using multi-dimensional scaling for content-based retrieval. In Proceedings of the international conference on image processing, pp. 835–838.Google Scholar
  6. Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509–517.CrossRefzbMATHMathSciNetGoogle Scholar
  7. Bezdek, J. C., Pal, M. R., Keller, J., & Krisnapuram, R. (1999). Fuzzy models and algorithms for pattern recognition and image processing. Norwell: Kluwer.CrossRefzbMATHGoogle Scholar
  8. Blei, D. M., & Jordan, M. I. (2003). Modeling annotated data. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp. 127–134.Google Scholar
  9. Brin, S. (1995). Near neighbor search in large metric spaces. In Proceedings of the 21th international conference on very large data bases, pp. 574–584.Google Scholar
  10. Caicedo, J. C., Moreno, J. G., Niño, E. A., & González, F. A. (2010). Combining visual features and text data for medical image retrieval using latent semantic kernels. In Proceedings of the international conference on multimedia information retrieval, pp. 359–366.Google Scholar
  11. Callan, J. P., Croft, W. B., & Harding, S. M. (1992). The INQUERY retrieval system. In A. M. Tjoa, & I. Ramos (Eds.), Database and expert systems applications (pp. 78–83). Vienna: Springer.Google Scholar
  12. Chang, E., Goh, K., Sychay, G., & Wu, G. (2003). CBSA: Content-based soft annotation for multimodal image retrieval using bayes point machines. IEEE Transactions on Circuits and Systems for Video Technology, 13(1), 26–38.CrossRefGoogle Scholar
  13. Chang, S. F., Sikora, T., & Puri, A. (2001). Overview of the MPEG-7 standard. IEEE Transactions on Circuits and Systems for Video Technology, 11(6), 688–695.CrossRefGoogle Scholar
  14. Chatzichristofis, S. A., & Boutalis, Y. S. (2008a). CEDD: Color and edge directivity descriptor: A compact descriptor for image indexing and retrieval. In A. Gasteratos, M. Vincze, & J. K. Tsotsos (Eds.), Proceedings of the 6th international conference on computer vision systems, Lecture Notes in Computer Science (Vol. 5008, pp. 312–322). Berlin: Springer.Google Scholar
  15. Chatzichristofis, S. A., & Boutalis, Y. S. (2008b). FCTH: Fuzzy color and texture histogram: A low level feature for accurate image retrieval. In Proceedings of the 9th international workshop on image analysis for multimedia interactive services, pp. 191–196.Google Scholar
  16. Ciaccia, P., Patella, M., & Zezula, P. (1997). M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the 23rd international conference on very large data bases, pp. 426–435.Google Scholar
  17. Clinchant, S., Csurka, G., Ah-Pine, J., Jacquet, G., Perronnin, F., Sánchez, J., et al. (2010). XRCE’s participation in Wikipedia retrieval, medical image modality classification and ad-hoc retrieval tasks of ImageCLEF 2010. In Working notes for the CLEF 2010 workshop.Google Scholar
  18. Datta, R., Ge, W., Li, J., & Wang, J. Z. (2007). Toward bridging the annotation-retrieval gap in image search. IEEE Multimedia, 14(3), 24–35.CrossRefGoogle Scholar
  19. Datta, R., Joshi, D., Li, J., Wang, J.Z. (2008). Image retrieval: Ideas, influences, and trends of the new age. ACM Compting Surveys 40(2), 5:1–5:60.Google Scholar
  20. Demner-Fushman, D., Antani, S., Simpson, M., & Thoma, G. (2008). Combining medical domain ontological knowledge and low-level image features for multimedia indexing. In Proceedings of the language resources for content-based image retrieval workshop (OntoImage), pp. 18–23.Google Scholar
  21. Demner-Fushman, D., Antani, S., Simpson, M., & Thoma, G. R. (2009). Annotation and retrieval of clinically relevant images. International Journal of Medical Informatics, 78(12), 59–67.CrossRefGoogle Scholar
  22. Demner-Fushman, D., Antani, S., Simpson, M., & Thoma, G. R. (2012). Design and development of a multimodal biomedical information retrieval system. Journal of Computing Science and Engineering (to appear).Google Scholar
  23. Demner-Fushman, D., & Lin, J. (2007). Answering clinical questions with knowledge-based and statistical techniques. Computational Linguistics, 33(1), 63–103.CrossRefGoogle Scholar
  24. de Vries, A. P. (1999). Content and multimedia database management systems. PhD thesis, University of Twente.Google Scholar
  25. de Vries, A. P., & Westerveld, T. (2004). A comparison of continuous vs. discrete image models for probabilistic image and video retrieval. In International conference on image processing, Vol. 4, pp. 2387–2390.Google Scholar
  26. Duygulu, P., Barnard, K., de Freitas, J., & Forsyth, D. (2006). Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In A. Heyden, G. Sparr, M. Nielsen, & P. Johansen (Eds.), Computer vision—ECCV 2006, Lecture Notes in Computer Science (Vol. 2353, pp. 349–354). Berlin: Springer.Google Scholar
  27. Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., & Abbadi, A. E. (2001). Approximate nearest neighbor searching in multimedia databases. In Proceedings of the 17th international conference on data engineering, pp. 503–511.Google Scholar
  28. Gkoufas, Y., Morou, A., & Kalamboukis, T. (2011). Combining textual and visual information for image retrieval in the medical domain. The Open Medical Informatics Journal, 5(Suppl 1), 50–57.CrossRefGoogle Scholar
  29. Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD international conference on the management of data, pp. 47–57.Google Scholar
  30. Hamer, O. W., Aguirre, D. A., Casola, G., Lavine, J. E., Woenckhaus, M., & Sirlin, C. B. (2006). Fatty liver: Imaging patterns and pitfalls. Radiographics, 26(6), 1637–1653.CrossRefGoogle Scholar
  31. Harris, C., & Stephens, M. (1988). A combined corner and edge detector. In Proceedings of the fourth alvey vision conference, pp. 147–151.Google Scholar
  32. Helbich, T. H., Heinz-Peer, G., Eichler, I., Wunderbaldinger, P., Götz, M., Wojnarowski, C., Brasch, R. C., Herold, C. J.et al. (1999). Cystic fibrosis: CT assessment of lung involvement in children and adults. Radiology, 213(2), 537–544.CrossRefGoogle Scholar
  33. Herbrich, R., Graepel, T., & Campbell, C. (2001). Bayes point machines. The Journal of Machine Learning Research, 1, 245–279.zbMATHMathSciNetGoogle Scholar
  34. Hersh, W., Müller, H., & Kalpathy-Cramer, J. (2009). The ImageCLEFmed medical image retrieval task test collection. Journal of Digital Imaging, 22(6), 648–655.CrossRefGoogle Scholar
  35. Ide, N. C., Loane, R. F., & Demner-Fushman, D. (2007). Essie: A concept-based search engine for structured biomedical text. Journal of the American Medical Informatics Association, 1(3), 253–263.CrossRefGoogle Scholar
  36. Indyk, P. (2004). Nearest neighbors in high-dimensional spaces. In J. E. Goodman & J. O’Rourke (Eds.), Handbook of discrete and computational geometry (2nd ed., pp. 877–892). Boca Raton: CRC Press.Google Scholar
  37. Indyk, P., & Motwani, R. (1998). Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on the theory of computing, pp. 604–613.Google Scholar
  38. Jégou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117–128.CrossRefGoogle Scholar
  39. Kadir, T., & Brady, M. (2001). Saliency, scale and image description. International Journal of Computer Vision, 45(2), 83–105.CrossRefzbMATHGoogle Scholar
  40. Kalpathy-Cramer, J., & Hersh, W. (2010). Multimodal medical image retrieval: Image categorization to improve search precision. In Proceedings of the international conference on multimedia information retrieval, pp. 165–174.Google Scholar
  41. Kohonen, T. (2001). Self-organizing maps, information sciences (Vol. 30, 3rd ed.). Berlin: Springer.CrossRefGoogle Scholar
  42. Lacoste, C., Lim, J. H., Chevallet, J. P., & Le, D. (2007). Medical-image retrieval based on knowledge-assisted text and image indexing. IEEE Transactions on Circuits and Systems for Video Technology, 17(7), 889–900.CrossRefGoogle Scholar
  43. Lancaster, F. W., & Fayen, E. G. (1973). Information retrieval on-line. Los Angeles: Melville Publishing.zbMATHGoogle Scholar
  44. Langlotz, C. P. (2006). RadLex: A new method for indexing online educational materials. Radiographics, 26(6), 1595–1597.CrossRefGoogle Scholar
  45. Lavrenko, V., Manmatha, R., & Jeon, J. (2003). A model for learning the semantics of pictures. In Proceedings of the seventeenth annual conference on neural information processing systems, Vol. 16, pp. 553–560.Google Scholar
  46. Li, J., & Wang, J. Z. (2008). Real-time computerized annotation of pictures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(6), 985–1002.CrossRefGoogle Scholar
  47. Lim, J. H., & Chevallet, J. P. (2005). Vismed: A visual vocabulary approach for medical image indexing and retrieval. In G. Lee, A. Yamada, H. Meng, & S. Myaeng (Eds.), Information retrieval technology, Lecture Notes in Computer Science (Vol. 3689, pp. 84–96). Berlin: Springer.Google Scholar
  48. Lindberg, D., Humphreys, B., & McCray, A. (1993). The unified medical language system. Methods of Information in Medicine, 32(4), 281–291.Google Scholar
  49. Lloyd, S. P. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129–137.CrossRefzbMATHMathSciNetGoogle Scholar
  50. Lowe, D. (1999). Object recognition from local scale-invariant features. In Proceedings of the seventh IEEE international conference on computer vision, Vol. 2, pp. 1150–1157.Google Scholar
  51. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRefGoogle Scholar
  52. Lux, M., & Chatzichristofis, S. A. (2008). LIRe: Lucene image retrieval: An extensible Java CBIR library. In Proceedings of the 16th ACM international conference on multimedia, pp. 1085–1088.Google Scholar
  53. Mäenpää, T. (2003). The local binary pattern approach to texture analysis—Extensions and applications. PhD thesis, University of Oulu.Google Scholar
  54. Miller, G. A. (1995). WordNet: A lexical database for English. Commun ACM, 38(11), 39–41.CrossRefGoogle Scholar
  55. Müler, H., Rosset, A., Vallée, J. P., & Geissbuhler, A. (2003). Integrating content-based visual access methods into a medical case database. In R. Baud, M. Fieschi, P. Le Beux, & P. Ruch (Eds.), The new navigators: From professionals to patients, studies in health technology and informatics (Vol. 95, pp. 480–485). Amsterdam: IOS Press.Google Scholar
  56. Müller, H., Michoux, N., Bandon, D., & Geissbuhler, A. (2004). A review of content-based image retrieval systems in medical applications—Clinical benefits and future directions. International Journal of Medical Informatics, 73(1), 1–23.CrossRefGoogle Scholar
  57. Müller, H., Clough, P., Deselaers, T., Caputo, B. (eds) (2010). ImageCLEF: Experimental evaluation in visual information retrieval, the information retrieval series (Vol. 32). Berlin: Springer.Google Scholar
  58. Müller, H., Kalpathy-Cramer, J., Eggel, I., Bedrick, S., Kahn, C.E., Jr., & Hersh, W. (2010b). Overview of the CLEF 2010 medical image retrieval track. In: Working notes of CLEF 2010.Google Scholar
  59. Müller, H., de Herrara, A. G. S., Kalpathy-Cramer, J., Demner-Fushman, D., Antani, S., & Eggel, I. (2012). Overview of the ImageCLEF 2012 medical image retrieval and classification tasks. In: Working notes for the CLEF 2012 workshop.Google Scholar
  60. National Center for Biotechnology Information. (2010). Entrez programming utilities help. http://www.ncbi.nlm.nih.gov/books/NBK25501/.
  61. Ng, R. T., & Sedighian, A. (1996). Evaluating multidimensional indexing structures for images transformed by principal component analysis. In: I. K. Sethi, & R. C. Jain (Eds.), Proceedings of SPIE, storage an retrieval for still image and video databases, Vol. 2670, pp. 50–61.Google Scholar
  62. Nowak, E., Jurie, F., & Triggs, B. (2006). Sampling strategies for bag-of-features image classification. In A. Leonardis, H. Bischof, & A. Pinz (Eds.), Computer vision—ECCV 2006, Lecture Notes in Computer Science (Vol. 3954, pp. 490–503). Berlin: Springer.Google Scholar
  63. Pham, T. T., Maillot, N. E., Lim, J. H., & Chevallet, J. P. (2007). Latent semantic fusion model for image retrieval and annotation. In Proceedings of the sixteenth ACM conference on information and knowledge management, pp. 439–444.Google Scholar
  64. Rahman, M., Antani, S., Long, R., Demner-Fushman, D., & Thoma, G. (2010). Multi-modal query expansion based on local analysis for medical image retrieval. In B. Caputo, H. Müller, T. Syeda-Mahmood, J. Duncan, F. Wang, & J. Kalpathy-Cramer (Eds.), Medical content-based retrieval for clinical decision support, Lecture Notes in Computer Science (Vol. 5853, pp. 110–119). Berlin: Springer.Google Scholar
  65. Rahman, M. M., Antani, S., & Thoma, G. (2009). A medical image retrieval framework in correlation enhanced visual concept feature space. In Proceedings of the 22nd IEEE international symposium on computer-based medical systems.Google Scholar
  66. Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G. R., Levy, R., et al. (2010). A new approach to cross-modal multimedia retrieval. In Proceedings of the international conference on multimedia, pp. 251–260.Google Scholar
  67. Richardson, W. S., Wilson, M. C., Nishikawa, J., & Hayward, R. S. (1995). The well-built clinical question: A key to evidence-based decisions. ACP Journal Club, 123(3), A12–A13.Google Scholar
  68. Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The Earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121.CrossRefzbMATHGoogle Scholar
  69. Sakai, T. (2007). Alternatives to bpref. In Proceedings of the 30th annual international acm sigir conference on research and development in information retrieval, pp. 71–78.Google Scholar
  70. Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.CrossRefzbMATHGoogle Scholar
  71. Simpson, M., Rahman, M. M., Demner-Fushman, D., Antani, S., Thoma, G. R. (2009). Text- and content-based approaches to image retrieval for the ImageCLEF 2009 medical retrieval track. In Working notes for the CLEF 2009 workshop.Google Scholar
  72. Simpson, M., Rahman, M. M., Phadnis, S., Apostolova, E., Demner-Fushman, D., Antani, S., et al. (2011). Text- and content-based approaches to image modality classification and retrieval for the ImageCLEF 2011 medical retrieval track. In Working notes for the CLEF 2011 workshop.Google Scholar
  73. Simpson, M., Rahman, M. M., Singhal, S., Demner-Fushman, D., Antani, S., & Thoma, G. (2010). Text- and content-based approaches to image modality detection and retrieval for the ImageCLEF 2010 medical retrieval track. In Working notes for the CLEF 2010 workshop.Google Scholar
  74. Simpson, M. S., You, D., Rahman, M. M., Antani, S. K., Thoma, G. R., & Demner-Fushman, D. (2012a) Towards the creation of a visual ontology of biomedical imaging entities. In Proceedings of the annual symposium of the American medical informatics association (AMIA), (to appear).Google Scholar
  75. Simpson, M. S., You, D., Rahman, M. M., Demner-Fushman. D., Antani, S., & Thoma, G. (2012b) ITI’s participation in the ImageCLEF 2012 medical retrieval and classification tasks. In Working notes for the CLEF 2012 workshop.Google Scholar
  76. Sivic, J., & Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In Proceedings of the ninth IEEE international conference on computer vision, Vol. 2, pp. 1470–1477.Google Scholar
  77. Smucker, M. D., Allan, J., & Carterette, B. (2007). A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the sixteenth ACM conference on information and knowledge management, pp. 623–632.Google Scholar
  78. Squire, D. M., Müller, W., Müller, H., & Pun, T. (2000). Content-based query of image databases: inspirations from text retrieval. Pattern Recognition Letters, 21(13–14), 1193–1198.CrossRefzbMATHGoogle Scholar
  79. Srinivasan, G. N., & Shobha, G. (2008). Statistical texture analysis. In Proceedings of world academy of science, engineering and technology, Vol. 36, pp. 1264–9.Google Scholar
  80. Tamura, H., Mori, S., & Yamawaki, T. (1978). Textural features corresponding to visual perception. IEEE Transactions on Systems, Man, and Cybernetics, 8(6), 460–473.CrossRefGoogle Scholar
  81. Uhlmann, J. K. (1991). Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 40(4), 175–179.CrossRefzbMATHGoogle Scholar
  82. Voorhees, E. M., & Harman, D. K. (Eds.). (2005). TREC: Experiment and evaluation in information retrieval. Cambridge: Digital Libraries and Electronic Publishing, The MIT Press.Google Scholar
  83. Wang, C., Jing, F., Zhang, L., & Zhang, H. J. (2008). Scalable search-based image annotation. Multimedia Systems, 14(4), 205–220.CrossRefGoogle Scholar
  84. Yang, J., Jiang, Y. G., Hauptmann, A. G., & Ngo, C. W. (2007). Evaluating bag-of-visual-words representations in scene classification. In Proceedings of the international workshop on multimedia information retrieval, pp. 197–206.Google Scholar
  85. Yianilos, P. N. (1993). Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the fourth annual ACM-SIAM symposium on discrete algorithms, pp. 311–321.Google Scholar
  86. Zezula, P., Amato, G., Dohnal, V., & Batko, M. (2006). Similarity Search: The metric space approach, advances in database systems (Vol. 32). Berlin: Springer.Google Scholar
  87. Zhou, X., Depeursinge, A., & Müller, H. (2010). Information fusion for combining visual and textual image retrieval in ImageCLEF@ICPR. In D. Ünay, Z. Çataltepe, & A. Aksoy (Eds.), Recognizing patterns in signals, speech, images and videos, Lecture Notes in Computer Science (Vol. 6388, pp. 129–137). Berlin/Heidelberg: Springer.Google Scholar

Copyright information

© Springer Science+Business Media New York (outside the USA) 2013

Authors and Affiliations

  • Matthew S. Simpson
    • 1
    Email author
  • Dina Demner-Fushman
    • 1
  • Sameer K. Antani
    • 1
  • George R. Thoma
    • 1
  1. 1.Lister Hill National Center for Biomedical Communications, U.S. National Library of MedicineNational Institutes of HealthBethesdaUSA

Personalised recommendations