Figure Retrieval from Collections of Research Articles

  • Saar KuziEmail author
  • ChengXiang Zhai
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11437)


In this paper, we introduce and study a new task of figure retrieval in which the retrieval units are figures of research articles and the task is to rank figures with response to a query. As a first step toward addressing this task, we focus on textual queries and represent a figure using text extracted from its article. We suggest and study the effectiveness of several retrieval methods for the task. We build a test collection by using research articles from the ACL Anthology corpus and treating figure captions as queries. While having some limitations, using this data set we were able to obtain some interesting preliminary results on the relative effectiveness of different representations of a figure and different retrieval methods, which also shed some light regarding possible types of information need, and potential challenges in figure retrieval.



We thank the reviewers for their useful comments. This material is based upon work supported by the National Science Foundation under Grant No. 1801652.


  1. 1.
    Adafre, S.F., de Rijke, M., Sang, E.T.K.: Entity retrieval. In: Recent Advances in Natural Language Processing (RANLP 2007) (2007)Google Scholar
  2. 2.
    Ah-Pine, J., Csurka, G., Clinchant, S.: Unsupervised visual and textual information fusion in cbmir using graph-based methods. ACM Trans. Inf. Syst. (TOIS) 33(2), 9 (2015)CrossRefGoogle Scholar
  3. 3.
    Bird, S., et al.: The ACL anthology reference corpus: A reference dataset for bibliographic research in computational linguistics (2008)Google Scholar
  4. 4.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  5. 5.
    Clark, C., Divvala, S.: Looking beyond text: Extracting figures, tables, and captions from computer science papers (2015)Google Scholar
  6. 6.
    Demartini, G., Missen, M.M.S., Blanco, R., Zaragoza, H.: Entity summarization of news articles. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 795–796. ACM (2010)Google Scholar
  7. 7.
    Dey, S., Dutta, A., Ghosh, S.K., Valveny, E., Lladós, J., Pal, U.: Learning cross-modal deep embeddings for multi-object image retrieval using text and sketch. arXiv preprint arXiv:1804.10819 (2018)
  8. 8.
    Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., Zhai, C.: Content-based citation analysis: the next generation of citation analysis. J. Assoc. Inf. Sci. Technol. 65(9), 1820–1833 (2014)CrossRefGoogle Scholar
  9. 9.
    Eakins, J., Graham, M.: Content-based image retrieval (1999)Google Scholar
  10. 10.
    Hearst, M.A., et al.: Biotext search engine: beyond abstract search. Bioinformatics 23(16), 2196–2197 (2007)CrossRefGoogle Scholar
  11. 11.
    Kaszkiel, M., Zobel, J.: Passage retrieval revisited. In: ACM SIGIR Forum, vol. 31, pp. 178–185. ACM (1997)Google Scholar
  12. 12.
    Kim, D., Yu, H.: Figure text extraction in biomedical literature. PloS one 6(1), e15338 (2011)CrossRefGoogle Scholar
  13. 13.
    Liu, F., Jenssen, T.K., Nygaard, V., Sack, J., Hovig, E.: Figsearch: a figure legend indexing and classification system. Bioinformatics 20(16), 2880–2882 (2004)CrossRefGoogle Scholar
  14. 14.
    Liu, F., Yu, H.: Learning to rank figures within a biomedical article. PloS one 9(3), e61567 (2014)CrossRefGoogle Scholar
  15. 15.
    Liu, T.Y.: Learning to rank for information retrieval. Found. Trends® Inf. Retr. 3(3), 225–331 (2009)CrossRefGoogle Scholar
  16. 16.
    Massung, S., Geigle, C., Zhai, C.: Meta: a unified toolkit for text retrieval and analysis. In: Proceedings of ACL-2016 System Demonstrations, pp. 91–96 (2016)Google Scholar
  17. 17.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)
  18. 18.
    Müller, H., Deselaers, T., Deserno, T., Clough, P., Kim, E., Hersh, W.: Overview of the ImageCLEFmed 2006 medical retrieval and medical annotation tasks. In: Peters, C., et al. (eds.) CLEF 2006. LNCS, vol. 4730, pp. 595–608. Springer, Heidelberg (2007). Scholar
  19. 19.
    Murphy, R.F., Kou, Z., Hua, J., Joffe, M., Cohen, W.W.: Extracting and structuring subcellular location information from on-line journal articles: the subcellular location image finder. In: Proceedings of the IASTED International Conference on Knowledge Sharing and Collaborative Engineering, pp. 109–114 (2004)Google Scholar
  20. 20.
    Petkova, D., Croft, W.B.: Proximity-based document representation for named entity retrieval. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 731–740. ACM (2007)Google Scholar
  21. 21.
    Raviv, H., Carmel, D., Kurland, O.: A ranking framework for entity oriented search using markov random fields. In: Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search, p. 1. ACM (2012)Google Scholar
  22. 22.
    Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 232–241. Springer-Verlag New York, Inc., London (1994)CrossRefGoogle Scholar
  23. 23.
    Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58. ACM (1993)Google Scholar
  24. 24.
    Sheikh, A.S., et al.: Structured literature image finder: Open source software for extracting and disseminating information from text and figures in biomedical literature. Technical report, Carnegie Mellon University School of Computer Science, Pittsburgh, USA, CMU-CB-09-101 (2009)Google Scholar
  25. 25.
    Shete, D.S., Chavan, M., Kolhapur, K.: Content based image retrieval. Int. J. Emerg. Technol. Adv. Eng. 2(9), 85–90 (2012)Google Scholar
  26. 26.
    Tellex, S., Katz, B., Lin, J., Fernandes, A., Marton, G.: Quantitative evaluation of passage retrieval algorithms for question answering. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 41–47. ACM (2003)Google Scholar
  27. 27.
    Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)Google Scholar
  28. 28.
    Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retr. 13(3), 254–270 (2010)CrossRefGoogle Scholar
  29. 29.
    Yin, X.C., et al.: Detext: a database for evaluating text extraction from biomedical literature figures. PLoS One 10(5), e0126200 (2015)CrossRefGoogle Scholar
  30. 30.
    Yu, H., Lee, M.: Accessing bioscience images from abstract sentences. Bioinformatics 22(14), e547–e556 (2006)CrossRefGoogle Scholar
  31. 31.
    Yu, H., Liu, F., Ramesh, B.P.: Automatic figure ranking and user interfacing for intelligent figure search. PLoS One 5(10), e12983 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations