FigureSeer: Parsing Result-Figures in Research Papers

  • Noah SiegelEmail author
  • Zachary Horvitz
  • Roie Levin
  • Santosh Divvala
  • Ali Farhadi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9911)


‘Which are the pedestrian detectors that yield a precision above 95 % at 25 % recall?’ Answering such a complex query involves identifying and analyzing the results reported in figures within several research papers. Despite the availability of excellent academic search engines, retrieving such information poses a cumbersome challenge today as these systems have primarily focused on understanding the text content of scholarly documents. In this paper, we introduce FigureSeer, an end-to-end framework for parsing result-figures, that enables powerful search and retrieval of results in research papers. Our proposed approach automatically localizes figures from research papers, classifies them, and analyses the content of the result-figures. The key challenge in analyzing the figure content is the extraction of the plotted data and its association with the legend entries. We address this challenge by formulating a novel graph-based reasoning approach using a CNN-based similarity metric. We present a thorough evaluation on a real-word annotated dataset to demonstrate the efficacy of our approach.


Plot Area Figure Content Figure Classification Annotate Dataset Figure Extraction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was in part supported by ONR N00014-13-1-0720, NSF IIS-1338054, and an Allen Distinguished Investigator Award. We thank Isaac Cowhey, Rodney Kinney, Christopher Clark, Eric Kolve, and Jake Mannix for their help in this work.

Supplementary material

419982_1_En_41_MOESM1_ESM.pdf (1.5 mb)
Supplementary material 1 (pdf 1547 KB)


  1. 1.
    Khabsa, M., Giles, C.L.: The number of scholarly documents on the public web. PLoS ONE 9(5), e93949 (2014)CrossRefGoogle Scholar
  2. 2.
  3. 3.
    Tufte, E.R.: Visual display of quantitative information. In: Graphics Press, Cheshire (1983)Google Scholar
  4. 4.
    Grice, P.: Logic and conversation. In: Speech Acts (1975)Google Scholar
  5. 5.
    Heer, J., et al.: Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In: CHI (2010)Google Scholar
  6. 6.
    Savva, M., et al.: ReVision: automated classification, analysis and redesign of chart images. In: UIST (2011)Google Scholar
  7. 7.
    Instructions instructions (Using Color). In: AAAI (2016).
  8. 8.
    Instructions for ACL proceedings (Section 3.8). In: ACL (2015).
  9. 9.
  10. 10.
    Choudhury, S.R., et al.: Automatic extraction of figures from scholarly documents. In: DocEng (2015)Google Scholar
  11. 11.
    Clark, C., Divvala, S.: Looking beyond text: extracting figures, tables, and captions from computer science paper. In: AAAI Workshop (2015)Google Scholar
  12. 12.
    Kuhn, T., et al.: Finding and accessing diagrams in biomedical publications. In: AMIA (2012)Google Scholar
  13. 13.
    Choudhury, S.R., Giles, C.L.: An architecture for information extraction from figures in digital libraries. In: WWW (Companion Volume) (2015)Google Scholar
  14. 14.
    Chan, J., et al.: Searching off-line arabic documents. In: CVPR (2006)Google Scholar
  15. 15.
    Liu, Y., et al.: Tableseer: automatic table metadata extraction and searching in digital libraries. In: JCDL (2007)Google Scholar
  16. 16.
    Kae, A., et al.: Improving state-of-the-art OCR through high-precision document-specific modeling. In: CVPR (2010)Google Scholar
  17. 17.
    Wu, J., et al.: CiteseerX: AI in a digital library search engine. In: AAAI (2014)Google Scholar
  18. 18.
  19. 19.
  20. 20.
    Wu, P., Carberry, S., Elzer, S., Chester, D.: Recognizing the intended message of line graphs. In: Goel, A.K., Jamnik, M., Narayanan, N.H. (eds.) Diagrams 2010. LNCS (LNAI), vol. 6170, pp. 220–234. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-14600-8_21 CrossRefGoogle Scholar
  21. 21.
    Xu, S., McCusker, J., Krauthammer, M.: Yale image finder (YIF): a new search engine for retrieving biomedical images. Bioinformatics 24(17), 1968–1970 (2008)CrossRefGoogle Scholar
  22. 22.
    Choudhury, S., et al.: A figure search engine architecture for a chemistry digital library. In: JCDL (2013)Google Scholar
  23. 23.
    Li, Z., et al.: Towards retrieving relevant information graphics. In: SIGIR (2013)Google Scholar
  24. 24.
    Krizhevsky, A., et al.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  25. 25.
    He, K., et al.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  26. 26.
    Deng, J., et al.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  27. 27.
    McCullagh, P., Nelder, J.: Generalized linear models. In: Chapman and Hall, London (1989)Google Scholar
  28. 28.
    Breiman, L.: Random forests. In: Machine Learning (2001)Google Scholar
  29. 29.
    Felzenszwalb, P., Veksler, O.: Tiered scene labeling with dynamic programming. In: CVPR (2010)Google Scholar
  30. 30.
    Joachims, T.: Training linear svms in linear time. In: KDD (2006)Google Scholar
  31. 31.
    Felzenszwalb, P., et al.: Discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  32. 32.
    Zagoruyko, S., Komodakis, N.: Learning to compare image patches via cnns. In: CVPR (2015)Google Scholar
  33. 33.
    Han, X., et al.: MatchNet: unifying feature and metric learning for patch-based matching. In: CVPR (2015)Google Scholar
  34. 34.
    Hadsell, R., et al.: Dimensionality reduction by learning an invariant mapping. In: CVPR (2006)Google Scholar
  35. 35.
    Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: CVPR (2007)Google Scholar
  36. 36.
    Dillencourt, M.B., Samet, H., Tamminen, M.: A general approach to connected-component labeling for arbitrary image representations. J. ACM (JACM) 39(2), 253–280 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding (2014). arXiv:1408.5093
  38. 38.
  39. 39.
    Sorokin, A., Forsyth, D.: Utility data annotation with amazon mechanical turk. In: CVPR Workshop (2008)Google Scholar
  40. 40.
    Everingham, M., et al.: The PASCAL visual object classes (VOC) challenge - a retrospective. In: IJCV (2015)Google Scholar
  41. 41.
    Microsoft project oxford.
  42. 42.
    Smith, R.: An overview of the tesseract OCR engine.
  43. 43.
    ABBYY Finereader 9.0.
  44. 44.
    Hou, X., Yuille, A., Koch, C.: Boundary detection benchmarking: beyond F-measures. In: CVPR (2013)Google Scholar
  45. 45.
    Corio, M., et al.: Generation of texts for information graphics. In: EWNLG (1999)Google Scholar
  46. 46.
    Carberry, S., et al.: Extending document summarization to information graphics. In: ACL Workshop (2004)Google Scholar
  47. 47.
    Kulkarni, G., et al.: Baby talk: understanding and generating simple image descriptions. In: CVPR (2011)Google Scholar
  48. 48.
    Moraes, P., et al.: Generating summaries of line graphs. In: INLG (2014)Google Scholar
  49. 49.
    Chen, X., Zitnick, C.: A recurrent visual representation for image caption generation. In: CVPR (2015)Google Scholar
  50. 50.
    Ladner, R.: My path to becoming an accessibility researcher. In: SIGACCESS (2014)Google Scholar
  51. 51.
    Russell, B.C., et al.: 3D Wikipedia: using online text to automatically label and navigate reconstructed geometry. In: Siggraph Asia (2013)Google Scholar
  52. 52.
    Seo, M.J., et al.: Diagram understanding in geometry questions. In: AAAI (2014)Google Scholar
  53. 53.
    eLife Lens: lens.elifesciences.orgGoogle Scholar
  54. 54.
    Tableau software.
  55. 55.
    Williams, K., et al.: Simseerx: a similar document search engine. In: DocEng (2014)Google Scholar
  56. 56.
    Noorden, V.: Publishers withdraw more than 120 gibberish papers. In: Nature (2014)Google Scholar
  57. 57.
    Sironi, A., et al.: Multiscale centerline detection by learning a scale-space distance transform. In: CVPR (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Noah Siegel
    • 1
    • 2
    Email author
  • Zachary Horvitz
    • 1
    • 2
  • Roie Levin
    • 1
    • 2
  • Santosh Divvala
    • 1
    • 2
  • Ali Farhadi
    • 1
    • 2
  1. 1.Allen Institute for Artificial IntelligenceSeattleUSA
  2. 2.University of WashingtonSeattleUSA

Personalised recommendations