Finding Captions in PDF-Documents for Semantic Annotations of Images

  • Gerd Maderlechner
  • Jiri Panyr
  • Peter Suda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4109)


The Portable Document Format (PDF) is widely-used in the Web and searchable by search engines, but only for the text content. The goal of this work is the extraction and annotation of images in PDF-documents, to make them searchable and to perform semantic image annotation. The first step is the extraction and conversion of the images into a standard format like jpeg, and the recognition of corresponding image captions using the layout structure and geometric relationships. The second step uses linguistic-semantic analysis of the image caption text in the context of the document domain. The result on a PDF-document collection with about 3300 pages with 6500 images has a precision of 95.5% and a recall of 88.8% for the correct image captions.


Text Line Semantic Annotation Text Block Portable Document Format Layout Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Flickner, M., et al.: Query by image and video content: the QBIC system. IEEE Computer 28(9), 23–32 (1995)Google Scholar
  2. 2.
    Sable, C.L., Hatzivassiloglou, V.: Text-based approaches for non-topical image categorization. Int. J. Digital Libraries, 261–275 (2000)Google Scholar
  3. 3.
    Srihari, R.K.: Automatic Indexing and Content-Based Retrieval of Captioned Images. IEEE Computer, 49–56 (September 1995)Google Scholar
  4. 4.
  5. 5.
  6. 6.
    Kou, Z., Cohen, W.W., Wang, R., Murphy, R.F.: Extracting information from text ands images for location proteomics. In: Proceedings of the 3rd ACM SIGKDD Int. Workshop on Data Mining in Bioinformatics, Washington DC, USA, pp. 2–9 (August 2003)Google Scholar
  7. 7.
    Cohen, W.W., Wang, R., Murphy, R.F.: Understanding Captions in Biomedical Publications. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington DC, USA, pp. 499–504 (August 2003)Google Scholar
  8. 8.
    Lovegrove, W.S., Brailsford, D.F.: Document analysis of PDF files: methods, results and implications. Electronic Publishing 8, 207–220 (1995)Google Scholar
  9. 9.
    Chao, H., Lin, X.: Capturing the Layout of Electronic Documents for the Reuse in Variable Data Printing. In: Proc. 7th International Conference on Document Analysis and Recognition, Seoul, Korea, pp. 940–944 (August 2005)Google Scholar
  10. 10.
    Rowe, N.C., Frew, B.: Automatic Caption Localization for Photographs on World Wide WebPages. Information Processing and Management 34, 95–107 (1998)CrossRefGoogle Scholar
  11. 11.
    Paek, S., Sable, C.L., Hatzivassiloglou, V.: Integration of Visual and Text Based Approaches for the Content Labeling and Classification of Photographs. In: ACM SIGIR Workshop on Multimedia Indexing and Retrieval (1999)Google Scholar
  12. 12.
    Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Proc. SPIE Electronic Imaging, vol. 5010, pp. 197–207 (January 2003)Google Scholar
  13. 13.
    Maderlechner, G., Suda, P., Brückner, T.: Classification of documents by form and content. Pattern Recognition Letters 18(11-13), 1225–1231 (1997)CrossRefGoogle Scholar
  14. 14.
    Becker, M., Drozdzynski, W., Krieger, H.-U., Piskorski, J., Schäfer, U., Xu, F.: SProUT, Shallow Processing with Unification and Typed Feature Structures. In: Proceedings of the International Conference on NLP (ICON 2002), Mumbai, India, December 18-21 (2002)Google Scholar
  15. 15.
  16. 16.
  17. 17.
    Mayr, P.: Das Dateiformat PDF im Web - eine statistische Erhebung. Informationswissenschaft & Praxis 53, 475–481 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Gerd Maderlechner
    • 1
  • Jiri Panyr
    • 1
  • Peter Suda
    • 1
  1. 1.Corporate TechnologySiemens AGMünchenGermany

Personalised recommendations