Advertisement

Footnote-Based Document Image Classification

  • Sara ZhalehpourEmail author
  • Andrew Piper
  • Chad Wellmon
  • Mohamed Cheriet
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10317)

Abstract

Analyzing historical document images is considered a challenging task due to the complex and unusual structures of these images. It is even more challenging to automatically find the footnotes in them. In fact, detecting footnotes is one of the essential elements for scholars to analyze and answer key questions in the historical documents. In this work, we present a new framework for footnote detection in historical documents. To this aim, we used the most salient feature of the footnotes, which is their smaller font size compared to the rest of the page content. We proposed three types of features to track the font size changes and fed them to two classifiers: SVM and AdaBoost. The framework shows promising results over 80% for both classifiers using our dataset.

Keywords

Visual information retrieval Footnote detection Historical documents classification 

Notes

Acknowledgments

This publication was made possible by a grant from SSHRC Canada for “The Visibility of Knowledge” project. I would also like to express my gratitude to Ehsan Arabnejad for his detailed and valuable comments.

References

  1. 1.
    Murugappan, A., Ramachandran, B., Dhavachelvan, P.: A survey of keyword spotting techniques for printed document images. Artif. Intell. Rev. 35, 119–136 (2011)CrossRefGoogle Scholar
  2. 2.
    Pasanek, B., Wellmon, C.: The enlightenment index. Eighteenth Century 56, 359–382 (2015)CrossRefGoogle Scholar
  3. 3.
    Grafton, A.: The Footnote: A Curious History. Harvard University Press, Cambridge (1999)Google Scholar
  4. 4.
    Apostolova, E., You, D., Xue, Z., Antani, S., Demner-Fushman, D., Thoma, G.R.: Image retrieval from scientific publications: text and image content processing to separate multipanel figures. J. Am. Soc. Inform. Sci. Technol. 64, 893–908 (2013)CrossRefGoogle Scholar
  5. 5.
    Klampfl, S., Kern, R.: An unsupervised machine learning approach to body text and table of contents extraction from digital scientific articles. In: Aalberg, T., Papatheodorou, C., Dobreva, M., Tsakonas, G., Farrugia, C.J. (eds.) TPDL 2013. LNCS, vol. 8092, pp. 144–155. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40501-3_15 CrossRefGoogle Scholar
  6. 6.
    Lorang, E., Soh, L.K., Datla, M.V., Kulwicki, S.: Developing an image-based classifier for detecting poetic content in historic newspaper collections. D-Lib Mag. 21(7) (2015)Google Scholar
  7. 7.
    Khurshid, K., Faure, C., Vincent, N.: Fusion of word spotting and spatial information for figure caption retrieval in historical document images. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 266–270. IEEE (2009)Google Scholar
  8. 8.
    Konya, I., Eickeler, S.: Logical structure recognition for heterogeneous periodical collections. In: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage, pp. 185–192. ACM (2014)Google Scholar
  9. 9.
    Mehri, M., Héroux, P., Gomez-Krämer, P., Mullot, R.: Texture feature benchmarking and evaluation for historical document image analysis. Int. J. Doc. Anal. Recogn. (IJDAR) 20, 1–35 (2017)CrossRefGoogle Scholar
  10. 10.
    Alaei, F., Alaei, A., Blumenstein, M., Pal, U.: Document image retrieval based on texture features and similarity fusion. In: 2016 International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1–6. IEEE (2016)Google Scholar
  11. 11.
    Joyce, J.M.: Kullback-leibler divergence. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science, pp. 720–722. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2, 27 (2011)Google Scholar
  13. 13.
    Wang, Y., Guan, L., Venetsanopoulos, A.N.: Kernel cross-modal factor analysis for information fusion with application to bimodal emotion recognition. IEEE Trans. Multimedia 14, 597–607 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Sara Zhalehpour
    • 1
    Email author
  • Andrew Piper
    • 2
  • Chad Wellmon
    • 3
  • Mohamed Cheriet
    • 1
  1. 1.École de Technologie SupérieureUniversity of QuebecMontrealCanada
  2. 2.McGill UniversityMontrealCanada
  3. 3.University of VirginiaCharlottesvilleUSA

Personalised recommendations