Attribute CNNs for word spotting in handwritten documents

Abstract

Word spotting has become a field of strong research interest in document image analysis over the last years. Recently, AttributeSVMs were proposed which predict a binary attribute representation (Almazán et al. in IEEE Trans Pattern Anal Mach Intell 36(12):2552–2566, 2014). At their time, this influential method defined the state of the art in segmentation-based word spotting. In this work, we present an approach for learning attribute representations with convolutional neural networks(CNNs). By taking a probabilistic perspective on training CNNs, we derive two different loss functions for binary and real-valued word string embeddings. In addition, we propose two different CNN architectures, specifically designed for word spotting. These architectures are able to be trained in an end-to-end fashion. In a number of experiments, we investigate the influence of different word string embeddings and optimization strategies. We show our attribute CNNs to achieve state-of-the-art results for segmentation-based word spotting on a large variety of data sets.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    https://memory.loc.gov/ammem/gwhtml/.

  2. 2.

    http://www.fki.inf.unibe.ch/databases/iam-historical-document-database/washington-database.

  3. 3.

    http://ciir.cs.umass.edu/downloads/old/data_sets.html.

  4. 4.

    Cross validation partitions available at https://github.com/almazan/watts/tree/master/data.

  5. 5.

    https://www.prhlt.upv.es/contests/icfhr2016-kws/data.html.

  6. 6.

    https://github.com/ssudholt/phocnet.

  7. 7.

    We denote the classic stochastic gradient descent optimization as SGD and the Adam optimization [21] as Adam although technically Adam is a form of stochastic gradient descent as well.

  8. 8.

    http://scikit-learn.org/.

References

  1. 1.

    Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: International Conference on Database Theory, pp. 420–434 (2001)

  2. 2.

    Aldavert, D., Rusinol, M., Toledo, R., Llados, J.: Integrating visual and textual cues for query-by-string word spotting. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 511–515 (2013)

  3. 3.

    Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)

    Article  Google Scholar 

  4. 4.

    Balntas, V., Johns, E., Tang, L., Mikolajczyk, K.: PN-Net: conjoined triple deep network for learning local image descriptors. arXiv (2016)

  5. 5.

    Chollet, F.: Information-theoretical label embeddings for large-scale image classification. arXiv (2016)

  6. 6.

    Dai, B., Ding, S., Wahba, G.: Multivariate Bernoulli distribution. Bernoulli 19(4), 1465–1483 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  7. 7.

    Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78 (2012)

    Article  Google Scholar 

  8. 8.

    Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  9. 9.

    Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: Computer Vision and Pattern Recognition, pp. 1778–1785. Miami (2009)

  10. 10.

    Fischer, A., Keller, A., Frinken, V., Bunke, H.: HMM-based word spotting in handwritten documents using subword models. In: Proceedings of the International Conference on Pattern Recognition, pp. 3416–3419 (2010)

  11. 11.

    Frinken, V., Fischer, A., Manmatha, R., Bunke, H.: A novel word spotting method based on recurrent neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 34, 211–224 (2012)

    Article  Google Scholar 

  12. 12.

    Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of the International Conference on Machine Learning, pp. 1050–1059. New York City (2016)

  13. 13.

    Giotis, A.P., Sfikas, G., Gatos, B., Nikou, C.: A survey of document image word spotting techniques. Pattern Recogn. 68, 310–332 (2017)

    Article  Google Scholar 

  14. 14.

    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. 15, pp. 315–323 (2011)

  15. 15.

    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Proceedings of the European Conference on Computer Vision, pp. 346–361 (2014)

  16. 16.

    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the International Conference on Computer Vision, pp. 1026–1034 (2015)

  17. 17.

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 770–778. Las Vegas (2016)

  18. 18.

    Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: Neural Information Processing Systems. Montreal (2014)

  19. 19.

    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T., Eecs, U.C.B.: Caffe: convolutional architecture for fast feature embedding. In: ACM Conference on Multimedia, pp. 675–678. Orlando (2014)

  20. 20.

    Johnson, J., Karpathy, A., Fei-Fei, L.: DenseCap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4565–4574. Las Vegas (2016)

  21. 21.

    Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations. San Diego (2015)

  22. 22.

    Kleber, F., Fiel, S., Diem, M., Sablatnig, R.: CVL-database: an off-line database for writer retrieval, writer identification and word spotting. In: International Conference on Document Analysis and Recognition, pp. 560–564. Washingotn (2013)

  23. 23.

    Kołcz, A., Alspector, J., Augusteijn, M., Carlson, R., Viorel Popescu, G.: A line-oriented approach to word spotting in handwritten documents. Pattern Anal. Appl. 3(2), 154–168 (2000)

    Google Scholar 

  24. 24.

    Krishnan, P., Dutta, K., Jawahar, C.: Deep feature embedding for accurate recognition and retrieval of handwritten Text. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 289–294 (2016)

  25. 25.

    Krishnan, P., Jawahar, C.: Matching handwritten document images. In: European Conference on Computer Vision. Amsterdam (2016)

  26. 26.

    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105. Montreal (2012)

  27. 27.

    Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: Computer Vision and Pattern Recognition, pp. 951–958. Miami (2009)

  28. 28.

    Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2014)

    Article  Google Scholar 

  29. 29.

    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. New York City (2006)

  30. 30.

    LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems, pp. 396–404. Denver (1990)

  31. 31.

    Manmatha, R., Han, C., Riseman, E.: Word spotting: a new approach to indexing handwriting. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–29 (1996)

  32. 32.

    Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)

    Article  MATH  Google Scholar 

  33. 33.

    Nielsen, M.A.: Neural Networks and Deep Learning. Determination Press (2015)

  34. 34.

    Ojala, M., Garriga, G.C.: Permutation tests for studying classifier performance. J. Mach. Learn. Res. 11, 1833–1863 (2010)

    MathSciNet  MATH  Google Scholar 

  35. 35.

    Pechwitz, M., Maddouri, S., Märgner, V.: IFN/ENIT-database of handwritten Arabic words. Colloque International Francophone sur l’Ecrit et le Document, pp. 1–8 (2002)

  36. 36.

    Poznanski, A., Wolf, L.: CNN-N-Gram for Handwriting Word Recognition. In: Computer Vision and Pattern Recognition, pp. 2305–2314. Las Vegas (NV), USA (2016)

  37. 37.

    Pratikakis, I., Zagoris, K., Gatos, B., Puigcerver, J., Toselli, A.H., Vidal, E.: ICFHR2016 handwritten keyword spotting competition (H-KWS 2016). In: International Conference on Frontiers in Handwriting Recognition, pp. 613–618. Shenzhen (2016)

  38. 38.

    Rath, T.M., Manmatha, R.: Word spotting for historical documents. Int. J. Doc. Anal. Recogn. 9, 139–152 (2007)

    Article  Google Scholar 

  39. 39.

    Retsinas, G., Sfikas, G., Gatos, B.: Transferable deep features for keyword spotting. In: Proceedings of the European Signal Processing Conference. Kos Island (2017)

  40. 40.

    Rodríguez-Serrano, J.A., Perronnin, F.: A model-based sequence similarity with application to handwritten word spotting. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2108–2120 (2012)

    Article  Google Scholar 

  41. 41.

    Rodriguez-Serrano, J.A., Perronnin, F.: Label embedding for text recognition. In: British Machine Vision Conference (2013)

  42. 42.

    Romero, V., Fornés, A., Serrano, N., Sánchez, J.A., Toselli, A.H., Frinken, V., Vidal, E., Lladós, J.: The ESPOSALLES database: an ancient marriage license corpus for off-line handwriting recognition. Pattern Recogn. 46(6), 1658–1669 (2013)

    Article  Google Scholar 

  43. 43.

    Rothacker, L., Fink, G.A.: Segmentation-free query-by-string word spotting with bag-of-features HMMs. In: International Conference on Document Analysis and Recognition, pp. 661–665. Nancy (2015)

  44. 44.

    Rothacker, L., Rusinol, M., Fink, G.A.: Bag-of-features HMMs for segmentation-free word spotting in handwritten documents. In: International Conference on Document Analysis and Recognition, pp. 1305–1309 (2013)

  45. 45.

    Rothacker, L., Sudholt, S., Rusakov, E., Kasperidus, M., Fink, G.A.: Word hypotheses for segmentation-free word spotting in historic document images. In: Proceedings of the International Conference on Document Analysis and Recognition. Kyoto (2017)

  46. 46.

    Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Browsing heterogeneous document collections by a segmentation-free word spotting method. In: International Conference on Document Analysis and Recognition, pp. 63–67. Beijing (2011)

  47. 47.

    Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Efficient segmentation-free keyword spotting in historical document collections. Pattern Recogn. 48(2), 545–555 (2015)

    Article  Google Scholar 

  48. 48.

    Rusiñol, M., Aldavert, D., Toledo, R., Lladós, J.: Towards query-by-speech handwritten keyword spotting. In: International Conference on Document Image Analysis, pp. 501–505. Nancy (2015)

  49. 49.

    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    MathSciNet  Article  Google Scholar 

  50. 50.

    Shalizi, C.R.: Advanced Data Analysis from an Elementary Point of View. Cambridge University Press, Cambridge (2013)

    Google Scholar 

  51. 51.

    Sharma, A., Pramod, S.K.: Adapting off-the-shelf CNNs for word spotting & recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 986–990 (2015)

  52. 52.

    Silberpfennig, A., Wolf, L., Dershowitz, N., Bhagesh, S., Chaudhuri, B.B.: Improving OCR for an under-resourced script using unsupervised word-spotting. In: International Conference on Document Analysis and Recognition, pp. 706–710. Nancy (2015)

  53. 53.

    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning Representations (2015)

  54. 54.

    Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: Conference on Information and Knowledge Management, pp. 623–632. Lisbon (2007)

  55. 55.

    Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. In: Proceedings of the International Conference on Learning Representations (2015)

  56. 56.

    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  57. 57.

    Sudholt, S., Fink, G.A.: A modified isomap approach to manifold learning in word spotting. In: Proceedings of the German Conference on Pattern Recognition, pp. 529–539 (2015)

  58. 58.

    Sudholt, S., Fink, G.A.: PHOCNet: a deep convolutional neural network for word spotting in handwritten documents. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 277–282 (2016)

  59. 59.

    Sudholt, S., Fink, G.A.: Evaluating word string embeddings and loss functions for CNN-based word spotting. In: Proceedings of the International Conference on Document Analysis and Recognition (2017)

  60. 60.

    Sudholt, S., Gurjar, N., Fink, G.A.: Learning deep representations for word spotting under weak supervision. arXiv (2017)

  61. 61.

    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., Hill, C., Arbor, A.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2014)

  62. 62.

    Tieleman, T., Hinton, G.: Lecture 6.5–RMSprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4, 26–31 (2012)

    Google Scholar 

  63. 63.

    Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word graph based keyword spotting in handwritten document images. Inf. Sci. 370, 497–518 (2016)

    Article  Google Scholar 

  64. 64.

    Wilkinson, T., Brun, A.: Semantic and verbatim word spotting using deep neural networks. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 307–312 (2016)

Download references

Acknowledgements

We would like to thank Irfan Ahmad for supplying the IFN/ENIT character mapping.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Sebastian Sudholt.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sudholt, S., Fink, G.A. Attribute CNNs for word spotting in handwritten documents. IJDAR 21, 199–218 (2018). https://doi.org/10.1007/s10032-018-0295-0

Download citation

Keywords

  • Attribute CNN
  • PHOCNet
  • TPP layer
  • Word spotting
  • Deep learning
  • Handwritten documents
  • Historical documents