Skip to main content

Self-supervised Vision Transformers with Data Augmentation Strategies Using Morphological Operations for Writer Retrieval

  • Conference paper
  • First Online:
Frontiers in Handwriting Recognition (ICFHR 2022)

Abstract

This paper introduces a self-supervised approach using vision transformers for writer retrieval based on knowledge distillation. We propose morphological operations as a general data augmentation method for handwriting images to learn discriminative features independent of the pen. Our method operates on binarized \(224\times 224\)-sized patches extracted of the documents’ writing region, and we generate two different views based on randomly sampled kernels for erosion and dilation to learn a representative embedding space invariant to different pens. Our evaluation shows that morphological operations outperform data augmentation generally used in retrieval tasks, e.g., flipping, rotation, and translation, by up to 8%. Additionally, we evaluate our data augmentation strategy to existing approaches such as networks trained with triplet loss. We achieve a mean average precision of 66.4% on the Historical-WI dataset, competing with methods using algorithms like SIFT for patch extraction or computationally expensive encodings, e.g., mVLAD, NetVLAD, or E-SVM. In the end, we show by visualizing the attention mechanism that the heads of the vision transformer focus on different parts of the handwriting, e.g., loops or specific characters, enhancing the explainability of our writer retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Christlein, V., Gropp, M., Fiel, S., Maier, A.: Unsupervised feature learning for writer identification and writer retrieval. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 991–997 (2017)

    Google Scholar 

  2. Rasoulzadeh, S., BabaAli, B.: Writer identification and writer retrieval based on NetVLAD with Re-ranking. IET Biometrics 11(1), 10–22 (2022)

    Article  Google Scholar 

  3. Keglevic, M., Fiel, S., Sablatnig, R.: Learning features for writer retrieval and identification using triplet CNNs. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 211–216 (2018)

    Google Scholar 

  4. Christlein, V., Bernecker, D., Angelopoulou, E.: Writer identification using VLAD encoded Contour-Zernike moments. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 906–910 (2015)

    Google Scholar 

  5. Kleber, F., Fiel, S., Diem, M., Sablatnig, R.: CVL-DataBase: an o-line database for writer retrieval, writer identification and word spotting. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 560–564 (2013)

    Google Scholar 

  6. Louloudis, G., Gatos, B., Stamatopoulos, N., Papandreou, A.: ICDAR 2013 competition on writer identification. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1397–1401 (2013)

    Google Scholar 

  7. Buhrmester, V., Münch, D., Arens, M.: Analysis of explainers of black box deep neural networks for computer vision: a survey. Mach. Learn. Knowl. Extr. 3(4), 966–989 (2021)

    Article  Google Scholar 

  8. Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in vision: a survey. ACM Comput. Surv. 54(10s), 1–41 (2021)

    Article  Google Scholar 

  9. Dosovitskiy, A., et al.: An image is Worth 16x16Words: transformers for image recognition at scale. ICLR (2021)

    Google Scholar 

  10. Abnar, S., Zuidema, W.: Quantifying attention flow in transformers. In: Proceedings of the 58th Annual Meeting of the Association for Compu- tational Linguistics, Online: Association for Computational Linguistics, pp. 4190–4197 (2020)

    Google Scholar 

  11. Chefer, H., Gur, S., Wolf, L.: Transformer interpretability beyond attention visualization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 782–791 (2021)

    Google Scholar 

  12. Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  13. Fiel, S., Sablatnig, R.: Writer identification and retrieval using a convolutional neural network. In: CAIP (2015)

    Google Scholar 

  14. Fiel, S., et al.: ICDAR2017 competition on historical document writer identification (Historical-WI). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 1377–1382 (2017)

    Google Scholar 

  15. Wang, Z., Maier, A., Christlein, V.: Towards end-to-end deep learning-based writer identification. In: INFORMATIK,: Gesellschaft für Informatik. Bonn, pp. 1345–1354 (2020)

    Google Scholar 

  16. Baevski, A., Hsu, W.N., Xu, Q., Babu, A., Gu, J., Auli, M.: Data2vec: a general framework for self-supervised learning in speech, vision and language (2022)

    Google Scholar 

  17. Souibgui, M.A., et al.: DocEnTr: an end-to-end document image enhancement transformer. arXiv preprint arXiv:-2201.10252 (2022)

    Google Scholar 

  18. Khamekhem Jemni, S., Souibgui, M.A., Kessentini, Y., Fornés, A.: Enhance to read better: a multi-task adversarial network for handwritten document image enhancement. Pattern Recognit. 123, 108 370 (2022)

    Google Scholar 

  19. Liu, Z., et al.: Swin transformer V2: scaling up capacity and resolution. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  20. Yan, H., Zhang, C., Wu, M.: Lawin transformer: improving semantic segmentation transformer with multi-scale representations via large window attention. ArXiv, vol. abs/2201.01615 (2022)

    Google Scholar 

  21. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

Download references

Acknowledgment

The project has been funded by the Austrian security research programme KIRAS of the Federal Ministry of Agriculture, Regions and Tourism (BMLRT) under the Grant Agreement 879687.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Peer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peer, M., Kleber, F., Sablatnig, R. (2022). Self-supervised Vision Transformers with Data Augmentation Strategies Using Morphological Operations for Writer Retrieval. In: Porwal, U., Fornés, A., Shafait, F. (eds) Frontiers in Handwriting Recognition. ICFHR 2022. Lecture Notes in Computer Science, vol 13639. Springer, Cham. https://doi.org/10.1007/978-3-031-21648-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21648-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21647-3

  • Online ISBN: 978-3-031-21648-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics