Skip to main content

Knowledge Integration Inside Multitask Network for Analysis of Unseen ID Types

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2023 Workshops (ICDAR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14194))

Included in the following conference series:

  • 374 Accesses

Abstract

Identity Document recognition is a key step in Know Your Customer applications where identity documents (IDs) are verified. IDs belonging to the same type share the same field structure called template. Traditional ID pipelines leverage this template to guide the localisation of the fields and then the text recognition. However, they have to be tuned to the different templates to correctly perform on those. Thus, such pipelines can not be directly used on new types of IDs. In this work, we address the task of text localisation and recognition in the context of new document types, where only the template is available with no labeled samples from the new ID type. To that end, we propose the use of Context Blocks (CB) performing template self-attention to guide the features of the network by the template. We propose three ways to leverage CB in a multitask architecture. To evaluate our approach, we design a new public task for the MIDV2020 database from rectified in-the-wild photos. Our method achieves the best results for two datasets including an industrial one composed of real examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://gitlab.inria.fr/tneittho/midv2020-rectified-photo.

References

  1. Attivissimo, F., Giaquinto, N., Scarpetta, M., Spadavecchia, M.: An automatic reader of identity documents. In: IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 3525–3530 (2019)

    Google Scholar 

  2. Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. In: Advances in Neural Information Processing Systems, vol. 29 (2016)

    Google Scholar 

  3. Bulatov, K.B., Bezmaternykh, P.V., Nikolaev, D.P., Arlazarov, V.V.: Towards a unified framework for identity documents analysis and recognition. Comput. Opt. 46(3), 436–454 (2022)

    Article  Google Scholar 

  4. Bulatov, K., Arlazarov, V.V., Chernov, T., Slavin, O., Nikolaev, D.: Smart IDReader: document recognition in video stream. In: ICDAR, vol. 6, pp. 39–44. IEEE (2017)

    Google Scholar 

  5. Bulatov, K.B., Emelianova, E., Tropin, D.V., et al.: MIDV-2020: a comprehensive benchmark dataset for identity document analysis. CoRR, abs/2107.00396 (2021)

    Google Scholar 

  6. Carbonell, M., Fornés, A., Villegas, M., Lladós, J.: A neural model for text localization, transcription and named entity recognition in full pages. Pattern Recogn. Lett. 136, 219–227 (2020)

    Article  Google Scholar 

  7. Coquenet, D., Chatelain, C., Paquet, T.: SPAN: a simple predict & align network for handwritten paragraph recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12823, pp. 70–84. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86334-0_5

    Chapter  Google Scholar 

  8. Coquenet, D., Chatelain, C., Paquet, T.: End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 508–524 (2022)

    Article  Google Scholar 

  9. Coquenet, D., Chatelain, C., Paquet, T.: DAN: a segmentation-free document attention network for handwritten document recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2023)

    Google Scholar 

  10. Coüasnon, B.: DMOS, a generic document recognition method: application to table structure analysis in a general and in a specific way. IJDAR 8, 111–122 (2006)

    Article  Google Scholar 

  11. d’Andecy, V.P., Hartmann, E., Rusinol, M.: Field extraction by hybrid incremental and a-priori structural templates. In: DAS Workshop, pp. 251–256. IEEE (2018)

    Google Scholar 

  12. Guerry, C., Couasnon, B., Lemaitre, A.: Combination of deep learning and syntactical approaches for the interpretation of interactions between text-lines and tabular structures in handwritten documents. In: ICDAR (2019)

    Google Scholar 

  13. Kushibar, K., Valverde, S., Gonzalez-Villa, S., et al.: Automated sub-cortical brain structure segmentation combining spatial and deep convolutional features. Med. Image Anal. 48, 177–186 (2018)

    Article  Google Scholar 

  14. Mustafina, V., Ivanov, S.: Identity document recognition: neural network approach. In: International Russian Automation Conference, pp. 806–811 (2021)

    Google Scholar 

  15. Sarshogh, M.R., Hines, K.: A multi-task network for localization and recognition of text in images. In: ICDAR, pp. 494–501 (2019)

    Google Scholar 

  16. Van Hoai, D.P., Duong, H.T., Hoang, V.T.: Text recognition for Vietnamese identity card based on deep features network. IJDAR 24, 123–131 (2021)

    Article  Google Scholar 

  17. Yousef, M., Bishop, T.E.: OrigamiNet: weakly-supervised, segmentation free, one-step, full page text recognition by learning to unfold. In: CVPR (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Timothée Neitthoffer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Neitthoffer, T., Lemaitre, A., Coüasnon, B., Soullard, Y., Awal, A.M. (2023). Knowledge Integration Inside Multitask Network for Analysis of Unseen ID Types. In: Coustaty, M., Fornés, A. (eds) Document Analysis and Recognition – ICDAR 2023 Workshops. ICDAR 2023. Lecture Notes in Computer Science, vol 14194. Springer, Cham. https://doi.org/10.1007/978-3-031-41501-2_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41501-2_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41500-5

  • Online ISBN: 978-3-031-41501-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics