Abstract
Users of OCR systems, from different institutions and scientific disciplines, prefer and produce different transcription styles. This presents a problem for training of consistent text recognition neural networks on real-world data. We propose to extend existing text recognition networks with a Transcription Style Block (TSB) which can learn from data to switch between multiple transcription styles without any explicit knowledge of transcription rules. TSB is an adaptive instance normalization conditioned by identifiers representing consistently transcribed documents (e.g. single document, documents by a single transcriber, or an institution). We show that TSB is able to learn completely different transcription styles in controlled experiments on artificial data, it improves text recognition accuracy on large-scale real-world data, and it learns semantically meaningful transcription style embeddings. We also show how TSB can efficiently adapt to transcription styles of new documents from transcriptions of only a few text lines.
Keywords
- Transcription styles
- Adaptive instance normalization
- Text recognition
- Neural networks
- CTC
This is a preview of subscription content, access via your institution.
Buying options






References
Bell, P., Fainberg, J., Klejch, O., Li, J., Renals, S., Swietojanski, P.: Adaptation algorithms for speech recognition: an overview (2020)
Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: ICDAR 2017, vol. 01, pp. 646–651 (2017)
Causer, T., Grint, K., Sichani, A.M., Terras, M.: ‘Making such bargain’: transcribe Bentham and the quality and cost-effectiveness of crowdsourced transcription. Digit. Sch. Hum. 33(3), 467–487 (2018)
Chowdhury, A., Vig, L.: An efficient end-to-end neural model for handwritten text recognition. CoRR abs/1807.07965 (2018)
Cui, X., Goel, V., Saon, G.: Embedding-based speaker adaptive training of deep neural networks. CoRR abs/1710.06937 (2017)
Delcroix, M., Kinoshita, K., Ogawa, A., Huemmer, C., Nakatani, T.: Context adaptive neural network based acoustic models for rapid adaptation. IEEE/ACM Trans. Audio Speech Lang. Process. 26(5), 895–908 (2018)
Dumoulin, V., Shlens, J., Kudlur, M.: A learned representation for artistic style. CoRR abs/1610.07629 (2016)
Dutta, K., Krishnan, P., Mathew, M., Jawahar, C.V.: Improving CNN-RNN hybrid networks for handwriting recognition. In: ICFHR 2018, pp. 80–85 (2018)
Gemello, R., Mana, F., Scanzio, S., Laface, P., De Mori, R.: Linear hidden transformations for adaptation of hybrid ANN/HMM models. Speech Commun. 49(10–11), 827–835 (2007)
Ghiasi, G., Lee, H., Kudlur, M., Dumoulin, V., Shlens, J.: Exploring the structure of a real-time, arbitrary neural artistic stylization network. CoRR abs/1705.06830 (2017)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML 2006, pp. 369–376 (2006)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, X., Belongie, S.J.: Arbitrary style transfer in real-time with adaptive instance normalization. CoRR abs/1703.06868 (2017)
Kang, L., Rusiñol, M., Fornés, A., Riba, P., Villegas, M.: Unsupervised writer adaptation for synthetic-to-real handwritten word recognition. CoRR abs/1909.08473 (2019)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948 (2018)
Kim, T., Song, I., Bengio, Y.: Dynamic layer normalization for adaptive neural acoustic modeling in speech recognition. CoRR abs/1707.06065 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015)
Kodym, O., Hradiš, M.: Page layout analysis system for unconstrained historic documents (2021)
Li, B., Sim, K.C.: Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems. In: Eleventh Annual Conference of the International Speech Communication Association (2010)
Mana, F., Weninger, F., Gemello, R., Zhan, P.: Online batch normalization adaptation for automatic speech recognition. In: IEEE ASRU 2019, pp. 875–880. IEEE (2019)
Michael, J., Labahn, R., Grüning, T., Zöllner, J.: Evaluating sequence-to-sequence models for handwritten text recognition. In: ICDAR 2019, Sydney, Australia, 20–25 September 2019, pp. 1286–1293. IEEE (2019)
Mohamed, A.R., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: IEEE ICASSP 2012, pp. 4273–4276. IEEE (2012)
Neto, J., et al.: Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system (1995)
Papadopoulos, C., Pletschacher, S., Clausner, C., Antonacopoulos, A.: The impact dataset of historical document images. In: Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing, pp. 123–130 (2013)
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: ICDAR 2017, vol. 01, pp. 67–72 (2017)
Samarakoon, L., Sim, K.C.: Factorized hidden layer adaptation for deep neural network based acoustic modeling. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2241–2250 (2016)
Sarı, L., Thomas, S., Hasegawa-Johnson, M., Picheny, M.: Speaker adaptation of neural networks with learning speaker aware offsets. Interspeech (2019)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. CoRR abs/1507.05717 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Singh, S., Krishnan, S.: Filter response normalization layer: eliminating batch dependence in the training of deep neural networks. CoRR abs/1911.09737 (2019)
Soullard, Y., Swaileh, W., Tranouez, P., Paquet, T., Chatelain, C.: Improving text recognition using optical and language model writer adaptation. In: ICDAR 2019, pp. 1175–1180 (2019)
Swietojanski, P., Li, J., Renals, S.: Learning hidden unit contributions for unsupervised acoustic model adaptation. CoRR abs/1601.02828 (2016)
Wang, Z.Q., Wang, D.: Unsupervised speaker adaptation of batch normalized acoustic models for robust ASR. In: IEEE ICASSP 2017, pp. 4890–4894. IEEE (2017)
Xie, X., Liu, X., Lee, T., Wang, L.: Fast DNN acoustic model speaker adaptation by learning hidden unit contribution features. In: INTERSPEECH, pp. 759–763 (2019)
Zhang, C., Woodland, P.C.: Parameterised sigmoid and ReLU hidden activation functions for DNN acoustic modelling. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D., Shen, H.T.: Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2740–2749 (2019)
Zhao, Y., Li, J., Gong, Y.: Low-rank plus diagonal adaptation for deep neural networks. In: IEEE ICASSP 2016, pp. 5005–5009. IEEE (2016)
Zhao, Y., Li, J., Kumar, K., Gong, Y.: Extended low-rank plus diagonal adaptation for deep and recurrent neural networks. In: IEEE ICASSP 2017, pp. 5040–5044. IEEE (2017)
Acknowledgment
This work has been supported by the Ministry of Culture Czech Republic in NAKI II project PERO (DG18P02OVV055).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kohút, J., Hradiš, M. (2021). TS-Net: OCR Trained to Switch Between Text Transcription Styles. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12824. Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-86337-1_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86336-4
Online ISBN: 978-3-030-86337-1
eBook Packages: Computer ScienceComputer Science (R0)
-
Published in cooperation with
http://www.iapr.org/