Special issue on deep learning for document analysis and recognition

Liu, Cheng-Lin; Fink, Gernot A.; Govindaraju, Venu; Jin, Lianwen

doi:10.1007/s10032-018-0310-5

Special issue on deep learning for document analysis and recognition

Editorial
Published: 20 August 2018

Volume 21, pages 159–160, (2018)
Cite this article

Download PDF

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Special issue on deep learning for document analysis and recognition

Download PDF

Cheng-Lin Liu¹,
Gernot A. Fink²,
Venu Govindaraju³ &
…
Lianwen Jin⁴

5190 Accesses
8 Citations
3 Altmetric
Explore all metrics

Deep learning—designing models and learning algorithms for deep neural networks—has achieved great success in various areas of artificial intelligence. Owing to the advances in theory and learning algorithms, the availability of big training data, and GPU computing, deep learning has yielded superior performance in many applications of pattern recognition and artificial intelligence. Examples include speech recognition, character and text recognition, image segmentation, object detection and recognition, traffic sign recognition, and face recognition. The exploration of new deep-learning models and algorithms as well as their potential applications has attracted great interest and attention.

Deep learning has significantly reshaped Document Analysis and Recognition (DAR) research, a field that analyzes digital contents of document images and handwriting. The adoption of deep learning has improved greatly the performance of character and text recognition (particularly, handwritten and scene-text recognition), text localization and document segmentation. Some of the most successful deep learning models include the convolutional neural network (CNN), the recurrent neural network with long short-term memory (RNN–LSTM), and the fully convolutional network (FCN). Now researchers are applying deep learning to additional document analysis problems, including layout analysis, writer identification and document retrieval.

This special issue acknowledges new advances in DAR that are using deep learning methods. The editors received 15 full submissions by the September 2017 deadline. The topics ranged from document image segmentation, layout analysis, text and object localization to character and text recognition, language modeling and handwritten mathematics recognition. They included signature verification, document retrieval and document understanding. The guest editors created a strict, peer-review process and invited guest reviewers to consider all submissions. At least two reviewers reviewed each paper, and most accepted papers underwent second-round review. Finally, the editors and reviewers accepted five papers for publication in this special issue. The outlined contents follow:

In “Learning to Detect, Localize and Recognize Many Text Objects in Document Images from Few Examples,” Moysset et al. propose a new neural model, which directly predicts object coordinates for text detection in document images. Key components of the model are spatial 2D-LSTM recurrent layers, which convey contextual information between the regions of the image. They use a new form of local parameter sharing to keep the overall number of trainable parameters low. This makes the model more powerful than state-of-the-art applications where training data are not abundant. The model also facilitates the detection of many objects in a single image and can deal with inputs of variable sizes without resizing. The researchers propose two regression strategies that would limit the amount of information produced by the local model components and enhance the localization precision of the coordinate regressor. These strategies: (1) separately predict lower-left and upper-right corners of each object bounding box, followed by combinatorial pairing; (2) only predict the left side of the objects and estimate the right position jointly with text recognition. This has led to good full-page text recognition results in heterogeneous documents. The researchers have performed experiments on the text-line localization task in the Maurdor dataset.

In “Fully Convolutional Network for Handwritten Text Line Segmentation,” Renton et al. present a learning-based method for handwritten text-line segmentation in document images. They use a variant of deep fully convolutional networks (FCN) with dilated convolutions, which allow to never reduce the input resolution and produce a pixel-level labeling. They have trained the FCN to identify X-height labeling as text-line representation, which has many advantages for text recognition. In experiments on a public dataset, they show that the proposed method outperforms the most popular variants of FCN, based on deconvolution or unpooling layers. They provide results that investigate various settings, concluding with a comparison of recent approaches in the cBAD international competition.

In “Integrating Scattering Feature Maps with Convolutional Neural Networks for Malayalam Handwritten Character Recognition,” Manjusha et al. use scattering-transform-based wavelet filters as first-layer convolutional filters in CNN architecture. A series of scattering-transform operations generates the scattering networks. The scattering coefficients generated in the first few layers are effective in capturing the dominant energy contained in the input data patterns. This architecture is equivalent to using scattering wavelet filters as first-layer receptive fields in CNN architecture. They used the proposed hybrid CNN architecture in Malayalam handwritten character recognition. The experimental results confirm that the proposed hybrid CNN architecture, based on scattering feature maps, could outperform CNN’s equivalent self-learning architecture regarding problems of handwritten character recognition.

In “Attribute CNNs for Word Spotting in Handwritten Documents,” Sudholt and Fink present an approach for word spotting in document images using learning attribute representations with convolutional neural networks (CNNs). By taking a probabilistic perspective on training CNNs, they derive two different loss functions for binary and real-valued word string embeddings. They also propose two different CNN architectures specifically designed for word spotting, which can be trained end-to-end. In a number of experiments, they investigate the influence of different word string embeddings and optimization strategies. Their work shows that the proposed ‘Attribute CNNs’ achieve state-of-the-art results for segmentation-based word spotting on a large variety of datasets.

In “Fixed-Sized Representation Learning from Offline Handwritten Signatures of Different Sizes,” Hafemann et al. propose modifying the network architecture of deep convolutional neural network using spatial pyramid pooling to learn about fixed-sized representation from variable-sized signatures. They also are investigating the impact of image resolution used for training, and the impact of adapting (fine-tuning) the representations to new operating conditions (different acquisition protocols, such as writing instruments and scan resolution). On the GPDS dataset, their results compare to state-of-the-art work, while not needing a maximum size to process the signatures. They also show that using higher resolutions (300 or 600 dpi) can improve performance when skilled forgeries from a subset of users are available for feature learning, but lower resolutions (around 100 dpi) can be used only with genuine signatures. When the operating conditions change, the researchers show that fine-tuning can improve performance.

We thank all the authors who shared their invaluable work and all reviewers for their insightful comments in considering the submissions for this special issue. We also want to thank the Editors-in-Chief of “International Journal on Document Analysis and Recognition” for giving us an opportunity to guest-edit this special edition and address the exciting advances in this field. Finally, we want to recognize the hard work of journal assistant Saranya Karunakaran for her ongoing support and assistance in this tremendous effort.

Author information

Authors and Affiliations

Institute of Automation of Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
TU Dortmund University, Dortmund, Germany
Gernot A. Fink
University at Buffalo, Buffalo, USA
Venu Govindaraju
South China University of Technology, Guangzhou, China
Lianwen Jin

Authors

Cheng-Lin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Gernot A. Fink
View author publications
You can also search for this author in PubMed Google Scholar
Venu Govindaraju
View author publications
You can also search for this author in PubMed Google Scholar
Lianwen Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cheng-Lin Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, CL., Fink, G.A., Govindaraju, V. et al. Special issue on deep learning for document analysis and recognition. IJDAR 21, 159–160 (2018). https://doi.org/10.1007/s10032-018-0310-5

Download citation

Published: 20 August 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10032-018-0310-5

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Special issue on deep learning for document analysis and recognition

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation