Self-training Strategies for Handwriting Word Recognition
Handwriting recognition is an emerging subfield of human-computer interaction that has many potential industrial applications, e.g. in postal automation, bank check processing, and automatic form reading. Training a recognizer, however, requires a substantial amount of training examples together with their corresponding ground truth, which needs to be created by humans. A promising way to significantly reduce this effort, and hence cut system development costs, is offered by semi-supervised learning, in which both text with and text without transcription is used for training. However, until today there is no straightforward and established way of semi-supervised learning, particularly not for handwriting recognition. In the self-training approach, an initially trained recognition system creates a new training set from unlabeled data. Using this set, a new recognizer is created. The creation of the training set is done by selecting elements from the unlabeled set, according to their recognition confidence. The success of self-training depends crucially on the data selected. In this paper, we test and compare different rules used to select new training data for single word recognition with and without additional language information in the form of a dictionary. We demonstrate that it is possible to substantially increase the recognition accuracy for both systems.
Unable to display preview. Download preview PDF.
- 2.Palacios, R., Gupta, A., Wang, P.S.: Handwritten Bank Check Recognition Of Courtesy Amounts. Int’l Journal of Image and Graphics 4(2), 1–20 (2004)Google Scholar
- 3.Ye, M., Viola, P.A., Raghupathy, S., Sutanto, H., Li, C.: Learning to Group Text Lines and Regions in Freeform Handwritten Notes. In: Ninth Int’l Conf. on Document Analysis and Recognition, pp. 28–32. IEEE Computer Society Press, Los Alamitos (2007)Google Scholar
- 4.Govindaraju, V., Xue, H.: Fast Handwriting Recognition for Indexing Historical Documents. In: First Int’l Workshop on Document Image Analysis for Libraries, pp. 314–320. IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
- 5.Bunke, H.: Recognition of Cursive Roman Handwriting - Past, Present and Future. In: Proc. 7th Int’l Conf. on Document Analysis and Recognition, August 2003, vol. 1, pp. 448–459 (2003)Google Scholar
- 8.Zhu, X.: Semi-Supervised Learning Literature Survey. Technical Report 1530, Computer Science, University of Wisconsin-Madison (2005), http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
- 9.Seeger, M.: Learning with Labeled and Unlabeled Data. Technical Report, University of Edinburgh, 5 Forest Hill, Edinburgh, EH1 2QL (2002)Google Scholar
- 12.Ball, G.R., Srihari, S.: Prototype Integration in Off-Line Handwriting Recognition Adaptation. In: Proc. Int’l. Conf. on Frontiers in Handwriting Recognition, pp. 529–534 (2008)Google Scholar
- 13.Frinken, V., Bunke, H.: Evaluating Retraining Rules for Semi-Supervised Learning in Neural Network Based Cursive Word Recognition. In: 9th Int’l Conference on Document Analysis and Recognition (accepted for publication) (2009)Google Scholar
- 16.Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A Novel Connectionist System for Unconstrained Handwriting Recognition. IEEE Transaction on Pattern Analysis and Machine Intelligence (accepted for publication)Google Scholar
- 17.Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist Temporal Classification: Labelling Unsegmented Sequential Data with Recurrent Neural Networks. In: 23rd Int’l Conf. on Machine Learning, pp. 369–376 (2006)Google Scholar
- 19.Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-Training. In: COLT 1998: Proc. of the 11th annual Conference on Computational Learning Theory, pp. 92–100. ACM, New York (1998)Google Scholar
- 21.Nigam, K., Ghani, R.: Analyzing the Effectiveness and Applicability of Co-Training. In: 9th Int’l Conf. on Information and Knowledge Management CIKM, pp. 86–93 (2000)Google Scholar