Data extraction from form images
In this paper, we describe a system capable of extracting textual information from images of structured documents. In particular the model and the algorithms we described are used to process forms in which the information fields can not be located only by their position on the page, but can also be identified after locating the corresponding instruction fields. The proposed model is based on attributed relational graphs and performs form registration and location of information fields using algorithms based on the hypothesize-and-verify paradigm. The location of instruction fields is carried out in an holistic way, by using connectionist models.
KeywordsAttributed Relational Graphs Document Registration Form Processing Layout Description
Unable to display preview. Download preview PDF.
- 1.M. Bianchini, P. Frasconi and M. Gori. Learning in Multilayered Networks Used as Autoassociators. IEEE Transaction on Neural Networks 1995, Vol. 6, No. 2 pp. 512–515.Google Scholar
- 2.F. Cesarini, M. Gori, S. Marinai, G. Soda. A Hybrid System for Locating Low Level Graphic Items. To appear in Proceedings of the First IAPR Workshop on Graphic Recognition, Pen State University, 1995.Google Scholar
- 3.F. Cesarini, M. Gori, S. Marinai, G. Soda. A System for Data Extraction from Forms of Known Class. To appear in Proceedings of the 3th International Conference on Document Analysis and Recognition, Montreal 1995.Google Scholar
- 4.D. S. Doermann, A. Rosenfeld The Processing of Form Documents. Proceedings of International Conference on Document Analysis and Recognition, 1993, pp. 497–501.Google Scholar
- 5.M.A. Eshera and K.S. Fu. An Image Understanding System using Attributed Symbolic Representation and Inexact Graph-matching. IEEE Transaction on PAMI 1986, Vol. 8, No. 5 pp. 604–617.Google Scholar
- 6.M. D. Garris et als. NIST Form-based Handprint Recognition System. NISTIR 5469. U.S. Department of Commerce. Technology Administration. National Institute of Standards and Technology. July 1994.Google Scholar
- 7.W.E.L. Grimson. Object Recognition by Computer, the Role of Geometric Constraints. Cambridge. MIT Press, 1990.Google Scholar
- 8.S.W. Lam, S.N. Srihari. Multi-domain Document Layout Understanding. Proceedings of International Conference on Document Analysis and Recognition, 1991, pp. 112–120.Google Scholar
- 9.S.W. Lam. An Adaptive Approach to Document Classification and Understanding. Proceedings of the IAPR Workshop on Document Analysis Systems Kaiserslautern, Germany, October 1994.Google Scholar
- 10.Y.Y. Tang, C.De Yan, C.Y. Suen. Document Processing for Automatic Knowledge Acquisition. IEEE Transaction on Knowledge and Data Engineering 1994, Vol. 6, No. 1 pp. 3–20.Google Scholar
- 11.C.D. Yan, Y.Y. Tang, C.Y. Suen. Form Understanding System Based on Form Description Language. Proceedings of International Conference on Document Analysis and Recognition, 1991, pp. 283–293.Google Scholar