Abstract
The goal of the international project Memorial is automatic retrieval from machine typed paper documents belonging to several classes. In this paper the problem of pre-processing and segmentation of scanned archival documents is considered. The goal of these processes is to exactly determine text regions in the document for further OCR processing. Text regions are initially located for each document’s class as XML templates. Then, region-matching algorithm is used to precisely locate regions in current document.
Chapter PDF
References
Wiszniewski, B.: The Virtual Memorial Project, http://docmaster.eti.pg.gda.pl
Lebied, J., Podgórski, A., Szwoch, M.: Quality Evaluation of Computer Aided Information Retrieval from Machine Typed Paper Documents. In: proceedings of Third Confrence on Recogntion Systems KOSYR 2003, Technical Uniwersity of Wroclaw, Poland, Wroclaw (2003)
Malina, W., Ablameyko, S., Pawlak, W.: The foundations of digital image processing. In: EXIT 2002, Acad. Press, New York (2002) (in polish)
Szwoch, M.: Musical notation recognition using context free attribute grammars, Ph.D. thesis, ETIF Technical Uniwersity of Gdansk, Gdansk (2002) (in polish)
Sahoo, P.K., et al.: A Survey of Thresholding Techniques, CVGIP 41 (1988)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Szwoch, M., Szwoch, W. (2004). Preprocessing and Segmentation of Bad Quality Machine Typed Documents. In: Marinai, S., Dengel, A.R. (eds) Document Analysis Systems VI. DAS 2004. Lecture Notes in Computer Science, vol 3163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28640-0_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-28640-0_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23060-1
Online ISBN: 978-3-540-28640-0
eBook Packages: Springer Book Archive