Preprocessing and Segmentation of Bad Quality Machine Typed Documents
The goal of the international project Memorial is automatic retrieval from machine typed paper documents belonging to several classes. In this paper the problem of pre-processing and segmentation of scanned archival documents is considered. The goal of these processes is to exactly determine text regions in the document for further OCR processing. Text regions are initially located for each document’s class as XML templates. Then, region-matching algorithm is used to precisely locate regions in current document.
- 1.Wiszniewski, B.: The Virtual Memorial Project, http://docmaster.eti.pg.gda.pl
- 2.Lebied, J., Podgórski, A., Szwoch, M.: Quality Evaluation of Computer Aided Information Retrieval from Machine Typed Paper Documents. In: proceedings of Third Confrence on Recogntion Systems KOSYR 2003, Technical Uniwersity of Wroclaw, Poland, Wroclaw (2003)Google Scholar
- 3.Malina, W., Ablameyko, S., Pawlak, W.: The foundations of digital image processing. In: EXIT 2002, Acad. Press, New York (2002) (in polish)Google Scholar
- 4.Szwoch, M.: Musical notation recognition using context free attribute grammars, Ph.D. thesis, ETIF Technical Uniwersity of Gdansk, Gdansk (2002) (in polish)Google Scholar
- 5.Sahoo, P.K., et al.: A Survey of Thresholding Techniques, CVGIP 41 (1988)Google Scholar