Word Spotting in Cursive Handwritten Documents Using Modified Character Shape Codes
There is a large collection of Handwritten English paper documents of Historical and Scientific importance. But paper documents are not recognised directly by computer. Hence the closest way of indexing these documents is by storing their document digital image.
Hence a large database of document images can replace the paper documents. But the document and data corresponding to each image cannot be directly recognised by the computer.
This paper applies the technique of word spotting using Modified Character Shape Code to Handwritten English document images for quick and efficient query search of words on a database of document images. It is different from other Word Spotting techniques as it implements two level of selection for word segments to match search query. First based on word size and then based on character shape code of query. It makes the process faster and more efficient and reduces the need of multiple pre-processing.
KeywordsWord Spot Handwritten Documents Character Shape Code Word Shape Token Modified Character Shape Code Levenshtein Distance Query Search Segmentation
Unable to display preview. Download preview PDF.
- 1.Lawrence Spitz, A.: Using Character Shape Code for word spotting in document images. In: Dori, D., Bruckstein, A. (eds.) Shape, Structure and Pattern Recognition. World Scientific, Singapore (1995)Google Scholar
- 2.Lawrence Spitz, A.: Shape-based word recognition. International Journal on Document Analysis and Recognition (1999)Google Scholar
- 3.Marcolino, A., Ramos, V., Ramalho, M., Caldas Pinto, J.: Line and Word matching in Old Documents. In: Proceedings of the 5th Ibero American Symposium on Pattern Recognition (2000)Google Scholar
- 4.Marti, U.-V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition (2002)Google Scholar
- 5.Manmatha, R., Rath, T.M.: Indexing of Handwritten Historical Documents-Recent Progress, University of Massachusetts AmherstGoogle Scholar
- 6.Burl, M.C., Perona, P.: Using Hierarchical Shape Models to spot keywords in Cursive Handwriting Data, Jet Propulsion Laboratory, California Institute of TechnologyGoogle Scholar
- 7.Nakayama, T.: Method and Apparatus for highlighting and categorizing document-susing Coded Word Tokens, Xerox Corporation & Fuji Xerox Co. Ltd., Patent:5526443Google Scholar
- 8.Manmatha, R.: Indexing Handwriting Words using Word Matching, Computer Science Department Faculty Publication Series, Paper 203Google Scholar
- 9.Casey, R.G., Lecolinet, E.: A survey of methods and strategies in Character Segmentation. IEEE Transaction on Patter Analysis and Machine Intelligence (1996)Google Scholar