Chapter

Document Analysis Systems VI

Volume 3163 of the series Lecture Notes in Computer Science pp 298-309

Information Retrieval System for Handwritten Documents

  • Sargur SrihariAffiliated withCenter of Excellence for Document Analysis and Recognition (CEDAR), University at Buffalo, State University of New York
  • , Anantharaman GaneshAffiliated withCenter of Excellence for Document Analysis and Recognition (CEDAR), University at Buffalo, State University of New York
  • , Catalin TomaiAffiliated withCenter of Excellence for Document Analysis and Recognition (CEDAR), University at Buffalo, State University of New York
  • , Yong-Chul ShinAffiliated withCenter of Excellence for Document Analysis and Recognition (CEDAR), University at Buffalo, State University of New York
  • , Chen HuangAffiliated withCenter of Excellence for Document Analysis and Recognition (CEDAR), University at Buffalo, State University of New York

* Final gross prices may vary according to local VAT.

Get Access

Abstract

The design and performance of a content-based information retrieval system for handwritten documents is described. System indexing and retrieval is based on writer characteristics, textual content as well as document meta data such as writer profile. Documents are indexed using global image features, e.g., stroke width, slant, word gaps, as well local features that describe shapes of characters and words. Image indexing is done automatically using page analysis, page segmentation, line separation, word segmentation and recognition of characters and words. Several types of queries are permitted: (i) entire document image; (ii) a region of interest (ROI) of a document; (iii) a word image; and (iv) textual. Retrieval is based on a probabilistic model of information retrieval. The system has been implemented using Microsoft Visual C++ and a relational database system. This paper reports on the performance of the system for retrieving documents based on same and different content.