Abstract
Images and signals may be represented by forms invariant to time shifts, spatial shifts, frequency shifts, and scale changes. Advances in time-frequency analysis and scale transform techniques have made this possible. However, factors such as noise contamination and “style” differences complicate this. An example is found in text, where letters and words may vary in size and position. Examples of complicating variations include the font used, corruption during facsimile (fax) transmission, and printer characteristics. The solution advanced in this paper is to cast the desired invariants into separate subspaces for each extraneous factor or group of factors. The first goal is to have minimal overlap between these subspaces and the second goal is to be able to identify each subspace accurately. Concepts borrowed from high-resolution spectral analysis, but adapted uniquely to this problem have been found to be useful in this context. Once the pertinent subspace is identified, the recognition of a particular invariant form within this subspace is relatively simple using well-known singular value decomposition (SVD) techniques. The basic elements of the approach can be applied to a variety of pattern recognition problems. The specific application covered in this paper is word spotting in bitmapped fax documents.
Article PDF
Similar content being viewed by others
References
Agazzi OE and Kuo S (1993) Pseudo two-dimensional hidden Markov models for document recognition. AT&T Technical Journal, 72:60–72.
Chen FR, Wilcox LD and Bloomberg DS (1993) Word spotting in scanned images using hidden Markov models. In: Proc. of the IEEE Int. Conf. on Acoust., Speech, and Signal Processing. IEEE, Vol. 5, pp. 1–4.
Cohen L (1993) The scale representation. IEEE Trans. on Signal Processing, 41(12):3275–3292.
Etemad K, Chellappa R and Doermann D (1994a) Document page segmentation by integrating distributed soft decisions. In: Proc. of the IEEE International Conference on Neural Networks, Vol. 6, pp. 4022–4027.
Etemad K, Doermann D and Chellappa R (1994b) Page segmentation using decision integration and wavelet packets. In: Proc. of the 12th IAPR International Conference on Pattern Recognition, Vol. 2, pp. 345–349.
Ho TK, Hull JJ and Srihari SN (1990) A word shape analysis approach to recognition of degraded word images. In: Proc. of the USPS Advanced Technology Conference. United States Postal Service, pp. 217–231.
Kahan S, Pavlidis T and Baird HS (1987) On the recognition of printed characters of any font and size. PAMI, 9(2):274–288.
Kuo S and Agazzi OE (1994) Keyword spotting in poorly printed documents using pseudo 2–D hidden Markov models. IEEE Trans. on Pattern Analysis and Machine Intelligence, 16(8):842–848.
Marinovich N, Cohen L, Umesh S and Nelson D (1995) Scale-invariant speech analysis via joint time-frequencyscale processing. Proc. Int. Soc. Opt. Eng., 2569:522–537.
Mori S, Suen CY and Yamamoto K (1982) Historical review of OCR research and development. Proc. IEEE, 80:1029–1092.
Pisarenko VF (1973) The retrieval of harmonics from a covariance function. Geophys. J. Royal Astron. Soc., 33:347–366.
Reed T and Wechsler H (1991) Spatial/spatial-frequency representations for image segmentation and grouping. Image and Vision Computing, 9(3):175–193.
Warke N and Orsak GC (1996) An information theoretic methodology for noisy image classification with application to face recognition. In: Proc. Conf. on Information Science and Systems, Princeton, NJ.
Williams WJ and Zalubas EJ (1996) Separating desired image and signal invariant components from extraneous variations. SPIE: Advanced Signal Processing Algorithms, Architectures and Implementations, 2846:262–272.
Williams WJ, Zalubas EJ, Nickel RM and Hero AO III (1998) Scale and translation invariant methods for enhanced time-frequency pattern recognition. Multidimensional Systems and Signal Processing, 9(4):465–473.
Yen C and Shiaw Kuo S (1995) Degraded gray-scale text recognition using pseudo-2D hidden Markov models and N-best hypotheses. Computer Speech and Language, 9:381–405.
Zalubas EJ and Williams WJ (1995) Discrete scale transform for signal analysis. In: Proc. of the IEEE Int. Conf. on Acoust., Speech, and Signal Processing, Vol. 3, pp. 1557–1561.
Zwicke PE and Kiss JI (1983) A new implementation of the Mellin transform and its application to radar classification of ships. IEEE Trans. on Pattern Analysis and Machine Intelligence, 5(2):191–199.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Williams, W.J., Zalubas, E.J. & Hero, A.O. Word Spotting in Bitmapped Fax Documents. Information Retrieval 2, 207–226 (2000). https://doi.org/10.1023/A:1009958827317
Issue Date:
DOI: https://doi.org/10.1023/A:1009958827317