Learning to Segment Document Images
A hierarchical framework for document segmentation is proposed as an optimization problem. The model incorporates the dependencies between various levels of the hierarchy unlike traditional document segmentation algorithms. This framework is applied to learn the parameters of the document segmentation algorithm using optimization methods like gradient descent and Q-learning. The novelty of our approach lies in learning the segmentation parameters in the absence of groundtruth.
KeywordsSegmentation Algorithm Document Image Text Line Foreground Pixel Text Block
- 2.Mao, S., Kanungo, T.: Emperical performance evaluation methodology and its application to page segmentation algorithms. IEEE Transactions on PAMI 23, 242–256 (2001)Google Scholar
- 3.Sylwester, D., Seth, S.: Adaptive segmentation of document images. In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, Seattle, WA, pp. 827–831 (2001)Google Scholar
- 5.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar