Abstract
This chapter addresses the problem of ancient Arabic document segmentation. As ancient documents have neither a real physical structure nor a logical one, the segmentation will be limited to textual areas or to line extraction in the areas. Although this type of segmentation appears quite simple, its implementation remains a challenging task. This is due to the state of many old documents; the image is of low quality, and the lines are not straight, but sinuous and connected. Given the failure of traditional methods, we proposed a method for line extraction in multi-oriented documents. The method is based on an image meshing that allows one to detect the orientations locally and safely. These orientations are then extended to larger areas. The orientation estimation uses the energy distribution of Cohen’s class, which is more accurate than the projection method. Then, the method exploits the projection peaks to follow the connected components forming text lines. The approach ends with a final separation of connected lines, based on the exploitation of the morphology of terminal letters.
Keywords
Minimal Span Tree Text Line Active Contour Model Orientation Estimation Gradient Vector FlowReferences
- 1.Antonacopoulos, A., Karatzas, D.: Document image analysis for World War II personal records. In: International Workshop on Document Image Analysis for Libraries, pp. 336–343 (2004) CrossRefGoogle Scholar
- 2.Auger, F., Doncarli, C.: Quelques commentaires sur des représentations temps-fréquence proposées récemment. Trait. Signal 9(1), 3–25 (1992) MathSciNetGoogle Scholar
- 3.Bennasri, A., Zahour, A., Taconet, B.: Extraction des lignes d’un texte manuscrit Arabe. In: Vision Interface’99, pp. 42–48 (1999) Google Scholar
- 4.Bukhari, S.S., Shafait, F., Breuel, T.M.: Segmentation of curled textlines using active contours. In: The Eighth IAPR Workshop on Document Analysis Systems (DAS 2008), pp. 270–277 (2008) CrossRefGoogle Scholar
- 5.Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. In: International Conference on Computer Vision, pp. 694–699 (1995) CrossRefGoogle Scholar
- 6.Classen, T.A.C.M., Mecklenbrauker, W.F.G.: The Wigner distribution—a tool for time frequency analysis, Part I. Philips J. Res. 35(3), 217–250 (1980) MathSciNetGoogle Scholar
- 7.Classen, T.A.C.M., Mecklenbrauker, W.F.G.: The Wigner distribution—a tool for time frequency analysis, Part II. Philips J. Res. 35(4/5), 372–389 (1980) MathSciNetGoogle Scholar
- 8.Classen, T.A.C.M., Mecklenbrauker, W.F.G.: The Wigner distribution—a tool for time frequency analysis, Part III. Philips J. Res. 35(6), 372–389 (1980) MathSciNetGoogle Scholar
- 9.Coasnon, B., Camillerapp, J.: DMOS, une methode gnrique de reconnaissance de documents: valuation sur 60 000 formulaires du XIXe sicle. In: Actes du Colloque International Francophone sur l’Crit et le Document (CIFED’02), Hammamet (2002) Google Scholar
- 10.Cohen, L.: Generalized phase-space distribution functions. J. Math. Phys. 7(5), 781–786 (1966) CrossRefGoogle Scholar
- 11.Du, X., Pan, W., Bui, T.D.: Text line segmentation in handwritten documents using Mumford–Shah model. Pattern Recognit. 42(12), 3136–3145 (2009) MATHCrossRefGoogle Scholar
- 12.Escudié, B., Gréa, J.: Sur une formulation générale de la représentation en temps et en fréquence dans l’analyse des signaux d’énergie finie. C. R. Acad. Sci. Paris 283, 1049–1051 (1976) Google Scholar
- 13.Feldbach, M., Tönnies, K.D.: Line detection and segmentation in historical church registers. In: ICDAR’01: Proceedings of the Sixth International Conference on Document Analysis and Recognition, pp. 743–748 (2001) CrossRefGoogle Scholar
- 14.Flandrin, P.: Time–Frequency/Time-Scale Analysis. Academic Press, San Diego (1999) MATHGoogle Scholar
- 15.Hlawatsch, F., Boudreaux-Bartels, G.F.: Linear and quadratic time–frequency signal representation. IEEE Signal Process. Mag. 9(2), 21–67 (1992) CrossRefGoogle Scholar
- 16.Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. In: Proceedings of 1st ICCV, pp. 259–268 (1987) Google Scholar
- 17.Kavallieratou, E., Fakotakis, N., Kokkinakis, G.: Skew angle estimation in document processing using Cohen’s class distributions. Pattern Recognit. Lett. 20, 11–13 (1999) CrossRefGoogle Scholar
- 18.Kavallieratou, E., Fakotakis, N., Kokkinakis, G.: Skew angle estimation for printed and handwritten documents using the Wigner–Ville distribution. Image Vis. Comput. 20, 813–824 (2002) CrossRefGoogle Scholar
- 19.Le Bourgeois, F., Emptoz, H., Trinh, E., Duong, J.: Networking digital document images. In: 6th International Conference on Document Analysis and Recognition, Seattle (2001) Google Scholar
- 20.Leitner, F., Cinquin, P.: From splines and snakes to SNAKE SPLINES. In: Selected Papers from the Workshop on Geometric Reasoning for Perception and Action, pp. 264–281. Springer, Berlin (1993) CrossRefGoogle Scholar
- 21.Li, Y., Zheng, Y., Doermann, D., Jaeger, S.: Script-independent text line segmentation in freestyle handwritten documents. IEEE Trans. Pattern Anal. Mach. Intell. 30(8), 1313–1329 (2008) CrossRefGoogle Scholar
- 22.Likforman-Sulem, L., Faure, C.: Extracting lines on handwritten documents by perceptual grouping. In: Faure, C., Keuss, P., Lorette, G., Winter, A. (eds.) Advances in Handwriting and Drawing: Multidisciplinary Approach, pp. 21–38 (1994) Google Scholar
- 23.Likforman-Sulem, L., Hanimyan, A., Faure, C.: A Hough based algorithm for extracting text lines in handwritten document. In: Proc. of ICDAR’95, pp. 774–777 (1995) Google Scholar
- 24.Louloudis, G., Gatos, B., Pratikakis, I., Halatsis, C.: Text line and word segmentation of handwritten documents. Pattern Recognit. 42(12), 3169–3183 (2009) MATHCrossRefGoogle Scholar
- 25.Lu, Y., Wang, Z., Tan, C.L.: Word grouping in document images based on voronoi tessellation. In: Document Analysis Systems, pp. 147–157 (2004) CrossRefGoogle Scholar
- 26.Malleron, V., Eglin, V., Emptoz, H., Dord-Crouslé, S., Régnier, P.: Text lines and snippets extraction for 19th century handwriting documents layout analysis. In: Proceedings of the Tenth International Conference on Document Analysis and Recognition (ICDAR), pp. 1001–1005 (2009) CrossRefGoogle Scholar
- 27.Montagnat, J., Delingette, H., Ayache, N.: A review of deformable surfaces: topology, geometry and deformation. Image Vis. Comput. 19(14), 1023–1040 (2001) CrossRefGoogle Scholar
- 28.Nagy, G., Falsafi, A.: Using vanishing points to locate objects with six degrees of freedom. In: Gelsema, E.S., Kanal, L. (eds.) Pattern Recognition and Artificial Intelligence, pp. 123–139. North-Holland, Amsterdam (1988) Google Scholar
- 29.Nicolaou, A., Gatos, B.: Handwritten text line segmentation by shredding text into its lines. In: International Conference on Document Analysis and Recognition, pp. 626–630 (2009) CrossRefGoogle Scholar
- 30.Nicolas, S., Paquet, T., Heutte, L.: Text line segmentation in handwritten document using a production system. In: International Workshop on Frontiers in Handwriting Recognition, pp. 245–250 (2004) CrossRefGoogle Scholar
- 31.Nicolas, S., Paquet, T., Heutte, L.: Text line segmentation in handwritten document using a production system. In: Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition (IWFHR-9) (2004) Google Scholar
- 32.Osher, S., Paragios, N.: Geometric Level Set Methods in Imaging, Vision, and Graphics. Springer, New York (2003) MATHGoogle Scholar
- 33.Ouwayed, N.: Segmentation en lignes de documents anciens: application aux documents Arabes. These de doctorat. Universit Nancy 2 (2010) Google Scholar
- 34.Ouwayed, N., Belaid, A.: Multi-oriented text line extraction from handwritten Arabic documents. In: International Workshop on Document Analysis Systems. IAPR, Nara (2008) Google Scholar
- 35.Ouwayed, N., Belaid, A.: Separation of overlapping and touching lines within handwritten Arabic documents. In: 13th International Conference on Computer Analysis of Images and Patterns (CAIP’2009), pp. 237–244. IEEE Press, New York (2009) CrossRefGoogle Scholar
- 36.Ouwayed, N., Belaid, A.: Une approache generale pour l’extraction des lignes des documents Arabes anciens multi-orientes. In: 12e Colloque International sur le Document Electronique (CIDE.12) (2009) Google Scholar
- 37.Ouwayed, N., Belaid, A., Auger, F.: Estimation de l’inclinaison d’un document Arabe manuscrit numerise par analyse temps-frequence des histogrammes de projection. Trait. Signal 26(4), 307–319 (2009) Google Scholar
- 38.Ouwayed, N., Belaid, A., Auger, F.: General text line extraction approach based on locally orientation estimation. In: Document Recognition and Retrieval XVII, San Jose, California (2010) Google Scholar
- 39.Oztop, E., Mulayim, A.Y., Atalay, V., Yarman Vural, F.: Repulsive attractive network for baseline extraction on document images. Signal Process. 75, 1–10 (1999) CrossRefGoogle Scholar
- 40.Pluempitiwiriyawej, C., Moura, J.M.F., Wu, Y.J.L., Ho, C.: STACS: new active contour scheme for cardiac MR image segmentation. IEEE Trans. Med. Imaging 24(5), 593–603 (2005) CrossRefGoogle Scholar
- 41.Pu, Y., Shi, Z.: A natural learning algorithm based on hough transform for text lines extraction in handwritten documents. In: Proceedings of the 6th International Workshop on Frontiers in Handwriting Recognition, Taejon, Korea, pp. 637–646 (1998) Google Scholar
- 42.Ramlau, R., Ring, W.: A Mumford-Shah level-set approach for the inversion and segmentation of X-ray tomography data. J. Comput. Phys. 221(2), 539–557 (2007) MathSciNetMATHCrossRefGoogle Scholar
- 43.Sethian, J.A.: Curvature and the evolution of fronts. Commun. Math. Phys. 101(4), 487–499 (1985) MathSciNetMATHCrossRefGoogle Scholar
- 44.Shapiro, V., Gluchev, G., Sgurev, V.: Handwritten document image segmentation and analysis. Pattern Recognit. Lett. 14(1), 71–78 (1993) CrossRefGoogle Scholar
- 45.Shi, Z., Govindaraju, V.: Line separation for complex document images using fuzzy run length. In: International Workshop on Document Image Analysis for Libraries, pp. 306–313 (2004) Google Scholar
- 46.Xu, C., Prince, J.L.: Gradient vector flow: a new external force for snakes. In: Proc. IEEE Conf. on Comp. Vis. Patt. Recog. (CVPR), pp. 66–71 (1997) Google Scholar
- 47.Yin, F., Liu, C.-L.: Handwritten text line segmentation by clustering with distance metric learning. In: Proc. 11th ICFHR, pp. 229–234 (2008) Google Scholar
- 48.Zahour, A., Taconet, B., Ramdane, S.: Contribution la segmentation de textes manuscrits anciens. In: Conference Internationale Francophone sur l’Ecrit et le Document, CIFED’04 (2004) Google Scholar
- 49.Zahour, A., Likforman-Sulem, L., Boussellaa, W., Taconet, B.: Text line segmentation of historical arabic documents. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 138–142 (2007) CrossRefGoogle Scholar
- 50.Zahour, A., Taconet, B. Likforman-Sulem, L., Boussellaa, W.: Overlapping and multi-touching text-line segmentation by Block Covering analysis. Pattern Anal. Appl. 12(4), 335–351 (2009) MathSciNetCrossRefGoogle Scholar
- 51.Zheng, Y., Li, H., Doermann, D.: A model-based line detection algorithm documents. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR) (2003) Google Scholar