Abstract
Text-line segmentation is one of the key factors which affect the performance of handwriting recognition system. Therefore, to make recognition systems more effective and accurate, segmentation of touching text-lines is an important task. One of the problems making this task crucial is the presence of touching components (TCs) representing connections between word letters of consecutive text-lines or those of words of the same text-line. The proposed method aims to segment TCs. It is mainly based on two steps: (1) finding for a localized TC a similar model, stored in a dictionary with its correct segmentation, using shape context descriptor and an interpolation function: the thin plate spline transformation, (2) segmenting the TC based on central point of the found similar model parts. TCs are assumed to be already extracted from Arabic manuscript images. Experiments are carried on a common TC database, using two metrics: Manhattan and Euclidean distances. Obtained results outperform the state of the art, considering the different types, variability and complexity of the TCs data set, and show the effectiveness of the proposed TC segmentation method.
Similar content being viewed by others
References
Kang L, Doermann D (2011) Template based segmentation of touching components in handwritten text-lines. Proceedings of the ICDAR, Beijining
Ouwayed N, Belaïd A (2009) Separation of overlapping and touching lines within handwritten Arabic documents. In: Proceedings of the 13th international conference on computer analysis of images and patterns, Münster (North Rhine-Westphalia), pp 237–244
Kumar J, Kang L, Doermann DS, Abd-Almageed W (2011) Segmentation of handwritten text-lines in presence of touching components. In: Proceedings of the ICDAR, pp 109–113
Likforman-Sulem L, Faure C (1995) Une méthode de résolution des conflits d’alignements pour la segmentation des documents manuscrits. Traitement Signal 12(6):541–549
Vassilis P, Themos S, Vassilis K, George C (2010) Handwritten document image segmentation into text lines and words. Pattern Recognit 43(1):369–377
Lemaitre A, Camillerapp J, Cousnon B (2011) A perceptive method for handwritten text segmentation. Proceedings of the DRR, San Francisco
Ouloudis GL, Gatos B, Pratikakis I, Halatsis C (2009) Text-line and word segmentation of handwritten documents. Pattern Recognit 42(12):3169–3183
Takru K, Leedham G (2002) Separation of touching and overlapping words in adjacent lines of handwritten text. Proceedings of the IWFHR, Ontario
Rohini S, Uma Devi RS, Mohanavel S (2012) Segmentation of touching, overlapping, skewed and short handwritten text lines. Int J Comput Appl 49(19):24–27
Ouwayed N (2010) Segmentation en lignes de documents anciens: application aux documents arabes. PhD thesis, Nancy University
Farnandez-Mota D, Lldos J, Fornes A (2014) Graph based approach for segmenting touching lines in historical handwritten documents. In: IJDAR
Kang L, Doermann DS, Cao H , Prasad R, Natarajan P (2012) Local segmentation of touching characters using contour based shape decomposition. In: Proceedings of the DAS, Gold Coast, pp 460–464
Boukerma H, Farah N (2012) PAW-IFN/ENIT: une nouvelle base de pseudo-mots arabes pour une approche de reconnaissance pseudo analytique, vol 1, ARIMA, pp 1–9
Alaei A, Pal U, Nagabhushan P (2011) A new scheme for unconstrained handwritten text-line segmentation. Pattern Recognit 44(4):917–928
Alaei A, Nagabhushan P, Pal U (2011) Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents. Pattern Anal Appl 14:381–394
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape context. In: IEEE transactions on pattern analysis and machine intelligence, pp 509–522
Bookstein FL (1989) Principal warps: thin-plane spline and the decomposition of deformations. In: IEEE transactions on pattern analysis and machine intelligence
Schaefer SE (2007) Graph clustering survey. Elsevier, New York
Piquin P, Viard-Gaudin C, Barba D (1994) Coopération des outils de segmentation et de binarisation de documents. In: Proceedings of the colloque national sur l’Ecrit et le document, Rouen
Ikeda H, Ogawa Y, Koga M, Nishimura H, Sako H, Fujisawa H (1999) A recognition method for touching Japanese handwritten characters. In: Proceedings of the ICDAR 99, pp 641–644
Tseng YH, Lee HJ (1999) Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm. Pattern Recognit Lett 20(8):791–806
You D, Kim G (2003) An approach for locating segmentation points of handwritten digit strings using a neural network. In: Proceedings of the ICDAR, pp 142–146
Vellasques E, Oliveira LS, Sabourin R, Britto AS, Koerich AL (2006) Modeling segmentation cuts using support vector machines. In: Proceedings of the IWFHR, pp 41–46
Pal US, Datta S (2003) Segmentation of Bangla unconstrained handwritten text. In: Proceedings of the ICDAR, pp 1128–1132
Zahour A, Taconet B, Ramdane S (2004) Contribution la segmentation de textes manuscrits anciens. In: Proceedings of the CIFED 2004, La Rochelle
Chen YA, Leedham G (2005) Independent component analysis segmentation algorithm. In: Proceedings of the ICDAR, pp 680–684
Bruzzone E, Coffetti MC (1999) An algorithm for extracting cursive text lines. In: Proceedings of the ICDAR, 20–22 September 1999, pp 749–752
Aouadi N, Kacem A, Belaïd A (2014) Segmentation of touching component in Arabic manuscripts. In: Proceedings of the ICFHR, 1–4 September 2014, pp 452–457
Aouadi N, Amiri S, Kacem A (2013) Segmentation of connected component in Arabic handwritten documents. In: Proceedings of the CIMTA, vol 10. Elsevier, New York, pp 738–746
Aouadi N, Amiri S, Kacem A (2013) Segmentation of touching components in Arabic handwritten documents. In: Proceedings of the CIMTA, 27–28 September 2013, pp 738–746
The Java Tutorial: Thread Pools. http://www.math.unihamburg.de/doc/java/tutorial/essential/threads/group.html. Accessed 14 Feb 2005
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698
Jonker R, Volgenant A (1987) A shortest augmentating path algorithm for dense and sparse linear assignment problem. J Comput 38(4):325–340
Chui H (2001) Non-rigid point matching: algorithms, extensions and applications. PhD dissertation, Yale University
Gatos B, Stamatopoulos N, Louloudis G (2009) ICDAR2009 handwriting segmentation contest. In: Proceedings of the ICDAR, 26–29 July 2009, 1393–1397
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Aouadi, N., Kacem, A. A proposal for touching component segmentation in Arabic manuscripts. Pattern Anal Applic 20, 1005–1027 (2017). https://doi.org/10.1007/s10044-016-0543-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-016-0543-1