Bag-of-visual-words for signature-based multi-script document retrieval

Mandal, Ranju; Roy, Partha Pratim; Pal, Umapada; Blumenstein, Michael

doi:10.1007/s00521-018-3444-y

Bag-of-visual-words for signature-based multi-script document retrieval

Original Article
Published: 22 March 2018

Volume 31, pages 6223–6247, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Ranju Mandal ORCID: orcid.org/0000-0002-3669-2446¹,
Partha Pratim Roy²,
Umapada Pal³ &
…
Michael Blumenstein⁴

6 Citations
2 Altmetric
Explore all metrics

Abstract

An end-to-end architecture for multi-script document retrieval using handwritten signatures is proposed in this paper. The user supplies a query signature sample, and the system exclusively returns a set of documents that contain the query signature. In the first stage, a component-wise classification technique separates the potential signature components from all other components. A bag-of-visual-words powered by SIFT descriptors in a patch-based framework is proposed to compute the features and a support vector machine (SVM)-based classifier was used to separate signatures from the documents. In the second stage, features from the foreground (i.e., signature strokes) and the background spatial information (i.e., background loops, reservoirs etc.) were combined to characterize the signature object to match with the query signature. Finally, three distance measures were used to match a query signature with the signature present in target documents for retrieval. The ‘Tobacco’ (The Legacy Tobacco Document Library (LTDL). University of California, San Francisco, 2007. http://legacy.library.ucsf.edu/) document database and an Indian script database containing 560 documents of Devanagari (Hindi) and Bangla scripts were used for the performance evaluation. The proposed system was also tested on noisy documents, and the promising results were obtained. A comparative study shows that the proposed method outperforms the state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of hand gesture and sign language recognition techniques

Article 08 August 2017

Importance and challenges of handwriting recognition with the implementation of machine learning techniques: a survey

Article 15 May 2024

HP_DocPres: a method for classifying printed and handwritten texts in doctor’s prescription

Article 13 November 2020

References

The Legacy Tobacco Document Library (LTDL) (2007). University of California, San Francisco. http://legacy.library.ucsf.edu/
Suen CY, Xu Q, Lam L (1999) Automatic recognition of handwritten data on cheques—fact or fiction? Pattern Recogni Lett 20:1287–1295
Article Google Scholar
Levy S (2004) “google's two revolutions”. Newsweek. http://www.newsweek.com/googles-two-revolutions-123507
Roy PP, Vazquez E, Lladós J, Baldrich R, Pal U (2008) A system to segment text and symbols from color maps. In: Proceedings of the international workshop on graphics recognition (GREC), pp 245–256
Zhu G, Doermann D (2009) Logo matching for document image retrieval. In: Proceedings of the international conference on document analysis and recognition (ICDAR), pp 606–610
Zhu G, Jaeger S, Doermann D (2006) A robust stamp detection framework on degraded documents. In: Proceedings of SPIE conference on document recognition and retrieval, pp 1–9
Farooq F, Sridharan K, Govindaraju V (2006) Identifying handwritten text in mixed documents. In: Proceedings international conference on pattern recognition (ICPR), pp 1–4
Guo JK, Ma MY (2001) Separating handwritten material from machine printed text using Hidden Markov Models. In: Proceedings of international conference on document analysis and recognition (ICDAR), pp 439–443
Kumar J, Prasad R, Cao H, Abd-Almageed W, Doermann D, Natarajan P (2011) Shape codebook based handwritten and machine printed text zone extraction. In: Proceedings of SPIE, vol 7874. https://doi.org/10.1117/12.876725
Peng X, Setlur S, Govindaraju V, Sitaram R, Bhuvanagiri K (2009) Markov Random Field-based text identification from annotated machine printed documents. In: Proceedings of the international conference on document analysis and recognition (ICDAR), pp 431–435
Zheng Y, Li H, Doermann D (2002) The segmentation and identification of handwriting in noisy document images. In: Proceedings of the document analysis systems (DAS), pp 95–105
Martinez-Diaz M, Fierrez J, Krish RP, Galbally J (2014) Mobile signature verification: feature robustness and performance comparison. IET Biom 3(4):267–277
Article Google Scholar
Galbally J, Diaz-Cabrera M, Ferrer MA, Gomez-Barrero M, Morales A, Fierrez J (2015) On-line signature recognition through the combination of real dynamic data and synthetically generated static data. Pattern Recognit 48(9):2921–2934
Article Google Scholar
Morocho D, Morales A, Fierrez J, Vera-Rodriguez R (2016) Towards human-assisted signature recognition: improving biometric systems through attribute-based recognition. In: Proceedings of the international conference on identity, security and behavior analysis (ISBA)
Blumenstein M, Ferrer Miguel A, Vargas JF (2010) The 4NSigComp2010 off-line signature verification competition: Scenario 2. In: Proceedings of the international conference on frontiers in handwriting recognition (ICFHR), vol 4, pp 721–726
Chalechale A, Naghdy G, Mertins A (2003) Signature-based document retrieval. In: Proceedings of the international symposium on signal processing and information technology (ISSPIT), pp 597–600
Zhu G, Zheng Y, Doermann D, Jaeger S (2009) Signature detection and matching for document image retrieval. IEEE Trans Pattern Anal Mach Intell (PAMI) 31(11):2015–2031
Article Google Scholar
Srinivasan H, Srihari S (2009) Signature-based retrieval of scanned documents using conditional random fields. In: Argamon S, Howard N (eds.) Computational methods for counterterrorism, Springer, Berlin, pp 17–32
Roy PP, Bhowmick S, Pal U, Ramel JY (2012) Signature based document retrieval using GHT of background information. In: Proceedings of the international conference on frontiers in handwriting recognition (ICFHR), pp 225–230
Mandal R, Roy PP, Pal U (2011) Signature segmentation from machine printed documents using Conditional Random Field. In: Proceedings of the international conference on document analysis and recognition (ICDAR), pp 1170–1174
Du X, AbdAlmageed W, Doermann D (2013) Large-scale signature matching using multi-stage hashing. In: Proceedings of the ICDAR, pp 976–980
Briceno JC, Travieso CM, Ferrer MA, Alonso JB, Vargas F (2009) Angular contour parameterization for signature identification. In: LNCS EUROCAST, vol 5717
Dewan H, Xichang W, Jiang L (2010) A content-based retrieval algorithm for document image database. In: Proceedings of the international conference on multimedia technology (ICMT), pp 1–5
Wang H (2010) Document logo detection and recognition using Bayesian model. In: Proceedings of the international conference on pattern recognition (ICPR), pp 1961–1964
Alaei A, Delalandre M (2014) A complete logo detection/recognition system for document images. In: Proceedings of the international workshop on document analysis systems (DAS), pp 324–328
Fischer A, Keller A, Frinken V, Bunke H (2010) Hmm-based word spotting in handwritten documents using subword models. In: Proceedings of the international conference on pattern recognition (ICPR), pp 3416–3419
Frinken V, Fischer A, Manmatha R, Bunke H (2012) A novel word spotting method based on recurrent neural networks. IEEE Trans Pattern Anal Mach Intell (PAMI) 3(3):211–224
Article Google Scholar
Rodríguez-Serrano JA, Perronnin F (2009) Handwritten word-spotting using hidden markov models and universal vocabularies. Pattern Recognit 42(9):2106–2116
Article Google Scholar
Alhwarin F, Wang C, Durrant DR, Gräser A (2008) Improved sift-features matching for object recognition. In: Proceedings of the vision of computer science, pp 179–190
Hua Y, Lin J, Lin C (2010) An improved sift feature matching algorithm. In: Proceedings of the world congress on intelligent control and automation (WCICA), pp 6109–6113
Kai W, Bo C, Long T (2011) An improved sift feature matching algorithm based on maximizing minimum distance cluster. In: Proceedings of the international conference on computer science and information technology (ICCSIT), pp 255–259
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis (IJCV) 60(2):91–110
Article Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond Bags of Features: Spatial Pyramid Matching for recognizing natural scene categories. In: Proceedings of the computer vision and pattern recognition (CVPR), vol 2, pp 2169–2178
Fei-Fei L, Peronae P (2005) A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 524–531
Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin
Book Google Scholar
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the international conference on knowledge discovery and data mining (KDD), pp 226–231
Harris C, Stephens M (1988) A combined corner and edge detector. In: Proceedings of the Alvey vision conference (AVC), pp 147–151
Pal U, Belaid A, Choisy CH (2003) Touching numeral segmentation using water reservoir concept. Pattern Recognit Lett 24(1–3):261–272
Article Google Scholar
Pal S, Alaei A, Pal U, Blumenstein M (2012) Multi-script off-line signature identification. In: Proceedings of the international conference hybrid intelligent systems (HIS), pp 236–240
Logo dataset. University of Maryland, Laboratory for Language and Media Processing (LAMP) (2014). http://lamp.cfar.umd.edu/
Mandal R, Roy PP, Pal U (2012) Signature segmentation from machine printed documents using contextual information. Int J Pattern Recognit Artif Intell (IJPRAI) 26(7). https://doi.org/10.1142/S0218001412530035

Download references

Author information

Authors and Affiliations

School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
Ranju Mandal
Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, India
Partha Pratim Roy
Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata, India
Umapada Pal
School of Software, University of Technology Sydney, Sydney, Australia
Michael Blumenstein

Authors

Ranju Mandal
View author publications
You can also search for this author in PubMed Google Scholar
Partha Pratim Roy
View author publications
You can also search for this author in PubMed Google Scholar
Umapada Pal
View author publications
You can also search for this author in PubMed Google Scholar
Michael Blumenstein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ranju Mandal.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mandal, R., Roy, P.P., Pal, U. et al. Bag-of-visual-words for signature-based multi-script document retrieval. Neural Comput & Applic 31, 6223–6247 (2019). https://doi.org/10.1007/s00521-018-3444-y

Download citation

Received: 27 February 2017
Accepted: 16 March 2018
Published: 22 March 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s00521-018-3444-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bag-of-visual-words for signature-based multi-script document retrieval

Abstract

Access this article

Similar content being viewed by others

A review of hand gesture and sign language recognition techniques

Importance and challenges of handwriting recognition with the implementation of machine learning techniques: a survey

HP_DocPres: a method for classifying printed and handwritten texts in doctor’s prescription

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bag-of-visual-words for signature-based multi-script document retrieval

Abstract

Access this article

Similar content being viewed by others

A review of hand gesture and sign language recognition techniques

Importance and challenges of handwriting recognition with the implementation of machine learning techniques: a survey

HP_DocPres: a method for classifying printed and handwritten texts in doctor’s prescription

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation