Abstract
In many public and private institutions, the digitalization of handwritten documents has progressed greatly in recent decades. As a consequence, the number of handwritten documents that are available digitally is constantly increasing. However, accessibility to these documents in terms of browsing and searching is still an issue as automatic full transcriptions are often not feasible. To bridge this gap, Keyword Spotting (KWS) has been proposed as a flexible and error-tolerant alternative to full transcriptions. KWS provides unconstrained retrievals of keywords in handwritten documents that are acquired either online or offline. In general, offline KWS is regarded as the more difficult task when compared to online KWS where temporal information on the writing process is also available. The focus of this chapter is on handwritten historical documents and thus on offline KWS. In particular, we review and compare different state-of-the-art as well as novel approaches for template-based KWS. In contrast to learning-based KWS, template-based KWS can be applied to documents without any a priori learning of a model and is thus regarded as the more flexible approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
George Washington Papers at the Library of Congress, 1741–1799: Series 2, Letterbook 1, pp. 270–279 & 300–309, http://memory.loc.gov/ammem/gwhtml/gwseries2.html.
- 2.
Parzival at IAM historical document database, http://www.fki.inf.unibe.ch/databases/iam-historical-document-database/parzival-database.
References
Adamek T, O’Connor NE, Smeaton AF (2006) Word matching using single closed contours for indexing handwritten historical documents. Int J Doc Anal Recogn 9(2–4):153–165
Agazzi O (1994) Keyword spotting in poorly printed documents using pseudo 2-D hidden Markov models. IEEE Trans Pattern Anal Mach Intell 16(8):842–848
Aghbari ZA, Brook S (2009) HAH manuscripts: a holistic paradigm for classifying and retrieving historical Arabic handwritten documents. Expert Syst Appl 36(8):10942–10951
Almazán J, Gordo A, Fornés A, Valveny E (2014) Segmentation-free word spotting with exemplar SVMs. Pattern Recogn 47(12):3967–3978
Ameri M, Stauffer M, Riesen K, Bui T, Fischer A (2017) Keyword spotting in historical documents based on handwriting graphs and Hausdorff edit distance. In: International graphonomics society conference
Bui QA, Visani M, Mullot R (2015) Unsupervised word spotting using a graph representation based on invariants. In: International conference on document analysis and recognition, pp 616–620
Bunke H, Allermann G (1983) Inexact graph matching for structural pattern recognition. Pattern Recogn Lett 1(4):245–253
Can EF, Duygulu P (2011) A line-based representation for matching words in historical manuscripts
Cao H, Govindaraju V (2007) Template-free word spotting in low-quality manuscripts. In: International conference on advances in pattern recognition, pp 1–5
Chan J, Ziftci C, Forsyth D (2006) Searching off-line arabic documents. IEEE Comput Soc Conf Comput Vis Pattern Recogn 2:1455–1462
Conte D, Foggia P, Sansone C, Vento M (2004) Thirty years of graph matching in pattern recognition. Int J Pattern Recogn Artif Intell 18(03):265–298
Dey S, Nicolaou A, Llados J, Pal U (2016) Local binary pattern for word spotting in handwritten historical document. Computing Research Repository
Edwards J, Teh YW, Bock R, Maire M, Vesom G, Forsyth DA (2004) Making latin manuscripts searchable using gHMM’s. Int Conf Neural Inf Process Syst 17:385–392
Fankhauser S, Riesen K, Bunke H (2011) Speeding up graph edit distance computation through fast bipartite matching. In: Graph-based representations in pattern recognition, pp 102–111
Fischer A, Indermühle E, Bunke H, Viehhauser G, Stolz M (2010) Ground truth creation for handwriting recognition in historical documents. In: International workshop on document analysis systems, New York, USA, pp 3–10
Fischer A, Keller A, Frinken V, Bunke H (2012) Lexicon-free handwritten word spotting using character HMMs. Pattern Recogn Lett 33(7):934–942
Foggia P, Percannella G, Vento M (2014) Graph matching and learning in pattern recognition in the last 10 years. Int J Pattern Recogn Artif Intell 28(01)
Frinken V, Fischer A, Manmatha R, Bunke H (2012) A novel word spotting method based on recurrent neural networks. IEEE Trans Pattern Anal Mach Intell 34(2):211–224
Guo Z, Hall RW (1989) Parallel thinning with two-subiteration algorithms. Commun ACM 32(3):359–373
Huang L, Yin F, Chen QH, Liu CL (2011) Keyword spotting in offline chinese handwritten documents using a statistical model. In: International conference on document analysis and recognition, pp 78–82
Konidaris T, Kesidis AL, Gatos B (2015) A segmentation-free word spotting method for historical printed documents. Pattern Anal Appl
Kovalchuk A, Wolf L, Dershowitz N (2014) A simple and fast word spotting method. In: International conference on frontiers in handwriting recognition, pp 3–8
Kruskal JB (1956) On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Am Math Soc 7(1):48–48
Lavrenko V, Rath T, Manmatha R (2004) Holistic word recognition for handwritten historical documents. In: International workshop on document image analysis for libraries, pp 278–287
Leydier Y, Lebourgeois F, Emptoz H (2007) Text search for medieval manuscript images. Pattern Recogn 40(12):3552–3567
Manmatha R, Han C, Riseman E (1996) Word spotting: a new approach to indexing handwriting. In: Computer vision and pattern recognition, pp 631–637
Manmatha R, Rath TM (2003) Indexing of handwritten historical documents—recent progress. In: Symposium on document image understanding technology, pp 77–85
Marti UV, Bunke H (2001) Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition systems. Int J Pattern Recogn Artif Intell 15(01):65–90
Perronnin F, Rodríguez-Serrano JA (2009) Fisher kernels for handwritten word-spotting. In: International conference on document analysis and recognition, pp 106–110
Rath T, Manmatha R (2003) Word image matching using dynamic time warping. In: Computer vision and pattern recognition, vol 2, pp II–521–II–527
Riba P, Llados J, Fornes A (2015) Handwritten word spotting by inexact matching of grapheme graphs. In: International conference on document analysis and recognition, pp 781–785
Riesen K (2015) Structural pattern recognition with graph edit distance. In: Advances in computer vision and pattern recognition, Cham
Riesen K, Bunke H (2009) Approximate graph edit distance computation by means of bipartite graph matching. Image Vis Comput 27(7):950–959
Rodríguez-Serrano JA, Perronnin F (2008) Local gradient histogram features for word spotting in unconstrained handwritten documents. In: International conference on frontiers in handwriting recognition, pp 7–12
Rodríguez-Serrano JA, Perronnin F (2009) Handwritten word-spotting using hidden Markov models and universal vocabularies. Pattern Recogn 42(9):2106–2116
Rodríguez-Serrano JA, Perronnin F (2012) A model-based sequence similarity with application to handwritten word spotting. IEEE Trans Pattern Anal Mach Intell 34(11):2108–20
Rose R, Paul D (1990) A hidden Markov model based keyword recognition system. In: IEEE international conference on acoustics, speech, and signal processing, pp 129–132
Rothacker L, Fink GA (2015) Segmentation-free query-by-string word spotting with bag-of-features HMMs. In: International conference on document analysis and recognition, pp 661–665
Rothacker L, Rusinol M, Fink Ga (2013) Bag-of-features HMMs for segmentation-free word spotting in handwritten documents. In: International conference on document analysis and recognition, pp 1305–1309
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech, Signal Process 26(1):43–49
Scott GL, Longuet-Higgins HC (1991) An algorithm for associating the features of two images. Proc Roy Soc B: Biol Sci 244(1309):21–26
Stauffer M, Fischer A, Riesen K (2016a) A novel graph database for handwritten word images. In: International workshop on structural, syntactic, and statistical pattern recognition
Stauffer M, Fischer A, Riesen K (2016b) Graph-based keyword spotting in historical handwritten documents. In: International workshop on structural, syntactic, and statistical pattern recognition
Stauffer M, Fischer A, Riesen K (2017a) Ensembles for graph-based keyword spotting in historical handwritten documents. In: International conference on document analysis and recognition
Stauffer M, Fischer A, Riesen K (2017b) Speeding-up graph-based keyword spotting by quadtree segmentations. In: International conference on computer analysis of images and patterns
Stauffer M, Fischer A, Riesen K (2017c) Speeding-up graph-based keyword spotting in historical handwritten documents. In: Graph-based representations in pattern recognition
Stauffer M, Tschachtli T, Fischer A, Riesen K (2017d) A survey on applications of bipartite graph edit distance. In: Graph-based representations in pattern recognition
Terasawa K, Tanaka Y (2009) Slit style HOG feature for document image word spotting. In: International conference on document analysis and recognition, pp 116–120
Thomas S, Chatelain C, Heutte L, Paquet T, Kessentini Y (2014) A deep HMM model for multiple keywords spotting in handwritten documents. Pattern Anal Appl 18(4):1003–1015
Wang P, Eglin V, Garcia C, Largeron C, Llados J, Fornes A (2014) A novel learning-free word spotting approach based on graph representation. In: International workshop on document analysis systems, pp 207–211
Wicht B, Fischer A, Hennebert J (2016) Deep learning features for handwritten keyword spotting. In: International conference on pattern recognition
Zhang B, Srihari SN, Huang C (2003) Word image retrieval using binary features. In: Document recognition and retrieval, p 45
Acknowledgements
This work has been supported by the Hasler Foundation Switzerland.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Stauffer, M., Fischer, A., Riesen, K. (2018). Searching and Browsing in Historical Documents—State of the Art and Novel Approaches for Template-Based Keyword Spotting. In: Dornberger, R. (eds) Business Information Systems and Technology 4.0. Studies in Systems, Decision and Control, vol 141. Springer, Cham. https://doi.org/10.1007/978-3-319-74322-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-74322-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74321-9
Online ISBN: 978-3-319-74322-6
eBook Packages: EngineeringEngineering (R0)