Advertisement

Simple and Effective Multi-word Query Spotting in Handwritten Text Images

  • Ernesto Noya-García
  • Alejandro H. Toselli
  • Enrique Vidal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10255)

Abstract

Keyword spotting techniques are becoming cost-effective solutions for information retrieval in handwritten documents. We explore the extension of the single-word, line-level probabilistic indexing approach described in [1, 2] to allow page-level Boolean combinations of several single-keyword queries. We propose heuristic rules to combine the single-word relevance probabilities into probabilistically consistent confidence scores of the multi-word boolean combinations. As a preliminary study, this paper focuses on evaluating the search performance of word-pair queries involving just one OR or AND Boolean operation. Empirical results of this study support the proposed approach and clearly show its effectiveness.

Keywords

Text Line Line Image Boolean Combination Handwritten Document Page Image 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word-graph based keyword spotting in handwritten document images. Int. J. Inf. Sci. 370, 497–518 (2015)Google Scholar
  2. 2.
    Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: Word-graph based keyword spotting and indexing of handwritten document images. Technical report, Universitat Politècnica de València (2013)Google Scholar
  3. 3.
    Sánchez, J., Mühlberger, G., Gatos, B., Schofield, P., Depuydt, K., Davis, R., Vidal, E., de Does, J.: tranScriptorium: an European project on handwritten text recognition. In: DocEng, pp. 227–228 (2013)Google Scholar
  4. 4.
    Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)Google Scholar
  5. 5.
    Causer, T., Wallace, V.: Building a volunteer community: results and findings from Transcribe Bentham. Digit. Humanit. Q. 6(2) (2012)Google Scholar
  6. 6.
    Sanchez, J.A., Romero, V., Toselli, A., Vidal, E.: ICFHR2014 Competition on Handwritten Text Recognition on Transcriptorium Datasets (HTRtS). In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 785–790, September 2014Google Scholar
  7. 7.
    Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefzbMATHGoogle Scholar
  8. 8.
    Zhu, M.: Recall, Precision and Average Precision. Working Paper 2004–09 Department of Statistics & Actuarial Science, University of Waterloo, 26 August 2004Google Scholar
  9. 9.
    Robertson, S.: A new interpretation of average precision. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), pp. 689–690. ACM, New York (2008)Google Scholar
  10. 10.
    Kozielski, M., Forster, J., Ney, H.: Moment-based image normalization for handwritten text recognition. In: Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition, ICFHR 2012, pp. 256–261. IEEE Computer Society, Washington, DC (2012)Google Scholar
  11. 11.
    Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book: Hidden Markov Models Toolkit V2.1. Cambridge Research Laboratory Ltd. (1997)Google Scholar
  12. 12.
    Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D.: The HTK Book: Hidden Markov Models Toolkit V3.4. Microsoft Corporation & Cambridge Research Laboratory Ltd., March 2009Google Scholar
  13. 13.
    Toselli, A., Vidal, E.: Handwritten text recognition results on the Bentham collection with improved classical N-gram-HMM methods. In: 3rd International Workshop on Historical Document Imaging and Processing (HIP 2015), pp. 15–22, August 2015Google Scholar
  14. 14.
    Kneser, R., Ney, H.: Improved backing-off for N-gram language modeling. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 1995), vol. 1, Los Alamitos, CA, USA, pp. 181–184. IEEE Computer Society (1995)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Ernesto Noya-García
    • 1
  • Alejandro H. Toselli
    • 1
  • Enrique Vidal
    • 1
  1. 1.PRHLT Research CentreUniversitat Politècnica de ValènciaValenciaSpain

Personalised recommendations