Guide to OCR for Arabic Scripts

  • Volker Märgner
  • Haikal El Abed

Table of contents

  1. Front Matter
    Pages I-XX
  2. Pre-processing

    1. Front Matter
      Pages 1-1
  3. Pre-Processing

    1. Sargur N. Srihari, Gregory Ball
      Pages 3-34
    2. Syed Saqib Bukhari, Faisal Shafait, Thomas M. Breuel
      Pages 35-53
    3. Eugene Borovikov, Ilya Zavorin
      Pages 55-78
    4. Zhixin Shi, Srirangaraj Setlur, Venu Govindaraju
      Pages 79-102
    5. Abdel Belaïd, Nazih Ouwayed
      Pages 103-122
    6. Laurence Likforman-Sulem, Ramy Al Hajj Mohammad, Chafic Mokbel, Fares Menasri, Anne-Laure Bianne-Bernard, Christopher Kermorvant
      Pages 123-143
  4. Recognition

    1. Front Matter
      Pages 145-145
    2. Irfan Ahmed, Sabri A. Mahmoud, Mohammed Tanvir Parvez
      Pages 147-168
    3. Mario Pechwitz, Haikal El Abed, Volker Märgner
      Pages 169-213
    4. Philippe Dreuw, David Rybach, Georg Heigold, Hermann Ney
      Pages 215-254
    5. Ihab Alkhoury, Adrià Giménez, Alfons Juan
      Pages 255-272
    6. Puntis Jifroodian Haghighi, Ching Y. Suen
      Pages 273-295
    7. Yousri Kessentini, Thierry Paquet, AbdelMajid Ben Hamadou
      Pages 335-350
  5. Evaluation

    1. Front Matter
      Pages 373-373
    2. Ilya Zavorin, Eugene Borovikov
      Pages 375-394

About this book

Introduction

Optical Character Recognition (OCR) is a key technology enabling access to digital text data. This technique is especially valuable for Arabic scripts, for which there has been very little digital access.

Arabic script is widely used today. It is estimated that approximately 200 million people use Arabic as a first language, and the Arabic script is shared by an additional 13 languages, making it the second most widespread script in the world. However, Arabic scripts pose unique challenges for OCR systems that cannot be simply adapted from existing Latin character-based processing techniques.

This comprehensive Guide to OCR for Arabic Scripts is the first book of its kind, specifically devoted to this emerging field. Presenting state-of-the-art research from an international selection of pre-eminent authorities, the book reviews techniques and algorithms for the recognition of both handwritten and printed Arabic scripts. Many of these techniques can also be applied to other scripts, serving as an inspiration to all groups working in the area of OCR.

Topics and features:

  • Contains contributions from the leading researchers in the field
  • With a Foreword by Professor Bente Maegaard of the University of Copenhagen
  • Presents a detailed overview of Arabic character recognition technology, covering a range of different aspects of pre-processing and feature extraction
  • Reviews a broad selection of varying approaches, including HMM-based methods and a recognition system based on multidimensional recurrent neural networks
  • Examines the evaluation of Arabic script recognition systems, discussing data collection and annotation, benchmarking strategies, and handwriting recognition competitions
  • Describes numerous applications of Arabic script recognition technology, from historical Arabic manuscripts to online Arabic recognition

This authoritative work is an essential reference for all researchers and graduate students interested in OCR technology and methodology in general, and in Arabic scripts in particular.

Keywords

Document Analysis Hidden Markov Models Neural Networks Pattern Recognition Text/Word Recognition

Editors and affiliations

  • Volker Märgner
    • 1
  • Haikal El Abed
    • 2
  1. 1.Institute for Communications TechnologyBraunschweig Technical UniversityBraunschweigGermany
  2. 2.Institute for Communications TechnologyBraunschweig Technical UniversityBraunschweigGermany

Bibliographic information

  • DOI https://doi.org/10.1007/978-1-4471-4072-6
  • Copyright Information Springer-Verlag London 2012
  • Publisher Name Springer, London
  • eBook Packages Computer Science
  • Print ISBN 978-1-4471-4071-9
  • Online ISBN 978-1-4471-4072-6
  • About this book