Advertisement

Binarization, character extraction, and writer identification of historical Hebrew calligraphy documents

  • Itay Bar-YosefEmail author
  • Isaac Beckman
  • Klara Kedem
  • Itshak Dinstein
Original Paper

Abstract

We present our work on the paleographic analysis and recognition system intended for processing of historical Hebrew calligraphy documents. The main goal is to analyze documents of different writing styles in order to identify the locations, dates, and writers of test documents. Using interactive software tools, a data base of extracted characters has been established. It now contains about 20,000 characters of 34 different writers, and will be distinctly expanded in the near future. Preliminary results of automatic extraction of pre-specified letters using the erosion operator are presented. We further propose and test topological features for handwriting style classification based on a selected subset of the Hebrew alphabet. A writer identification experiment using 34 writers yielded 100% correct classification.

Keywords

Binarization Character extraction Writer identification Document analysis Historical documents 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fournier, J.M., Vienot, J.C.: Fourier transform holograms used as matched filters in hebraic paleography. Isr. J. Technol. 281–287 (1971)Google Scholar
  2. 2.
    Sirat, C.: L’examen des ’critures: L’oeil et la machine, Paris, Editions du Centre National de la Recherche Scientifique (1981)Google Scholar
  3. 3.
    Dinstein I. and Shapira Y. (1982). Ancient hebraic handwriting identification with run-length histograms. IEEE Trans. Syst. Man Cybern. 12: 405–409 CrossRefGoogle Scholar
  4. 4.
    Likforman-Sulem L., Maitre H. and Sirat C. (1991). An expert vision system for analysis of Hebrew characters and authentication of Manuscripts. Pattern Recognit. 24(2): 121–137 CrossRefGoogle Scholar
  5. 5.
    Bar-Yosef I. (2005). Input sensitive thresholding for ancient Hebrew manuscript. Pattern Recognit. Lett. 26: 1168–1173 CrossRefGoogle Scholar
  6. 6.
    Breu H., Gil J., Kirkpatrick D. and Werman M. (1995). Linear time Euclidean distance transform algorithms. IEEE Trans. Pattern Anal. Machine Intell. 17(5): 529–533 CrossRefGoogle Scholar
  7. 7.
    Zhuang, Y., Zhang, X., Wu, J., Lu, X.: Retrieval of Chinese calligraphic character image. In: 5th Pacific Rim Conference on Multimedia, Tokyo, Japan. pp. 17–24. Part I, (2004)Google Scholar
  8. 8.
    Saykol E., Sinop A.K., Gudukbay U., Ulusoy O. and Cetin A.E. (2004). Content-based retrieval of historical Ottoman documents stored as textual images. IEEE Trans. Image Process. 13(3): 314–325 CrossRefGoogle Scholar
  9. 9.
    Haralick R.M., Sternberg S.R. and Zhuang X. (1987). Image analysis using mathematical morphology. IEEE Trans. PAMI 9(4): 532–550 Google Scholar
  10. 10.
    Al-Badr B. and Haralick R.M. (1998). A segmentation-free approach to text recognition with application to Arabic text. IJDAR 1(3): 147–166 CrossRefGoogle Scholar
  11. 11.
    Schauf, M., Akoy, S., Haralick, R.M.: Model-based shape recognition using recursive mathematical morphology. 14th International Conference on Pattern Recognition, pp. 202–204 (1998)Google Scholar
  12. 12.
    Beit-Arie, M.: Paleographical Identification of Hebrew Manuscripts: Methodology and Practice, in idem, The Making of the Medieval Hebrew Book, pp. 15–44. The Magnes Press, The Hebrew University, Jerusalem (1991)Google Scholar
  13. 13.
    Said H.E.S., Tan T.N. and Baker K.D. (2000). Personal identification based on handwriting. Pattern Recognit. 33(1): 149–160 CrossRefGoogle Scholar
  14. 14.
    Bulacu, M., Schomaker, L.R.B., Vuurpijl, L.G.: Writer identification using edge-based directional features. International Conference on Document Analysis and Recognition, pp. 937–941 (2003)Google Scholar
  15. 15.
    Zhang, B., Srihari, S.N., Lee, S.: Individuality of handwritten characters. ICDAR 2003, pp. 1086–1090Google Scholar
  16. 16.
    Zhang, B., Srihari, S.N.: Analysis of Handwriting Individuality Using Word Features. ICDAR ’01, p. 1142Google Scholar
  17. 17.
    Wang, X., Ding, X., Liu, H.: Writer identification using directional element features and linear transform. In: International Conference on Document Analysis and Recognition, pp. 942–945 (2003)Google Scholar
  18. 18.
    Ablavsky, V., Stevens, M.R.: Automatic feature selection with applications to script identification of degraded Documents. In: International Conference on Document Analysis and Recognition, pp. 750–754 (2003)Google Scholar
  19. 19.
    Molina, L.C., Belanche, L., Nebot, A.: Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of the International Conference on Data Mining, pp. 306–313 (2002)Google Scholar
  20. 20.
    Kittler, J.: Feature set search algorithms. Pattern Recognit. Signal Process. pp. 41–60 (1978)Google Scholar
  21. 21.
    Jain A.K. and Zongker D. (1997). Feature selection: evaluation, application and small sample performance. IEEE Trans. Pattern Anal. Machine Intell. 19: 153–158 CrossRefGoogle Scholar
  22. 22.
    Pudil P., Novovicova J. and Kittler J. (1994). Floating search methods in feature selection. Pattern Recognit. Lett. 15: 1119–1125 CrossRefGoogle Scholar
  23. 23.
    Duda R.O., Hart P.E. and Stork D.G. (2000). Pattern Classification. Wiley, New York Google Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  • Itay Bar-Yosef
    • 1
    Email author
  • Isaac Beckman
    • 1
  • Klara Kedem
    • 1
  • Itshak Dinstein
    • 2
  1. 1.Computer Science DepartmentBen Gurion UniversityBeer-ShevaIsrael
  2. 2.Electrical and Computer Engineering DepartmentBen Gurion UniversityBeer-ShevaIsrael

Personalised recommendations