Skip to main content

Creating a Handwriting Recognition Corpus for Bushman Languages

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 7008)

Abstract

Handwriting recognition systems rely on the existence of a corpus for training recognition models and evaluating accuracy. Creating a handwriting recognition corpus for the Bushman languages of southern Africa is difficult due to the complexities of the script used to represent them and the fact that this script cannot be represented using Unicode. To solve this problem, a semi-automatic Web-based tool was developed to segment, capture and encode the Bushman text. A case study demonstrated how the tool could be used to create a Bushman handwriting corpus with few errors.

Keywords

  • Corpus creation
  • transcription
  • digital libraries

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-24826-9_28
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   64.99
Price excludes VAT (USA)
  • ISBN: 978-3-642-24826-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   84.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Suleman, H.: Digital libraries without databases: The Bleek and Lloyd collection. In: Kovács, L., Fuhr, N., Meghini, C. (eds.) ECDL 2007. LNCS, vol. 4675, pp. 392–403. Springer, Heidelberg (2007)

    CrossRef  Google Scholar 

  2. Marti, U., Bunke, H.: A full English sentence database for off-line handwriting recognition. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition, pp. 705–708. IEEE, Washington, DC (1999)

    Google Scholar 

  3. Rath, T.M., Manmatha, R.: Word spotting for historical documents. Int. J. Doc. Anal. Recognit. 9, 139–152 (2007)

    CrossRef  Google Scholar 

  4. Makridis, M., Nikolaou, N., Gatos, B.: An efficient word segmentation technique for historical and degraded machine-printed documents. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition, pp. 178–182. IEEE, Washington, DC (2007)

    Google Scholar 

  5. Al-Ma’adeed, S., Elliman, D., Higgins, C.A.: A data base for arabic handwritten text recognition research. In: Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 485–489. IEEE, Washington, DC (2002)

    CrossRef  Google Scholar 

  6. Fischer, A., Indermühle, E., Bunke, H., Viehhauser, G., Stolz, M.: Ground truth creation for handwriting recognition in historical documents. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 3–10. ACM, New York (2010)

    CrossRef  Google Scholar 

  7. Surowiecki, J.: The wisdom of crowds: why the many are smarter than the few. Abacus (2005)

    Google Scholar 

  8. Setlur, S., Kompalli, S., Ramanaprasad, V., Govindaraju, V.: Creation of data resources and design of an evaluation test bed for Devanagari script recognition. In: 13th International Workshop on Research Issues in Data Engineering: Multi-lingual Information Management, pp. 55–61. IEEE, Washington, DC (2003)

    Google Scholar 

  9. Lee, R.A., Balick, M.J.: Indigenous use of hoodia gordonii and appetite suppression. EXPLORE: The Journal of Science and Healing 3(4), 404–406 (2007)

    CrossRef  Google Scholar 

  10. Williams, K., Manilal, S., Molwantoa, L., Suleman, H.: A visual dictionary for an extinct language. In: Chowdhury, G., Koo, C., Hunter, J. (eds.) ICADL 2010. LNCS, vol. 6102, pp. 1–4. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  11. Williams, K., Suleman, H.: Translating handwritten bushman texts. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 109–118. ACM, New York (2010)

    CrossRef  Google Scholar 

  12. Rei, F.: Tipa: A system for processing phonetic symbols in LaTeX. TUGboat 17(2), 102–114 (1996)

    Google Scholar 

  13. Sezgin, M., Sankur, B.: Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 13(1), 146–168 (2007)

    Google Scholar 

  14. Marti, U., Bunke, H.: On the influence of vocabulary size and language models in unconstrained handwritten text recognition. In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, pp. 260–265. IEEE, Washington, DC (2001)

    CrossRef  Google Scholar 

  15. Pastor, M., Toselli, A.H., Vidal, E.: Projection profile based algorithm for slant removal. In: Campilho, A.C., Kamel, M.S. (eds.) ICIAR 2004. LNCS, vol. 3212, pp. 183–190. Springer, Heidelberg (2004)

    CrossRef  Google Scholar 

  16. Shapiro, L., Stockman, G.: Computer vision. Prentice Hall, Englewood Cliffs (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Williams, K., Suleman, H. (2011). Creating a Handwriting Recognition Corpus for Bushman Languages. In: Xing, C., Crestani, F., Rauber, A. (eds) Digital Libraries: For Cultural Heritage, Knowledge Dissemination, and Future Creation. ICADL 2011. Lecture Notes in Computer Science, vol 7008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24826-9_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24826-9_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24825-2

  • Online ISBN: 978-3-642-24826-9

  • eBook Packages: Computer ScienceComputer Science (R0)