OTTO: A Tool for Diplomatic Transcription of Historical Texts

  • Stefanie Dipper
  • Martin Schnurrenberger
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6562)


In this paper, we present OTTO, a web-based transcription tool which is designed for diplomatic transcription of historical language data. The tool supports fast and accurate typing, by use of user-defined special characters, and, simultaneously, providing a view on the manuscript that is as close to the original as possible. It also allows for the annotation of rich, user-defined header information. Users can log in and operate OTTO from anywhere through a standard web browser.


Transcription tool historical corpora diplomatic transcription 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baker, P.: Junicode, a Unicode/OpenType font for medievalists. Font Software,
  2. 2.
    Boersma, P.: Praat, a system for doing phonetics by computer. Glot International 5(9/10), 341–345 (2001), software Google Scholar
  3. 3.
    Burnard, L., Bauman, S.: Representation of primary sources. In: P5: Guidelines for Electronic Text Encoding and Interchange, ch. 11. TEI Consortium (2007),
  4. 4.
    Burnard, L., Bauman, S.: The TEI header. In: P5: Guidelines for Electronic Text Encoding and Interchange, ch. 2. TEI Consortium (2007),
  5. 5.
    Dipper, S.: POS-tagging of historical language data: First experiments. In: Semantic Approaches in Natural Language Processing. In: Proceedings of the 10th Conference on Natural Language Processing (KONVENS 2010), pp. 117–121 (2010)Google Scholar
  6. 6.
    Dipper, S., Schnurrenberger, M.: OTTO: A tool for diplomatic transcription of historical texts. In: Proceedings of 4th Language & Technology Conference, Poznan, Poland (2009)Google Scholar
  7. 7.
    Driscoll, M.J.: Levels of transcription. In: Burnard, L., O’Keeffe, K.O., Unsworth, J. (eds.) Electronic Textual Editing, pp. 254–261. Modern Language Association of America, New York (2006), Google Scholar
  8. 8.
    Haugen, O.E. (ed.): MUFI character recommendation. Medieval Unicode Font Initiative, Bergen (2009), version 3.0, Google Scholar
  9. 9.
    Hellwig, B., Uytvanck, D.V., Hulsbosch, M.: ELAN — linguistic annotator. Manual, Version 3.9.0, Max Planck Institute for Psycholinguistics, Nijmegen (2010), software
  10. 10.
    Kytö, M. (ed.): Manual to the Diachronic Part of The Helsinki Corpus of English Texts: Coding Conventions and Lists of Source Texts, 3rd edn. University of Helsinki, Finland (1996)Google Scholar
  11. 11.
    Manuscriptorium project: The ENRICH project and non-standard characters, character database, (menu item ‘gaiji bank)
  12. 12.
    Nowviskie, B., McGann, J.: NINES: a federated model for integrating digital scholarship. White paper by NINES (Networked Infrastructure for Nineteenth-Century Electronic Scholarship) (2005), software
  13. 13.
    Rissanen, M., Kytö, M., et al.: The Helsinki Corpus of English Texts. Department of English, University of Helsinki. Compiled by Matti Rissanen (Project leader), Merja Kytö (Project secretary); Leena Kahlas-Tarkka, Matti Kilpiö (Old English); Saara Nevanlinna, Irma Taavitsainen (Middle English); Terttu Nevalainen, Helena Raumolin-Brunberg (Early Modern English) (1991)Google Scholar
  14. 14.
    Schmidt, T.: Creating and working with spoken language corpora in EXMARaLDA. In: LULCL II: Lesser Used Languages & Computer Linguistics II, pp. 151–164 (2009), software
  15. 15.
    Zentrum für Datenverarbeitung, Universität Tübingen: TUSTEP: Tübinger System von Textverarbeitungs-Programmen. Handbuch und Referenz. Manual, Version 2010, Tübingen University (2000),

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Stefanie Dipper
    • 1
  • Martin Schnurrenberger
    • 1
  1. 1.Linguistics DepartmentRuhr University BochumGermany

Personalised recommendations