Adding Words to Manuscripts: From PagesXML to TEITOK

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11057)


This article describes a two-step method for transcribing historic manuscripts. In this method, the first step uses a page-based representation making it easy to transcribe the document page-by-page and line-by-line, while the second step converts this to the TEI/XML text-based format, in order to make sure the document becomes fully searchable.


Manuscript transcription TEI/XML Linguistic annotation 


  1. 1.
    Janssen, M.: TEITOK: text-faithful annotated corpora. In: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, pp. 4037–4043 (2016)Google Scholar
  2. 2.
    Bilansky, A.: TypeWright: an experiment in participatory curation. Digit. Hum. Q. 9(4) (2015)Google Scholar
  3. 3.
    Evert, S., Hardie, A.: Twenty-first century corpus workbench: updating a query architecture for the new millennium. In: Corpus Linguistics 2011 (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.CELGA-ILTECUniversidade de CoimbraCoimbraPortugal

Personalised recommendations