Abstract
In this paper, we demonstrate how TEITOK provides a full online interface for speech and even video corpora, that are fully searchable using the CQL query language, can contain all speech-related annotation such as repetitions, gaps, and mispronunciations, and provides a full interface for time-aligned annotations scrolling below the waveform and showing the video if there is any. Corpora are stored in the TEI/XML standard, with import and output functions for other established standards like ELAN, Praat, or Transcriber. It is even possible to directly annotate corpora in TEITOK.
Keywords
- TEI
- Spoken corpus
- Multimedia corpora
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Boersma, P., Weenink, D.: Praat: doing phonetics by computer (version 6.0.37) (2018). http://www.praat.org
CLUL: P.S. post scriptum. arquivo digital de escrita quotidiana em portugal e espanha na Época moderna. http://ps.clul.ul.pt
Evert, S., Hardie, A.: Twenty-first century corpus workbench: Updating a query architecture for the new millennium. In: Corpus Linguistics 2011 (2011)
Janssen, M.: TEITOK: text-faithful annotated corpora. In: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, pp. 4037–4043 (2016)
Janssen, M.: Neotag: a POS tagger for grammatical neologism detection. In: Calzolari, N., et al. (eds.) LREC, pp. 2118–2124. European Language Resources Association (ELRA) (2012)
Janssen, M., Freitas, T.: Spock - a spoken corpus client. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008) (May 2008)
Ruzaitė, J.: Learner corpora for lesser taught languages: a workin-progress report on the Lithuanian learner corpus (2019)
Schmidt, T.: Exmaralda - ein modellierungs- und visualisierungsverfahren für die computergestützte transkription gesprochener sprache. In: Buchberger, E. (ed.) Proceedings of Konvens 2004, vol. 5 (2004). http://www.exmaralda.org/files/Konvens_Paper.pdf, dE
Straka, M., Straková, J.: UDPipe (2016). LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. http://hdl.handle.net/11234/1-1702
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Janssen, M. (2021). A Corpus with Wavesurfer and TEI: Speech and Video in TEITOK. In: Ekštein, K., Pártl, F., Konopík, M. (eds) Text, Speech, and Dialogue. TSD 2021. Lecture Notes in Computer Science(), vol 12848. Springer, Cham. https://doi.org/10.1007/978-3-030-83527-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-83527-9_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-83526-2
Online ISBN: 978-3-030-83527-9
eBook Packages: Computer ScienceComputer Science (R0)