Skip to main content

A Corpus with Wavesurfer and TEI: Speech and Video in TEITOK

  • 1066 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 12848)

Abstract

In this paper, we demonstrate how TEITOK provides a full online interface for speech and even video corpora, that are fully searchable using the CQL query language, can contain all speech-related annotation such as repetitions, gaps, and mispronunciations, and provides a full interface for time-aligned annotations scrolling below the waveform and showing the video if there is any. Corpora are stored in the TEI/XML standard, with import and output functions for other established standards like ELAN, Praat, or Transcriber. It is even possible to directly annotate corpora in TEITOK.

Keywords

  • TEI
  • Spoken corpus
  • Multimedia corpora

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.tei-c.org/release/doc/tei-p5-doc/es/html/TS.html.

  2. 2.

    https://wavesurfer-js.org/.

  3. 3.

    https://github.com/czcorpus/kontext.

  4. 4.

    http://teitok.clul.ul.pt/madison/.

  5. 5.

    https://github.com/ufal/teitok-tools.

References

  1. Boersma, P., Weenink, D.: Praat: doing phonetics by computer (version 6.0.37) (2018). http://www.praat.org

  2. CLUL: P.S. post scriptum. arquivo digital de escrita quotidiana em portugal e espanha na Época moderna. http://ps.clul.ul.pt

  3. Evert, S., Hardie, A.: Twenty-first century corpus workbench: Updating a query architecture for the new millennium. In: Corpus Linguistics 2011 (2011)

    Google Scholar 

  4. Janssen, M.: TEITOK: text-faithful annotated corpora. In: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, pp. 4037–4043 (2016)

    Google Scholar 

  5. Janssen, M.: Neotag: a POS tagger for grammatical neologism detection. In: Calzolari, N., et al. (eds.) LREC, pp. 2118–2124. European Language Resources Association (ELRA) (2012)

    Google Scholar 

  6. Janssen, M., Freitas, T.: Spock - a spoken corpus client. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008) (May 2008)

    Google Scholar 

  7. Ruzaitė, J.: Learner corpora for lesser taught languages: a workin-progress report on the Lithuanian learner corpus (2019)

    Google Scholar 

  8. Schmidt, T.: Exmaralda - ein modellierungs- und visualisierungsverfahren für die computergestützte transkription gesprochener sprache. In: Buchberger, E. (ed.) Proceedings of Konvens 2004, vol. 5 (2004). http://www.exmaralda.org/files/Konvens_Paper.pdf, dE

  9. Straka, M., Straková, J.: UDPipe (2016). LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. http://hdl.handle.net/11234/1-1702

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maarten Janssen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Janssen, M. (2021). A Corpus with Wavesurfer and TEI: Speech and Video in TEITOK. In: Ekštein, K., Pártl, F., Konopík, M. (eds) Text, Speech, and Dialogue. TSD 2021. Lecture Notes in Computer Science(), vol 12848. Springer, Cham. https://doi.org/10.1007/978-3-030-83527-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-83527-9_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-83526-2

  • Online ISBN: 978-3-030-83527-9

  • eBook Packages: Computer ScienceComputer Science (R0)