Speech-Based Real-Time Subtitling Services

Lambourne, Andrew; Hewitt, Jill; Lyon, Caroline; Warren, Sandra

doi:10.1023/B:IJST.0000037071.39044.cc

Speech-Based Real-Time Subtitling Services

Published: October 2004

Volume 7, pages 269–279, (2004)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Andrew Lambourne,
Jill Hewitt,
Caroline Lyon &
…
Sandra Warren

314 Accesses
20 Citations
Explore all metrics

Abstract

Recent advances in technology have led to the availability of powerful speech recognizers at low cost and to the possibility of using speech interaction in a variety of new and exciting practical applications. The purpose of this research was to investigate and develop the use of speech recognition in live television subtitling. This paper describes how the “SpeakTitle” project met the challenges of real time speech recognition and live subtitling through the development of a customisable speaker interface and use of ‘Topics’ for specific subject domains. In the prototype system (described in Hewitt et al., 2000; Bateman et al., 2001) output from the speech recognition system (the IBM ViaVoice® engine) is passed in to a custom-built editor from where it can be corrected and passed on to an existing subtitling system. The system was developed to the extent that it was acceptable for the production of subtitles for live television broadcasts and it has been adopted by three subtitle production facilities in the UK.

The evolution of the product and the experiences of users in developing the system in a live subtitling environment are considered, and the system is analysed against industry standards. Ease-of-use and accuracy are also discussed and further research areas are identified.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Relation Between Music Technology and Music Industry

AlignTool: The automatic temporal alignment of spoken utterances in German, Dutch, and British English for psycholinguistic purposes

Article 29 January 2018

Voice Assistant vs. Chatbot – Examining the Fit Between Conversational Agents’ Interaction Modalities and Information Search Tasks

Article Open access 10 December 2021

References

Bateman, A., Hewitt, J., and Lambourne, A. (2001). Subtitles from Simultaneous Transdiction: Multi-modal Interfaces for Generating and Correcting Real-time Subtitles, HCII2001, New Orleans.
Google Scholar
Clarkson, P. and Robinson, T. (1998). The applicability of adaptive language modelling for the broadcast news task. Proceedings of ICSLP. Sydney, Australia, pp. 1699–1702.
Damper, R.I., Lambourne, A.D., and Guy, D.P. (1985). Speech input as an adjunct to keyboard entry in television subtitling. In B. Shackel (Ed)., Proceedings Human-Computer Interaction-INTERACT'84, pp. 203–208.
Gibbon, D., Moore, R., and Winski, R. (Eds.) (1997). Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter., Chapter 7.
Hewitt, J., Bateman, A., Lambourne, A., Ariyaeeinia, A., and Sivakumaran, P. (2000). Real-time speech generated subtitles: Problems and solutions. 6th International Conference on Spoken Language Processing ICSLP 2000. Vol. III.
ITC guidance on standards for subtitling (amended February 1999): http://www.itc.org.uk/itc publications/codes guidance/standards for subtitling/index.asp
LINK. (1998).The Use Of Speech Recognition In Live TV Subtitling, LINK Project No. GR/M15958/01, 1/10/1998-30/9/2001. Overview of LINK Project: http://homepages.feis.herts.ac.uk/∽ nehaniv/idmf/abstracts/hewitt.doc
National Captioning Institute. http://www.ncicap.org/ acapintro.asp
Ney, H., Martin, S., and Wessel, F. (1997). Statistical language modelling using leaving one out. In S. Young and G. Bloothoft (Eds.), Corpus Based Methods in Language and Speech Processing. Kluwer Academic.
NHK. (2002). http://www.nhk.or.jp/strl/open2002/en/tenji/id03/03. html
Pallet, D.S., et al. (1997). Broadcast news benchmark test results: English and Non-English. Proc.DARPA Speech Recognition Workshop 1997.
Seymour, K. and Rosenfeld, R. (1997). Using story topics for language model adaptation. Proceedings of Eurospeech97.
Sivakumaran, P., Fortuna, J., and Ariyaeeinia, A.M. (2001). On the use of the bayesian information criterion in multiple speaker detection.Proceedings of Eurospeech2001.
Sivakumaran, P., Ariyaeeinia, A., and Fortuna, J. (2002). An effective unsupervised scheme for multiple speaker detection. ICSLP2002. Denver, Colorado, Topic 16. Stenograph: http://www.stenograph.com
Google Scholar
UK legislation: Broadcasting Act 1990 (c. 42) Section 35, HM Stationery Office UK. Broadcasting Act 1996 (c. 42) Section 20(3)(a), HM Stationery Office UK. Statutory Instrument 2000 no 2378: Broadcast (subtitling) order 2001, HM Stationery Office UK.
UK standards: Unified Standard April 1974, BBC Engineering Sheet 4008(5), Oct. 1975. Joint IBA/BBC/BREMA Publication: Broadcast Teletext Specifi-cation, September 1976.
US legislation: Television Decoder Circuitry Act of 1990, US Congress. Telecommunications Act of 1996, US Congress. Federal Communications Commission Rule 79-Closed Captioning of Video Programming, updated 2001.
®: http://www.ibm.com/software/speech WinCAPS: (2003) SysMedia Ltd. at http://www.sysmedia.com/ subtitling/pdfs/wincaps multimedia.pdf

Download references

Authors

Andrew Lambourne
View author publications
You can also search for this author in PubMed Google Scholar
Jill Hewitt
View author publications
You can also search for this author in PubMed Google Scholar
Caroline Lyon
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Warren
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lambourne, A., Hewitt, J., Lyon, C. et al. Speech-Based Real-Time Subtitling Services. International Journal of Speech Technology 7, 269–279 (2004). https://doi.org/10.1023/B:IJST.0000037071.39044.cc

Download citation

Issue Date: October 2004
DOI: https://doi.org/10.1023/B:IJST.0000037071.39044.cc

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech-Based Real-Time Subtitling Services

Abstract

Access this article

Similar content being viewed by others

The Relation Between Music Technology and Music Industry

AlignTool: The automatic temporal alignment of spoken utterances in German, Dutch, and British English for psycholinguistic purposes

Voice Assistant vs. Chatbot – Examining the Fit Between Conversational Agents’ Interaction Modalities and Information Search Tasks

References

Rights and permissions

About this article

Cite this article

Navigation

Speech-Based Real-Time Subtitling Services

Abstract

Access this article

Similar content being viewed by others

The Relation Between Music Technology and Music Industry

AlignTool: The automatic temporal alignment of spoken utterances in German, Dutch, and British English for psycholinguistic purposes

Voice Assistant vs. Chatbot – Examining the Fit Between Conversational Agents’ Interaction Modalities and Information Search Tasks

References

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation