Skip to main content
Log in

Speech-Based Real-Time Subtitling Services

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Recent advances in technology have led to the availability of powerful speech recognizers at low cost and to the possibility of using speech interaction in a variety of new and exciting practical applications. The purpose of this research was to investigate and develop the use of speech recognition in live television subtitling. This paper describes how the “SpeakTitle” project met the challenges of real time speech recognition and live subtitling through the development of a customisable speaker interface and use of ‘Topics’ for specific subject domains. In the prototype system (described in Hewitt et al., 2000; Bateman et al., 2001) output from the speech recognition system (the IBM ViaVoice® engine) is passed in to a custom-built editor from where it can be corrected and passed on to an existing subtitling system. The system was developed to the extent that it was acceptable for the production of subtitles for live television broadcasts and it has been adopted by three subtitle production facilities in the UK.

The evolution of the product and the experiences of users in developing the system in a live subtitling environment are considered, and the system is analysed against industry standards. Ease-of-use and accuracy are also discussed and further research areas are identified.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bateman, A., Hewitt, J., and Lambourne, A. (2001). Subtitles from Simultaneous Transdiction: Multi-modal Interfaces for Generating and Correcting Real-time Subtitles, HCII2001, New Orleans.

    Google Scholar 

  • Clarkson, P. and Robinson, T. (1998). The applicability of adaptive language modelling for the broadcast news task. Proceedings of ICSLP. Sydney, Australia, pp. 1699–1702.

  • Damper, R.I., Lambourne, A.D., and Guy, D.P. (1985). Speech input as an adjunct to keyboard entry in television subtitling. In B. Shackel (Ed)., Proceedings Human-Computer Interaction-INTERACT'84, pp. 203–208.

  • Gibbon, D., Moore, R., and Winski, R. (Eds.) (1997). Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter., Chapter 7.

  • Hewitt, J., Bateman, A., Lambourne, A., Ariyaeeinia, A., and Sivakumaran, P. (2000). Real-time speech generated subtitles: Problems and solutions. 6th International Conference on Spoken Language Processing ICSLP 2000. Vol. III.

  • ITC guidance on standards for subtitling (amended February 1999): http://www.itc.org.uk/itc publications/codes guidance/standards for subtitling/index.asp

  • LINK. (1998).The Use Of Speech Recognition In Live TV Subtitling, LINK Project No. GR/M15958/01, 1/10/1998-30/9/2001. Overview of LINK Project: http://homepages.feis.herts.ac.uk/∽ nehaniv/idmf/abstracts/hewitt.doc

  • National Captioning Institute. http://www.ncicap.org/ acapintro.asp

  • Ney, H., Martin, S., and Wessel, F. (1997). Statistical language modelling using leaving one out. In S. Young and G. Bloothoft (Eds.), Corpus Based Methods in Language and Speech Processing. Kluwer Academic.

  • NHK. (2002). http://www.nhk.or.jp/strl/open2002/en/tenji/id03/03. html

  • Pallet, D.S., et al. (1997). Broadcast news benchmark test results: English and Non-English. Proc.DARPA Speech Recognition Workshop 1997.

  • Seymour, K. and Rosenfeld, R. (1997). Using story topics for language model adaptation. Proceedings of Eurospeech97.

  • Sivakumaran, P., Fortuna, J., and Ariyaeeinia, A.M. (2001). On the use of the bayesian information criterion in multiple speaker detection.Proceedings of Eurospeech2001.

  • Sivakumaran, P., Ariyaeeinia, A., and Fortuna, J. (2002). An effective unsupervised scheme for multiple speaker detection. ICSLP2002. Denver, Colorado, Topic 16. Stenograph: http://www.stenograph.com

    Google Scholar 

  • UK legislation: Broadcasting Act 1990 (c. 42) Section 35, HM Stationery Office UK. Broadcasting Act 1996 (c. 42) Section 20(3)(a), HM Stationery Office UK. Statutory Instrument 2000 no 2378: Broadcast (subtitling) order 2001, HM Stationery Office UK.

  • UK standards: Unified Standard April 1974, BBC Engineering Sheet 4008(5), Oct. 1975. Joint IBA/BBC/BREMA Publication: Broadcast Teletext Specifi-cation, September 1976.

  • US legislation: Television Decoder Circuitry Act of 1990, US Congress. Telecommunications Act of 1996, US Congress. Federal Communications Commission Rule 79-Closed Captioning of Video Programming, updated 2001.

  • ®: http://www.ibm.com/software/speech WinCAPS: (2003) SysMedia Ltd. at http://www.sysmedia.com/ subtitling/pdfs/wincaps multimedia.pdf

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lambourne, A., Hewitt, J., Lyon, C. et al. Speech-Based Real-Time Subtitling Services. International Journal of Speech Technology 7, 269–279 (2004). https://doi.org/10.1023/B:IJST.0000037071.39044.cc

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:IJST.0000037071.39044.cc

Navigation