Skip to main content
Log in

Novel software for producing audio description based on speech synthesis enables cost reduction without sacrificing quality

  • Long Paper
  • Published:
Universal Access in the Information Society Aims and scope Submit manuscript


Audio description (AD) is a narration technique that allows visually impaired people to experience movies. However, high-quality AD production is expensive, preventing its widespread adoption. The quality of synthesized-voice AD is inferior to that of human-narrated AD, but its practicality may outweigh the cost of producing human-narrated AD if its real cost of production is verified. The results of an empirical experiment comparing standard synthesized-voice AD and human-narrated AD for two-hour-long Japanese narrative films using general-purpose movie-player software and Excel spreadsheets indicated that the synthesized-voice AD scripts required approximately 5 h 20 min. of modification work by the film producer and sound engineer. Furthermore, despite the same cost of production, the quality of the finished synthesized-voice AD was only 75% of that of human-narrated AD. Consequently, with the aim of improving these statistics, we developed a prototype of a higher-quality, less expensive AD production software based on speech synthesis. The prototype includes dedicated interfaces for AD-script entry, frame-by-frame insertion-point settings, speech rate settings, and volume settings with audio waveforms. It also integrates with Excel, a software often used by film production teams. In evaluations conducted using trailers in the same genre as that of the Japanese film used in the empirical experiment, our AD production software successfully reduced cost by eliminating the modification time conventionally required by the film producer and sound engineer; additionally, the film producer’s quality assessment increased by 5% relative to existing standard synthesized-voice ADs. Finally, challenges and potential areas of improvement with the software were evinced from the participating audio describers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others


  1. Lakritz, J., Salway, A.: The semi-automatic generation of audio description from screenplays. Tech. rep., Dept. Of Computing, University of Surrey (2002)

  2. Sade, J., Naz, K., Plaza, M.: Enhancing audio description: a value added approach. In: Computers Helping People with Special Needs - Lecture Notes in Computer Science book series, vol. 7382, pp. 270–277 (2012)

  3. Kobayashi, M., O’Connell, T., Gould, B., Takagi, H., Asakawa, C.: Are synthesized video descriptions acceptable? In: Proceedings of the 12th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS’10, pp. 163–170. ACM (2010)

  4. Szarkowska, A.: Text-to-speech audio description: towards wider availability of AD. J. Spec. Transl. 15, 142–162 (2011)

    Google Scholar 

  5. Fernández-Torné, A.: Audio description and technologies: Study on the semi-automatisation of the translation and voicing of audio descriptions. Ph.D. thesis, Universitat Autnoma de Barcelona, Barcelona, Spain (2016)

  6. Omori, K., Nakagawa, R., Yasumura, M., Watanabe, T.: Comparative evaluation of the movie with audio description narrated with text-to-speech. IEICE Tech. Rep. 114(512), 17–22 (2015)

    Google Scholar 

  7. Fernández-Torné, A., Matamala, A.: Text-to-speech vs. human voiced audio descriptions: a reception study in films dubbed into Catalan. J. Spec. Transl. 24, 61–88 (2015)

    Google Scholar 

  8. Walczak, A., Fryer, L.: Vocal delivery of audio description by genre: measuring users’ presence. Perspectives 26(1), 69–83 (2018).

    Article  Google Scholar 

  9. Kurihara, K., Imai, A., Seiyama, N., Shimizu, T., Sato, S., Yamada, I., Kumano, T., Tako, R., Miyazaki, T., Ichiki, M., Takagi, T., Sumiyoshi, H.: Automatic generation of audio descriptions for sports programs. SMPTE Motion Imag. J. 128(1), 41–47 (2019).

    Article  Google Scholar 

  10. Campos, V.P., de Araújo, T.M.U., de Souza Filho, G.L.: Gonçalves LMG (2020) CineAD: a system for automated audio description script generation for the visually impaired. Univ. Access Inf. Soc. 19, 99–111 (2020)

    Article  Google Scholar 

  11. Shen, J., Pang, R., Weiss, R.J., Schuster, M., Jaitly, N., Yang, Z., Chen, Z., Zhang, Y., Wang, Y., Skerry-Ryan, R.J., Saurous, R.A., Agiomyrgiannakis, Y., Wu, Y.: Natural TTS synthesis by conditioning wavenet on mel spectrogram predictions. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4779–4783 (2018)

  12. Takaki, S.: Applied technology for speech synthesis: DNN-based text-to-speech synthesis. J. Acoust. Soc. Jpn. 75(7), 393–399 (2019)

    Google Scholar 

  13. King, S., Crumlish, J., Martin, A., Wihlborg, L.: The blizzard challenge 2018. The Blizzard Challenge (2018)

  14. Orero, P.: Audio description: professional recognition, practice and standards in Spain. Transl Watch Q. 1, 7–18 (2005)

    Google Scholar 

  15. Ibáñez, A.M.: Evaluation criteria and film narrative. A frame to teaching relevance in audio description. Perspectives 18, 143–153 (2010).

    Article  Google Scholar 

  16. Vercauteren, G.: A narratological approach to content selection in audio description: towards a strategy for the description of narratological time. MonTI. Monografías de Traducción e Interpretación 4, 207–231 (2012).

    Article  Google Scholar 

  17. Matamala, A., Orero, P.: Researching Audio Description New Approaches. Palgrave MacMillan, London (2016)

    Book  Google Scholar 

  18. Salway, A.: A corpus-based analysis of audio description. Media for All 3, 151–174 (2007)

  19. Caro, M.R.: Testing audio narration: the emotional impact of language in audio description. Perspectives 24(4), 606–634 (2016).

    Article  Google Scholar 

  20. Walczak, A., Fryer, L.: Creative description: the impact of audio description style on presence in visually impaired audiences. Br. J. Vis. Impair. 35(1), 6–17 (2017).

    Article  Google Scholar 

  21. Chou no Nemuri, SIGLO Ltd., KING RECORDS Co., Ltd., ZOA FILMS Inc.: Accessed 28 Aug (2020)

  22. UDCast, Palabra Inc.: Accessed 28 Aug (2020)

  23. Pro Tools, Avid Technology Inc.: Accessed 28 Aug (2020)

  24. Open JTalk, Nagoya Institute of Technology: Accessed 28 Aug (2020)

  25. hts_engine API, Nagoya Institute of Technology: Accessed 28 Aug (2020)

  26. HTS voice, Nagoya Institute of Technology: Accessed 25 Jan (2021)

  27. MMDAgent, Nagoya Institute of Technology: Accessed 25 Jan (2021)

  28. Accent Audio Description, Accent Global: Accessed 25 Jan 2021

  29. Annotation Edit, zeitAnker: Accessed 25 Jan (2021)

  30. CapScribe2, CapScribe project team: Accessed 25 Jan 2021

  31. Media Access Generator (MAGpie), National Center for Accessible Media (NCAM): Accessed 25 Jan (2021)

  32. StarFish, Starfish Technologies Ltd: Accessed 25 Jan (2021)

  33. GOM Player, GOM & COMPANY: Accessed 25 Jan (2021)

  34. QuickTime Player, Apple Inc.: Accessed 25 Jan (2021)

  35. AITalk SDK, AI Inc.: Accessed 28 Aug (2020)

  36. Dolphin Blue, Dolphin Bule Production Committee: Accessed 28 Aug (2020)

  37. Brooke, J.: SUS: A ‘quick and dirty’ usability scale. Usability Evaluation in Industry pp. 189–194 (2009)

  38. Brooke, J.: SUS: a retrospective. J. Usability Stud. 8(2), 29–40 (2013)

    Google Scholar 

  39. Bangor, A., Kortum, P.T., Miller, J.T.: An empirical evaluation of the system usability scale. Int. J. Hum. Comput. Interact. 24(6), 574–594 (2008)

    Article  Google Scholar 

  40. Bangor, A., Kortum, P., Miller, J.: Determining what individual SUS scores mean: adding an adjective rating scale. J. Usability Stud. 4(3), 114–123 (2009)

    Google Scholar 

Download references


This study was supported by Grant-in-Aid for Scientific Research (C) JP17K01553. We would like to express our gratitude for the support.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Sawako Nakajima.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nakajima, S., Mitobe, K. Novel software for producing audio description based on speech synthesis enables cost reduction without sacrificing quality. Univ Access Inf Soc 21, 405–418 (2022).

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: