Novel software for producing audio description based on speech synthesis enables cost reduction without sacrificing quality

  • Long Paper
  Published:
Audio description (AD) is a narration technique that allows visually impaired people to experience movies. However, high-quality AD production is expensive, preventing its widespread adoption. The quality of synthesized-voice AD is inferior to that of human-narrated AD, but its practicality may outweigh the cost of producing human-narrated AD if its real cost of production is verified. The results of an empirical experiment comparing standard synthesized-voice AD and human-narrated AD for two-hour-long Japanese narrative films using general-purpose movie-player software and Excel spreadsheets indicated that the synthesized-voice AD scripts required approximately 5 h 20 min. of modification work by the film producer and sound engineer. Furthermore, despite the same cost of production, the quality of the finished synthesized-voice AD was only 75% of that of human-narrated AD. Consequently, with the aim of improving these statistics, we developed a prototype of a higher-quality, less expensive AD production software based on speech synthesis. The prototype includes dedicated interfaces for AD-script entry, frame-by-frame insertion-point settings, speech rate settings, and volume settings with audio waveforms. It also integrates with Excel, a software often used by film production teams. In evaluations conducted using trailers in the same genre as that of the Japanese film used in the empirical experiment, our AD production software successfully reduced cost by eliminating the modification time conventionally required by the film producer and sound engineer; additionally, the film producer’s quality assessment increased by 5% relative to existing standard synthesized-voice ADs. Finally, challenges and potential areas of improvement with the software were evinced from the participating audio describers.

Fig. 1
Fig. 2

This study was supported by Grant-in-Aid for Scientific Research (C) JP17K01553. We would like to express our gratitude for the support.

