Captioning for Deaf and Hard of Hearing People by Editing Automatic Speech Recognition in Real Time

  • Mike Wald
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4061)


Deaf and hard of hearing people can find it difficult to follow speech through hearing alone or to take notes when lip-reading or watching a sign-language interpreter. Notetakers summarise what is being said while qualified sign language interpreters with a good understanding of the relevant higher education subject content are in very scarce supply. Real time captioning/transcription is not normally available in UK higher education because of the shortage of real time stenographers. Lectures can be digitally recorded and replayed to provide multimedia revision material for students who attended the class and a substitute learning experience for students unable to attend. Automatic Speech Recognition can provide real time captioning directly from lecturers’ speech in classrooms but it is difficult to obtain accuracy comparable to stenography. This paper describes the development of a system that enables editors to correct errors in the captions as they are created by Automatic Speech Recognition.


Speech Recognition Automatic Speech Recognition Speech Recognition System Liberate Learn Automatic Speech Recognition System 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baecker, R.M., Wolf, P., Rankin, K.: The ePresence Interactive Webcasting System: Technology Overview and Current Research Issues. In: Proceedings of Elearn 2004, pp. 2396–3069 (2004)Google Scholar
  2. 2.
    Bailey: Human Interaction Speeds (Retrieved December 8, 2005), from
  3. 3.
    Bain, K., Basson, S.A., Faisman, A., Kanevsky, D.: Accessibility, transcription, and access everywhere. IBM Systems Journal 44(3), 589–603 (2005), (Retrieved December 12, 2005) from CrossRefGoogle Scholar
  4. 4.
    Bain, K., Basson, S., Wald, M.: Speech recognition in university classrooms. In: Proceedings of the Fifth International ACM SIGCAPH Conference on Assistive Technologies, pp. 192–196. ACM Press, New York (2002)CrossRefGoogle Scholar
  5. 5.
    Brotherton, J.A., Abowd, G.D.: Lessons Learned From eClass: Assessing Automated Capture and Access in the Classroom. ACM Transactions on Computer-Human Interaction 11(2) (2004)Google Scholar
  6. 6.
    Clements, M., Robertson, S., Miller, M.S.: Phonetic Searching Applied to On-Line Distance Learning Modules (2002) (Retrieved December 8, 2005) from
  7. 7.
    Downs, S., Davis, C., Thomas, C., Colwell, J.: Evaluating Speech-to-Text Communication Access Providers: A Quality Assurance Issue. In: PEPNet 2002: Diverse Voices, One Goal Proceedings from Biennial Conference on Postsecondary Education for Persons who are Deaf or Hard of Hearing (2002) (Retrieved November 8, 2005) from
  8. 8.
    Francis, P.M., Stinson, M.: The C-Print Speech-to-Text System for Communication Access and Learning. In: Proceedings of CSUN Conference Technology and Persons with Disabilities, California State University Northridge (2003) (Retrieved December 12, 2005) from
  9. 9.
    Howard-Spink, S.: IBM’s Superhuman Speech initiative clears conversational confusion (2005) (Retrieved December 12, 2005) from
  10. 10.
    IBM. The Superhuman Speech Recognition Project (2003) (Retrieved December 12, 2005) from
  11. 11.
    IBM: (2005) (Retrieved December 12, 2005) from
  12. 12.
    Karat, C.M., Halverson, C., Horn, D., Karat, J.: Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems. In: CHI 1999 Conference Proceedings, pp. 568–575 (1999)Google Scholar
  13. 13.
    Karat, J., Horn, D., Halverson, C.A., Karat, C.M.: Overcoming unusability: developing efficient strategies in speech recognition systems. In: Conference on Human Factors in Computing Systems CHI 2000 extended abstracts, pp. 141–142 (2000)Google Scholar
  14. 14.
    Lambourne, A., Hewitt, J., Lyon, C., Warren, S.: Speech-Based Real-Time Subtitling Service. International Journal of Speech Technology 7, 269–279 (2004)CrossRefGoogle Scholar
  15. 15.
    Leitch, D., MacMillan, T.: Liberated Learning Initiative Innovative Technology and Inclusion: Current Issues and Future Directions for Liberated Learning Research. Year III Report, Saint Mary’s University, Nova Scotia (2003)Google Scholar
  16. 16.
    Lewis, J.R.: Effect of Error Correction Strategy on Speech Dictation Throughput. In: Proceedings of the Human Factors and Ergonomics Society, pp. 457–461 (1999)Google Scholar
  17. 17.
    NCAM International Captioning Project (2000) (Retrieved December 12, 2005) from
  18. 18.
    Nuance (2005) (Retrieved December 12, 2005) from
  19. 19.
    Olavsrud, T.: IBM Wants You to Talk to Your Devices (2002) (Retrieved December 12, 2005) from
  20. 20.
    Robison, J., Jensema, C.: Computer Speech Recognition as an Assistive Device for Deaf and Hard of Hearing People, Challenge of Change: Beyond the Horizon. In: Proceedings from Seventh Biennial Conference on Postsecondary Education for Persons who are Deaf or Hard of Hearing (April 1996) (Retrieved November 8, 2005) from
  21. 21.
  22. 22.
    SENDA: (2001) (Retrieved December 12, 2005) from
  23. 23.
    Shneiderman, B.: The Limits of Speech Recognition. Communications of the ACM 43(9), 63–65 (2000)CrossRefGoogle Scholar
  24. 24.
    SMIL: (2005) (Retrieved December 12, 2005) from
  25. 25.
    Softel FAQ Live or ‘Real-time’ Subtitling (2001) (Retrieved December 12, 2005) from
  26. 26.
    Start-Stop Dictation and Transcription Systems (2005) (Retrieved December 27, 2005) from
  27. 27.
    Stinson, M., Stuckless, E., Henderson, J., Miller, L.: Perceptions of Hearing-Impaired College Students towards real-time speech to print: Real time Graphic display and other educational support services. The Volta Review 90, 341–347 (1988)Google Scholar
  28. 28.
    Suhm, B., Myers, B., Waibel, A.: Model-Based and Empirical Evaluation of Multimodal Interactive Error Correction. In: CHI 1999 Conference Proceedings, pp. 584–591 (1999)Google Scholar
  29. 29.
    Suhm, B., Myers, B.: Multimodal error correction for speech user interfaces. ACM Transactions on Computer-Human Interaction (TOCHI) 8(1), 60–98 (2001)CrossRefGoogle Scholar
  30. 30.
    Tyre, P.: Professor In Your Pocket, Newsweek MSNBC (2005) (Retrieved December 8, 2005) from
  31. 31.
    WAI (2005) (Retrieved December 12, 2005), from:
  32. 32.
    Wald, M.: Developments in technology to increase access to education for deaf and hard of hearing students. In: Proceedings of CSUN Conference Technology and Persons with Disabilities, California State University, Northridge (2000) (retrieved December 12, 2005), from
  33. 33.
    Wald, M.: Hearing disability and technology. In: Phipps, L., Sutherland, A., Seale, J. (eds.) Access All Areas: disability, technology and learning, JISC TechDis and ALT, pp. 19–23 (2002)Google Scholar
  34. 34.
    Wald, M.: Personalised Displays. Speech Technologies: Captioning, Transcription and Beyond, IBM T.J. Watson Research Center New York (2005) (Retrieved December 27, 2005), from

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Mike Wald
    • 1
  1. 1.Learning Technologies Group, School of Electronics and Computer ScienceUniversity of SouthamptonUnited Kingdom

Personalised recommendations