A Study of Automatic Speech Recognition in Noisy Classroom Environments for Automated Dialog Analysis

  • Nathaniel Blanchard
  • Michael Brady
  • Andrew M. Olney
  • Marci Glaus
  • Xiaoyi Sun
  • Martin Nystrand
  • Borhan Samei
  • Sean Kelly
  • Sidney D’Mello
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9112)

Abstract

The development of large-scale automatic classroom dialog analysis systems requires accurate speech-to-text translation. A variety of automatic speech recognition (ASR) engines were evaluated for this purpose. Recordings of teachers in noisy classrooms were used for testing. In comparing ASR results, Google Speech and Bing Speech were more accurate with word accuracy scores of 0.56 for Google and 0.52 for Bing compared to 0.41 for AT&T Watson, 0.08 for Microsoft, 0.14 for Sphinx with the HUB4 model, and 0.00 for Sphinx with the WSJ model. Further analysis revealed both Google and Bing engines were largely unaffected by speakers, speech class sessions, and speech characteristics. Bing results were validated across speakers in a laboratory study, and a method of improving Bing results is presented. Results provide a useful understanding of the capabilities of contemporary ASR engines in noisy classroom environments. Results also highlight a list of issues to be aware of when selecting an ASR engine for difficult speech recognition tasks.

Keywords

Google Speech Bing Speech Sphinx 4 Microsoft Speech ASR engine evaluation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kelly, S.: Classroom discourse and the distribution of student engagement. Soc. Psychol. Educ. 10, 331–352 (2007)CrossRefGoogle Scholar
  2. 2.
    Sweigart, W.: Classroom Talk, Knowledge Development, and Writing. Res. Teach. Engl. 25, 469–496 (1991)Google Scholar
  3. 3.
    Juzwik, M.M., Borsheim-Black, C., Caughlan, S., Heintz, A.: Inspiring Dialogue: Talking to Learn in the English Classroom. Teachers College Press (2013)Google Scholar
  4. 4.
    Nystrand, M., Gamoran, A., Kachur, R., Prendergast, C.: Opening dialogue. Teachers College, Columbia University, New York (1997)Google Scholar
  5. 5.
    Gamoran, A., Kelly, S.: Tracking, instruction, and unequal literacy in secondary school English. In: Stab. Change Am. Educ. Struct. Process Outcomes, pp. 109–126 (2003)Google Scholar
  6. 6.
    Nystrand, M., Gamoran, A.: The big picture: Language and learning in hundreds of English lessons. In: Open. Dialogue., pp. 30–74 (1997)Google Scholar
  7. 7.
    Wang, Z., Pan, X., Miller, K.F., Cortina, K.S.: Automatic classification of activities in classroom discourse. Comput. Educ. 78, 115–123 (2014)CrossRefGoogle Scholar
  8. 8.
    Ford, M., Baer, C.T., Xu, D., Yapanel, U., Gray, S.: The LENA Language Environment Analysis System. Technical Report LTR-03-2. Boulder, CO: LENA Foundation (2008)Google Scholar
  9. 9.
    Litman, D.J., Silliman, S.: ITSPOKE: An intelligent tutoring spoken dialogue system. In: Demonstration Papers at HLT-NAACL 2004, Association for Computational Linguistics, pp. 5–8 (2004)Google Scholar
  10. 10.
    Mostow, J., Aist, G.: Evaluating tutors that listen: An overview of Project LISTEN (2001)Google Scholar
  11. 11.
    Schultz, K., Bratt, E.O., Clark, B., Peters, S., Pon-Barry, H., Treeratpituk, P.: A scalable, reusable spoken conversational tutor: Scot. In: Proceedings of the AIED 2003 Workshop on Tutorial Dialogue Systems: With a View toward the Classroom, pp. 367–377 (2003)Google Scholar
  12. 12.
    Ward, W., Cole, R., Bolaños, D., Buchenroth-Martin, C., Svirsky, E., Vuuren, S.V., Weston, T., Zheng, J., Becker, L.: My science tutor: A conversational multimedia virtual tutor for elementary school science. ACM Trans. Speech Lang. Process. TSLP. 7, 18 (2011)Google Scholar
  13. 13.
    Johnson, W.L., Valente, A.: Tactical Language and Culture Training Systems: using AI to teach foreign languages and cultures. AI Mag. 30, 72 (2009)Google Scholar
  14. 14.
    Morbini, F., Audhkhasi, K., Sagae, K., Artstein, R., Can, D., Georgiou, P., Narayanan, S., Leuski, A., Traum, D.: Which ASR should I choose for my dialogue system? In: Proceedings of the SIGDIAL 2013 Conference, Metz, pp. 394–403 (2013)Google Scholar
  15. 15.
    Samei, B., Olney, A., Kelly, S., Nystrand, M., D’Mello, S., Blanchard, N., Sun, X., Glaus, M., Graesser, A.: Domain independent assessment of dialogic properties of classroom discourse. In: Stamper, J., Pardos, Z., Mavrikis, M., McLaren, B.M., (Eds.) Proceedings of the 7th International Conference on Educational Data Mining, London, pp. 233–236 (2014)Google Scholar
  16. 16.
    Nystrand, M., Wu, L.L., Gamoran, A., Zeiser, S., Long, D.A.: Questions in time: Investigating the structure and dynamics of unfolding classroom discourse. Discourse Process. 35, 135–198 (2003)CrossRefGoogle Scholar
  17. 17.
    Schalkwyk, J., Beeferman, D., Beaufays, F., Byrne, B., Chelba, C., Cohen, M., Kamvar, M., Strope, B.: Your word is my command: google search by voice: a case study. In: Advances in Speech Recognition, pp. 61–90. Springer (2010)Google Scholar
  18. 18.
    Microsoft: The Bing Speech Recognition Control (2014). http://www.bing.com/dev/en-us/speech. (accessed January 14, 2015)
  19. 19.
    Goffin, V., Allauzen, C., Bocchieri, E., Hakkani-Tür, D., Ljolje, A., Parthasarathy, S., Rahim, M.G., Riccardi, G., Saraclar, M.: The AT&T WATSON speech recognizer. In: ICASSP (1), pp. 1033–1036 (2005)Google Scholar
  20. 20.
    Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gouvea, E., Wolf, P., Woelfel, J.: Sphinx-4: A flexible open source framework for speech recognition (2004)Google Scholar
  21. 21.
    Kelly, S., Majerus, R.: School-to-school variation in disciplined inquiry. Urban Educ. 0042085911413151 (2011)Google Scholar
  22. 22.
    D’Mello, S.K., Graesser, A., King, B.: Toward Spoken Human-Computer Tutorial Dialogues. Human-Computer Interact. 25, 289–323 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Nathaniel Blanchard
    • 1
  • Michael Brady
    • 1
  • Andrew M. Olney
    • 2
  • Marci Glaus
    • 3
  • Xiaoyi Sun
    • 3
  • Martin Nystrand
    • 3
  • Borhan Samei
    • 2
  • Sean Kelly
    • 4
  • Sidney D’Mello
    • 1
  1. 1.University of Notre DameNotre DameUSA
  2. 2.University of MemphisMemphisUSA
  3. 3.University of Wisconsin-MadisonMadisonUSA
  4. 4.University of PittsburghPittsburghUSA

Personalised recommendations