Abstract
Previous research has shown that learners of English-as-a-second-language (ESL) have difficulties in understanding connected speech spoken by native English speakers. Extending from past research limited to quiet listening condition, this study examined the perception of English connected speech presented under five adverse conditions, namely multi-talker babble noise, speech-shaped noise, factory noise, whispering and sad emotional tones. We tested a total of 64 Chinese ESL undergraduate students, using a battery of listening tasks. Results confirmed that the recognition of English native speech was more challenging for Chinese ESL learners under unfavorable listening conditions, in comparison to a noise-free listening condition. These findings carry significant implications for the importance of training and assessments on connected speech perception across various listening environments.
Similar content being viewed by others
References
Audacity Team. (2012). Audacity (Version 2.0.2) [Computer software]. http://audacity.sourceforge.net/.
Bao, H., Xu, M. X., & Zheng, T. F. (2007). Emotion attribute projection for speaker recognition on emotional speech. Paper presented at the 8th annual conference of the international speech communication association, Antwerp, Belgium.
Berwick, G., Hardy-Gould, J., Southern, A., Thorne, S., & Wallwork, A. (2008). BBC World News English Arts and Entertainment. BBC Worldwide Ltd.
Bradlow, A. R., & Bent, T. (2002). The clear speech effect for non-native listeners. The Journal of the Acoustical Society of America, 112(1), 272–284. doi:10.1121/1.1487837.
Broersma, M., & Scharenborg, O. (2010). Native and non-native listeners’ perception of English consonants in different types of noise. Speech Communication, 52(11), 980–995. doi:10.1016/j.specom.2010.08.010.
Brungart, D. S. (2001). Informational and energetic masking effects in the perception of two simultaneous talkers. The Journal of the Acoustical Society of America, 109(3), 1101–1109. doi:10.1121/1.1345696.
Buck, R. (1985). Prime theory: An integrated view of motivation and emotion. Psychological Review, 92(3), 389. doi:10.1037/0033-295X.92.3.389.
Chan, Y. H. (Ed.). (2003). CNN Interactive English. Taipei: Hebron Soft Ltd.
Cirillo, J. (2004). Communication by unvoiced speech: The role of whispering. Anais da Academia Brasileira de Ciências, 76(2), 413–423. doi:10.1590/S0001-37652004000200034.
Clavel, C., Vasilescu, I., Devillers, L., & Ehrette, T. (2004). Fiction database for emotion detection in abnormal situations. Paper presented in the 8th International Conference on Spoken Language Processing, Jeju Island, Korea.
Collins, S. (2007). Practical everyday English. Barcelona: Montserrat Publishing.
Crandell, C. C., & Smaldino, J. J. (1996). Speech perception in noise by children for whom English is a second language. American Journal of Audiology, 5(3), 47–51. doi:10.1044/1059-0889.0503.47.
Cutler, A., Weber, A., Smits, R., & Cooper, N. (2004). Patterns of English phoneme confusions by native and non-native listeners. The Journal of the Acoustical Society of America, 116(6), 3668–3678. doi:10.1121/1.1810292.
Drullman, R., & Bronkhorst, A. W. (2004). Speech perception and talker segregation: Effects of level, pitch, and tactile support with multiple simultaneous talkers. The Journal of the Acoustical Society of America, 116(5), 3090–3098. doi:10.1121/1.1802535.
Fujimura, O., & Lindqvist, J. (1971). Sweep-tone measurements of vocal-tract characteristics. The Journal of the Acoustical Society of America, 49(2B), 541–558. doi:10.1121/1.1912385.
Garcia Lecumberri, M. L., & Cooke, M. (2006). Effect of masker type on native and non-native consonant perception in noise. The Journal of the Acoustical Society of America, 119(4), 2445–2454. doi:10.1121/1.2180210.
Garcia Lecumberri, M. L., Cooke, M., & Cutler, A. (2010). Non-native speech perception in adverse conditions: A review. Speech Communication, 52(11), 864–886. doi:10.1016/j.specom.2010.08.014.
Gaskell, M. G., & Snoeren, N. D. (2008). The impact of strong assimilation on the perception of connected speech. Journal of Experimental Psychology: Human Perception and Performance, 34, 1632–1647. doi:10.1037/a0011977.
Grimaldi, M., & Cummins, F. (2008). Speaker identification using instantaneous frequencies. IEEE Transactions on Audio, Speech and Language Processing, 16(6), 1097–1111. doi:10.1109/TASL.2008.2001109.
Hazan, V., & Simpson, A. (2000). The effect of cue-enhancement on consonant intelligibility in noise: Speaker and listener effects. Language and Speech, 43(3), 273–294. doi:10.1177/00238309000430030301.
Henrichsen, L. E. (1984). Sandhi-variation: A filter of input for learners of ESL. Language Learning, 34(3), 103–123. doi:10.1111/j.1467-1770.1984.tb00343.x.
Hewings, M. (2007). English pronunciation in use advanced. Cambridge: Cambridge University Press.
HKEAA. (2013a). Press Release: HKDSE level 5** awarded with UCAS Tariff Points. Retrieved from http://www.hkeaa.edu.hk/DocLibrary/MainNews/PR_20121218_eng.pdf.
HKEAA. (2013b). Press Release: Results of the benchmarking study between IELTS and HKDSE English Language Examination. Retrieved from http://www.hkeaa.edu.hk/DocLibrary/MainNews/press_20130430_eng.pdf.
Ito, Y. (2006). Effect of reduced forms on ESL learners’ input-intake process. In J. D. Brown & K. Kondo-Brown (Eds.), Perspectives on teaching connected speech to second language speakers (pp. 51–58). Honolulu: University of Hawaii, National Foreign Language Resource Center.
Ito, T., Takeda, K., & Itakura, F. (2005). Analysis and recognition of whispered speech. Speech Communication, 45(2), 139–152. doi:10.1016/j.specom.2003.10.005.
Jin, S. H., & Liu, C. (2012). English sentence recognition in speech-shaped noise and multi-talker babble for English-, Chinese-, and Korean-native listeners. The Journal of the Acoustical Society of America, 132(5), EL391–EL397. doi:10.1121/1.4757730.
Jovicic, S. T., & Dordevic, M. M. (1996). Acoustic features of whispered speech. Acustica, 82, S228. Retrieved from https://getinfo.de/app/Acoustic-features-of-whispered-speech/id/BLSE%3ARN003953233.
Kidd, G, Jr., Mason, C. R., Richards, V. M., Gallun, F. J., & Durlach, N. I. (2007). Informational masking. In W. A. Yost, A. N. Popper, & R. R. Fay (Eds.), Auditory perception of sound sources (pp. 143–189). New York: Springer.
Klein, D., Zatorre, R. J., Milner, B., & Zhao, V. (2001). A cross-linguistic PET study of tone perception in Mandarin Chinese and English speakers. Neuroimage, 13(4), 646–653. doi:10.1006/nimg.2000.0738.
Konno, H., Toyama, J., Shimbo, M., & Murata, K. (1996). The effect of formant frequency and spectral tilt of unvoiced vowels on their perceived pitch and phonemic quality. IEICE Technical Report, 39–45.
Kumai, N., & Timson, S. (2010). Hit parade listening (3rd ed.). Tokyo: Macmillan Language House.
Ladefoged, P. (2000). A course in phonetics (4th ed.). Fort Worth, TX: Harcourt Brace Jovanovich.
Laver, J. (1994). Principles of phonetics. Cambridge: Cambridge University Press.
Lee, L., & Nusbaum, H. C. (1993). Processing interactions between segmental and suprasegmental information in native speakers of English and Mandarin Chinese. Perception and Psychophysics, 53(2), 157–165. doi:10.3758/BF03211726.
Litman, D. J., Hirschberg, J. B., & Swerts, M. (2000). Predicting automatic speech recognition performance using prosodic cues. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference (pp. 218–225). Association for Computational Linguistics.
Maekawa, K. (2004). Production and perception of ’paralinguistic’ information. In Speech Prosody 2004, international conference.
Mansell, P. (1973). An experimental investigation of articulatory reorganisation in whispered speech. Forschungsberichte des Instituts für Phonetik und sprachliche Kommunikation der Universität München, 2, 201–253. Retrieved from http://www.ibrarian.net/navon/page.jsp?paperid=13127527.
Matsuzawa, T. (2006). Comprehension of English reduced forms by Japanese business people and the effectiveness of instruction. In J. D. Brown & K. Kondo-Brown (Eds.), Perspectives on teaching connected speech to second language speakers (pp. 59–66). Honolulu: University of Hawaii, National Foreign Language Resource Center.
Mayo, L. H., Florentine, M., & Buus, S. (1997). Age of second-language acquisition and perception of speech in noise. Journal of Speech, Language, and Hearing Research, 40(3), 686–693. doi:10.1044/jslhr.4003.686.
Mi, L., Tao, S., Wang, W., Dong, Q., Jin, S. H., & Liu, C. (2013). English vowel identification in long-term speech-shaped noise and multi-talker babble for English and Chinese listeners. The Journal of the Acoustical Society of America, 133(5), EL391–EL397. doi:10.1121/1.4800191.
Mitterer, H., & Tuinman, A. (2012). The role of native-language knowledge in the perception of casual speech in a second language. Frontiers in Psychology,. doi:10.3389/fpsyg.2012.00249.
Mok, P., Setter, J. & Low, E. L. (2011). The perception of word juncture characteristics in three varieties of English. In Proceedings of the 17th International Congress of Phonetic Sciences (ICPhS) (pp. 1410–1413). Hong Kong.
Morris R. W. (2003). Enhancement and recognition of whispered speech. Doctoral dissertation, Georgia Institute of Technology, Atlanta.
Narayanan, A. (2012). Sound demo for IBM-masked noise. Retrieved from http://web.cse.ohio-state.edu/pnl/demo/IBM.html.
Nogueiras, A., Moreno, A., Bonafonte, A., & Mariño, J. B. (2001). Speech emotion recognition using hidden Markov models. Paper presented in EUROSPEECH 2001 Scandinavia, 7th European Conference on Speech Communication and Technology, 2nd INTERSPEECH Event, Aalborg, Denmark.
Paul, D. B., & Baker, J. M. (1992). The Design for the Wall Street Journal-based CSR Corpus. ICSLP-92.
Pell, M. D. (2001). Influence of emotion and focus location on prosody in matched statements and questions. The Journal of the Acoustical Society of America, 109(4), 1668–1680. doi:10.1121/1.1352088.
Polzin, T. S., & Waibel, A. (1998). Detecting emotions in speech. Paper presented in cooperative multimodal communication: Second international conference, Tilburg, Netherlands.
Qin, Y. Y. (Ed.). (2003). Crazy English (Vol. 42). Guangzhou: Renzhen Enterprise Co., Limited.
Sawyer, G. (2010). MP3 Gain. http://mp3gain.sourceforge.net/.
Scherer, K. R., Banse, R., & Wallbott, H. G. (2001). Emotion inferences from vocal expression correlate across languages and cultures. Journal of Cross-Cultural Psychology, 32(1), 76–92. doi:10.1177/0022022101032001009.
Shi, L. F. (2009). Normal-hearing English-as-a-second-language listeners’ recognition of English words in competing signals. International Journal of Audiology, 48(5), 260–270. doi:10.1080/14992020802607431.
Shockey, L. (2003). Sound patterns of spoken English. Cornwall: Blackwell.
Tartter, V. C. (1991). Identifiability of vowels and speakers from whispered syllables. Perception and Psychophysics, 49(4), 365–372. doi:10.3758/BF03205994.
Van Engen, K. J. (2010). Similarity and familiarity: Second language sentence recognition in first-and second-language multi-talker babble. Speech Communication, 52(11), 943–953. doi:10.1016/j.specom.2010.05.002.
Van Engen, K. J., & Bradlow, A. R. (2007). Sentence recognition in native-and foreign-language multi-talker background noise. The Journal of the Acoustical Society of America, 121(1), 519–526. doi:10.1121/1.2400666.
Varga, A., & Steeneken, H. (1993). Assessment for automatic speech recognition II: NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251. doi:10.1016/0167-6393(93)90095-3.
Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162–1181. doi:10.1016/j.specom.2006.04.003.
Vlasenko, B., Prylipko, D., & Wendemuth, A. (2012). Towards robust spontaneous speech recognition with emotional speech adapted acoustic models. Paper presented in the 35th German Conference on Artificial Intelligence, Saarbrücken, Germany.
Wang, Y. T. (2005). An exploration of the effects of reduced forms instruction on EFL college students’ listening comprehension. Unpublished master dissertation, National Tsing Hua University, Taiwan.
Williams, C. E., & Stevens, K. N. (1972). Emotions and speech: Some acoustical correlates. The Journal of the Acoustical Society of America, 52(4B), 1238–1250. doi:10.1121/1.1913238.
Wong, S. W. L., Mok, P. P. K., Chung, K. K.-H., Leung, V. W. H., Bishop, D. V. M., & Chow, B.-W.-Y. (2017). Perception of native English reduced forms in Chinese learners: Its role in listening comprehension and its phonological correlates. TESOL Quarterly, 51(1), 7–31. doi:10.1002/tesq.273.
Wrench, A. A., & Hardcastle, W. J. (2000). A multichannel articulatory speech database and its application for automatic speech recognition. Paper presented in the 5th seminar on speech production: Models and data, München, Germany.
Acknowledgements
We thank all the students who participated in this study. We are also grateful to Lauren Couillard, Marnie Evans, and Marianne Katherine Hewitt, who helped with the preparation of speech stimuli.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
This study was funded by Research Grants Council (RGC) of the University Grants Committee (UGC), Hong Kong (ECS 846212) and Internal Research Grant of the Education University of Hong Kong (RG72/2015-2016).
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Wong, S.W.L., Tsui, J.K.Y., Chow, B.WY. et al. Perception of Native English Reduced Forms in Adverse Environments by Chinese Undergraduate Students. J Psycholinguist Res 46, 1149–1165 (2017). https://doi.org/10.1007/s10936-017-9486-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10936-017-9486-y