Skip to main content
Log in

Perception of Native English Reduced Forms in Adverse Environments by Chinese Undergraduate Students

  • Published:
Journal of Psycholinguistic Research Aims and scope Submit manuscript


Previous research has shown that learners of English-as-a-second-language (ESL) have difficulties in understanding connected speech spoken by native English speakers. Extending from past research limited to quiet listening condition, this study examined the perception of English connected speech presented under five adverse conditions, namely multi-talker babble noise, speech-shaped noise, factory noise, whispering and sad emotional tones. We tested a total of 64 Chinese ESL undergraduate students, using a battery of listening tasks. Results confirmed that the recognition of English native speech was more challenging for Chinese ESL learners under unfavorable listening conditions, in comparison to a noise-free listening condition. These findings carry significant implications for the importance of training and assessments on connected speech perception across various listening environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others


  • Audacity Team. (2012). Audacity (Version 2.0.2) [Computer software].

  • Bao, H., Xu, M. X., & Zheng, T. F. (2007). Emotion attribute projection for speaker recognition on emotional speech. Paper presented at the 8th annual conference of the international speech communication association, Antwerp, Belgium.

  • Berwick, G., Hardy-Gould, J., Southern, A., Thorne, S., & Wallwork, A. (2008). BBC World News English Arts and Entertainment. BBC Worldwide Ltd.

  • Bradlow, A. R., & Bent, T. (2002). The clear speech effect for non-native listeners. The Journal of the Acoustical Society of America, 112(1), 272–284. doi:10.1121/1.1487837.

    Article  PubMed  Google Scholar 

  • Broersma, M., & Scharenborg, O. (2010). Native and non-native listeners’ perception of English consonants in different types of noise. Speech Communication, 52(11), 980–995. doi:10.1016/j.specom.2010.08.010.

    Article  Google Scholar 

  • Brungart, D. S. (2001). Informational and energetic masking effects in the perception of two simultaneous talkers. The Journal of the Acoustical Society of America, 109(3), 1101–1109. doi:10.1121/1.1345696.

    Article  PubMed  Google Scholar 

  • Buck, R. (1985). Prime theory: An integrated view of motivation and emotion. Psychological Review, 92(3), 389. doi:10.1037/0033-295X.92.3.389.

    Article  Google Scholar 

  • Chan, Y. H. (Ed.). (2003). CNN Interactive English. Taipei: Hebron Soft Ltd.

    Google Scholar 

  • Cirillo, J. (2004). Communication by unvoiced speech: The role of whispering. Anais da Academia Brasileira de Ciências, 76(2), 413–423. doi:10.1590/S0001-37652004000200034.

    Article  PubMed  Google Scholar 

  • Clavel, C., Vasilescu, I., Devillers, L., & Ehrette, T. (2004). Fiction database for emotion detection in abnormal situations. Paper presented in the 8th International Conference on Spoken Language Processing, Jeju Island, Korea.

  • Collins, S. (2007). Practical everyday English. Barcelona: Montserrat Publishing.

    Google Scholar 

  • Crandell, C. C., & Smaldino, J. J. (1996). Speech perception in noise by children for whom English is a second language. American Journal of Audiology, 5(3), 47–51. doi:10.1044/1059-0889.0503.47.

    Article  Google Scholar 

  • Cutler, A., Weber, A., Smits, R., & Cooper, N. (2004). Patterns of English phoneme confusions by native and non-native listeners. The Journal of the Acoustical Society of America, 116(6), 3668–3678. doi:10.1121/1.1810292.

    Article  PubMed  Google Scholar 

  • Drullman, R., & Bronkhorst, A. W. (2004). Speech perception and talker segregation: Effects of level, pitch, and tactile support with multiple simultaneous talkers. The Journal of the Acoustical Society of America, 116(5), 3090–3098. doi:10.1121/1.1802535.

    Article  PubMed  Google Scholar 

  • Fujimura, O., & Lindqvist, J. (1971). Sweep-tone measurements of vocal-tract characteristics. The Journal of the Acoustical Society of America, 49(2B), 541–558. doi:10.1121/1.1912385.

    Article  Google Scholar 

  • Garcia Lecumberri, M. L., & Cooke, M. (2006). Effect of masker type on native and non-native consonant perception in noise. The Journal of the Acoustical Society of America, 119(4), 2445–2454. doi:10.1121/1.2180210.

    Article  PubMed  Google Scholar 

  • Garcia Lecumberri, M. L., Cooke, M., & Cutler, A. (2010). Non-native speech perception in adverse conditions: A review. Speech Communication, 52(11), 864–886. doi:10.1016/j.specom.2010.08.014.

    Article  Google Scholar 

  • Gaskell, M. G., & Snoeren, N. D. (2008). The impact of strong assimilation on the perception of connected speech. Journal of Experimental Psychology: Human Perception and Performance, 34, 1632–1647. doi:10.1037/a0011977.

    PubMed  Google Scholar 

  • Grimaldi, M., & Cummins, F. (2008). Speaker identification using instantaneous frequencies. IEEE Transactions on Audio, Speech and Language Processing, 16(6), 1097–1111. doi:10.1109/TASL.2008.2001109.

    Article  Google Scholar 

  • Hazan, V., & Simpson, A. (2000). The effect of cue-enhancement on consonant intelligibility in noise: Speaker and listener effects. Language and Speech, 43(3), 273–294. doi:10.1177/00238309000430030301.

    Article  PubMed  Google Scholar 

  • Henrichsen, L. E. (1984). Sandhi-variation: A filter of input for learners of ESL. Language Learning, 34(3), 103–123. doi:10.1111/j.1467-1770.1984.tb00343.x.

    Article  Google Scholar 

  • Hewings, M. (2007). English pronunciation in use advanced. Cambridge: Cambridge University Press.

    Google Scholar 

  • HKEAA. (2013a). Press Release: HKDSE level 5** awarded with UCAS Tariff Points. Retrieved from

  • HKEAA. (2013b). Press Release: Results of the benchmarking study between IELTS and HKDSE English Language Examination. Retrieved from

  • Ito, Y. (2006). Effect of reduced forms on ESL learners’ input-intake process. In J. D. Brown & K. Kondo-Brown (Eds.), Perspectives on teaching connected speech to second language speakers (pp. 51–58). Honolulu: University of Hawaii, National Foreign Language Resource Center.

    Google Scholar 

  • Ito, T., Takeda, K., & Itakura, F. (2005). Analysis and recognition of whispered speech. Speech Communication, 45(2), 139–152. doi:10.1016/j.specom.2003.10.005.

    Article  Google Scholar 

  • Jin, S. H., & Liu, C. (2012). English sentence recognition in speech-shaped noise and multi-talker babble for English-, Chinese-, and Korean-native listeners. The Journal of the Acoustical Society of America, 132(5), EL391–EL397. doi:10.1121/1.4757730.

  • Jovicic, S. T., & Dordevic, M. M. (1996). Acoustic features of whispered speech. Acustica, 82, S228. Retrieved from

  • Kidd, G, Jr., Mason, C. R., Richards, V. M., Gallun, F. J., & Durlach, N. I. (2007). Informational masking. In W. A. Yost, A. N. Popper, & R. R. Fay (Eds.), Auditory perception of sound sources (pp. 143–189). New York: Springer.

    Chapter  Google Scholar 

  • Klein, D., Zatorre, R. J., Milner, B., & Zhao, V. (2001). A cross-linguistic PET study of tone perception in Mandarin Chinese and English speakers. Neuroimage, 13(4), 646–653. doi:10.1006/nimg.2000.0738.

    Article  PubMed  Google Scholar 

  • Konno, H., Toyama, J., Shimbo, M., & Murata, K. (1996). The effect of formant frequency and spectral tilt of unvoiced vowels on their perceived pitch and phonemic quality. IEICE Technical Report, 39–45.

  • Kumai, N., & Timson, S. (2010). Hit parade listening (3rd ed.). Tokyo: Macmillan Language House.

    Google Scholar 

  • Ladefoged, P. (2000). A course in phonetics (4th ed.). Fort Worth, TX: Harcourt Brace Jovanovich.

    Google Scholar 

  • Laver, J. (1994). Principles of phonetics. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Lee, L., & Nusbaum, H. C. (1993). Processing interactions between segmental and suprasegmental information in native speakers of English and Mandarin Chinese. Perception and Psychophysics, 53(2), 157–165. doi:10.3758/BF03211726.

    Article  PubMed  Google Scholar 

  • Litman, D. J., Hirschberg, J. B., & Swerts, M. (2000). Predicting automatic speech recognition performance using prosodic cues. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference (pp. 218–225). Association for Computational Linguistics.

  • Maekawa, K. (2004). Production and perception of ’paralinguistic’ information. In Speech Prosody 2004, international conference.

  • Mansell, P. (1973). An experimental investigation of articulatory reorganisation in whispered speech. Forschungsberichte des Instituts für Phonetik und sprachliche Kommunikation der Universität München, 2, 201–253. Retrieved from

  • Matsuzawa, T. (2006). Comprehension of English reduced forms by Japanese business people and the effectiveness of instruction. In J. D. Brown & K. Kondo-Brown (Eds.), Perspectives on teaching connected speech to second language speakers (pp. 59–66). Honolulu: University of Hawaii, National Foreign Language Resource Center.

    Google Scholar 

  • Mayo, L. H., Florentine, M., & Buus, S. (1997). Age of second-language acquisition and perception of speech in noise. Journal of Speech, Language, and Hearing Research, 40(3), 686–693. doi:10.1044/jslhr.4003.686.

    Article  PubMed  Google Scholar 

  • Mi, L., Tao, S., Wang, W., Dong, Q., Jin, S. H., & Liu, C. (2013). English vowel identification in long-term speech-shaped noise and multi-talker babble for English and Chinese listeners. The Journal of the Acoustical Society of America, 133(5), EL391–EL397. doi:10.1121/1.4800191.

    Article  PubMed  Google Scholar 

  • Mitterer, H., & Tuinman, A. (2012). The role of native-language knowledge in the perception of casual speech in a second language. Frontiers in Psychology,. doi:10.3389/fpsyg.2012.00249.

    PubMed  PubMed Central  Google Scholar 

  • Mok, P., Setter, J. & Low, E. L. (2011). The perception of word juncture characteristics in three varieties of English. In Proceedings of the 17th International Congress of Phonetic Sciences (ICPhS) (pp. 1410–1413). Hong Kong.

  • Morris R. W. (2003). Enhancement and recognition of whispered speech. Doctoral dissertation, Georgia Institute of Technology, Atlanta.

  • Narayanan, A. (2012). Sound demo for IBM-masked noise. Retrieved from

  • Nogueiras, A., Moreno, A., Bonafonte, A., & Mariño, J. B. (2001). Speech emotion recognition using hidden Markov models. Paper presented in EUROSPEECH 2001 Scandinavia, 7th European Conference on Speech Communication and Technology, 2nd INTERSPEECH Event, Aalborg, Denmark.

  • Paul, D. B., & Baker, J. M. (1992). The Design for the Wall Street Journal-based CSR Corpus. ICSLP-92.

  • Pell, M. D. (2001). Influence of emotion and focus location on prosody in matched statements and questions. The Journal of the Acoustical Society of America, 109(4), 1668–1680. doi:10.1121/1.1352088.

    Article  PubMed  Google Scholar 

  • Polzin, T. S., & Waibel, A. (1998). Detecting emotions in speech. Paper presented in cooperative multimodal communication: Second international conference, Tilburg, Netherlands.

  • Qin, Y. Y. (Ed.). (2003). Crazy English (Vol. 42). Guangzhou: Renzhen Enterprise Co., Limited.

    Google Scholar 

  • Sawyer, G. (2010). MP3 Gain.

  • Scherer, K. R., Banse, R., & Wallbott, H. G. (2001). Emotion inferences from vocal expression correlate across languages and cultures. Journal of Cross-Cultural Psychology, 32(1), 76–92. doi:10.1177/0022022101032001009.

    Article  Google Scholar 

  • Shi, L. F. (2009). Normal-hearing English-as-a-second-language listeners’ recognition of English words in competing signals. International Journal of Audiology, 48(5), 260–270. doi:10.1080/14992020802607431.

    Article  PubMed  Google Scholar 

  • Shockey, L. (2003). Sound patterns of spoken English. Cornwall: Blackwell.

    Book  Google Scholar 

  • Tartter, V. C. (1991). Identifiability of vowels and speakers from whispered syllables. Perception and Psychophysics, 49(4), 365–372. doi:10.3758/BF03205994.

    Article  PubMed  Google Scholar 

  • Van Engen, K. J. (2010). Similarity and familiarity: Second language sentence recognition in first-and second-language multi-talker babble. Speech Communication, 52(11), 943–953. doi:10.1016/j.specom.2010.05.002.

    Article  PubMed  PubMed Central  Google Scholar 

  • Van Engen, K. J., & Bradlow, A. R. (2007). Sentence recognition in native-and foreign-language multi-talker background noise. The Journal of the Acoustical Society of America, 121(1), 519–526. doi:10.1121/1.2400666.

    Article  PubMed  PubMed Central  Google Scholar 

  • Varga, A., & Steeneken, H. (1993). Assessment for automatic speech recognition II: NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251. doi:10.1016/0167-6393(93)90095-3.

    Article  Google Scholar 

  • Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162–1181. doi:10.1016/j.specom.2006.04.003.

    Article  Google Scholar 

  • Vlasenko, B., Prylipko, D., & Wendemuth, A. (2012). Towards robust spontaneous speech recognition with emotional speech adapted acoustic models. Paper presented in the 35th German Conference on Artificial Intelligence, Saarbrücken, Germany.

  • Wang, Y. T. (2005). An exploration of the effects of reduced forms instruction on EFL college students’ listening comprehension. Unpublished master dissertation, National Tsing Hua University, Taiwan.

  • Williams, C. E., & Stevens, K. N. (1972). Emotions and speech: Some acoustical correlates. The Journal of the Acoustical Society of America, 52(4B), 1238–1250. doi:10.1121/1.1913238.

    Article  PubMed  Google Scholar 

  • Wong, S. W. L., Mok, P. P. K., Chung, K. K.-H., Leung, V. W. H., Bishop, D. V. M., & Chow, B.-W.-Y. (2017). Perception of native English reduced forms in Chinese learners: Its role in listening comprehension and its phonological correlates. TESOL Quarterly, 51(1), 7–31. doi:10.1002/tesq.273.

  • Wrench, A. A., & Hardcastle, W. J. (2000). A multichannel articulatory speech database and its application for automatic speech recognition. Paper presented in the 5th seminar on speech production: Models and data, München, Germany.

Download references


We thank all the students who participated in this study. We are also grateful to Lauren Couillard, Marnie Evans, and Marianne Katherine Hewitt, who helped with the preparation of speech stimuli.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Simpson W. L. Wong.

Ethics declarations


This study was funded by Research Grants Council (RGC) of the University Grants Committee (UGC), Hong Kong (ECS 846212) and Internal Research Grant of the Education University of Hong Kong (RG72/2015-2016).

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wong, S.W.L., Tsui, J.K.Y., Chow, B.WY. et al. Perception of Native English Reduced Forms in Adverse Environments by Chinese Undergraduate Students. J Psycholinguist Res 46, 1149–1165 (2017).

Download citation

  • Published:

  • Issue Date:

  • DOI: