Predicting Speech Errors in Mandarin Based on Word Frequency

  • Marc TangEmail author
  • I-Ping Wan
Part of the Frontiers in Chinese Linguistics book series (FiCL, volume 9)


This paper investigates the effect of word frequency on the occurrence of speech errors in Mandarin. A corpus of 390 speech errors along with their surrounding linguistic context was gathered. The information of word frequency was extracted from the Academia Sinica Corpus. Our analysis with a computational classifier based on conditional inference trees shows that intended words having a frequency lower than words of the surrounding context are more likely to generate speech errors.


Speech errors Mandarin Frequency Random forests 



We thank the two anonymous reviewers for their constructive comments, which led to significant improvements of the paper. The second author would like to thank Dr. Chain-wu Lee for his continuous cutting-edge high-tech programming support in constructing all the corpora in Phonetics and Psycholinguistics lab at National Chengchi University. All remaining errors are our own. The research reported in this paper was funded to the second author by MOST three-year grant, MOST 98-2410-H-004-103-MY2, in Taiwan.


  1. Arnaud, Pierre J. 1999. Target—error resemblance in French word substitution speech errors and the mental lexicon. Applied Psycholinguistics 20 (2): 269–287.CrossRefGoogle Scholar
  2. Bastiaanse, Roelien, Martijn Wieling, and Nienke Wolthuis. 2015. The role of frequency in the retrieval of nouns and verbs in aphasia. Aphasiology 30: 1221–1239.CrossRefGoogle Scholar
  3. Berg, Thomas. 1987. A cross-linguistic comparison of slips of the tongue. Bloomington: Indiana University Press.Google Scholar
  4. Breiman, Leo. 2001. Random forests. Machine Learning 45 (1): 5–32.CrossRefGoogle Scholar
  5. Breiman, Leo, Jerome Friedman, Charles J. Stone, and Richard Olshen. 1984. Classification and regression trees. New York: Taylor & Francis.Google Scholar
  6. CKIP (Chinese Knowledge Information Processing Group). 1998. The content and illustration of Academica Sinica Corpus. Taipei: Academia Sinica.Google Scholar
  7. Cutler, Anne. 1982. The reliability of speech error data. In Slips of the tongue and language production, ed. Anne Cutler, 7–28. Amsterdam: Mouton.CrossRefGoogle Scholar
  8. Fay, David, and Anne Cutler. 1977. Malapropisms and the structure of the mental lexicon. Linguistic Inquiry 8: 505–520.Google Scholar
  9. Fromkin, Victoria. 1980. Errors in linguistic performance: Slips of the tongue, ear, pen, and hand. NY: Academic Press.Google Scholar
  10. Harley, Trevor, and Siobhan MacAndrew. 2001. Constraints upon word substitution speech errors. Journal of Psycholinguistic Research 30: 395–418.CrossRefGoogle Scholar
  11. Huang, Chu-Ren, Lung-Hao Lee, Qu Wei-guang, Jia-Fei Hong, and Yu. Shiwen. 2008. Quality assurance of automatic annotation of very large corpora: A study based on heterogeneous tagging systems. LREC 2008: 2725–2729.Google Scholar
  12. Jaeger, Jeri J. 2005. Kids’ slips: What young children’s slips of the tongue reveal about language development. Mahwah: Lawrence Erlbaum Associates.Google Scholar
  13. Kittredge, Audrey K., Gary S. Dell, Jay Verkuilen, and Myrna F. Schwartz. 2008. Where is the effect of frequency in word production? Insights from aphasic picture-naming errors. Cognitive Neuropsychology 25: 463–492.CrossRefGoogle Scholar
  14. Levelt, Willem J. 1989. Speaking: From intention to articulation. Cambridge, MA: MIT press.Google Scholar
  15. Levshina, Natalia. 2015. How to do linguistics with R: Data exploration and statistical analysis. Amsterdam: John Benjamins.CrossRefGoogle Scholar
  16. Ma, Wei-Yun, Chu-Ren, Huang. 2006. Uniform and effective tagging of a heterogeneous Giga-word corpus. In Proceedings of the 5th international conference on language resources and evaluation (LREC-5).Google Scholar
  17. Martin, Nadine, and Eleanor M. Saffran. 1997. Language and auditory-verbal short-term memory impairments: Evidence for common underlying processes. Cognitive Neuropsychology 14: 641–682.CrossRefGoogle Scholar
  18. Minkina, Irene, Nadine Martin, Kristie A. Spencer, and Diane L. Kendall. 2018. Links between short-term memory and word retrieval in aphasia. American Journal Speech Language Pathology 27 (1): 379–391.CrossRefGoogle Scholar
  19. Nickels, Lyndsey, and David Howard. 1994. A frequent occurrence? Factors affecting the production of semantic errors in aphasic naming. Cognitive Neuropsychology 11: 289–320.CrossRefGoogle Scholar
  20. Ting, Kai Ming. 2010. Precision and Recall. In Encyclopedia of machine learning, Claude, Sammut, Geoffrey I. Webb, (eds.). 781–781. Boston, MA: Springer US. Scholar
  21. Wan, I-Ping, Marc, Tang. 2018. A corpus study of lexical speech errors in Mandarin. Manuscript.Google Scholar
  22. Wijnen, Frank. 1992. Incidental word and sound errors in young speakers. Journal of Memory and Language 31: 734–755.CrossRefGoogle Scholar
  23. Wan, I-Ping, Ting, Jen. To appear. Semantic relationships in Mandarin speech errors. Taiwan Journal of Linguistics.Google Scholar

Copyright information

© Peking University Press 2020

Authors and Affiliations

  1. 1.Department of Linguistics and PhilologyUppsala UniversityUppsalaSweden
  2. 2.Research Center for Mind, Brain and Learning, Graduate Institute of Linguistics, National Chengchi UniversityTaipeiRepublic of China

Personalised recommendations