Predicting Speech Errors in Mandarin Based on Word Frequency
- 198 Downloads
This paper investigates the effect of word frequency on the occurrence of speech errors in Mandarin. A corpus of 390 speech errors along with their surrounding linguistic context was gathered. The information of word frequency was extracted from the Academia Sinica Corpus. Our analysis with a computational classifier based on conditional inference trees shows that intended words having a frequency lower than words of the surrounding context are more likely to generate speech errors.
KeywordsSpeech errors Mandarin Frequency Random forests
We thank the two anonymous reviewers for their constructive comments, which led to significant improvements of the paper. The second author would like to thank Dr. Chain-wu Lee for his continuous cutting-edge high-tech programming support in constructing all the corpora in Phonetics and Psycholinguistics lab at National Chengchi University. All remaining errors are our own. The research reported in this paper was funded to the second author by MOST three-year grant, MOST 98-2410-H-004-103-MY2, in Taiwan.
- Berg, Thomas. 1987. A cross-linguistic comparison of slips of the tongue. Bloomington: Indiana University Press.Google Scholar
- Breiman, Leo, Jerome Friedman, Charles J. Stone, and Richard Olshen. 1984. Classification and regression trees. New York: Taylor & Francis.Google Scholar
- CKIP (Chinese Knowledge Information Processing Group). 1998. The content and illustration of Academica Sinica Corpus. Taipei: Academia Sinica.Google Scholar
- Fay, David, and Anne Cutler. 1977. Malapropisms and the structure of the mental lexicon. Linguistic Inquiry 8: 505–520.Google Scholar
- Fromkin, Victoria. 1980. Errors in linguistic performance: Slips of the tongue, ear, pen, and hand. NY: Academic Press.Google Scholar
- Huang, Chu-Ren, Lung-Hao Lee, Qu Wei-guang, Jia-Fei Hong, and Yu. Shiwen. 2008. Quality assurance of automatic annotation of very large corpora: A study based on heterogeneous tagging systems. LREC 2008: 2725–2729.Google Scholar
- Jaeger, Jeri J. 2005. Kids’ slips: What young children’s slips of the tongue reveal about language development. Mahwah: Lawrence Erlbaum Associates.Google Scholar
- Levelt, Willem J. 1989. Speaking: From intention to articulation. Cambridge, MA: MIT press.Google Scholar
- Ma, Wei-Yun, Chu-Ren, Huang. 2006. Uniform and effective tagging of a heterogeneous Giga-word corpus. In Proceedings of the 5th international conference on language resources and evaluation (LREC-5).Google Scholar
- Wan, I-Ping, Marc, Tang. 2018. A corpus study of lexical speech errors in Mandarin. Manuscript.Google Scholar
- Wan, I-Ping, Ting, Jen. To appear. Semantic relationships in Mandarin speech errors. Taiwan Journal of Linguistics.Google Scholar