Skip to main content

Using Corpus Statistics to Evaluate Nonce Words

  • Conference paper
  • 466 Accesses

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 8607)

Abstract

Nonce words are widely used in linguistic research to evaluate areas such as the acquisition of vowel harmony and consonant voicing, naturalness judgment of loanwords, and children’s acquisition of morphemes. Researchers usually create lists of nonce words intuitively by considering the phonotactic features of the target languages. In this study, a corpus of Turkish orthographic representations is used to propose a measure for the nonce word appropriateness for linearly concatenative languages. The conditional probabilities of orthographic co-occurrences and pairwise vowel collocations within the same word boundaries are used to evaluate a list of nonce words in terms of whether they would be rejected, moderately accepted or fully accepted as novel words. A group of 50 Turkish native speakers was asked to judge the same list of nonce words on how native-like the words sound. Both the model and the participants displayed similar results.

Keywords

  • Nonce words
  • Orthographic representations
  • Conditional probabilities

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-662-44116-9_3
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   44.99
Price excludes VAT (USA)
  • ISBN: 978-3-662-44116-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   59.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hammond, M.: Gradience, phonotactics, and the lexicon in English phonology. Int. J. of English Studies 4, 1–24 (2004)

    Google Scholar 

  2. Anshen, F., Aronoff, M.: Producing morphologically complex words. Linguistics 26, 641–655 (1988)

    CrossRef  Google Scholar 

  3. Dabrowska, E.: Low-level schemas or general rules? The role of diminutives in the acquisition of Polish case inflections. Language Sciences 28, 120–135 (2006)

    CrossRef  Google Scholar 

  4. MacDonald, S., Ramscar, M.: Testing the distributional hypothesis: The influence of context on judgements of semantic similarity. In: Proc. of the 23rd Annual Conference of the Cognitive Science Society. University of Edinburgh (2001)

    Google Scholar 

  5. Pycha, A., Novak, P., Shosted, R., Shin, E.: Phonological rule-learning and its implications for a theory of vowel harmony. In: Garding, G., Tsujimura, M. (eds.) Proc. of WCCFL, vol. 22, pp. 423–435 (2003)

    Google Scholar 

  6. Kawahara, S.: OCP is active in loanwords and nonce words: Evidence from naturalness judgment studies. Lingua (to appear)

    Google Scholar 

  7. Albright, A.: From clusters to words: Grammatical models of nonce word acceptability. Handout of talk presented at 82nd LSA, Chicago (January 3, 2008)

    Google Scholar 

  8. Shademan, S.: From clusters to words: Grammatical models of nonce word acceptability. Grammar and Analogy in Phonotactic Well-formedness Judgments. Ph. D. thesis, University of California, Los Angeles (2007)

    Google Scholar 

  9. Hay, J., Pierrehumbert, J., Beckman, M.: Speech perception, well-formedness and the statistics of the lexicon. In: Local, J., Ogden, R., Temple, R. (eds.) Phonetic Interpretation: Papersbin Laboratory Phonology VI. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  10. Frisch, S.A., Zawaydeh, B.A.: The psychological reality of OCP-Place in Arabic. Language 77, 91–106 (2001)

    CrossRef  Google Scholar 

  11. Koo, H., Callahan, L.: Tier-adjacency is not a necessary condition for learning phonotactic dependencies. Language and Cognitive Processes 77, 1–8 (2011)

    Google Scholar 

  12. Finley, S.: Testing the limits of long-distance learning: learning beyond a three-segment window. Cognitive Science 36, 740–756 (2012)

    CrossRef  Google Scholar 

  13. Treiman, R., Kessler, B., Knewasser, S., Tincoff, R., Bowman, M.: English speakers sensitivity to phonotactic patterns. In: Broe, M.B., Pierrehumbert, J. (eds.) Papers in Laboratory Phonology V: Acquisition and the Lexicon, pp. 269–282. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  14. Goldsmith, J., Riggle, J.: Information theoretic approaches to phonological structure: the case of Finnish vowel harmony. Natural Language & Linguistic Theory (to appear)

    Google Scholar 

  15. Say, B., Zeyrek, D., Oflazer, K., Özge, U.: Development of a corpus and a treebank for present-day written Turkish. In: Proc. of the Eleventh International Conference of Turkish Linguistics (2002)

    Google Scholar 

  16. Göksel, A., Kerslake, C.: Turkish: A Comprehensive Grammar. Routledge, London (2005)

    CrossRef  Google Scholar 

  17. Lewis, G.: Turkish Grammar, 2nd edn. University Press, Oxford (2000)

    Google Scholar 

  18. Kılıç, Ö., Bozşahin, C.: Semi-supervised morpheme segmentation without morphological analysis. In: Pro. of the LREC 2012 Workshop on Language Resources and Technologies for Turkic Languages, Istanbul, Turkey (2012)

    Google Scholar 

  19. Yatbaz, M.A., Yuret, D.: Unsupervised morphological disambiguation using statistical language models. In: Pro. of the NIPS 2009 Workshop on Grammar Induction, Representation of Language and Language Learning, Whistler, Canada (2009)

    Google Scholar 

  20. Aslin, R.N., Saffran, J.R., Newport, E.L.: Computation of conditional probability statistics by human infants. Psychological Science 9, 321–324 (1998)

    CrossRef  Google Scholar 

  21. Gomez, R.L.: Variability and detection of invariant structure. Psychological Science 13, 431–436 (2002)

    CrossRef  Google Scholar 

  22. Kaschak, M.P., Saffran, J.R.: Idiomatic syntactic constructions and language learning. Cognitive Science 30, 43–63 (2006)

    CrossRef  Google Scholar 

  23. Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Tran. on Speech and Language Processing 4(1) (2007)

    Google Scholar 

  24. Bernhard, D.: Unsupervised morphological segmentation based on segment predictability and word segments alignment. In: Proc. of 2nd Pascal Challenges Workshop, pp. 19–24 (2006)

    Google Scholar 

  25. Demberg, V.: A language-independent unsupervised model for morphological segmentation. Ann. Meet. of Assoc. for Computational Linguistics 45(1), 920–927 (2007)

    Google Scholar 

  26. Debrowska, E.: The effects of frequency and neighbourhood density on adult native spakers’ productivity with Polish case inflections: An empirical test of usafe-based approaches to morphology. Memory and Language 58, 931–951 (2008)

    CrossRef  Google Scholar 

  27. Baayen, R.H., Dijkstra, T., Schreuder, R.: Singulars and plurals in Dutch: Evidence for a parallel dual route model. Memory and Language 37, 94–117 (1997)

    CrossRef  Google Scholar 

  28. Reeder, P.A., Newport, E.L., Aslin, R.N.: From shared contexts to syntactic categories: The role of distributional information in learning linguistic form-classes. Cognitive Psychology 66, 30–54 (2013)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kılıç, Ö. (2014). Using Corpus Statistics to Evaluate Nonce Words. In: Colinet, M., Katrenko, S., Rendsvig, R.K. (eds) Pristine Perspectives on Logic, Language, and Computation. ESSLLI ESSLLI 2013 2012. Lecture Notes in Computer Science, vol 8607. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44116-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-44116-9_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-44115-2

  • Online ISBN: 978-3-662-44116-9

  • eBook Packages: Computer ScienceComputer Science (R0)