From Cues to Categories: A Computational Study of Children’s Early Word Categorization

  • Fatemeh Torabi Asr
  • Afsaneh Fazly
  • Zohreh Azimifar
Part of the Theory and Applications of Natural Language Processing book series (NLP)


Young children exhibit knowledge of abstract syntactic categories of words, such as noun and verb. A key research question is concerned with the type of information that children might use to form such categories. We use a computational model to provide insights into the (differential and cooperative) role of various information sources (namely, distributional, morphological, phonological, and semantic properties of words) in children’s early word categorization. Specifically, we use an unsupervised incremental clustering algorithm to learn categories of words using different combinations of these information sources, and determine the role of each type of cue by evaluating the quality of the resulting categories. We conduct two types of experiments: First, we compare the categories learned by our model to a set of gold-standard part of speech (PoS) tags, such as verb and noun. Second, we perform an experiment which simulates a particular language task similar to what performed by children, as reported in a psycholinguistic study by Brown (J Abnor Soc Psychol 55(1):1–5, 1957). Our results suggest that different categories of words may be recognized by relying on different types of cues. The results also indicate the importance of knowledge of word meanings for their syntactic categorization, and vice versa: Addition of semantic information leads to the construction of categories with a better match to the gold-standard parts of speech. On the other hand, our model (like children) can predict the semantic class of a word (e.g., action or object) by drawing on its learned knowledge of the word’s syntactic category.


Semantic Feature Test Word Syntactic Category Word Categorization Countable Noun 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Alishahi, A., & Chrupała, G. (2009) Lexical category acquisition as an incremental process. In CogSci-2009 Workshop on Psychocomputational Models of Human Language Acquisition, Amsterdam.Google Scholar
  2. 2.
    Asr, F. T., Fazly, A. & Azimifar, Z. (2010). The effect of word-internal properties on syntactic categorization: A computational modeling approach. In Proceedings of the 32nd Annual Conference of the Cognitive Science Society, Portland, USA.Google Scholar
  3. 3.
    Berko, G. J. (1958). The child’s learning of English morphology. Word, 14, 150–177.Google Scholar
  4. 4.
    Brown, R. (1957). Linguistic determinism and the part of speech. Journal of Abnormal and Social Psychology, 55(1), 1–5.CrossRefGoogle Scholar
  5. 5.
    Cartwright, T., & Brent, M. (1997). Syntactic categorization in early language acquisition: Formalizing the role of distributional analysis. Cognition, 63(2), 121–170.CrossRefGoogle Scholar
  6. 6.
    Chang, F., Lieven, E., & Tomasello, M. (2008). Automatic evaluation of syntactic learners in typologically-different languages. Cognitive Systems Research, 9(3), 198–213.CrossRefGoogle Scholar
  7. 7.
    Chrupała, G., & Alishahi, A. (2010). Online entropy-based model of lexical category acquisition. In Proceedings of 14th Conference on Computational Natural Language Learning (CoNLL) (pp. 182–191), Uppsala, Sweden.Google Scholar
  8. 8.
    Clark, A. (2000). Inducing syntactic categories by context distribution clustering. In Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning (Vol. 7, pp. 91–94). Morristown: Association for Computational Linguistics.Google Scholar
  9. 9.
    Fazly, A., Alishahi, A., & Stevenson, S. (2008). A probabilistic incremental model of word learning in the presence of referential uncertainty. In Proceedings of the 30th Annual Conference of the Cognitive Science Society, Washington, DC.Google Scholar
  10. 10.
    Fellbaum, C. (1998). WordNet: An electronic lexical database. Cambridge: The MIT press. ISBN 026206197X.MATHGoogle Scholar
  11. 11.
    Gelman, S., & Taylor, M. (1984). How two-year-old children interpret proper and common names for unfamiliar objects. Child Development, 55(4), 1535–1540.CrossRefGoogle Scholar
  12. 12.
    Gerken, L., Wilson, R., & Lewis, W. (2005). Infants can use distributional cues to form syntactic categories. Journal of Child Language, 32(02), 249–268.CrossRefGoogle Scholar
  13. 13.
    Goldwater, S., Griffiths, T. L., & Johnson, M. (2009). A bayesian framework for word segmentation: Exploring the effects of context. Cognition, 112(1), 21–54.CrossRefGoogle Scholar
  14. 14.
    Harm, M. (2002). Building large scale distributed semantic feature sets with WordNet (Tech. Rep. No. PDP. CNS. 02.01). Carnegie Mellon University, Center for the Neural Basis of Cognition, Pittsburgh, PA.Google Scholar
  15. 15.
    Kaplan, F., Oudeyer, P., & Bergen, B. (2008). Computational models in the debate over language learnability. Infant and child development, 17(1), 55–80.CrossRefGoogle Scholar
  16. 16.
    Kemp, N., Lieven, E., Tomasello, M. (2005). Young children’s knowledge of the “determiner” and “adjective” categories. Journal of Speech, Language, and Hearing Research, 48(3), 592–602.CrossRefGoogle Scholar
  17. 17.
    Kipper-Schuler, K. (2005). VerbNet: A broad-coverage, comprehensive verb lexicon. Ph.D. thesis, University of Pennsylvania, Philadelphia.Google Scholar
  18. 18.
    MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk, volume 2: The database (3rd ed.). MahWah: Lawrence Erlbaum Associates.Google Scholar
  19. 19.
    Mintz, T. (2003). Frequent frames as a cue for grammatical categories in child directed speech. Cognition, 90(1), 91–117.CrossRefGoogle Scholar
  20. 20.
    Monaghan, P., Christiansen, M., & Chater, N. (2007). The phonological-distributional coherence hypothesis: Cross-linguistic evidence in language acquisition. Cognitive Psychology, 55(4), 259–305.CrossRefGoogle Scholar
  21. 21.
    Naigles, L. (1990). Children use syntax to learn verb meanings. Journal of Child Language, 17, 357–374.CrossRefGoogle Scholar
  22. 22.
    Onnis, L. & Christiansen, M. (2008). Lexical categories at the edge of the word. Cognitive Science, 32(1), 184–221.CrossRefGoogle Scholar
  23. 23.
    Parisien, C., Fazly, A., & Stevenson, S. (2008). An incremental Bayesian model for learning syntactic categories. In Proceedings of the Twelfth Conference on Computational Natural Language Learning (pp. 89–96). New York: Association for Computational Linguistics.Google Scholar
  24. 24.
    Pearl, L. (2009). Using computational modeling in language acquisition research. Experimental Methods in Language Acquisition Research, 163–184.Google Scholar
  25. 25.
    Redington, M., Chater, N., & Finch, S. (1998). Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science, 22(4), 425–469.CrossRefGoogle Scholar
  26. 26.
    Samuelson, L. & Smith, L. (1999). Early noun vocabularies: do ontology, category structure and syntax correspond? Cognition, 73(1), 1–33.CrossRefGoogle Scholar
  27. 27.
    Schütze, H. (1995). Distributional part-of-speech tagging. In Proceedings of the Seventh Conference on European Chapter of the Association for Computational Linguistics (pp. 141–148). San Francisco: Morgan Kaufmann Publishers Inc.Google Scholar
  28. 28.
    Theakston, A. L., Lieven, E. V., Pine, J. M., & Rowland, C. F. (2001). The role of performance limitations in the acquisition of verb–argument structure: An alternative account. Journal of Child Language, 28, 127–152.CrossRefGoogle Scholar
  29. 29.
    Wilson, M. (1988). MRC psycholinguistic database: Machine-usable dictionary, version 2.00. Behavior Research Methods, 20(1), 6–10.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Fatemeh Torabi Asr
    • 1
  • Afsaneh Fazly
    • 2
  • Zohreh Azimifar
    • 1
  1. 1.Computer Science and Engineering DepartmentShiraz UniversityShirazIran
  2. 2.School of Computer ScienceInstitute for Research in Fundamental Sciences (IPM)TehranIran

Personalised recommendations