Abstract
Child language acquisition, one of Nature’s most fascinating phenomena, is to a large extent still a puzzle. Experimental evidence seems to support the view that early language is highly formulaic, consisting for the most part of frozen items with limited productivity. Fairly quickly, however, children find patterns in the ambient language and generalize them to larger structures, in a process that is not yet well understood. Computational models of language acquisition can shed interesting light on this process. This paper surveys various works that address language learning from data; such works are conducted in different fields, including psycholinguistics, cognitive science and computer science, and we maintain that knowledge from all these domains must be consolidated in order for a well-informed model to emerge. We identify the commonalities and differences between the various existing approaches to language learning, and specify desiderata for future research that must be considered by any plausible solution to this puzzle.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adriaans, P.: Language Learning from a Categorial Perspective. PhD thesis, Universiteit van Amsterdam (1992)
Adriaans, P.: Learning shallow context-free languages under simple distributions. In: Copestake, A., Vermeulen, K. (eds.) Algebras, Diagrams and Decisions in Language, Logic and Computation. CSLI/CUP, Stanford (2001)
Adriaans, P., Vervoort, M.: The EMILE 4.1 grammar induction toolbox. In: Adriaans, P.W., Fernau, H., van Zaanen, M. (eds.) ICGI 2002. LNCS (LNAI), vol. 2484, pp. 293–295. Springer, Heidelberg (2002)
Adriaans, P.W., van Zaanen, M.M.: Computational grammatical inference. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning. Studies in Fuzziness and Soft Computing, vol. 194, ch. 7. Springer, Heidelberg (2006)
Banko, M., Moore, R.C.: Part of speech tagging in context. In: COLING 2004: Proceedings of the 20th international conference on Computational Linguistics, Morristown, NJ, USA, p. 556. Association for Computational Linguistics (2004)
Bannard, C., Lieven, E.: Repetition and reuse in child language learning. In: Corrigan, R., Moravcsik, E., Ouali, H., Wheatley, K. (eds.) Formulaic Language. John Benjamins, Amsterdam (2009)
Bannard, C., Lieven, E., Tomasello, M.: Early grammatical development is piecemeal and lexically specific. Proceedings of the National Academy of Science 106(41), 17284–17289 (2009)
Bates, E., MacWhinney, B.: Competition, variation, and language learning. In: [46], ch. 6, pp. 157–193 (1987)
Berant, J., Gross, Y., Mussel, M., Sandbank, B., Edelman, S.: Boosting unsupervised grammar induction by splitting complex sentences on function words. In: Proceedings of the 31st Boston University Conference on Language Development, pp. 93–104. Cascadilla Press (2007)
Berman, R.A.: Between emergence and mastery: The long developmental route of language acquisition. In: Berman, R.A. (ed.) Language development across childhood and adolescence. Trends in Language Acquisition Research, vol. 3, pp. 9–34. John Benjamins, Amsterdam/Philadelphia (2004)
Bod, R.: An all-subtrees approach to unsupervised parsing. In: ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, Morristown, NJ, USA, pp. 865–872. Association for Computational Linguistics (2006a)
Bod, R.: Unsupervised parsing with U-DOP. In: Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), New York City, pp. 85–92. Association for Computational Linguistics (2006b)
Bod, R.: Is the end of supervised parsing in sight? In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 400–407. Association for Computational Linguistics (2007)
Bod, R.: Constructions at work or at rest? Cognitive Linguistics 20(1) (2009)
Bod, R., Sima’an, K., Scha, R. (eds.): Data-Oriented Parsing. CSLI Publications, Stanford (2003)
Borensztajn, G., Zuidema, W.: Bayesian model merging for unsupervised constituent labeling and grammar induction. ILLC Prepublication PP-2007-40, ILLC, University of Amsterdam (2007)
Borensztajn, G., Zuidema, J., Bod, R.: Children’s grammars grow more abstract with age — evidence from an automatic procedure for identifying the productive units of language. In: Proceedings of CogSci 2008 (2008)
Brodsky, P., Waterfall, H., Edelman, S.: Characterizing motherese: On the computational structure of child-directed language. In: Proceedings of the 29th Cognitive Science Society Conference. Cognitive Science Society (2007)
Brown, R.: A first language: the Early stages. Harvard University Press, Cambridge (1973)
Chang, F., Lieven, E., Tomasello, M.: Automatic evaluation of syntactic learners in typologically-different languages. Cognitive Systems Research 9(3), 198–213 (2008)
Chomsky, N.: Aspects of the theory of syntax. MIT Press, Cambridge (1965)
Chomsky, N.: Language and Mind. Harcourt Brace Juvanovich, New York (1968)
Chomsky, N.: Rules and representations. Behavioral and Brain Sciences 3, 1–61 (1980)
Chomsky, N.: Reflections on Language. Pantheon, New York (1975)
Church, K.W., Mercer, R.L.: Introduction to the special issue on computational linguistics using large corpora. Computational Linguistics 19(1), 1–24 (1993)
Da̧browska, E., Lieven, E.: Towards a lexically specific grammar of children’s question constructions. Cognitive Linguistics 16(3), 437–474 (2005)
Edelman, S., Waterfall, H.: Behavioral and computational aspects of language and its acquisition. Physics of Life Reviews 4(4), 253–277 (2007)
Freudenthal, D., Pine, J.M., Gobet, F.: Modelling the development of children’s use of optional infinitives in Dutch and English using MOSAIC. Cognitive Science 30, 277–310 (2006)
Freudenthal, D., Pine, J.M., Gobet, F.: Understanding the developmental dynamics of subject omission: the role of processing limitations in learning. Journal of Child Language 34(01), 83–110 (2007)
Freudenthal, D., Pine, J.M., Gobet, F.: Simulating the referential properties of Dutch, German, and English root infinitives in MOSAIC. Language Learning and Development 5, 1–29 (2009)
Kennedy, G.: An introduction to corpus linguistics. Addison Wesley, Reading (1998)
Klein, D., Manning, C.D.: Natural language grammar induction using a constituent-context model. In: Dietterich, T.G., Becker, S., Ghahramani, Z., Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) NIPS, pp. 35–42. MIT Press, Cambridge (2001)
Klein, D., Manning, C.D.: A generative constituent-context model for improved grammar induction. In: ACL, pp. 128–135 (2002)
Klein, D., Manning, C.D.: Corpus-based induction of syntactic structure: Models of dependency and constituency. In: ACL, pp. 478–485 (2004)
Klein, D., Manning, C.D.: Natural language grammar induction with a generative constituent-context model. Pattern Recognition 38(9), 1407–1419 (2005)
Kol, S., Nir, B., Wintner, S.: Acquisition of abstract slot-filler schemas: Computational evaluation. Presented at the COGSCI 2009 Workshop on Psychocomputational Models of Human Language Acquisition (2009)
Li, P., Farkas, I., MacWhinney, B.: Early lexical development in a self-organizing neural network. Neural Networks 17(8-9), 1345–1362 (2004)
Lieven, E., Behrens, H., Speares, J., Tomasello, M.: Early syntactic creativity: a usage-based approach. Journal of Child Language 30(2), 333–370 (2003)
Lieven, E., Salomo, D., Tomasello, M.: Two-year-old children’s production of multiword utterances: a usage-based analysis. Cognitive Linguistics 20(3), 481–507 (2009)
Lieven, E.V., Pine, J.M., Baldwin, G.: Lexically-based learning and early grammatical development. Journal of Child Language 24(1), 187–219 (1997)
MacWhinney, B.: The CHILDES Project: Tools for Analyzing Talk, 3rd edn. Lawrence Erlbaum Associates, Mahwah (2000)
MacWhinney, B.: Models of the emergence of language. Annual Review of Psychology 49, 199–227 (1998)
MacWhinney, B.: A multiple process solution to the logical problem of language acquisition. Journal of Child Language 31, 883–914 (2004a)
MacWhinney, B.: A unified model of language acquisition. In: Kroll, J., De Groot, A. (eds.) Handbook of bilingualism: Psycholinguistic approaches. Oxford University Press, Oxford (2004b)
MacWhinney, B.: Rules, rote, and analogy in morphological formations by Hungarian children. Journal of Child Language 2, 65–77 (1975)
MacWhinney, B. (ed.): Mechanisms of language acquisition. Lawrence Erlbaum Associates, Hillsdale (1987)
The emergence of language. In: MacWhinney, B. (ed.) Carnegie Mellon Symposia on Cognition. Lawrence Erlbaum Associates, Mahwah (1999)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn treebank. Computational Linguistics 19(2), 313–330 (1993)
McEnery, A., Wilson, A.: Corpus Linguistics. Edinburgh University Press, Edinburgh (1996)
Pinker, S.: The Language Instinct. William Morrow and Company, New York (1994)
Rowland, C.F., Fletcher, S.L., Freudenthal, D.: Repetition and reuse in child language learning. In: Behrens, H. (ed.) Corpora in Language Acquisition Research: History, methods, perspectives, pp. 1–24. John Benjamins, Amsterdam (2008)
Sagae, K., MacWhinney, B., Lavie, A.: Automatic parsing of parent-child interactions. Behavior Research Methods, Instruments, and Computers 36, 113–126 (2004)
Sagae, K., Davis, E., Lavie, A., MacWhinney, B., Wintner, S.: High-accuracy annotation and parsing of CHILDES transcripts. I. In: Proceedings of the ACL-2007 Workshop on Cognitive Aspects of Computational Language Acquisition, Prague, Czech Republic, pp. 25–32. Association for Computational Linguistics (2007)
Sagae, K., Davis, E., Lavie, A., MacWhinney, B., Wintner, S.: Morphosyntactic annotation of CHILDES transcripts. Journal of Child Language (to appear)
Seginer, Y.: Fast unsupervised incremental parsing. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 384–391. Association for Computational Linguistics (2007)
Smith, N.A., Eisner, J.: Annealing techniques for unsupervised statistical language learning. In: ACL 2004: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, p. 486. Association for Computational Linguistics (2004)
Solan, Z., Horn, D., Ruppin, E., Edelman, S.: Unsupervised learning of natural languages. Proceedings of the National Academy of Sciences of the United States of America 102(33), 11629–11634 (2005)
Stolcke, A., Omohundro, S.M.: Inducing probabilistic grammars by bayesian model merging. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS, vol. 862, pp. 106–118. Springer, Heidelberg (1994)
Tomasello, M.: On the different origins of symbols and grammars. In: Christiansen, M.H., Kirby, S. (eds.) Language Evolution. Studies in the Evolution of Language, ch. 6, pp. 94–110. Oxford University Press, Oxford (2003)
Tomasello, M.: Acquiring linguistic constructions. In: Kuhn, D., Siegler, R. (eds.) Handbook of Child Psychology, pp. 255–298. Wiley, New York (2006)
Tomasello, M.: Language is not an instinct. Cognitive Development 10, 131–156 (1995)
van Zaanen, M.: Implementing alignment-based learning. In: Adriaans, P.W., Fernau, H., van Zaanen, M. (eds.) ICGI 2002. LNCS (LNAI), vol. 2484, pp. 312–314. Springer, Heidelberg (2002)
van Zaanen, M.: ABL: alignment-based learning. In: Proceedings of the 18th conference on Computational linguistics, Morristown, NJ, USA, pp. 961–967. Association for Computational Linguistics (2000)
van Zaanen, M.: Bootstrapping Structure into Language: Alignment-Based Learning. PhD thesis, University of Leeds, Leeds, UK (2002a)
van Zaanen, M., Adriaans, P.: Alignment-Based Learning versus EMILE: A comparison. In: Proceedings of the Belgian-Dutch Conference on Artificial Intelligence (BNAIC), Amsterdam, The Netherlands, pp. 315–322 (2001)
van Zaanen, M., Geertzen, J.: Problems with evaluation of unsupervised empirical grammatical inference systems. In: Clark, A., Coste, F., Miclet, L. (eds.) ICGI 2008. LNCS (LNAI), vol. 5278, pp. 301–303. Springer, Heidelberg (2008)
Vogt, P., Lieven, E.: Verifying theories of language acquisition using computer models of language evolution. In: Adaptive Behavior Special issue on Language Evolution: Computer models for Empirical Data (forthcoming)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wintner, S. (2010). Computational Models of Language Acquisition. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-12116-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12115-9
Online ISBN: 978-3-642-12116-6
eBook Packages: Computer ScienceComputer Science (R0)