Cognitive Computation

, Volume 2, Issue 4, pp 280–284

A Connectionist Study on the Interplay of Nouns and Pronouns in Personal Pronoun Acquisition

Authors

    • Department of Physics and School of Computer ScienceMcGill University
Article

DOI: 10.1007/s12559-010-9050-7

Cite this article as:
Kaznatcheev, A. Cogn Comput (2010) 2: 280. doi:10.1007/s12559-010-9050-7

Abstract

Cascade-correlation learning is used to model pronoun acquisition in children. The cascade-correlation algorithm is a feed-forward neural network that builds its own topology from input and output units. Personal pronoun acquisition is an interesting non-linear problem in psychology. A mother will refer to her son as you and herself as me, but the son must infer for himself that when he speaks to his mother, she becomes you and he becomes me. Learning the shifting reference of these pronouns is a difficult task that most children master. We show that learning of two different noun-and-pronoun addressee patterns is consistent with naturalistic studies. We observe a surprising factor in pronoun reversal: increasing the amount of exposure to noun patterns can decrease or eliminate reversal errors in children.

Keywords

Computational linguisticsCascade-correlation neural networksPersonal pronoun acquisitionComputational model

Introduction

The ability to learn quickly and effectively is one of humankind’s greatest strengths. It allows us to adapt, create, communicate, and hypothesize in a changing world. However, the processes of learning are not well understood. From child to elder, we are constantly expanding our minds, but no one has a full understanding of how. One of the earliest and arguably most important endeavors of the developing brain is the acquisition of language. In a matter of years, a child transforms from cries and coos to full sentences. An exciting stage along the voyage is the acquisition of personal pronouns.

Learning the proper usage of pronouns provides a myriad of interesting problems. Unlike the static subjects of proper (and common) nouns, the referent of pronouns is not fixed; it shifts with conversation roles. For nouns (both proper and common), a child can simply imitate her parent, but for pronouns, a child must realize that she needs to reverse the pronouns in order to communicate correctly. A father will refer to his daughter as you and himself as me, but the daughter must infer for herself that when she speaks to her father, he becomes you and she becomes me. If a child simply imitates what she hears, she will refer to herself as you and to everyone else as me—a reversal error.

Amazingly, most children learn the correct use of personal pronouns by the age of 3 years [4], with few reversal errors [2, 3]. Those few that do succumb to reversal errors are often plagued by them for months [2, 11, 16]. Many early researchers [7, 15] assumed that speech not addressed to children is unimportant in language acquisition, but more recently, Oshima-Takane [10] proposed that children can learn the rules of pronouns from speech not addressed to them. In support of her theory, Oshima-Takane observed that second-born acquired personal pronouns earlier than first-born children [12], presumably because of more exposure to pronoun use in speech between their parents and older siblings [20]. This provided two clear training patterns: addressee and non-addressee. The former refers to when the parents talks to the child, referring to self as me and the child as you. The latter refers to the mother and father communicating with each other and the child simply overhearing. The two patterns were studied through connectionist simulations [20, 22, 14] and built support for Oshima-Takane’s theory.

To better understand the learning involved in the acquisition of personal pronouns, this paper extends the connectionist models to deal with the mixed use of personal pronouns and proper (or common) nouns. Children often avoid making pronoun errors by replacing them with proper names [13]. Although previous work [20] hinted at the involvement of proper nouns during times of confusion (not knowing whether to use me or you), a simulation or in-depth analysis of confusion was not conducted. This study trained neural networks in conditions that involved exposure to both personal pronouns and proper nouns in varying proportions and then tested how capable the network was of using proper personal pronouns.

Method

The cascade-correlation algorithm [8] serves as the foundation for this study. More conventional neural-network algorithms, such as back-propagation, require a full specification of the network. Cascade correlation begins with a minimal graph—just the input and output units—and constructs its own topology by recruiting new hidden units as it learns. The learning stops when all output activations are within a score threshold of their targets for all training patterns. Due to the novel approach to construction and the error minimization techniques in cascade correlation, the algorithm can typically learn 10–50 times faster than back propagation [20]. Due to the biologically plausible hidden-unit recruitment, cascade correlation and its derivatives have found applications in learning everything from mathematical structures such as primes [6] and abstract groups [9, 17] to psychological phenomena like the balance-scale [18, 21, 5], false-belief [1], and transitivity tasks [19].

The key enhancement over previous research [20, 22, 14] is the explicit incorporation of proper nouns into the parent–parent and parent–child interactions. A traditional approach is to represent the output of the neural net with a single output node that reads either +0.5 for me or −0.5 for you. The values near 0 are generally considered as a state of ’confusion’ for the network, where it supplies neither answer [20]. In our model, this confusion is solidified by designating 0 as the noun neighborhood 1. When the network is confused about which pronoun to use, it behaves like a child [13] and refers to the addressee by a noun such as mommy or daddy. Due to the third choice available in the output, the usual score threshold of 0.4 has to be reduced to 0.3 to lower the amount of overlap between the three areas and provide more reliable results. We interpret readings of the output node in the 0.2 to 0.5 range as me, −0.2 to −0.5 as you, and −0.15 to 0.15 as a noun.

The availability of an alternative to you and me as forms of addressing individuals provides new training patterns for the network. These learning patterns are presented in Table 1. The classic approach [20] to addressee and non-addressee conditions is augmented with two different severities of proper or common noun use. The standard extension (from now on, the standard learning pattern) comprises the first six patterns of the addressee and non-addressee conditions and involves addressing the third family member in a discussion between the other two by that person’s name or noun-title (such as mommy or daddy). An example might be the mother talking to her son, pointing to the father, and saying daddy. The extended learning patterns add noun use in conversations only involving two people and comprise the other four patterns of the addressee and non-addressee conditions. A common example of such noun use is baby-talk; the mother might refer to her son as baby-boy or to herself as mommy. In the non-addressee extended case, parents referring to each other by name (instead of you and me) are a good example of noun use. The standard pronoun uses introduced by Shultz et al. [20] are maintained as part of the training patterns. The standard learning patterns comprise the first six patterns of the addressee and non-addressee conditions. The extended learning patterns are all 10 patterns for both conditions.
Table 1

The learning patterns in phase 1

Condition

Speaker

Addressee

Referent

Phrase

Addressee

Father

Child

father

me

Mother

Other

Child

You

Mother

Mother

Me

Father

Other

Child

You

Extended

Father

Father

Other

Child

Other

Mother

Mother

Other

Child

Other

Non-addressee

Father

Mother

Father

Me

Child

Other

Mother

You

Mother

Father

Mother

Me

Child

Other

Mother

You

Extended

Father

Mother

Father

Other

Mother

Other

Mother

Father

Mother

Other

Father

Other

The training procedure follows a two-phase method. The first phase is supervised learning; a combination of addressee and non-addressee patterns is used to initially train the network. This is an analogy to what a child might learn while he listens to his parents speak between each other and to him. In the second phase, we combine reinforcement and supervised learning by adding test patterns 2 from Table 2. The test patterns represent cases in which the child speaks and the accuracy of his pronoun and noun use is tested. The procedure is repeated in separate trials for the standard and extended learning patterns and for two different addressee to non-addressee proportions that correspond to a usual first-born’s environment (9:1—mostly being addressed by the parents, sometimes overhearing parent-to-parent speech) and second-born’s (5:5—being addressed as often as overhearing parent-to-parent or parent-to-sibling speech) [20]. The output activations are averaged over 100 networks to account for variation in network topology and initial weights.
Table 2

The additional test patterns in phase 2

Speaker

Addressee

Referent

Correct phrase

Child

Father

Child

Me

Mother

Father

Mother

Other

Mother

Father

Father

Father

You

Mother

Mother

Results

Oshima-Takane’s [12] observation of faster learning in second-born (5:5) children is reflected by the networks in both the standard 3 and extended case. The mean results from 100 networks are presented (along with their standard errors) in Table 3 for the standard learning patterns and in Table 4 for the extended learning patterns. The faster learning is reflected by fewer epochs required to learn the training patterns. The difference in learning rate, especially in phase 2, is more evident in the extended training pattern, where the first child learns in 111 ± 3 epochs and the second in 70 ± 2.
Table 3

Results for the standard learning patterns

 

Required epochs

Recruited units

Phase 1

Phase 2

Phase 1

Phase 2

First child

176 ± 3

198 ± 6

2.30 ± 0.05

4.40 ± 0.09

Second child

88 ± 2

202 ± 7

1.86 ± 0.03

4.0 ± 0.1

Table 4

Results for the extended learning patterns

 

Required epochs

Recruited units

Phase 1

Phase 2

Phase 1

Phase 2

First child

105 ± 1

111 ± 3

5.0 ± 0.1

9.3 ± 0.2

Second child

88.9 ± 0.9

70 ± 2

2.93 ± 0.04

5.2 ± 0.1

A surprising result is the improved performance on the extended pattern compared to the standard one. For the extended pattern, more hidden units are recruited but fewer epochs are needed to train the network. A potential explanation for the increased speed is the presence of 67% more patterns, effectively making each epoch 67% longer. However, cascade correlation uses batch learning, thus there is the same number of weight adjustments regardless of the number of patterns in a batch. Further, even with the epoch time normalized by the number of training patterns, the extended pattern networks seem to learn phase 2 faster. The larger number of hidden units and shorter learning time also signifies a much more aggressive recruitment of hidden units. This suggests that the extended training patterns allow the network to quickly realize the non-linearity of the problem and adjust accordingly.

The most peculiar result is the phase 2 activation patterns of the neural networks. In Figs. 1 and 2, patterns where the network should use me (Table 2: rows 1 and 2) are presented in red, other (Table 2: rows 3 and 4) in black, and you (Table 2: rows 5 and 6) in blue—the line thickness corresponds to the standard error from averaging the networks. What the network actually outputs in a trial is read from the y-axis; the network says me in the 0.2 to 0.5 range, a noun in the −0.15 to 0.15 range and you in the −0.2 to −0.5 range. The yellow vertical line marks the average number of epochs required to be within error threshold of the correct answers (as listed in Table 3 and 4). With the standard patterns, an interesting type of reversal error can be seen. For the first 50 epochs in 9:1 and 25 epochs in 5:5, the networks tend to make the standard pronoun reversal mistake and refer to themselves as you. However, instead of the usual flip-side of referring to you as me, the networks instead refer to a person outside of the conversation (someone who should be referred to with a noun) as me until epoch 30 for the 9:1 patterns and epoch 175 for the 5:5 patterns. This is a very counterintuitive result.
https://static-content.springer.com/image/art%3A10.1007%2Fs12559-010-9050-7/MediaObjects/12559_2010_9050_Fig1_HTML.gif
Fig. 1

Output activations for standard pattern. Learning patterns are a combination of Table 1: rows 1–6, Table 1: rows 11–16, and Table 2. Test patterns where the network should use me (Table 2: rows 1 and 2) are presented in red, other (Table 2: rows 3 and 4) in black, and you (Table 2: rows 5 and 6) in blue—the line thickness corresponds to the standard error from averaging 100 networks. (Color figure online)

https://static-content.springer.com/image/art%3A10.1007%2Fs12559-010-9050-7/MediaObjects/12559_2010_9050_Fig2_HTML.gif
Fig. 2

Output activations for extended pattern. Learning patterns are a combination of all of Tables 1 and 2. Test patterns where the network should use me (Table 2: rows 1 and 2) are presented in red, other (Table 2: rows 3 and 4) in black, and you (Table 2: rows 5 and 6) in blue—the line thickness corresponds to the standard error from averaging 100 networks. (Color figure online)

The activation patterns for the extended patterns are much less surprising, and match theory [10] well. At the initial learning part of phase 2, the networks tend to favor using nouns in order to avoid reversal errors (remain in the −0.15 to 0.15 for all patterns). After around 70 epochs, both the 9:1 and 5:5 networks start using personal pronouns correctly—in agreement with naturalistic studies.

Discussion

Cascade-correlation learning of two different noun-and-pronoun addressee patterns was consistent with naturalistic studies of pronoun acquisition in children and illuminated a surprising factor in pronoun reversal. The neural nets where consistent with Oshima-Takane’s [12] observation that second-born children learn pronoun use faster than first-borns. In both noun-to-pronoun combinations, the networks modeling the second child (5:5) learnt proper pronoun use faster than the first child (9:1). The surprising result was that an increase in the amount and variety of noun use can lead to a decrease in reversal errors. With nouns used only to refer to a person outside of the conversation, me-reversal and noun-reversal were prevalent. When patterns that used nouns in conversations to refer to the speaker or addressee where added, the reversal was eliminated and the networks displayed a behavior consistent with real children [13]; the networks avoided pronoun use when they were unsure of which pronoun was the correct choice. Paradoxically, baby-talk might help children to learn proper pronoun use.

These results suggest future directions for both experiments and computational models. Experimental research could test if baby-talk contributes to a decrease in pronoun reversal errors. Further, it is important to discover which is the more salient factor in the decrease of reversal errors: (a) the ratio of noun to pronoun use or (b) the specific use of nouns to refer to the speaker or addressee. Our computational model predicts that baby-talk will decrease pronoun reversal errors and the specific use of nouns to refer to the speaker or addressee will be important to this effect. From a computational modeling perspective, the ambiguity of the other training patterns allows the incorporation of third-person pronouns or other parts of language development. The presented model only simulated pronoun acquisition, and modeled nouns as confusion. A more holistic approach and future direction is to consider a full model of both proper noun and pronoun acquisition.

This paper was a testament to an interesting correlation between seemingly unrelated parts of learning. By understanding how young children learn, we can start to better understand learning in general. With further research in the field of learning, both in humans and machines, we can one day hope to unravel the mystery of learning and maybe even thought itself.

Footnotes
1

A possible concern is that placing the noun in the unstable 0 region of the sigmoid makes nouns artificially difficult to learn—a contradiction to the intuition that nouns should be much easier to learn than personal pronouns. However, it should be noted that we are simulating pronoun acquisition and not noun acquisition. Thus, the primary goal of noun use is to model confusing inputs with respect to pronoun acquisition. Nouns do not tell the learner which pronoun to use in a given setting and can be viewed as confusion. Since a noun is no more similar to me than to you, it is neutral, hence the 0.

 
2

These patterns are a way to test the network’s learning thus far, and the feedback from the patterns provides more learning for the network, producing the reinforcement learning effect.

 
3

Technically, in phase 2 of standard learning, both the 9:1 and 5:5 networks learn at the same rate. However, due to the significant increase in learning of phase 1 by the 5:5 networks, second-borns learn faster overall.

 

Acknowledgments

We are grateful to Thomas R. Shultz for providing LISP code for the cascade-correlation algorithm, a thorough introduction to the subject and comments of an earlier draft and to Victoria Ly and Vincent G. Berthiaume for helpful comments.

Copyright information

© Springer Science+Business Media, LLC 2010