1 Introduction

I seize little Isabel and swing her around, thereby making her laugh and breaking a vase. Asked about this I might say, ‘The goal of my actions was not to break the vase but only to make her laugh, of course.’ In talking about goals in this way I am not talking about mental representations: I am talking about actual and possible outcomes of an action, things which did or might have happened.

In general, a goal is an outcome to which an action is directed. For a process to track a goal of an action is for how that process unfolds to nonaccidentally depend in some way on whether that outcome is indeed a goal of the action. (And for a subject to track a goal is for some goal-tracking process to occur in her.)

My focus in this paper is pure goal tracking, that is, goal tracking which does not depend on any information about mental states. Although widely ignored, pure goal tracking is a fundamental capacity. It plausibly anchors abilities to understand others’ minds and underpins capacities to perform joint actions.

My aim in this paper is to introduce a developmental puzzle about goal tracking. I will also tentatively introduce a conjecture which, if correct, might solve the puzzle. My priority in discussing the puzzle, though, is to highlight reasons for thinking we may need a new theoretical approach if we are to understand goal tracking in humans.

2 Pure goal tracking in the first months of life

When can infants first track goals to which actions are directed? Consider an elegant and much-replicated experiment by Woodward (1998 study 3). In Woodward’s paradigm, infants are presented with a scenario involving two objects, a ball and a teddy (say). In the habituation phase, a hand enters the scene and grasps one of the two objects, as depicted in Fig. 1. In the main phase, infants are then shown one of two new events. The idea is to pit sameness of movement trajectory against sameness of goal. In both new events, the locations of the ball and teddy are reversed. In one event, the hand follows the same trajectory as before and so grasps a new object. In the other event, the hand grasps the same object as before, which requires following a new trajectory. If infants ignore goals and track trajectories only, then they should find the event with the new trajectory more interesting. But if infants track the goals of actions, then they may well be more interested in changes in the goals of actions than in changes in movement trajectories. In that case they should find the event with the new goal more interesting and so dishabituate more strongly to it. And this is just what Woodward observed in both 9- and 6-month-olds. She concluded that ‘early in life, infants begin to set up a system of knowledge of human action that has features in common with more mature understandings’ (Woodward 1998, p. 31).

Fig. 1
figure 1

Source: Woodward et al. (2001, fig. 1)

Habituation and test stimuli from an experiment by Woodward (1998) on 6-month-olds’ abilities to track the goals of actions.

Using a manipulation we’ll consider later (‘sticky mittens’), Sommerville et al. (2005) used this paradigm to show that even 3-month-olds can form expectations based on the goal of an action. Apparently, then, pure goal tracking is present even in the first months of life.

Or is it? We should distinguish goals from targets. The target or targets of an action (if any) are the things the towards which it is directed. If the goal of an action is to kick a particular football, this football is the action’s target. To specify a target of an action is to partially specify one of its goals. But more is required to fully specify a goal, of course. A goal typically involves a type of action—kicking rather than smashing, say. It may also involve one or more manners of action—discretely, firmly, and precisely, for example—and perhaps more besides (see Fig. 2).

Fig. 2
figure 2

Fully specifying a goal can involve giving a type of action, a target, some manners of action, and more

For a process to track a target of an action is for how that process unfolds to nonaccidentally depend in some way on whether that thing is indeed a target of the action. Given the liberal way I have defined tracking a goal, a process which tracks a target is thereby a process which tracks a goal. (The converse will often hold too, although there could be exceptions.) It is conceivable that some types of process merely track targets: that is, such processes only ever track goals in virtue of tracking targets.

The studies considered so far are all consistent with the possibility that infants merely track targets. To see this, suppose further research found that 6-month-olds could not distinguish between different goals with the same target. Suppose, for example, that they could not distinguish grasping a teddy from pushing it away. In that hypothetical case, the findings considered so far would indicate an ability to merely track the targets of actions.

So can infants at this age actually distinguish two actions directed to the same target which differ in type or manner, or in some other way? To answer this question, we would ideally have pairs of scenarios in which the target of an action is kept constant while the type of action varies. To the extent that subjects responded appropriately to the difference in type of action, we could be confident that they can distinguish actions not just by their targets but also by their types. Song et al. (2005) conducted an ingenious experiment along roughly these lines with 13.5-month-olds. They observed that, at this age, how long infants look at a reaching action depends on congruence between the type of an agent’s previous action and the target of the reaching action. Unfortunately, as far as I know, this kind of experiment has not yet been done with younger infants. Developmental researchers rarely explicitly distinguish aspects of goals, such as target, action type and manner.

Fortunately there are some studies which, although not intended to get at this exactly, do bear indirectly on whether younger infants can track goals other than merely in virtue of tracking targets of actions. Behne et al. (2005) created pairs of contrasting scenarios. In one of their contrasts, an experimenter holds a ball out for an infant to grasp and then either ‘accidentally’ drops it or teasingly pulls it back. So in each case there is a goal-directed action involving the ball, but in one case the goal of the action is to pass the ball to the infant whereas in the other case the goal is to tease the infant. Behne et al. (2005 Study 2) found that 9-month-olds (but not 6-month-olds) consistently and appropriately discriminated between these scenarios by, for example, banging more when the ball was ‘accidentally’ dropped than when it was teasingly retracted. This and other research (Ambrosini et al. 2013, discussed below) suggests that, at least from 9 months of age, infants can indeed distinguish both the type and target of a goal-directed action. It would be unsurprising if, in simple enough cases, infants also showed competence in goal tracking earlier.

3 How do infants track goals?

What follows depends on the premise that infants can track goals in the first 9 months of life, and that they do not always do so merely in virtue of tracking targets. How do they do this?

Answers to this question can be broken into two parts. First we need to specify a function or principle which characterises infants’ goal tracking; this is what Marr (1982) calls a computational description. Then, second, we need a hypothesis about how infants represent actions and goals and which processes enable them to compute the function or apply the principle.

Start with the computational description. Step back and think about planning an action. It is a familiar idea that we can characterise part of the process (ignoring some important details) with an inference:

  1. 1.

    This outcome, \(G\), is the goal (specification).

  2. 2.

    Means \(m\) is a best available way of bringing \(G\) about.


  1. 3.

    Adopt \(m\)!

It is possible to think of pure goal tracking as essentially the same inference, just re-ordered:

  1. 1.

    This means, \(m\), has been adopted (observation).

  2. 2.

    \(G\) is an outcome such that: \(m\) is a best available way of bringing \(G\) about.


  1. 3.

    \(G\) is a goal of the observed action.

Planning is the process of moving from goals to means, whereas tracking goes in the reverse direction, from means to goals. But these two have something in common: they exploit the same relation between means and goals. In both cases, planning and goal tracking, the means that are adopted should be a best available way of bringing the goal about.Footnote 1 This is a core insight of what Gergely and Csibra call the Teleological Stance.Footnote 2

Note that this second inference is not exactly a candidate answer to our question, How do infants identify goals from 9 months of age (or earlier)? Instead, it provides what Marr (1982) calls a computational description. It specifies a function. Given facts about events and states of affairs, this function yields one or more outcomes which are the goals of an action. Importantly, the facts given to the function could be known without knowing which goals any particular actions are directed to, nor anything about particular mental states.

The existence of this function is important because it shows that pure goal tracking is possible in principle. Should we assume, further, that it is part of a correct computational description of goal tracking in the first 9 months of life (and beyond)? This assumption is risky if you believe there is room for uncertainty about whether 9-month-olds are only ever merely tracking targets (see Sect. 2). After all, the above inference hinges on a relation between means and outcomes which could not be computed if only targets were represented. But we are working from the premise that infants in the first 9 months of life can track goals not merely in virtue of tracking targets. And as there is no known, published alternative to Gergely and Csibra’s idea about how pure goal tracking is possible in principle, we can only proceed by accepting it. This is why we should take as a working assumption that the above inference provides part of a computational description of infants’ goal tracking.

But what we want to know much more than this, of course. How do infants (and adults, and members of other species) actually compute the function specified by this inference? In Marr’s terms, which representations and processes are involved in pure goal tracking?

There is no obvious reason to assume that there is just one answer to this question. In theory, the answers could be different for infants and adults, for example. Indeed, given the importance of goal tracking, there may be multiple kinds of goal tracking involving different representations and processes in the first months of life.

One possibility is that we could intentionally use the inference in response to being instructed to do so. Presented with the above inference, you and I might be tasked with using it in a particular situation. We could observe a means and write down a list of candidate outcomes. We might then discuss which, if any, of the outcomes the observed means would be a best available way of bringing about. And, using the inference, we could finally reach a conclusion about the goal (or goals) of the action.

Consider the view that infants (and adults) engaged in goal tracking reason about to which outcome a means is the best available in fundamentally the same way that you and I would if tasked with working it out explicitly (but without writing anything down, of course). Let us call this the Simple View:

Infants’ (and adults’) goal tracking depends on beliefs or working assumptions concerning relations which hold quite generally between means and goals; and they identify particular goals by making inferences from these beliefs or assumptions plus their observations.

Gergely and Csibra might be interpreted as endorsing the Simple View. While there is no decisive textual support for this interpretation, they do stress continuities between goal tracking in infants and explicit reasoning in adults (Gergely and Csibra 2003; Csibra and Gergely 1998), and they describe applying the Teleological Stance as a matter of using knowledge in drawing inferences (Csibra and Gergely 2013; Pomiechowska and Csibra 2017). However, at another point they could be interpreted as stepping back from the Simple View (Csibra and Gergely 2007, pp. 72–74). There is also no clear reason for them to accept the Simple View: their overall theoretical position is consistent with it, but does not appear to require its truth.

Irrespective of who (if anyone) endorses it, the Simple View is a good starting point for at least three reasons. First, it involves postulating no novel psychological states, processes or systems. (It does not entail the existence of a goal-tracking module, for example.) Second, as just illustrated, it is a generalisation from cases in which its claim is known to apply. Third, there are no published, suitably detailed accounts of any alternative. So what, if anything, is wrong with the Simple View?

4 A limit on infants’ goal tracking?

One source of evidence relevant to evaluating the Simple View concerns a limit on infants’ goal-tracking abilities. As we shall see, there is a body of evidence for the hypothesis that, at least in the first 9 months of life, infants can only track goals of an action they can represent motorically at the time the action occurs.

How will this evidence bear on the Simple View? Any theory of infants’ goal tracking should explain not only how they succeed but also why their abilities are limited in various ways. Our confidence in the Simple View should therefore be modulated by how well it can explain the actual limits on infants’ performance.

What evidence suggests that infants can only track goals of actions they can represent motorically? To answer this question, it is helpful to step back and first consider something interesting about adults when they perform, and when they observe, actions. In performing actions—stacking blocks, say—you do not look at your hand but at the block it will pick up, or, when holding a block, at the location where it will place a block. In acting, our eyes move just ahead of the action. Flanagan and Johansson (2003) showed that the same pattern occurs when adults observe another agent acting. In observing an action, the eyes move just ahead of the action. Such proactive eye movements have been used to measure goal tracking in adults (e.g. Ambrosini et al. 2011).

So much for adults, what about infants? When observing a hand that is approaching some objects and about to grasp one of them, infants will, like adults, often look to the target of the action in advance on the hand arriving there (Falck-Ytter et al. 2006). As in adults, this proactive gaze indicates goal tracking. Critically, though, the occurrence of this proactive gaze in infants is related to their own abilities to represent the observed actions motorically. To a first approximation, we might say that for those infants who are not yet able to reach, their eyes do not arrive on an object to be grasped in advance of the hand grasping it (Kanakogi and Itakura 2011). Further, if we consider proactive gaze for different kinds of observed actions (such as various kinds of grasping actions or putting objects into containers), we find that infants’ gaze to the target of an action becomes more proactive as they become able perform the particular kind of action observed.Footnote 3

Further evidence for the limit that infants can only track goals of actions they can represent motorically comes from studies which compare reaching bodies with nonbodily events. For example, when Kanakogi and Itakura (2011) substituted a mechanical claw for the grasping hand in their videos, they found that infants did not appear to gaze proactively to the target of the action (see also Cannon and Woodward 2012; Adam et al. 2016). As in other cases, infants’ goal tracking appears limited to cases in which the actions they observed are actions they can represent motorically.Footnote 4

This rough statement of the limit needs refining. The actions observed by infants in the studies mentioned so far are probably faster and more fluid than any the infants themselves could produce. We will also see evidence that 3-month-olds can track the goals of reaching actions although they are not capable of reaching (but only of pre-reaching), and that they can do so even for actions like reaching over a high barrier which they could not do at all (Skerry et al. 2013). Indeed, we will even see evidence that 6-month-olds can track the goals of actions (specifically, phonetic gestures) they are wholly unable to perform (Bruderer et al. 2015). How, then, could infants’ goal tracking be limited to actions they can represent motorically?

The answer has two parts. First, the limit we are considering is not about capacities to perform actions but capacities to represent them motorically. These are connected, of course: to some extent, limits to perform actions are useful proxies for limits to represent actions motorically. But that is a background methodological issue. Second, representing an action motorically need not involve representing every aspect of the action accurately. Aplasic individuals may represent observed manual grasping actions motorically as if they were grasping actions performed with the foot or mouth (Gazzola et al. 2007b). Likewise, ordinary individuals who have experience of pressing a pedal with their feet to grasp an object may represent observed manual grasping actions as if they involved the feet (Triberti et al. 2016). Representing an action motorically may therefore involve distorting the means while capturing the goal with a higher degree of accuracy.

To be more precise about the limit under consideration, let us say therefore that two actions are similar enough in a context if they are either both of the same kind (for example, both reaching actions) or else similar enough that the differences make no difference for the purposes of goal tracking in that context. For example, reaching and pre-reaching are similar enough in many contexts. The limit under consideration is then this: infants in their first 9 months of life can only track the goals of an action if they can represent a similar enough action motorically at the time the action occurs.

The evidence considered so far takes the forms of correlations between infants’ abilities to perform and to track the goals of actions. What happens if you intervene on infants’ abilities to act?

An ingenious way to enhance infants’ abilities to act was invented by Needham et al. (2002). They put ‘sticky mittens’ on 3-month-old infants and allowed them to play with objects. These infants spent more time visually and manually exploring novel objects than others without such mittens (see further Needham et al. 2017). Having established the mittens’ efficacy, a group of researchers went on to examine whether 3-month-olds who had worn the mittens show enhanced abilities to track the goals of others’ actions (Woodward 2009). To this end, they used Woodward’s paradigm mentioned above to test goal tracking in a group of 3-month-olds who had played wearing the mittens, and in a further group who had not done so. Only infants who had played wearing the mittens showed evidence of goal tracking. While this result can of course be interpreted in many ways, it is another indication that infants can only track the goals of an action if they can represent a similar enough action motorically.Footnote 5

One potential objection to this study concerns the fact that playing with the mittens not only enhanced the infants’ abilities to act but also gave them more time observing actions. Could it be observation of action (including one’s own) rather than performance that matters? To address this issue, subsequent studies have compared what happens when infants are enabled to perform a new action with what happens when they merely observe the new action being performed (for example, Sommerville et al. 2008; Gerson and Woodward 2014; Bakker et al. 2015). Taken together, the results of such studies indicate that it really is enhancing infants’ abilities to act (and thereby their abilities to represent actions motorically), rather than merely allowing them to observe, which influences their goal-tracking abilities.

But what exactly can we conclude about limits on infants’ performance? The limit under consideration is about infants only tracking the goals of actions they can represent motorically at the time the action occurs. None of the evidence considered so far bears on when infants must represent actions motorically. We need studies in which infants’ abilities to act are impaired.

Bruderer et al. (2015) temporarily impaired 6-month-olds’ abilities to act in one of two ways by getting them to suck either a tongue-constraining dummy or a lip-constraining dummy. How would these two kinds of constraint affect infants’ abilities to track the goals of actions for which the tongue was critical? The researchers found that the infants’ goal tracking was impaired by the tongue-constraining dummy only. Given the assumption (on which we have been relying throughout) that intervening on infants’ action abilities has an effect on their abilities to represent actions motorically, their observations suggest that intervening on infants’ abilities to represent particular goals motorically may have an immediate effect specifically on their abilities to track actions directed to similar goals.

Or does it? Like many groundbreaking studies, Bruderer et al. (2015)’s leaves alternative possibilities open. One particularly pressing issue is whether temporarily impairing infants’ abilities to perform tongue actions really altered their ability to represent those actions motorically. In adults, tying the hands appears to have effects related to effects of direct intervention on the motor cortex using TMS (see Costantini et al. 2014 and Elsner et al. 2013 on Ambrosini et al. 2012). This suggests that tying the hands does indeed impair abilities to represent manual actions motorically. There is no comparably direct evidence on whether constraining infants’ tongues similarly impairs their ability to represent tongue actions motorically.

In this section I have reviewed evidence concerning a limit on infants’ goal tracking. As they acquire or lose new abilities to act, whether through ordinary development or experimental intervention, so their goal-tracking abilities are correspondingly enhanced or impaired. This suggests that, at least in the first 9 months of life, infants can only track goals of actions when they can represent a similar enough action motorically at the time the action occurs.

Of course we are not forced by the available evidence to accept that infants’ goal tracking is so limited. One response would be to withhold judgement given that the evidence on what happens when infants’ abilities to represent actions motorically are impaired is extremely sparse by comparison with the richer body of evidence concerning adults (Costantini et al. 2014; Pazzaglia et al. 2008). Another response would be to accept the evidence but interpret it differently. Rather than a limit on goal tracking, the evidence could be interpreted as suggesting that 9-month-olds simply find goal tracking slightly harder when they cannot represent observed actions motorically. As things stand, either of these responses is probably reasonable. Even so, the available evidence is at least enough to justify asking what would follow if infants can only track goals of actions they can represent motorically at the time the action occurs.

Why might goal tracking in infants be limited in this way? One possibility is that the Simple View is wrong: infants’ goal tracking is not a consequence of making inferences from general beliefs and observations but instead involves motor representations and processes only (see Sect. 5 below). But could the limit be explained without rejecting the Simple View? On the Simple View, goal tracking is a matter of thinking and reasoning about the best means to perform an action (see Sect. 3). A proponent of the Simple View might allow that acquiring abilities to act, or to represent actions motorically, provides new knowledge of means-ends relations, which in turn enhance goal-tracking abilities (compare Skerry et al. 2013, p. 18732). But this is not sufficient to explain the why infants can only track goals of actions they can represent motorically at the time the action occurs. A proponent of the Simple View would have to suppose, further, that access to this new knowledge is impaired by momentary inability to represent actions motorically. While not impossible, this would be a bold conjecture for which there is currently little direct evidence. So if infants really can only track goals of actions they can represent motorically at the time the action occurs, we will probably need an alternative to the Simple View.

5 The motor theory of goal tracking

Consider a very small-scale action such as dipping a brush into a can of paint, placing a book on a shelf or cracking an egg. Attention to the ways such actions unfold reveals that, often enough, early parts of the action anticipate future parts in ways that cannot be determined from environmental constraints alone. For instance, how you grip a book or an egg may depend on what you are going to do with it (see, for example, Kawato 1999; Cohen and Rosenbaum 2004). This anticipatory control of grasp, like several other features of action performance (see Rosenbaum 2010, chapter 1 for more examples), is not plausibly a consequence of mindless physiology. It likely involves representations concerning how actions will unfold in the future.

Such representations are thought to feature in processes which are planning-like in that they involve computation of means-ends relations (Grafton and Hamilton 2007) and in that they enable satisfying relational constraints on the selection of means (Rosenbaum et al. 2012). Representations of actions which feature in planning-like processes and thereby characteristically play a role in coordinating very small-scale actions are what I refer to as motor representations.Footnote 6

This way of characterising motor representations is deliberately nonspecific. Motor representations are to action what visual representations are to vision. In both cases, the motor and the visual, the representations are theoretical posits. We can be confident that they exist to the extent that we are confident that a theory positing them is broadly correct. But for many purposes we need not commit to the details of any particular theory.

How are motor representations relevant to understanding goal tracking in infants? According to a review by Sinigaglia and Butterfill (2016), a body of evidence supports the hypothesis that tracking the goals of others’ actions can be achieved motorically.Footnote 7 That is, there are cases of goal tracking in which the only representations involved are motor representations. (They use the term ‘functional goal ascription’ for tracking goals.) Whereas those authors focus on adults, their hypothesis may also be relevant to understanding goal tracking in infants.

This hypothesis rests on some background assumptions about the control of action. First, I follow Jeannerod (2006) and others in rejecting the view that all motor representations specify only bodily configurations, joint displacements and end states. Instead some motor representations specify outcomes to which actions are directed, such as the grasping of a particular handle or the transporting of a given object. Second, some motor processes involve computing means from ends and generating sensory expectations concerning the effects of actions (e.g. Wolpert et al. 2003). Third, multiple means–ends computations can occur simultaneously, or at least rapidly enough for action preparation to involve selection on the basis of multiple means-ends computations (e.g. Wolpert et al. 1998).

The Motor Theory of Goal Tracking (as I will call it) is the upshot of combining these background assumptions with a further discovery and a principle. The discovery concerns action observation. When observing an action, motor representations and processes can occur in the observer which are, or closely resemble, those which would occur in her if it were her rather than the actual agent who was acting (Rizzolatti and Sinigaglia 2010, 2016). This discovery forces us to answer a question. What are those motor representations and processes doing there? Since the observer is not acting, they might appear redundant; but since they are costly, they cannot actually be redundant. An answer to this question is suggested by a principle which we have already encountered (in Sect. 3). It is the principle that goal tracking is planning in reverse. Perhaps motor processes occur in action observation partly because the means-ends computations they enable are the core part of a goal-tracking process.

But how could goal tracking work according to the Motor Theory? In action observation, possible outcomes of observed actions are represented. There may be few or no constraints on which outcomes are initially represented motorically given that multiple means–ends computations can occur simultaneously (or almost simultaneously). Each represented outcome triggers a planning-like process like that which would occur in preparing to perform an action. As in action performance, this process generates predictions concerning bodily configurations, joint displacements and sensory effects associated with actions. These predictions can be compared with the observed action. The representation of the outcome is weakened to the extent that these predictions are inexplicably unmet. The result is that the only only outcomes to which the observed action is a means are represented strongly (see Fig. 3).

Fig. 3
figure 3

Source: Sinigaglia and Butterfill (2016, fig. 1)

How motor processes might enable cases of goal tracking in which the only representations involved are motor representations

Note that the Motor Theory of Goal Tracking is not an alternative to the idea that goal tracking depends on computing which outcomes the observed means are best ways of bringing about (see Sect. 3). The Simple View and the Motor Theory do not differ at all concerning which relation between means and goals is to be computed in pure goal tracking. The two differ only on which processes are responsible for identifying which outcome or outcomes the observed means is a best available way of achieving.Footnote 8

It is helpful to distinguish three claims which could be associated with The Motor Theory of Goal Tracking:

  1. 1.

    Goal-tracking could, in principle, be implemented motorically.

  2. 2.

    In humans, some goal-tracking processes involve only motor processes and representations.

  3. 3.

    *Goal-tracking is impossible without motor processes and representations.

The third claim is not part of the Motor Theory. Nor is it supported by any of the evidence considered here (compare Rizzolatti and Sinigaglia 2010, p. 271; and Gallese and Sinigaglia 2011). Indeed, given that human adults could explicitly step through the inference specified in Sect. 3, it seems clear they can track goals independently of being able to represent them motorically.

Any proponent of the Motor Theory of Goal Tracking should allow that there are at least two kinds of goal-tracking process, one motoric and one which involves theoretical deliberation. This duality could be advantageous. Motor processes operate at the speed of action and can exploit kinematic cues whose significance observers may be otherwise unaware of; they are ideal for tracking goals as an action unfolds, enabling proactive gaze. By contrast, theoretical deliberation enables greater flexibility in goal-tracking as it is not limited to cases where the observed actions can be represented motorically.

Earlier (in Sect. 4), we were confronted with the question, Why might goal tracking in infants in the first 9 months of life be limited in that they can only track goals which they can represent motorically at the time of observing an action? This is the question which led us to consider the Motor Theory of Goal Tracking. But how does it help? Consider a conjecture about development:

The Developmental Motor Conjecture In the first 9 months of life, all pure goal tracking is explained by the Motor Theory. Other goal-tracking processes emerge later in development.Footnote 9

This conjecture, if true, would neatly explain why goal tracking in the first months of life is limited. There’s just one small problem. It does not seem quite true to say that infants’ goal tracking is limited by their abilities to represent actions motorically.

6 Twin developmental puzzles

The Developmental Motor Conjecture (from Sect. 5) was introduced in connection with a candidate limit on infants’ goal tracking in the first 9 or so months. As we saw, there is inconclusive but significant evidence for the hypothesis that infants in their first 9 months of life can only track the goals of an action if they can represent a similar enough action motorically at the time the action occurs. However, a breakthrough discovery by Gergely et al. (1995) made much earlier than the research so far reviewed appears to conflict with any such hypothesis about limits.Footnote 10

Their experiment involves transformations of two-dimensional spheres only. The 12-month-old infants in the test group of this study were habituated to a sequence of events which adults would likely spontaneously interpret as involving goal-directed action (Heider and Simmel 1944; see Fig. 4). Specifically, the small ball’s movements are directed to reaching the larger ball. This involved leaping a barrier. Following habituation, infants saw a new film with no barrier to leap over. One group of infants saw a ‘new action’: that is, a different movement trajectory but one which was plausibly directed to the same goal. Another group of infants saw the ‘old action’: that is, the same movement trajectory but one which, in the absence of the barrier, was not so plausibly directed to the same goal (see Fig. 5). Now if infants were considering the movements only and ignoring information about the goal, the ‘new action’ (movement in a straight line) should be more interesting because it is most different. But if infants are taking goal-related information into action, the ‘old action’ might be unexpected and so might generate greater dishabituation. And this latter possibility is exactly what Gergely and Csibra found.

Fig. 4
figure 4

Source: Gergely et al. (1995, fig. 1b)

Illustration of a movie used in habituation for the test group: the small ball moves over the barrier and stops by the larger ball.

Fig. 5
figure 5

Source: Gergely et al. (1995, fig. 3)

Following habituation, infants were shown one of the two movies represented above. In a, the small ball moves directly to the large ball; in b the small ball takes the same trajectory taken when there was a barrier between it and the larger ball.

These findings have been extensively replicated and extended (see Csibra 2003; Gergely and Csibra 2003 for reviews). Importantly for our purposes, much the same findings can be observed with younger, 9-month-old infants (Hernik and Southgate 2012) and even 6.5-month-old infants (Csibra 2008). Related observations indicate that even 3-month-olds may be capable of extracting goal-related information from displays involving simple geometric shapes (Luo 2011).

Combined with the evidence about limits mentioned earlier (see Sect. 4), these findings give rise to a puzzle about development. If we take all the available evidence at face value, we arrive at this view. For infants in the first 9 months of life, some, but not all, of their goal tracking is limited by their abilities to represent actions motorically in this way: they can only track the goals of an action if they can represent a similar enough action motorically at the time the action occurs. The puzzle is to understand why this might be. We cannot explain it by appeal to the Simple View (from Sect. 3): that predicts no such limits. And we cannot explain it by appeal to the Developmental Motor Conjecture (from Sect. 5), which predicts inescapable limits.

In fact, the evidence on goal tracking is more puzzling even than this. Daum et al. (2012) created a modified version of Woodward’s paradigm which allowed them to measure two different responses to a single scenario, anticipatory looking and dishabituation. Their modified paradigm involved cartoon fish moving in ways which infants (and probably adults too) are unlikely to represent motorically. They found evidence for goal tracking by 9-month-olds in their dishabituation responses but not in their anticipatory looking. In fact, the 9-month-olds’ anticipatory looking indicated that they expected the fish to move along the same path irrespective of any more distal goal it might have; and it was only the 3-year-olds (not the 1- or 2-year-olds) whose anticipatory looking indicated goal tracking.Footnote 11

An initial response to these discrepancies may be to think that anticipatory looking is especially hard because it requires such rapid identification of a likely goal. However, as we have seen (in Sect. 4), there are other cases in which infants do indicate goal tracking in their anticipatory looking (e.g. Kanakogi and Itakura 2011; Ambrosini et al. 2013). Note also that invoking the Motor Theory also cannot explain the discrepancies. After all, a key virtue of the Motor Theory is that it makes predictions about the timing of goal tracking: specifically, goal tracking should be as far ahead in time of an observed action as motor preparation would be ahead in time if the goal-tracker were not observing but performing the action. So where the Motor Theory explains performance, we would ordinarily expect goal tracking to be detectable in anticipatory looking.

Any theory of pure goal tracking in infants must therefore solve twin puzzles. Why does 9-month-olds’ goal tracking sometimes manifest itself in dishabituation (or pupil dilation) but not anticipatory looking? And why is some, but not all, of their goal tracking limited by their abilities to represent actions motorically at the time of observing an action?

7 Two responses to the puzzles

My primary aim is to draw attention to the twin developmental puzzles about goal tracking. However, it is perhaps worth considering steps towards a possible solution, even if only to illustrate how hard solving the puzzles would be. Let me outline two possible lines of response, one based on prior art and the other novel.

Infants who can track the goals of a contracting and expanding, self-propelled ball are not representing the ball’s actions motorically. The Developmental Motor Conjecture must therefore be false. Infants’ goal tracking in the first 9 months of life is too flexible to be explained by the Motor Theory. Instead we must allow that these infants, like adults, can arrive at conclusions about the goals to which an action is directed via theoretical deliberation, just as adults can.Footnote 12 Consequently we must reject the hypothesis about limits. Since these infants can deliberate theoretically about goals, it must be false that they can only track the goals of an action if they can represent a similar enough action motorically at the time the action occurs. And, as we saw (in Sect. 2), the evidence does not force us to accept the hypothesis about limits because it is broadly compatible with the weaker hypothesis that goal-tracking is merely facilitated by motor processes and representations. Accepting only this weaker hypothesis would allow us to hold on to the Simple View and dissolve the first puzzle. And this response could be further developed to answer the second puzzle by invoking a conjecture about how theoretical deliberation concerning goals typically speeds up as children age, only gradually becoming fast enough to support anticipatory looking. So the first line of response.

This first line of response has the virtue of taking many studies of proper goal tracking in infants at face value. But what if it turns out that the hypothesis about limits is actually correct? Is there an alternative response, one consistent with the possibility that infants under 10 months of age can only track the goals of an action if they can represent a similar enough action motorically at the time the action occurs?

Targets are formally distinct from goals (see Sect. 2). Further, there could be a kind of process which merely tracks targets. That is, it would count as a goal-tracking process only in virtue of tracking the targets of actions. The computational description of such a process must differ from the computational description we considered for goal-tracking processes (see Sect. 3). After all, that computational description hinges on a relation between means and outcomes where the means is a best available way of bringing the outcome about—no such relation holds between antimate things and their targets. A process which tracked targets only would therefore differ substantially from a propergoal-tracking process.

A process which tracks targets only could be useful. Imagine becoming aware of a hostile presence. In the very first moments, you may already have taken a view about who the target of her actions is on the basis of her movements and which way she is facing. Perhaps it will take longer to more fully identify the goals of her actions—whether, for example, her knife is for attacking or coercing; perhaps you will never discover what her goals were. But information about the target alone could be enough to spring into action, heorically placing yourself between the hostile and her target.

Consider a guess about infants:

Target Tracking Guess When infants in the first 9 months of life might appear to be tracking goals which they are unable to represent motorically, they are merely tracking targets.

If this guess were correct, there would be no conflict with evidence for the hypothesis about limits. This would make the first puzzle merely apparent. It would be an artifact of failing to distinguish merely tracking the target of an action from proper goal tracking. Suppose, further, that processes by which infants in the first 9 months of life track the targets of objects do not enable anticipatory looking. Together with the guess, this further assumption suggests a solution to the second puzzle. The appearance of dissociations between dishabituation (or pupil dilation) and anticipatory looking may be due to the fact that, in infants (at least), target-tracking processes, unlike goal-tracking processes, do not enable proactive gaze. This, then, is a second response to the twin developmental puzzles.

As far as I know, we cannot yet tell whether the Target Tracking Guess is correct or incorrect. We cannot yet tell that it is incorrect because few experiments are designed to distinguish proper goal tracking from mere target tracking; and those which do so distinguish also measure abilities to represent actions motorically (as we saw in Sect. 2). But the Guess is not merely wild speculation. For there is evidence of processes in adults which merely track targets. This is provided by research on perceptual animacy, the detection by broadly perceptual processes of animate objects and their targets.

To illustrate, consider a groundbreaking experiment by Gao et al. (2009, experiment 1). Adults were shown a display which contained some moving circles. In some cases the circles moved independently of each other, but in other cases there was a ‘wolf’ circle which chased a ‘sheep’ circle with varying degrees of subtlety. The adults’ task was simply to detect the presence of a wolf. Gao et al. (2009) established that adults can do this providing the chasing is not too subtle. In further experiments, they also showed that adults’ abilities to perceptually detect chasing depend on several cues including whether the chaser ‘faces’ its target (‘directionality’) and how directly the chaser approaches its target (‘subtlety’).

Some discussions of perceptual animacy confound it with goal tracking proper.Footnote 13 This would imply that perceptual animacy effects depend on the identification of a best-available-means relation (see Sect. 3). While not impossible in principle, this seems unlikely given three considerations. First, perceptual animacy effects can be elicited when almost no information about goals other than targets is provided. Second, the perceptual detection of animacy appears to depend on cues and heuristics which would be detrimental if deployed in proper goal tracking. And, third, detection of animacy appears to be a broadly perceptual phenomenon since it depends on areas of the brain associated with vision and influences how perceptual attention is allocated (Scholl and Gao 2013) irrespective of your beliefs and intentions (van Buren et al. 2016). By contrast, there is no reason to suppose that any kind of proper goal tracking is a broadly perceptual phenomenon. So the triggers, computational description and cognitive architecture of perceptual animacy all nondemonstratively suggest that it is not goal tracking proper. I shall therefore assume instead that perceptual animacy is a broadly perceptual process which merely tracks targets and is therefore distinct from the motor or theoretical processes of proper goal tracking characterised by the Motor Theory and the Simple View.

Perceptual animacy and motor goal-tracking processes can be distinguished using the method of signature limits. Where a response is due to perceptual animacy, it should be subject to signature limits concerning trajectories and directionality. For example, where a chaser could have, but does not, take a reasonably direct approach to her target, perceptual animacy does not enable target detection; and likewise if a chaser inexplicably faces too far away from her target (Gao et al. 2009). By contrast, where a response is due to motor goal-tracking processes, it should be limited by the observer’s abilities to represent actions motorically (see Sect. 4). In short, different signature limits ensure that conjectures about perceptual animacy and motor goal-tracking processes generate distinct testable predictions.

Could it be that behaviours in 9-month-olds which appear to manifest proper goal tracking but do not involve motor processes are all a consequence of perceptual animacy? Such a conjecture may appear tempting for several reasons. Infants in the first year of life can perceptually detect animacy.Footnote 14 And because Gergely et al. (1995)’s effect appears to depend on cues to animacy (Schlottmann and Ray 2010; see further Hernik et al. 2014), an interpretation of this particular scenario in terms of perceptual animacy is currently plausible.

This conjecture also generates readily refutable predictions. Whatever ability underpins Gergely et al. (1995)’s effect also appears to enable infants to track unseen constraints on movement (Csibra et al. 2003; Csibra 2003). The conjecture therefore implies that tracking unseen constraints could be a consequence of perceptual animacy, although no research to date suggests this. Further, Gergely et al. (1995)’s effect involves violations of the proposed limit on subtlety where barriers are present. The conjecture therefore implies that the subtlety limit is more complex than existing research indicates, so that including barriers in stimuli would allow perceptual animacy effects even where the angle of approach to a target is large.

8 Conclusion

There is a developmental puzzle about goal tracking in the first 9 months of life. On the one hand, an impressive body of evidence from various researchers using a range of manipulations and measurements points to the conclusion that infants’ goal tracking is limited: these infants can only track the goals of an action if they can represent a similar enough action motorically at the time the action occurs (see Sect. 4). On the other hand, an extensive body of evidence, which is standardly taken to indicate proper goal tracking in infancy, points to the conclusion that no such limit exists (see Sect. 6). We must reject one of these conclusions.

A further, related development puzzle concerns dissociations in performance as measured by different response types (see Sect. 6). It is not just that infants manifest goal tracking in one type of response (e.g. pupil dilation) but not another (e.g. anticipatory looking; Gredebäck and Melinder 2010). There is at least one case in which dishabituation appears to manifest goal tracking whereas anticipatory looking indicates tracking statistical regularities (Daum et al. 2012).

I propose three steps towards solving these twin puzzles. While at least two incompatible responses to the puzzles are consistent with the available evidence (see Sect. 7), there should be broad agreement on these steps.

The first step is the least controversial and theoretically most basic. We should distinguish proper goal tracking from mere target tracking. This distinction matters because evidence that infants– can track the targets of actions should not automatically be taken to support the conclusion that they are capable of proper goal tracking (see Sect. 2).

An important second step is to recognise that there may in fact be distinct processes for proper goal tracking and mere target tracking, at least in adults. Mere target tracking can be achieved perceptually thanks to perceptual animacy (see Sect. 7), while proper goal tracking can be achieved motorically (see Sect. 5).

Merely distinguishing processes is of little use unless doing so also allows us to generate testable predictions. Fortunately the different processes are associated with signature limits. For instance, tracking targets though perceptual animacy depends on heuristics like directionality and subtlety (see Sect. 7). And, of course, motor goal-tracking processes are limited to cases in which actions similar enough to those observed can be represented motorically (see Sect. 5). Thanks to these and other distinguishing performance characteristics, predictions can be derived from conjectures about the particular process or processes which underpin sensitivity to features of action. The third step, then, is to contrast the predictions of multiple conjectures about the kinds of processes involved in tracking others’ actions.

Taking these three steps appears necessary if we are to understand the development of goal tracking. In adults, we can distinguish a proper goal-tracking process which involves only motor representations and processes (Sinigaglia and Butterfill 2016) from one which involves theoretical deliberation; and these can both be distinguished, arguably, from the broadly perceptual target-tracking processes which underpins perceptual animacy effects. These goal- and target-tracking processes may have different developmental trajectories. Understanding these is key to solving the twin developmental puzzles.

One (of several) candidate solutions to the puzzles would be this conjecture:

In the first 9 months of life, all proper pure goal tracking is explained by the Motor Theory. Other pure goal-tracking processes emerge later in development. Further, a mere target-tracking process is also present in these infants. And appearances that these infants’ pure goal-tracking abilities are not limited by what they can represent motorically are misleading: they are due to mistaking mere target tracking for proper goal tracking.

While it is not yet known whether this conjecture is true or false, it is worth considering partly because it can be developed in ways which generate readily testable predictions (see Sect. 7). The conjecture also has another virtue. Sensitivity to others’ actions is so fundamental to social cognition and joint action that it is likely to depend on a rich mix of many kinds of processes. We need ways to identify these processes, to distinguish their limits and to understand their synergies. This is why distinguishing motor processes which support proper goal tracking from perceptual animacy which supports mere target tracking matters. But regardless of whether this conjecture turns out to be right, what seems more certain is that solving the twin developmental puzzles about pure goal tracking is essential if we care at all about the abilities which underpin, among other things, understanding others’ minds and social interaction.