1 Introduction

Fodor (1975) introduced into contemporary philosophy of psychology the idea that cognition occurs in a language-like medium. This proposal leads naturally to the question of the relation between this proposed language of thought and natural languages, like English or Quechua, with which we are familiar. The simplest proposal is one of identification: the language of thought is natural language. Having a thought is tokening an expression of natural language, and token thoughts are individuated by their linguistic properties. However, despite this simplicity, the view that we ought identify the language of thought with natural language has not been widely adopted in philosophy or psychology.Footnote 1 A variety of traditional arguments, detailed in Sects. 3 and 4, have proved convincing. In particular, it has been argued that a variety of properties of natural language, in particular that natural languages are learned, variable, public, and ambiguous, cannot be properties of the language of thought, which is innate, uniform, private, and disambiguated.

A central difficulty with these arguments, as compelling as they have seemed, is that they rely on an intuitive understanding of what a natural language is. As linguistic theory has developed, especially in the last couple decades, however, a quite different picture of natural language has emerged. It has been one of the driving assumptions of generative linguistics that these everyday understandings of the term ‘language’ do not provide substantial constraints on linguistic theorizing, and so it is possible, indeed it appears likely, that the assumptions underwriting these arguments against the identification of natural language and the language of thought are false. Natural language, conceived of as the target of empirical linguistic research, is an aspect of individual’s psychologies, and is plausibly innate, invariant, private, and disambiguated. More specifically, these intuitive properties of natural language are better viewed as properties of the externalization of natural language proper.

This suggests a re-evaluation of the traditional question of the identification. In fact, certain theorists working within this generativist tradition have argued that contemporary understanding of natural language does indeed suggest that identification is a plausible empirical hypothesis. Chomsky himself has been somewhat equivocal on this issue, although in recent years he seems to be more clearly promoting this thesis. In Chomsky (2007a) he states that “[i]t is often argued that another independent language of thought [i.e. independent of natural language] must be postulated, but the arguments for that do not seem to be compelling” (p. 16). However, in Chomsky (2015) he claims that we ought view “language as essentially an instrument of thought, even if we do not go as far as Humboldt in identifying the two” (p. 16), but later refers to the “underlying ‘language of thought’ provided by the internal language, the I-language, that everyone has mastered...” (p. 59). Most explicitly, in Chomsky (2007b), he states: “If the relation to the interfaces is asymmetric, as seems to be the case, then unbounded Merge provides only a language of thought, and the basis for ancillary processes of externalization” (p. 22), and this view is developed further in Chomsky (2017). In a series of papers, Hinzen (2006, 2011, 2013, 2014, 2015, 2017), following Chomsky, has provided the most detailed account of what such an identification would look like, how contemporary linguistics, and cognitive science more generally, makes it plausible, and what explanatory benefits would arise from making it.

While I believe they are lurking in the background of these works, the ways that contemporary generative theory undermines classical arguments against the identification of natural language and the language of thought have not been made fully explicit. My first goal in this paper will be to do just this. I hope to spell out more explicitly and fully than has been done previously how the picture of natural language presented by contemporary linguistic theory shows all of the traditional arguments for distinguishing between natural language and the language of thought to be unsound. However, while the metaphysics of contemporary linguistic theory are much more hospitable to this identification than traditional pictures of natural language were, the methodological developments suggest the opposite lesson. Whereas traditional linguistic theory assumed a relatively close correspondence between spoken language and linguistic competence, the demands of contemporary linguistic theory have expanded the gap between the two substantially. That is, as contemporary linguistic theory has advanced deeper and more abstract underlying grammars, it has been forced to exclude a wider variety of linguistic behavior from its purview. Of particular relevance for my purposes is the category of utterances deemed ungrammatical but acceptable. This class has been expanding as a result of phenomena traditionally taken to be paradigmatically grammatical being re-analyzed as relating to externalization. Perhaps the clearest example of this is word order itself: it is near-definitional of syntax that it studies of the arrangement of words in sentences, and from this it has traditionally, and reasonably, been assumed that the linear order of words is a grammatical/syntactic phenomenon, but much recent work views surface word order as instead relating to externalization processes (see chapter 7 of Hornstein et al. 2005). This increasing distance between grammar and observable properties of utterances predictably leads to breakdowns in the mapping between grammaticality and acceptability.

Utterances with this combination of properties pose a significant worry for attempts to identify natural language and the language of thought. If there are sentences that human speakers are able to interpret, but which are not licensed by the internal rules of the grammar, this seems to entail that speakers can have thoughts that are not expressions of natural language. But if this is so, the sets of possible thoughts and of possible (complete) linguistic expressions are not even extensionally equivalent, let alone identical. In Sect. 5, I shall go through some examples of expressions that generative linguists plausibly view as ungrammatical, despite the fact that they express thoughts which normal speakers can grasp.

In Sect. 6, I will describe several strategies that the defender of the identification of thought and language can use to respond to these kinds of examples. I will provide a qualified defense of the Identification Hypothesis, arguing that many apparent counter-examples of this sort seem plausibly explained. However, I will point to some examples which seem somewhat more difficult. I hope that this paper can make clear exactly what would need to be done to defend this alternative position.

2 The claim

Since Fodor (1975), the question of whether thoughts are conveyed by language-like vehicles has been one of the dominant threads in the philosophy of psychology. For the purposes of this paper, I shall not weigh in on these debates. I shall assume that something like the LOT hypothesis is correct, and will interpret it as making the following claims:

  1. i.

    Cognition involves the manipulation of representations.

  2. ii.

    These representations have constituent structure.

  3. iii.

    Cognitive processes are defined over such structural properties.

  4. iv.

    The semantic properties of these representations are not, in general, iconic.

Claims i–iii establish the representational theory of mind (see inter alia Fodor 1987 Chapter 1, and Fodor 1990 Chapter 1). Claim iv is intended to rule out representationalist theories which centrally posit non-language-like representational media, such as maps or images. As I said, I shall not be arguing for or elaborating on these claims. I mention them in order to make clear what the proposal is that I am evaluating, namely that the representations over which thought processes are defined are themselves the products of the language faculty, generated in accordance with whatever psychological principles govern this system. The idea that the language of thought is a natural language is of course only viable on the assumption that there is a language of thought, and so the issue only arises for those who accept i-iv. For certain stripes of connectionists (e.g. Churchland 1996) or dynamical systems theorists (e.g. Van Gelder 1995), then, the proposed identification cannot even be stated.

It is worth making explicit that the notion of a language of thought is being used here in a much narrower sense than it was in Fodor (1975). Fodor is concerned to show that representational theories of any aspect of the mind presuppose a structured medium over which computational operations can be defined. This will thus include the workings of the perceptual, navigational, and motor control systems, and any other systems which operate by manipulating representational states. It is clear that the identification of a language of thought with natural language is wildly implausible for most such systems. Non-human animals, from arthropods to apes, for which there is no reason to posit the possession of natural language, exhibit minds with these kinds of representational capacities.Footnote 2 In this sense, Fodor’s broad use of the term ‘language of thought’ is misleading, as what he is really proposing is a language of mentation. It is a language of thought in a narrower sense, as applied only to thoughts, that I am interested in. Of course, it is is an open empirical question what the psychological kinds are, and which, if any, corresponds closely enough with our pretheoretic term ‘thought’ to be worthy of the name. This may involve some degree of explication (see Sect. 6.3), but for our purposes, it is sufficient to identify some seemingly important properties of thought, and view whatever empirical psychology discovers which has these properties (more or less) as ‘thoughts’. Thoughts, as I use the term, are paradigmatically personal-level propositional attitudes.Footnote 3 That is, they are relations to complete propositional contents, attributable to a cognitive agent as a unified whole. Beliefs are paradigmatic examples of thoughts: agents believe that p, where p is a proposition. Thoughts are also, to use Stich’s (1978) term, ‘inferentially promiscuous’: they systematically and reliably enter into rational transitions with other thoughts with a wide range of contents. Paradigmatic examples of thought processes include both practical and theoretical reasoning: I believe that p and if p then q, and so infer q, or I believe that a-ing would bring it about that p, and desire that p, and so I a. The proposed identification would then be that propositional, personal-level, inferentially promiscuous thoughts are natural language expressions. I will refer to the proposal that such thoughts are natural language expressions as the ‘Identification Hypothesis’ or ‘IH’. IH is thus an empirical scientific hypothesis, not a piece of conceptual analysis or a priori metaphysics. The model here would be an empirical identity statement such as the identification of lightning with atmospheric electrical discharge. We can identify two psychological capacities, the ability to think and the ability to use and acquire a language. IH is the hypothesis that these capacities centrally depend on one-and-the-same underlying system; that exercises of one are exercises of the other.

The other relata of the proposed identification, natural language, will be the focus of this paper. We can identify two predominant ways of understanding this notion. One approach views natural languages as essentially public, shared entities. These entities consist of mappings from symbols onto meanings, and their properties are largely determined by social conventions. From this perspective, it is natural to ask how many people speak a particular language. The alternative approach views natural language as a psychological mechanism which maps one kind of representation onto another. This mechanism, in concert with many others, makes language use possible. An individual’s language in this sense is token-distinct from that of any other speaker. Following Chomsky (1986), I will refer to languages in the former sense ‘E-languages’, and in the latter sense ‘I-languages’. It will be my central contention that the plausibility of IH depends on interpreting ‘natural language’, in the latter sense.Footnote 4

For different reasons, neither of these notions is perfectly transparent at this point. E-languages are largely continuous with folk understanding of what natural languages are, and thus they inherit much of the imprecision characteristic of folk notions. However, there are certain features we can use to identify them: they are few relative to the number of speakers, people can grasp them more or less perfectly, they are tools primarily for communication, they exhibit significant variation, and are acquired through a process of learning from others. I-languages, on the other hand, are posits of a developing science, and thus claims about them are tentative and provisional. However, within at least the generative tradition, there is consensus that they are to a significant degreeFootnote 5 species-universal and largely develop without much effort on behalf of either the learner or other speakers.

In the next two sections, I shall outline some standard arguments against IH. I shall show how these objections rely on an E-language understanding of natural language, and that when we replace this with the notion of I-language drawn from scientific linguistics, the force of these arguments evaporates. Along the way, the competing understandings of natural language should themselves become clearer.

Before getting into the theoretical arguments against the feasibility of this identification, we can note some general features in its favor. Firstly, to some the feeling that our thoughts are sentences of a natural language is highly intuitive. Carruthers (1998, Sect. 2.2) develops an introspection based argument that at least conscious thoughts occur in natural language.Footnote 6 The scope and force of such arguments are limited, in that firstly they only apply to consciously accessible thoughts and so may not generalize to thought in general, and secondly that it is far from clear that conscious introspection is a reliable guide to the workings of the mind. However, they offer at least a prima facie argument in favor of the identification of the language of thought with natural language.

Another very general motivation is parsimony. Given that we are already independently committed to the existence of natural language, if we could explain higher cognition with reference only to language of this sort, we make fewer theoretical commitments. However, as with all arguments from parsimony, this doesn’t get us very far. We ought make as few theoretical posits as we can, all else being equal. That is, if positing only natural language, and no independent language of thought, were sufficient to account for all the relevant phenomena, then parsimony would favor making fewer posits. But, the question of interest is always whether all else is indeed equal. Only investigating the empirical prospects of the competing theories will settle this issue.

Perhaps more significantly, Hinzen (2013, 2014, 2017) has mounted an argument that there is a tight connection between what we can think and what we can express linguistically.Footnote 7 In particular, Hinzen argues that in many cases the best explanation for why certain thoughts are (im-)possible involves reference to which linguistic structures are made (im-)possible by the language faculty. For example, Hinzen (2011, 2013) points out that lexico-grammatical properties of verbs appear to determine which kinds of thoughts we can have involving the concepts they express. Collins (2011) provides an apposite example:

  1. 1.

    Anton broke the bed.

  2. 2.

    The bed broke.

  3. 3.

    Anton made the bed.

  4. 4.

    *The bed made.

As these examples show, the thoughts we can have seem to track the expressions available in our language. More generally, the distinctions we make in thought track those made in language. Positing a language of thought independent of the language we speak seems consistent with the possibility that a sentence like 4 is ungrammatical but that nonetheless the thought it corresponds to is perfectly fine, perhaps analogous to sentence 2, indicating that something or other caused the bed to become made. But this is not what we observe. Sentence 4 is not merely ungrammatical, but unthinkable. Note the contrast between sentence 4 and ungrammatical sentences like “The child seems sleeping”. For these latter kinds of sentence, we can easily figure out what was meant (see Sect. 6.4), whereas sentence 4 simply doesn’t seem to provide a complete thought. Does it mean that the bed was made by someone or other, analogous to sentence 2, or that it made itself (analogous to sentences like “Anton washed”), or what? The ungrammaticality of 4 seems to preclude an answer to such questions. This is so despite there being no clear language-independent reason for the difference between these verbs, or the concepts they express. It is surely commonly understood that both making and breaking events require some independent force, agential or otherwise.

IH, however, provides a neat explanation for why sentence 2, but not sentence 4, expresses a thought. Ergative verbs, like ‘break’, allow for passive alternation, wherein the direct object (THEME) of a transitive construction can be raised to subject position in an intransitive construction. Non-ergative verbs, like ‘make’, preclude such an operation.Footnote 8 The explanation for this phenomenon is controversial and complex, centering around the claim that the lexical entries for ergative verbs mandate that the THEME (direct object) of these verbs is identified (‘theta-marked’), but identifying the AGENT (subject) is optional. Non-ergative verbs, on the other hand, mandatorily identify both their AGENTs and THEMEs. That is, ‘break the bed’ is a complete verb phrase, with no mandatory argument positions unfilled, whereas ‘make the bed’ is short an AGENT. Sentences 2 and 4 are formed by taking these verb phrases and raising their THEME arguments to sentential subject position. In sentence 2 no problem arises, as there are no further argument positions which need to be identified. However, in the attempt to form sentence 4, ‘the bed’ must be interpreted as filling the required AGENT role as well as the THEME role it has already been assigned, in violation of the Theta-criterion, which states that each argument must be assigned exactly one argument role (Chomsky 1981).Footnote 9 The details are not crucial for our purposes, what matters is that it seems that the thoughts we can have track the expressions available in our language. We explain the thinkability of 2 with reference to its grammaticality, and the unthinkability of 4 with reference to its ungrammaticality. This correlation between available thoughts and grammatical expressions is left unexplained if we posit a disparity between language and thought, but is predicted if we accept the view that the limits of language provide the limits of thought. This account thus inverts the perhaps standard view that we explain why we make the linguistic distinctions we do with reference to the kinds of thoughts we can have. In the example above, defense of this view would thus require some language-independent reason for treating ‘break’ and ‘make’ differently, which seems absent.

Despite these motivations, IH has not been widely accepted within philosophy. I will turn now to the primary reasons why not.

3 Traditional arguments against the identification I: the easy cases

I will examine four major arguments aimed to show that we cannot identify the language of thought with natural language.Footnote 10 As arguments concerning whether two systems are numerically identical, all four arguments can be stated as applications of Leibniz’s law of the indiscernability of identicals. A property of natural language is proposed, which it is argued that the language of thought lacks, and so it is concluded that these cannot be the same language. I will describe these arguments in order of how serious a threat to the identification I think they pose.Footnote 11 In this section, I will detail two such arguments, from publicity and underspecificity. I believe adopting an I-language approach to natural language undermines these arguments, or at least transforms them into empirical disputes concerning the details of linguistic theory. Once this approach is presented, and its power in defending IH is exhibited, I will turn in the next section to more serious traditional worries, from variation and acquisition. Responding to these will require more detailed and controversial claims about I-languages.

3.1 Publicity

Expressions of natural languages are, according to our intuitive understanding, public entities. That is, their properties are interpersonally available. This follows from our conception of language as primarily a tool for communication. If I were unable to pick up on the visual or auditory properties of your utterances, they would be unable to serve this communicative function. This public view of language is evidenced by the common assumptions that people share particular languages, that different people can speak such a shared language more or less well, and that linguistic expressions are essentially spoken or written (or perhaps signed). On the other hand, pace the behaviorist, thoughts are private. My having a thought of a particular sort does not result in any particular perceptually detectable property. Again, this falls out of the standard account of why we have language in the first place: language is needed precisely because it enables us to make our thoughts available for others. So this pair of contrary properties fits in perfectly with our everyday understanding of the relation between thought and language.Footnote 12

3.2 Underspecificity

Despite the standard assumptions that language is for communicating thoughts, there are a variety of ways in which it seems prima facie to be less than optimal for doing so. In particular utterances often fall short of providing all the information in the thought they are used to express.Footnote 13 This can happen in a variety of related ways:

  1. 5.

    He stole them from her.

  2. 6.

    The dictator’s behavior was sanctioned by the government.

  3. 7.

    She put the keys in the basket on the floor.

  4. 8.

    She loved him, and he her.

Sentences 5–8 all point to ways that our thoughts differ from the way they are expressed. Sentence 5 is an example of context-sensitivity or indexicality. Someone hearing this sentence may be unsure of whom or what the various noun phrases refer to, as the referents of these expressions can vary from context to context. However, in thinking such a thought, there can be no question as to whom or what one is thinking about. For me to think He stole them from her, I must know exactly at whom I am addressing this (mental) accusation. Similarly for sentence 6. This sentence-form on its own does not determine whether it means that the government has endorsed or penalized this behavior, but the corresponding thought cannot fail to make clear which meaning it has. Sentence 7 is another case of ambiguity, this time of a structural, rather than lexical, sort. The sentence alone, spoken or written, is ambiguous between a reading according to which the keys begin in the basket and end up on the floor (perhaps in the basket, perhaps not) and one in which the basket begins on the floor and the keys are placed in it. But again, to think such a thought requires that one select a reading. Finally, sentence 8 is an example of ellipsis. The second coordinated clause (“and he her”) appears to lack a verb, but interpretation of this sentence fills in this gap by replicating the verb from the first clause.Footnote 14

What all of these phenomena are supposed to show is that certain properties are explicable only in strictly linguistic terms. Context-sensitivity, lexical and structural ambiguity, and ellipsis seem to be essentially properties of (public) language, not thought. Indeed it is hard to see how these could be properties of thought. And so by identifying language and thought, we lose the ability to account for these phenomena.Footnote 15

Fodor (2008) makes this same point in terms of compositionality. Thoughts must, in order to explain their productivity and systematicity, be compositional. That is, the meaning of a complex thought must be determined by the meanings of its constituents (i.e. concepts) and their arrangement. However, language is non-compositional, as indicated by the above examples of underspecified linguistic expressions. The meanings of sentences are typically determined by variable features of the utterance context in addition to the meanings of lexical constituents and their arrangements, and so linguistic meaning is non-compositional. Thus, again we find a property of thought which is not a property of language: compositionality of meaning.

3.3 Why These Traditional Arguments Fail

As mentioned above, I believe the crucial failure in these objections to the identification of thought and language is the assumption that what natural language is is relatively accessible to our intuitions. There is a folk notion of ‘language’ which does indeed have all the properties just mentioned, and thus differs from (a folk notion of) thought. But linguistic theory, as an empirical science, is not constrained to theorize about objects with these intuitive properties. One of the central aims of linguistics, just like all sciences, is to empirically determine which entities in the natural world are suitable targets for theorizing, and thus to identify the natural kinds in the target domain. And in fact, over the history of generative linguistics, the working conception of the proper target of linguistic theorizing has shifted radically away from this folk notion, in ways that appear to undermine these traditional arguments.

Chomsky’s (1986) distinction between E-languages and I-languages encapsulates this shift. Chomsky intended for E-languages to correspond to our folk notions of a language. An E-language is an external, possibly abstract, object. Speakers of a particular E-language are similar in that they bear some cognitive relation (‘knowledge’) to this external entity. Speakers may, however, differ in how well they know the language they share. Young children, for example, may not yet have mastery of the language, but they are in the process of learning it, and thus acquiring mastery. I-languages, on the other hand, are states of an individual’s psychology. In particular, they are states of the psychological computational system which functions to generate complex linguistic structures out of simpler linguistic structures. My I-language is the state of my language faculty. Other people’s I-languages may be similar to mine, incorporating the same rules, but they are token-distinct psychological objects. Crucially, E-languages are individuated by the set of expressions (form-meaning pairs) they license, and which E-language a community or speaker knows is determined by which conventions they adopt.Footnote 16 If it is a convention in a given community that “The battle of Hastings was fought in 1066” is taken to mean that the battle of Hastings was fought in 1066, then that community speaks/knows an E-language containing this form-meaning pair as a member. I-languages, however, are type-individuated by the psychological processes which (partially) enable speakers to identify such form-meaning pairs. In principle, a single E-language could be spoken by a collection of agents with very different I-languages.Footnote 17

Once the distinction between an E-language, as a social object, and an I-language, as a psychological object, is made, it opens the door to a variety of questions about the nature of this psychological object, and its relation to both this social object and to observable linguistic behavior. Once it is recognized that an I-language is supposed to be a genuine component of human psychology, it cannot simply be assumed that there is any simple relationship between this entity and these social objects or behavioral states. If the question of the relationship between thought and language is, as I take it to be, an empirical question about the relationship between two natural (psychological) kinds, then it is likewise an empirical question what these kinds are. That linguistic science has developed so as to investigate this internal, psychological object suggests that our intuitive judgements about which properties natural language has are insufficient. Whether natural language, qua target of theoretical linguistics, is indeed public and ambiguous is thus an empirical question. There are reasons to suspect that it is neither of these things.

Qua target of linguistic science, at least in the generativist tradition, natural language is an aspect of speakers’ psychologies. Thus, even if thought and language are not to be identified, the one is no more public than the other. On this picture, while the language faculty enables (in concert with many other psychological mechanisms) the externalization, and hence publicity, of natural language, the internal computational system is not to be identified with whatever is thereby made public.

The underspecificity objection is similarly undermined in light of this conception of natural language. Structural ambiguity, on this account, strictly involves a many-to-one mapping from internal products of the language faculty to externalized public symbols. While utterances of the series of words in sentence 7 could be used to express distinct thoughts in distinct situations, it is a guiding assumption of much work in linguistic syntax and semantics that the language faculty generates structurally disambiguated expressions, and thus that this ambiguity is introduced only by the linguistically peripheral process of externalization, mapping these type-distinct structures into identical sounds, markings, or gestures. Likewise with ellipsis. Almost all work on this topic in generative linguistics assumes that the underlying, psychologically real, structure includes multiple copies of the elided material, but that some process of externalizing this structure deletes some of these copies. “She loved him, and he her” (8), and “She loved him, and he loved her” are thus, on this view, simply different ways of pronouncing the same linguistic expression. Thus the apparent gap between the linguistic expression and the thought it expresses, i.e. that one but not the other is in some way inexplicit, disappears. The assumption that ellipsis is a feature of pronunciation, not of differing underlying structures, is essential in explanations of why the elided material must be identical to some non-elided expression.Footnote 18

The degree to which the view that the underlying structures of ambiguous expressions are themselves disambiguated is controversial varies depending on the cases. Analyzing sentences 7 and 8 in this way is basically uncontroversial. The ability to account for ambiguities (and lack thereof) in natural language with reference to different underlying structures has been one of the pillars of justification for generative grammar since at least Aspects, in which Chomsky explains the ambiguity of “Flying planes can be dangerous” with reference to distinct underlying structures (Chomsky 1965, p. 21). The ambiguity of sentence 7 is thus accounted for by positing two distinct syntactic structures, one in which ‘in the basket’ is a prepositional modifier of the noun phrase (NP) ‘the keys’ and ‘on the floor’ is the locative argument of the verb, and one in which ‘in the basket’ is the locative argument and ‘on the floor’ modifies the NP ‘the basket’.

The worries raised by lexical ambiguity involve the assumption that the very same linguistic expression (e.g. ‘sanction’) can feature in distinct thoughts. This means that thoughts are individuated more fine-grainedly than lexical expressions and so the two cannot be identified. Again, this argument rests on a folk notion of language, according to which words are individuated by their phonological properties. ‘Sanction’, on this conception, is one word with two different meanings. However, linguistic theory has no reason to stick with these everyday taxonomies.Footnote 19 And in fact, most theories of the lexicon instead view ambiguous expressions as involving the accidental sharing of phonological properties between two distinct lexical entries. So, on this view, ‘sentence 6’ is really a misnomer, as this does not denote a particular sentence, but a class of sentences pronounced in the same way. So the multiplicity of thoughts expressed is perfectly tracked by the multiplicity of linguistic expressions, and the objection disappears.

This also resolves the worries raised by Fodor (2008, p. 73) concerning cases from Kripke (1979) in which a single name is wrongly thought to refer to two distinct people. Fodor claims that this cannot be explained if we think in natural language, as natural language has just one expression here, whereas LOT can distinguish Paderewski\(_{1}\) from Paderewski\(_{2}\). Of course, the correct response here is that there are two linguistic expressions here, they are just pronounced in the same way (and, coincidentally, refer to the same person).

The treatment of deixis (as in sentence 5) requires slightly more machinery. On the face of things, the solution for cases of ambiguity can’t apply here, as different utterances of sentence 5 involve the same words. Whereas a mental lexicon will list multiple entries for ‘sanction’, it being more-or-less a coincidence that these words are pronounced identically, it will of course not list distinct entries for each use of ‘he’ or ‘them’. Thus sentence 5 seems to allow for variation in thought without variation in either grammatical structure or lexical items, the sole determinants of a linguistic expression.

What is needed here is a distinction between lexical types and tokens. These correspond to two distinct roles for lexical items in the use of language. On the one hand, lexical items are repeatable. For language to be useful, I must be able to re-apply the same expression in different contexts. This means I must store enough information about the expression that I can tell when it can be (re-)applied. The lexicon provides a store of just such information. On the other hand, lexical items are constituents of token linguistic expressions, constructed in real time in the process of producing and processing language (and, if IH is on the right track, thought). For a given speaker, there will be only one expression type ‘he’, but as many tokens of this expression as there are complex expressions in which it features. What indexical expressions demonstrate is that the meaning of a token complex expression is a function of the meaning of its token constituents, not of those of its constituent types.Footnote 20

So, to defend IH from worries surrounding context-sensitive expressions, we must view it as a hypothesis about token thoughts: token thoughts are identical with token sentences. Token linguistic expressions are individuated by both their grammatical structure and their lexical constituents. In the case of stable expressions, whose contribution to a sentence is always the same, the type/token distinction could be fudged, but once we are dealing with expressions with variable semantics, it is absolutely crucial. While sentence 5 identifies a type of sentence which can express multiple different thoughts, each such thought corresponds to a distinct sentence token. This kind of argument generalizes to cover cases of polysemy as well. While tokens of the same word-type may contribute differently to different thoughts (e.g. ‘chicken’ in “I don’t think you should feed the chicken lamb” vs. “I don’t think you should feed the lamb chicken”), provided that each token thought is identical to some token linguistic expression, then IH can be maintained.

This response is similar to, but distinct from, Hinzen’s (2015) response to Fodor’s (2001) argument that there are elements found in thoughts which are absent in the language used to express them. Hinzen covers various different cases of this sort, with different strategies for dealing with each. In some cases, it is argued that the thought in question does not have the properties attributed to it (e.g. thinking, while in London, that it is raining is different from thinking it is raining in London), and in others that the linguistic expression does contain the meaning attributed to the thought (e.g. that “It is raining” does mean that it is raining here and now as a function of its grammatical structure). I shall not repeat all of Hinzen’s discussion here, but it is instructive, and seems to adopt a similar strategy to that in the previous paragraph of emphasizing the complexity and particularity which must be attributed to given linguistic structures.

4 Traditional arguments against the identification II: the hard cases

4.1 Variation

One of the most apparent properties of languages is their variation. On the surface, languages seem to be as different from one another as can be. Natural languages differ in their phonological properties (consider the complex consonant clusters of Czech, the rising and falling tones of Mandarin, and the clicks of !Kung), their morphology (compare polysynthetic Yupik to purely isolating Yoruba), their syntax (compare the strict word-order constraints of English with the relatively free word-order of Latin), and in myriad other ways as well. However, given the rejection of strong versions of the Sapir-Whorf hypothesisFootnote 21, it is widely accepted that the thoughts of speakers of these divergent languages do not show this same variation.Footnote 22 The way these thoughts are conveyed may differ in seemingly limitless ways, while the thoughts conveyed remain the same.

This disparity is what makes translation difficult but possible. If there were no linguistic variation, we would be able to communicate with everyone. But if thought itself varied, it is unclear whether communication between speakers of different languages, say through reading a translated work, would even be possible. It is because the thoughts expressed by “I’m hungry” and “Tengo hambre” are assumed to be the same, despite their quite different linguistic properties, that I am able to learn some Spanish by recognizing synonymies of this sort. Again, these sorts of phenomena do not even seem to be statable if we identify the language in which we think with the language which we speak.

4.2 Acquisition

A substantial chunk of Fodor (1975) is dedicated to arguing that natural languages cannot be the language of thought. His central argument is that if we do make this identification, we are faced with a regress. Natural languages, he argues, are learned. That is, children acquire a language by rationally responding to linguistic evidence in their environment, typically the utterances of nearby adult speakers. This fits in nicely with our layperson’s picture of language. Intuitively, we learn the language we speak through various kinds of experiences we have with other people who have already learned it. This account of acquisition also seems to explain the variation we perceive: English speakers don’t say things like “Tengo hambre” because the speakers from whom they learned had themselves learned rules prohibiting this kind of expression.

What Fodor was at pains to show, however, was that the language of thought could not be like this. That is, we cannot learn our language of thought. The reason for this is pretty straightforward. Fodor viewed learning as something like hypothesis testing. To learn, for example, whether one’s language allowed unpronounced subjects, one forms the hypothesis “My language allows unpronounced subjects”, and tests this hypothesis on the basis of one’s primary linguistic data.Footnote 23 However, hypothesis testing presupposes a medium in which to state the hypotheses which are being tested. And so if the language of thought was to be learned in this way, then there must be some further language in which the learner is able to state hypotheses about the language of thought. And so on. Fodor’s solution was to deny that we do learn the language of thought. If the language of thought is innate, i.e. it simply emerges as part of biological development, then there is no question about how to learn it. And it provides the medium in which hypotheses about natural language can be stated. This maneuver thus simultaneously showed how learning a language is possible (i.e via hypothesis testing), and undermined the looming regress (i.e. by claiming that the language in which such hypotheses are stated is innate). But, this proposal immediately precludes the identification of the language of thought with a natural language. If these are identified, we cannot leverage one into an account of the acquisition of the other.

Further, the innateness argument and the variation argument are mutually supporting. If Fodor is right that the language of thought must, in order to preclude a vicious regress, be innate, then those defending IH must likewise view natural language as innate. However, the more languages vary, and the more this variation depends on subtle properties of the environment, the less plausible it is to view language as innately given.

4.3 Solving the hard cases

Much of what I said about the privacy and specificity of natural language is relatively uncontroversial. However, responding to the objections from variation and innateness requires going out further on a limb. For my purposes, the crucial proposal that has been developed in recent linguistic theorizing is that I-languages are species-universal. Whereas traditional approaches to generative theory assumed that much of the work in explaining linguistic variation was to be done within the I-language, certain recent work has suggested that we ought view this variation as instead a product of the different ways in which the same internal system is ‘externalized’.Footnote 24 To see the difference, compare the following two possible explanations for the difference between a language in which (some) wh-expressions are pronounced at the beginning of a sentence (as in English) and those in which they are pronounced wherever in the sentence they receive their semantic interpretation (like Mandarin):

  1. 9.

    What did Harry buy?

  2. 10.

    Húfēi măi-le shénme.Footnote 25

    Hufei  buy-PERF what?

Traditional theories viewed this as a genuinely syntactic difference. The underlying structure of sentences 9 and 10 differed in that the wh-expression ‘what’ in 9 underwent movement:

figure a

whereas the wh-expression ‘shénme’ in 10 did not:

figure b

However, there is an alternative analysis, according to which the underlying syntax of both 9 and 10 is the sameFootnote 26, and they differ only in how this structure is ‘externalized’, or pronounced (strike-out indicates that this expression is unpronounced)Footnote 27:

figure c

These contrasting explanations suggest different understandings of the language faculty and of its states, I-languages. The former explanation assumes a framework in which language variation is explained internal to the language faculty, whereas the latter views variation as a product of the ways that systems outside of the language faculty handle the products of the faculty itself. On this latter account, the difference between wh-movement languages like English and wh-in-situ languages like Mandarin is constituted by differences in the ways that the sensory-motor systems interpret the generated linguistic structures, i.e. whether the lower or higher copy of a wh-expression is relevant for commands to the production systems.Footnote 28 Given this account of wh-movement, one could generalize and envision an approach to linguistic variation which treated all grammatical differences in this way, as contained within phonology, not syntax proper. This latter approach, viewing linguistic variation as a product of language-external ‘externalization’ systems has been suggested by Chomsky consistently over the last couple decades (see especially Berwick and Chomsky 2015) and has been forcefully advanced by Cedric Boeckx (see especially Boeckx 2010, 2014; Boeckx and Leivada 2013).Footnote 29

On such a picture, the language faculty is invariant across the human population. I-languages, states of the language faculty partially responsible for the acquisition and use of language, consist of a computational system capable of constructing complex representations out of simpler representations. The syntactic principles governing such construction are the same for all human language users, as are the conceptual/semantic principles governing interpretation. The differences between languages, which are phenomenologically so overwhelming, are reducible to differences in the ways in which these identical systems interact with extra-linguistic systems of production and to differences within the lexicon.Footnote 30 This approach to the study of language suggests significant deviation from our folk notion, in ways which substantially undermine the traditional arguments against identifying natural language and the language of thought.

As in the above discussion of the division between the grammar and the lexicon, it is worth stressing that this is not merely a terminological question about what we call ‘natural language’. The question is a substantive one: What are the components of the mind? What are the natural psychological kinds? The traditional explanation of linguistic variation proposes that there is a single psychological system responsible for both the similarities and differences between different natural languages. The more recent proposal denies this: similarities are accounted for by the species-universal language faculty, while differences are explained by divergent strategies for expressing linguistic structures. The former system is argued to be the distinctive feature of human psychology which makes human language possible, whereas the latter are more ancient systems, similar to those in many non-human animals, which have been co-opted for linguistic purposes. In this sense, it is an empirical discovery, not mere linguistic stipulation, that the species-universal language faculty is the natural phenomenon of natural language.

Of course, as with almost any scientific development, there will be some degree of linguistic decision in determining how to describe what has been discovered using everyday terminology. The purported ‘discovery’ that tomatoes are not fruit amounts to a decision to adopt a botanical taxonomy rather than a layperson’s culinary taxonomy. The reason that the term ‘fruit’ can be retained through this lexical shift is that there is a close-enough correspondence between the earlier and later uses of the term. Likewise, I believe that linguistic theory provides a picture of what language is, in the sense of the ability to utilize a system of symbols unlike that of any other animal, which serves as a suitable replacement for our previous folk notion. Of course, much of the folk notion will not be retained (e.g. publicity), but neither will much of previous scientific understandings of language (e.g. rules governing surface word order). A plausible scientific theory posits the existence of a species-invariant computational system, part of human’s innate endowment. A reasonable linguistic proposal is that such a system would deserve the name ‘language’. IH is the combination of both of these.

If such proposals are correct, then apparent linguistic variation does not actually indicate different computational systems. If we identify this computational system with natural language, then there is no strictly linguistic variation. There is instead variation only in the way that language is externalized. This is expressed by Chomsky (1993, p. 50) when he claims that “[t]he ‘computational system’ of language that determines the forms and relations of linguistic expressions may indeed be invariant; in this sense, there is indeed only one human language.” Our folk notion of language, which individuates languages partially in terms of such externalization properties, thus misled us into individuating languages much more finely than naturalistic inquiry suggests. Yet another intuitive difference between thought and language turns out to be at least a complex empirical issue, on which the debate is far from settled.

As mentioned in the setup of the objection, objections from variation and acquisition largely stand or fall together. Large amounts of environment-specific variation suggest learning, whereas a species-universal syntax and semantics suggests innate guidance. In fact, one of the arguments in favor of the view of linguistic variation as located exclusively in the forms of externalization is that it is precisely these properties that are available to the learner. That is, surface word order, phonological properties, and certain aspects of the lexicon, are largely made available in the primary linguistic data. Syntactic and semantic properties, on the other hand, are not strictly perceptible. Note that this view of linguistics insists on a strict demarcation between syntactic properties, like hierarchical constituency structure, and surface word order. The former are (alleged to be) species-universal, while the latter are extra-linguistic phenomena depending partially on the language faculty, but also on a variety of other psychological systems, especially ‘interface’ systems, and processes. It is only the latter that the learner can identify directly from the linguistic data, but due to movement and other phenomena which complicate the mapping from syntactic structures to utterances, these provide at best unreliable evidence for the former. Claiming that the underlying structures are known innately, and that all that must be learned is how these underlying structures are mapped onto externalized expressions massively reduces the difficulty of the acquisition problem.Footnote 31 Keeping with our assumed identification between natural language and this computational system, this concludes the case that traditional arguments against identifying natural language with the language of thought fail on account of an outdated understanding of natural language. Natural language, on this conception, is both innate and invariant. As we think that the language of thought must share these properties, the prospect of an identification is at least still open.

I wish to briefly touch on a possible objection which is not in the class of traditional worries about IH: the objection from psycho/neurolinguistics. While still very little is known about the neurobiological processes involved in the acquisition and use of language, future developments may be crucial in evaluating IH. In particular, IH predicts a very close relationship between syntactic and semantic processing. If semantic processing involves identifying the thought associated with a complex expression, then IH predicts that semantic processing requires syntactic processing. ‘Syntax first’ models of processing, such as Friederici (2002) are thus highly compatible with such a picture. However, proposals involving ‘autonomous semantics’, semantic processing occurring independently of syntactic structure, would pose a very serious worry.Footnote 32 Baggio (2018), for example, proposes that interpretation is at least partially independent of syntax. One primary source of data in favor of such a position is the existence of agrammatical aphasiacs who are able to interpret grammatically complex sentences when the semantic content is predictable on the basis of the lexical items, but not when it is not:

  1. 11.

    The apple that the boy is eating is red.

  2. 12.

    The cat that the dog is biting is black.Footnote 33

In 11, but not 12, lexical meaning makes one assignment of arguments to their predicates highly plausible, and aphasiacs can utilize this information to interpret the sentence correctly. However, in 12, one needs to identify the grammatical/thematic relations between argument and predicate (i.e. that ‘the cat’ is the object of ‘bite’ and ‘the dog’ is the subject) in order to correctly identify the sentence’s meaning. Whereas boys typically eat apples, and not vice versa, it may be assumed that cats and dogs bite one another frequently enough to not provide a strong cue for interpretation independent of grammatical constraints. This dissociation between grammar and semantic competence is apparently at odds with the predictions of IH.

The difficulty with interpreting such phenomena is that agrammatical behavior does not guarantee that there are deficiencies in the internal grammatical system. In particular, it is at least possible that these failures to interpret complex grammatical structures are precisely due to failures to map external linguistic inputs onto internal grammatical structures. This interpretation would locate the failure as outside the strictly linguistic system, leaving the possibility that the thoughts of such subjects are expressed by linguistic structures a live option. Of course, it is an empirical question how such debates will be resolved. One motivation for viewing aphasia as a problem with performance (i.e. externalization) rather than competence is that production and comprehension can be dissociated, with one but not the other affected (see e.g. Friederici 1981). This suggests that it is the input/output systems which are damaged, as the core grammatical system is involved in both, and so damage to it should equally undermine production and comprehension.

The variety of neurolinguistic proposals in the literature suggests that there is little in the way of consensus here. While many parties accept a ‘dual stream’ model, according to which linguistic processing is divided into distinct processing routines in distinct neurological regions (the dorsal and ventral streams), there is much debate about which linguistic properties are processed in which streams. Mostly it is agreed that the dorsal stream is used for connecting sounds to action and motor control, but syntax and semantics have been argued to be processed together in the ventral stream (Hickok and Poeppel 2007), and to be interaction effects between both streams (Saur et al. 2008; Bornkessel et al. 2005). Both of these options are consistent with the claim that having a thought/interpreting a sentence requires being able to construct a grammatical structure. Given that such proposals do not treat syntax and semantics as processed independently (i.e. in parallel streams), they suggest that data like 11 and 12 ought be accounted for without positing strictly syntactic deficits. While I take these studies to be inconclusive on this issue, hopefully they point to a further area in which progress in answering the philosophical question about the relation between thought and language can be made by drawing on work in the sciences.

Another possible response to arguments of this sort could be given if we have reason to posit significant disparity between the strategies adopted by the parser and rules governing the grammar, as in Ferreira and Patson (2007) and Ferreira and Lowder (2016). If we posit two different kinds of structure-forming operations in language use, one process following the rules of the grammar and generating ‘deep’ structures, and another positing specialized and simplified heuristics of the parser and generating ‘shallow’ structures, we can account for the difference in 11 and 12 with reference to disruption only to the latter process. This would again involve the claim that aphasia is a performance, not a competence, phenomenon. Aphasiacs are unable to generate ‘shallow’ parses without the guidance of lexical semantic information and associations between concepts, and so cannot use these shallow trees as input to genuine sentence/thought generation, whereas they can do this with the shallow parses generated in response to sentences like 12 which provide the needed semantic clues.

Before finally moving onto the new problem with identifying thought and language, it is important to stress just how tentative all of this is. Positing genuinely syntactic differences between I-languages is still the most common approach to language variation in the linguistics literature, even within contemporary generativist work.Footnote 34 If such an approach is correct, some (but not all) of the arguments given above will be successful. If English and Mandarin differ in that the grammar of one, but not the other, mandates wh-movement, but the thoughts that speakers of these language can have do not differ, then the language of thought cannot be natural language.Footnote 35

5 The new problem: the acceptable but ungrammatical

While I take the conception of what natural language is, qua target of theoretical linguistics, as developed over the past few decades to be more amenable to identification with the language of thought, the methodology of these sciences has moved significantly in the other direction. In particular, while traditional generative theories, especially those in the transformationalist paradigm, assumed a close correspondence between the structures output by the language faculty and utterances, the increased abstraction of contemporary theories has led to a widening gap between these two phenomena. Significant proportions of linguistic behavior are thus viewed as non-reflective of the underlying system, as resulting from the influence of a variety of external, non-linguistic systems.

There are two kinds of gap between the outputs of the grammar and produced utterances: grammaticality without acceptability, and acceptability without grammaticality. Utterances are acceptable when native speakers judge them to be natural. Expressions are grammatical, roughly, when they are generable by the language faculty. That these two are not the same has been one of the central assumptions guiding the methodology of generative linguistics. However, while this gap has been central to generative theory since its inception, the magnitude of the gap has increased significantly. In the early days, it was largely assumed, explicitly or implicitly, that acceptability tracked grammaticality modulo certain kinds of ‘deficiencies’. For example, Chomsky (1965) identifies “memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance” (p. 3) as sources of disparity between linguistic performance and competence.Footnote 36 Canonical examples of unacceptable but grammatical sentences include center-embeddings such as “the mouse the cat the dog chased caught squeaked”, which despite being formed by perfectly normal grammatical rules place too substantial a burden on parser memory to interpret. However, this fairly simple relationship between acceptability and grammaticality has little to motivate it. If extra-linguistic factors can serve to make grammatical sentences unacceptable, there is no reason why they should not also be able to make ungrammatical sentences acceptable.

It was the methodology of early grammatical theory, not the theoretical claims themselves, that suggested such an asymmetry. Given the tools of transformational grammar, it was possible to account for just about any linguistic data, and so sentences taken to be acceptable could easily be ‘predicted’ by grammatical theories by introducing new transformational rules. However, as linguistic theory has developed, the constraints on appropriate theory-formation have become significantly stronger, especially in the contemporary Minimalist Program. This means that it is often better to exclude the observations from the purview of the theory, and denounce even acceptable sentences as ungrammatical, than to complexify the theory so as to account for them.Footnote 37 Trotzke et al. (2013) provides a very clear recent statement of this approach: “[C]ertain attested utterances are explained outside the grammar proper. This permits a much simpler grammar than would otherwise be possible...” (pp. 26–27).

One complication in all of this is that claiming that an utterance is grammatical is, strictly, a kind of category mistake. This is why I flagged that the above definition of grammaticality as generability by the language faculty is only roughly accurate (see also Sect. 6.4 for further difficulties with this account of grammaticality). Along with the notion of a natural language, the notion of grammaticality has itself undergone significant revision in the development of generative grammar. As stated above, grammaticality is the property of being generable by the language faculty. But the language faculty generates structured psychological representations, not publicly observable utterances. The operational notion of grammaticality, as applied to sentences, is something along the lines of: producible via a relatively transparent mapping from the syntactic structure to a linearized utterance. Exactly what this mapping is is a matter of much debate in contemporary phonosyntax. Kayne (1994)’s Linear Correspondance Axiom, which states that linear order is determined by asymmetrical C-command, is one famous proposal for such a mapping. Some such proposal is needed in order to make sense of the notion of the grammaticality of a sentence. In this way, the grammaticality of a sentence, as opposed to a syntactic structure, is a derived notion dependent on both syntactic and phonological rules.Footnote 38\(^{,}\)Footnote 39

With this background out of the way, we can get to the problem with identifying natural language and the language of thought: ungrammatical but acceptable sentences. These are sentences which speakers are able to interpret but which are not licensed by the rules of grammar. The problem is obvious: if speakers can interpret these sentences, i.e. the sentences express an available thought, but they cannot be generated by the language faculty, this suggests that the set of possible thoughts and possible natural language sentences are not even extensionally equivalent, let alone identical. Some thoughts are not expressible in our natural language, and so there must be some medium other than natural language in which they are expressible.

While the objection itself is relatively straightforward, identifying genuinely problematic examples is a little trickier, and turns on various empirical claims about the grammatical properties of the language faculty and the ways in which syntactic structures relate to utterances. Keeping in mind the distinction between the syntactic structure and the utterance form is crucial here, as the existence of a gap between linguistically licensed structures and possible thoughts is only demonstrated if the utterances in question do not correspond to (i.e. are not externalizations of) some underlying legitimate syntactic structure. And these expressions may indeed so correspond even if the way that they are pronounced introduces deviations from that predicted by the grammar. For example, as the above quote from Aspects makes clear, disfluencies may be viewed as creating a gap between competence and performance:

  1. 13.

    I... umm, went to the, uh, to the shop.

While the inclusion of filler terms like ‘umm’ and ‘uh’ and repetitions are, of course, not reflected in the grammatical structure of this utterance, deviations from the grammatically predicted sentence of this sort pose no problems for the proposed identification of thought and language. The thought conveyed by such utterances seems to be perfectly captured by a grammatical product of the language faculty. The aspects of the utterance which seem unreflected by this psychological linguistic expression are likewise not found in the thought it expresses, and so there is no need to posit a gap between the natural language expression and the thought. This strategy, of isolating the source of ungrammaticality in extra-linguistic processes will be one of the central ways of defending IH in the face of apparent counter-examples.

This account of disfluencies is similar to the above discussion of the difference between wh-raising and wh-in-situ languages. In both cases, it is claimed that one-and-the-same underlying grammatical structure can be realized by multiple different externalizations. The difference is that in the account of different question-formation operations, the multiplicity of ways of externalizing is explained with reference to specific phono-syntactic rules of which expressions get pronounced. In the account of disfluencies, however, the disparity is a product instead of much less well understood features of general, i.e. extra-linguistic, cognition and performance systems. We might, following the distinction between the language faculty in the narrow sense and in the broad sense (Hauser et al. 2002), distinguish between narrow and broad phonology, where the former refers to linguistic rules of pronunciation, while the latter picks out whatever psychological processes are involved in ‘translating’ a syntactic structure into a public symbol. In general, then, when apparent divergence between linguistic expressions can be attributed to phonological processes, broad or narrow, this does not pose a problem for IH.

Other examples of acceptable but ungrammatical sentences require a different treatment. In many cases, we can recognize that a sentence is ill-formed in some way, but nonetheless understand it. Agreement violations and certain kinds of subcategorization violations provide examples:

  1. 14.

    They is happy.

  2. 15.

    Him told me the time.

  3. 16.

    The child seems sleeping.

In all these cases, we can recognize that something has gone wrong (number agreement, case agreement, and subcategorization requirements, respectively), but we can nevertheless understand what is meant. Calling such expressions ‘acceptable’ is a stretch, given that they do sound quite wrong. However, the importance of ungrammatical but acceptable expressions, for our purposes, was that they could be interpreted. And even though they sound bad, these expressions clearly meet this criterion. Further, these examples do not seem suitable for a ‘phonological’ explanation, as given for the examples above. It doesn’t seem that in these cases there is a grammatically correct underlying structure which has been modified to produce these strange utterances. Instead it seems that the underlying structure itself is ungrammatical.Footnote 40

Do these examples, then, provide the requisite case demonstrating an extra-linguistic medium for thought? Probably not. Such examples are probably best accounted for by some psycholinguistic ‘repair’ strategy, which maps these ungrammatical sentences onto corresponding grammatical structures. The literature on such parsing ‘repair strategies’ is large, but it fairly consistently adopts the view that given the lexico-semantic information made available by word-recognition, the parser is largely able to reconstruct the meanings of expressions despite grammatical deviance.Footnote 41 Again, if such approaches are on the right track, these examples pose no problem for the identification of thought and language. Quite the reverse: the parser’s being required to produce a grammatical analog of these ungrammatical sentences, in order for the hearer to be able to interpret them, is exactly what would be predicted if the thought conveyed by a sentence was necessarily expressed in natural language.

The real problem for identifying thought and language can be seen by examining cases which seem to involve learning that grammatical constraints can be violated in certain circumstances. Remember that it is crucial in defending this position that languages are neither learned nor variable: that the language faculty, as opposed to the lexicon and principles of externalization, is innate and species universal. However, there do appear to be acceptable linguistic constructions which violate the rules of the grammar, and thus indicate that some thoughts are not expressible as natural language constructions. One paradigm case of such constructions are adicity violations:

  1. 17.

    Ivan sneezed his tooth across the table.Footnote 42

  2. 18.

    Siobhan danced the night away.Footnote 43

The adicity, or argument structure, of verbs is one of the crucial ingredients in determining the grammaticality of structures containing it. That the adicity of an expression is part of the stored information attached to the term is a necessary part of explanations for a wide range of linguistic data. In particular, speaker intuitions about the unacceptability of adicity violations is highly robust. All competent speakers of English agree that “Ivan sneezed Siobhan” is unacceptable. This fact can be explained with reference to the shared knowledge that ‘sneeze’ is an intransitive verb, and thus cannot take a direct object. However, as Goldberg and Jackendoff, and others in the construction grammar tradition, argue, speakers are able to learn, via certain kinds of analogical processes, that there are constructions in which these verbs function differently. If the adicity of these expressions goes into determining which structures are generable by the language faculty, then the ability to interpret these constructions seems to require the ability to have thoughts that are not constructible within the confines of grammatical principles.Footnote 44

A similar kind of worry comes from apparently interpretable violations of syntactic constraints. Consider:

  1. 19.

    This is the department which employs a teacher who speaks every language.Footnote 45

To my ear, this sentence is ambiguous. On one reading, the identified department employs at least one amazingly gifted linguist capable of speaking every human language. On the other, it makes the much more modest, but still impressive, claim that each human language is spoken by at least some teacher in the department, although it may be that no teachers speak all of the languages. While this second reading may be a little unnatural, it is, I believe, available, and informal polling has supported this. This poses another problem for the claim that thoughts are expressed in natural language. This is because this reading seems to involve interpreting the quantified noun phrase ‘every language’ as taking wide scope over ‘a teacher’.

On standard grammatical assumptions, going back to May (1985), quantifier scope is determined by relations of C-command after raising the quantifier expressions to the left-periphery of the expression. In order to get this second reading, then, ‘every language’ must be attached to the root of the syntactic structure after ‘a teacher’ is raised. The problem is that there are principled reasons to deny that ‘every language’ can be raised at all. Going back to Ross (1967), relative clauses have been viewed as ‘islands’: expressions which prevent extraction. This ‘relative clause constraint’ explains why the following question is ungrammatical and thus unacceptable:

  1. 20.

    *Which language did the department employ a teacher who speaks?

The attempt to raise the wh-expression ‘which language’ from the relative clause ‘who speaks which language’ results in ungrammaticality and thus unacceptability due to this island constraint. But why then are we able to raise the quantifier expression out of this clause in order to get the reading of sentence 19 wherein ‘every language’ takes wide scope? It seems we must be able to interpret this sentence despite the fact that the sentence expressing it (under this reading) violates the grammatical constraints imposed by our grammar. Thus we can think a thought our natural language cannot represent.

An analogy could be made here to learning an artificial language, such as a formal logic. Such languages are not consistent with the principles of natural language grammar, and thus cannot be acquired in the normal way that we acquire a (first) language.Footnote 46 They seem instead to be acquired through the use of more general psychological tools of inference, memorization, and extrapolation. Likewise it seems that, alongside the development of our natural language, we can acquire, through these more general learning processes, a variety of additions and exceptions to the grammatical principles governing the construction of natural language expressions. If these additions genuinely increase the set of thoughts that one can have, then the language of thought is not reducible to natural language. I believe the cases just described present the best case against this proposed identification.

6 Solution strategies

While I take the problem of acceptable but ungrammatical expressions to be a serious barrier to identifying thought and language, there are strategies that can be used in the attempt to explain away these difficulties. The first two of these, phonological explanations and repair strategies, we saw in the previous section. These aim to show that, while the acceptable utterances may appear to violate the grammatical principles governing the language faculty, they may nonetheless be suitably related to an underlying legitimate structure. As IH says only that thoughts are expressed by the outputs of the language faculty, as long as our interpretation of these utterances is given by these underlying grammatical structures, such expressions do not pose a problem for this proposal. After elaborating on these strategies, I will turn to two more responses to the argument from acceptable but ungrammatical expressions. The first will involve claiming that the ‘thoughts’ grasped in interpreting these ungrammatical expressions may be quite unlike those grasped when we grasp grammatical thoughts, perhaps on analogy with the representational capacities of non-human animals. The second will draw a distinction between ways in which expressions can be deemed ungrammatical by the language faculty, and argue that only one of these ways poses a problem for IH. If all of the remaining apparent cases of acceptable but ungrammatical expressions are ungrammatical in this way, the hypothesis may be saved.

6.1 Complicate the morphophonology

As I have argued, the central development in linguistic theory which has made the revitalization of IH plausible is the distinction between the core processes of the computational linguistic system and the processes of externalization recruited to publicize the structured representations made available by this system. As several cases above made clear, this distinction is crucial in accounting for apparent linguistic diversity without committing to the claim that the underlying linguistic system, the natural language according to the proposal advocated in this paper, itself varies between speakers. The apparent difference between wh-in-situ and wh-movement languages can be accounted for with reference to different externalization strategies, and so the thesis that the syntactic/semantic properties of these languages are identical can be retained. If phonological processing of this sort is indeed peripheral, or subsidiary, to the core operations of language, this is an argument that English and Mandarin speakers really do speak the same ‘language’, in this technical sense. This thus undermines the argument against IH which claims that languages vary in ways that thoughts do not.

This raises the possibility that the examples of ungrammatical but acceptable expressions claimed to pose a problem for this identification can be handled in a similar way. On such a proposal, learned constructions ought be viewed as acquired conventions about phonology, not syntax. That is, sentences like 17 and 18 could be analyzed as learned ‘pronunciations’ of grammatically acceptable expressions, such as:

  1. 21.

    Ivan sneezed and thereby caused his tooth to move across the room.

  2. 22.

    Siobhan danced until the night was over.

If 17 and 18 have the same underlying structure as 21 and 22 respectively and differ only in the way this structure is mapped onto an externalized production, then their acceptability can be explained with reference to this structure, which is presumably generable by the language faculty. Thus the problem for IH disappears: no thought is available beyond those made so by the language faculty.

How plausible such a strategy is is a vexed empirical question, depending on complex issues concerning the relation between morphology, syntax, and phonology. Viewing surface forms as derived from apparently very different underlying structures is, however, a very familiar idea in these literatures, within the programs of lexical decomposition (e.g. Wierzbicka 1996; Jackendoff 1996), lexical semantics (e.g. Hale and Keyser 2002; Pustejovsky 1991) and distributed morphology (Halle and Marantz 1994).Footnote 47 Sentences like 21 and 22, as candidates for more transparent representations of the structures underlying 17 and 18, are in line with the general thrust of such approaches, which assume that it is only semantically “general” expressions such as ‘cause’ or ‘until’ which can be featured in the underlying representations without being pronounced.

Cross-linguistic work such as Dixon (2000) has shown that languages appear to vary in whether they allow causative constructions like 17 and 18, or whether the causality must be encoded with a causative morpheme. Japanese, for example, has a productive morphological rule for transforming a verb (e.g. ‘agaru’ to go up becomes ‘ageru’ to raise i.e. to cause to go up). That causality must be marked in surface structure in many languages provides some motivation for viewing causatives like 17 as simply the English strategy for externalizing what is, in its grammatical structure, akin to 21, and a similar cross-linguistic story could be told for resultative constructions like 18. Further work by Papafragou et al. (2002), building on discussions of Talmy (1988), has shown that while the surface properties of languages may vary in the ways they encode things like motion, this does not seem to influence other, non-linguistic, cognitive processes such as recall. This again is compatible with the theory here developed. If the differences between languages are restricted to mappings from thoughts to sensory-motor systems, we would predict that there would be no influence of ‘linguistic variation’ on other cognitive systems.

6.2 Repair

A closely related strategy is that of repair. Instead of viewing such anomalous (from the perspective of grammatical theory) expressions as opaque phonological mappings from underlying grammatical structures to externalized expressions, we can treat such utterances as genuinely ungrammatical (i.e. not products of normal phono-syntactic and phono-morphological rules applied to the products of the language faculty) but posits mechanisms by which they are ‘translated into’ grammatical expressions which can then be interpreted to give the contents of the thoughts.Footnote 48

Which structures such processes produce is another empirical question, but the structures underlying sentences like 21 and 22 again seem like plausible candidates. This strategy and the previous one may well shade into one another, depending on how parsing (and production) mechanisms relate to the posits of phonological and morphological theory. If these ‘performance systems’ utilize the rules of the latter theories, as argued for in Phillips (2004, 2013b) and Phillips et al. (2011), then there may be no difference: ‘repair’ would just amount to the application of such rules to public symbols so as to reproduce the grammatical structures from which they derive. If, on the other hand, parsing mechanisms utilize quite different strategies, perhaps the heuristics of Ferreira and Patson (2007), then there will be a clean divide between them.

While I believe these two strategies are plausibly the best bet for defenders of IH, there are serious obstacles to application of these strategies. The most serious is the overgeneration problem. Positing a repair strategy or phonological process by which apparently ungrammatical utterances can be mapped onto grammatical underlying structures is liable to overgenerate, and predict that expressions which are in fact unacceptable would be legitimized by these very processes. That is, one must ensure that any proposed strategies for exacting the mapping from acceptable but ungrammatical expressions to grammatical structures does not also suffice to map unacceptable expressions to grammatical structures.

For example, one possible strategy for ‘repairing’ adicity violations such as sentences 17 and 18 would be that rather than identifying the (usual) argument structure of the identified verb, and thus precluding the generation of a structure with the intended number of arguments, the parser first identifies the intended argument structure and creates a ‘verbal skeleton’, which has the right argument structure but with a dummy variable where the verb should be:

  1. 23.

    Ivan V his tooth across the table.

  2. 24.

    Siobhan V NP away.

These are perfectly normal grammatical structures (cf. “Ivan pushed his tooth across the table” and “Siobhan gave her money away”). The offending expressions, which can’t typically be found in structures of this sort, can then be late-inserted and coerced into taking on the transitive meanings intended.

The difficulty with this is that it is unclear how to prevent overgeneration, predicting that sentences which are in fact unacceptable could be salvaged in these ways.Footnote 49 For example, Jackendoff (1997) describes a wide range of constructions closely analogous to sentence 18 that seem semantically plausible but which are nonetheless unacceptable. Consider, for example:

  1. 25.

    *Siobhan danced the Tango the night away.

  2. 26.

    *Siobhan danced happily the night away.


  1. 27.

    *Ivan sneezed violently his tooth across the table.

In all these cases, we can understand what these sentences would mean, but they are clearly bad. The difficulty then is explaining why such a repair strategy cannot likewise be used to salvage these expressions. If ‘dance’ and ‘sneeze’ can be treated as transitive verbs, why is this impossible for ‘dance the Tango’, ‘dance happily’ and ‘sneeze violently’? Of course, there are things one can say about such constraints. It appears that these constructions allow this sort of coercion to apply only to verbal heads, not to VPs. This fact must itself be explained however: if pragmatic strategies allow for the mapping of ungrammatical expressions onto grammatical structures in the case of 17 and 18, why can similar processes not apply to 25–27?

It is worth noting that accounting for these examples in this way would not merely defend IH from this objection, but would provide positive support for it. In line with Hinzen’s arguments discussed above, if interpretation of an utterance requires that we ‘translate’ it, mapping it onto a grammatical expression, this reinforces the idea that there is a one-to-one mapping between possible thoughts and possible grammatical expressions, as predicted by IH. It is not always noted that the assumption that ungrammatical sentences must be repaired in order to be understood itself requires explanation and justification. If we can grasp thoughts which are not expressed by grammatical sentences, then we might expect some ungrammatical sentences to be understood ‘directly’, i.e. by mapping them onto the LOT (where this is assumed to be distinct from natural languages) without repairing them. If we discover that such repairs are indeed always required, this provides strong evidence that understanding a sentence, i.e. grasping the thought it expresses, simply is constructing a natural language expression.

Examples like 19 may present even more serious overgeneration worries. Any strategy which loosens the constraint on extraction from relative clauses so as to allow for the ambiguity of 19 must not thereby predict that 20 is acceptable. One could propose that some repair strategy enables us to loosen the locality constraints on quantifier raising (but crucially not on wh-movement). However, this proposal similarly overgenerates:

  1. 28.

    It was a man who told me that every philosopher loves Frege.

Despite featuring a cleft construction which appears to avoid the constraints on movement in the cases above, this sentence is not ambiguous. It cannot be read as claiming that every philosopher is such that a man (read specifically or non-specifically) told me that they loved Frege. That is, one cannot read the embedded ‘every philosopher’ as scoping out of the that-clause and over the focused matrix subject ‘a man’. Proponents of the repair strategy must not posit repair mechanisms for 19 which also predict scope ambiguities in 28.

6.3 Different kinds of thought

As Hinzen (2013, Sect. 7) points out, claiming that human thought occurs in natural language does not preclude the possibility that non-human (and thus non-linguistic) animals engage in some forms of ‘thought’. The crucial idea behind IH is that human thought, as expressed by the structures of natural language, forms a natural psychological kind. This is consistent with there being many other kinds of psychological representation. Indeed, it is clear that the representational formats of large parts of cognition, such as vision (see e.g. Palmer 1999) or map-like locational representation (e.g. Camp 2007) are quite unlike expressions of natural language. One option for defending IH, then, is allowing that we can interpret ungrammatical sentences, and that we do so without mapping them onto grammatical structures, but that this involves a quite different kind of cognition than that used when we understand grammatical sentences. On this proposal, identifying thought and language is a kind of ‘explication’ in something like the sense of Carnap (1962): natural language expressions are the vehicles for a substantial amount of what is pretheoretically called ‘thought’, the class of natural language expressions forms a natural kind, and this class includes many of the ‘core cases’ of our pretheoretic notion, and thus we are justified in revising our conception of thought in line with this hypothesis.

It is, I assume, an empirical possibility that the everyday notion of ‘thought’ does not pick out a uniform psychological phenomenon (beyond the heterogeneity usually assumed by this term in philosophical discussions, which apply it to distinct psychological kinds like belief, desire, intention, etc.). There are several options we could take in response to such a discovery. One would be eliminativist, eschewing talk of thoughts altogether. Of course this would preclude IH. The proposal just sketched, however, would instead select some subset of the things we antecedently viewed as thoughts, and treat that as the extension of our new, scientifically useful, concept. If, for example, sentence interpretation in general turned out to centrally involve the construction of a syntactic structure in line with the constraints of Universal Grammar, but that in certain rare cases, when such a strategy was unavailable, interpreters resorted to the construction of a different kind of psychological structure, it may be defensible to hold onto IH by viewing only the former as the extension of the new, explicated, notion of ‘thought’. Of course, how plausible/appropriate this linguistic maneuver is will depend on how much, and in particular how many of the ‘core cases’, of our traditional notion of ‘thought’ is covered by these linguistic structures. If non-linguistic interpretations are common and paradigmatic instances of thought, this explication will amount to little more than a stipulation of the truth of IH. If, however, exceptions are quirky and unusual, IH could be viewed as a genuine kind identity, and explication would thus be useful. Relatedly, if work on animal cognition suggests a close correspondence between animal and human cognition, this would pose a problem for such an explication of ‘thought’, as non-human animal thought, we are assuming, is not structured linguistically.

As above, this strategy is more plausible for some phenomena than others. In particular, it is often thought that non-linguistic representational media are very bad at expressing certain sorts of ‘logical’ content. Quantifiers, negation, disjunction, etc. seem difficult to express without language. This suggests that such an approach is unlikely to be of much use in handling the recalcitrant sentence 19.

6.4 Filters versus ungenerable expressions

The final possible response involves distinguishing between two ways in which an expression can be deemed ‘ungrammatical’. Some expressions are ungrammatical on account of not being generable at all by the language faculty, whereas others are ungrammatical in virtue of violating some constraint on what the outputs of the language faculty must be like. This distinction largely originated with Chomsky and Lasnik (1977), and was then incorporated as one of the main features of Government and Binding Theory (Chomsky 1981). Traditional GB theory included general constraints on what kinds of structures could be produced (centrally, X-bar Theory and the sole transformational rule ‘move \(\alpha \)’), as well as a collection of ‘filters’, which served to further limit the set of acceptable expressions by excluding those structures which, though generated in a perfectly legitimate way, had some illegitimate property. Perhaps the most famous example of the latter is the Case Filter, which states that all overt NPs must be assigned Case. This constraint on grammaticality is posited in order to account for, inter alia, cases like the following:

  1. 29.

    *(It) seems the child to be sleeping.

  2. 30.

    The child seems to be sleeping.

For various reasonsFootnote 50, it is important that grammatical theory not preclude the language faculty from generating structures like 29. However, it is clearly unacceptable. The Case Filter provides an explanation for this. Infinitive verbs (‘to be sleeping’) do not assign Case properties to their arguments. As ‘the child’ is the overt subject of this expression in 29, the Case Filter rules it out. This further explains why we find sentences like 30 in English. Because the embedded verb can’t assign Case to the NP, the NP must move to the higher, tensed, verb, which can. This movement occurs even though ‘the child’ is the semantic argument of ‘sleeping’, and not of ‘seems’. In 30, then, the overt NP is assigned Case, and so the filter is not violated and the sentence is grammatical, and thus acceptable.

The crucial point about this for our purposes is that, despite our ability to recognize sentence 29 as unacceptable, we do know what it means. This fact can be accounted for within the confines of IH if it is allowed that thoughts can be expressions of the language faculty, even if these expressions are marked as ungrammatical. On this view, we can have thoughts which are themselves ill formed, just as we can produce expressions which are ill formed. What is ruled out is having thoughts which the language faculty cannot even produce. We can thus distinguish two kinds of ungrammaticality. One kind involves the production of a full-fledged syntactic structure which is somehow ‘marked’ as ungrammatical.Footnote 51 The other involves a complete failure to even produce a structure. It seems that ungrammatical but acceptable sentences of the latter sort pose a deeper worry for the proposed identification of thought and language. If we can understand an utterance which cannot even be generated by the language faculty, it seems there must be some extra-linguistic medium which can serve as a vehicle for thought. But if the examples are ungrammatical in the former way, the language faculty will produce a vehicle for such thoughts, although it will indicate that this vehicle is in some way ill formed.

I have so far stated this response in the terms of GB theory. Doing so in the terms of the contemporary Minimalist Program is slightly more complicated. ‘Filters’ in this program have largely been replaced by ‘interface conditions’, demands imposed on the outputs of the language faculty which ensure that they are ‘legible’ by the semantic and phonological systems which are used to interpret and externalize the products of the language faculty. Standard Minimalist accounts of movement treat it as arising out of the need to remove uninterpretable features, i.e. properties of lexical items which result in failures at the interfaces. Case, for example, is viewed as an uninterpretable feature of arguments (DP or NP, depending on the theory) which gets deleted when the argument expression is in a local relation to a Tense expression. This will motivate movement when arguments originate within verb phrases (see fn. 38) which cannot perform this function of eliminating uninterpretable features.Footnote 52 Similar accounts are given of other agreement phenomena such as gender and number.Footnote 53

Whether this Minimalist re-interpretation of what were traditionally viewed as filters can re-instate the traditional distinction between two forms of ungrammaticality depends on how we interpret the claim that unchecked/undeleted features are uninterpretable by the interfaces. At face value, this would seem to undermine the proposal that such cases of ungrammaticality involve merely ‘marking’ these structures as deficient in some way. If they are literally uninterpretable, especially by the semantic interface, then it seems that we cannot rescue IH by saying that such sentences can be interpreted despite being ungrammatical. However, it is not clear that one should read this term so strongly.Footnote 54 Sentences like 29 are interpretable, in the everyday sense of the term, despite being recognized as ill formed. I thus suggest that we ought read ‘uninterpretability’ at the interfaces as exactly in line with the account of traditional filter violations given above: uninterpretable expressions can be assigned meanings, and so can serve as vehicles for thought, although they are marked as grammatically defective.Footnote 55

On the other side of this distinction between kinds of grammaticality, some expressions do seem to be genuinely ruled out by Minimalist accounts of the language faculty; not merely in that they are ‘uninterpretable’, in the sense just identified, but that they cannot even be constructed. Economy constraints, principles governing the workings of the language faculty which ensure that its operations are maximally computationally efficient, seem to operate in this way. The Subjacency Constraint, which provides a limit on the distance (defined structurally) an expression can be moved by a single operation, is motivated on the grounds that allowing the system to perform long-distance movement would create too substantial a computational cost.Footnote 56 Such a proposal is involved in explaining the difference between the following expressions:

  1. 31.

    What did [\(_{\hbox {IP}}\) Rahim claim [\(_{\hbox {CP}}\) that [\(_{\hbox {IP}}\) he read ]]]?

  2. 32.

    *What did [\(_{\hbox {IP}}\) Rahim believe [\(_{\hbox {NP}}\) the claim [\(_{\hbox {CP}}\) that [\(_{\hbox {IP}}\) he read]]] ?

If movement is restricted so that it can cross at most one ‘bounding node’ (in English, IP or NP)Footnote 57 at a time, we can explain the above pattern. In 31, there are two IPs that must be crossed, but the wh-expression can make this movement in two steps, each of which crosses only one. ‘What’ can first move to the specifier of the embedded CP, as marked by the intermediate struckout ‘what’, and then to the sentence-initial position in which it is pronounced, and so no ‘long-distance’ movement is needed. However, in 32, while the wh-expression can legitimately move initially to the embedded CP, from there it must move to the sentence-initial position. But to do so would involve crossing two bounding nodes (the embedded NP and the matrix IP), and there is no intermediate ‘landing site’ which could be used to break up this journey. This thus explains the acceptability of 31 and the unacceptability of the otherwise quite similar 32.

Subjacency, and other constraints resulting from economy considerations, are ungrammatical in a stronger sense than violations of filters/interface conditions. It is a crucial part of the explanatory strategy of the Minimalist Program that such constraints prevent representations which would violate them from being constructed in the first place.Footnote 58 This is reflected in the more extreme unacceptability responses they generate. Whereas one can assign a meaning to 29 despite its obvious unacceptability, sentences like 32 are typically genuinely uninterpretable (in the non-technical sense). This fact provides further motivation for IH, as identifying thought and language enables us to explain the distribution of acceptability in ungrammatical sentences. Ungrammatical but acceptable sentences may involve the generation of structures which are subsequently marked as ungrammatical. Despite this marking, the fact that the structures are produced provides a possible vehicle for thought. However, when a sentence is ungrammatical in virtue of not even being generable, there is no vehicle for thought present and thus the sentence cannot be interpreted at all.

However, when sentences which are predicted to not even be generable are nonetheless interpretable, this poses a particularly deep problem for the proposed identification. The reading of sentence 19 where ‘every language’ takes wide scope appears to be of this sort. The constraint on extraction from relative clauses is typically analyzed as a subjacency violation.Footnote 59 This suggests that the interpretability of expressions like this, under readings which violate the subjacency constraint, are seriously problematic for IH, as the thoughts they seem to convey are not even producible by the language faculty. On the other hand, sentence 19 is somewhat marginal, so we may not wish to place too heavy an argumentative burden on sentences of this sort, which would be good news for IH.

While I believe these strategies are reasonably exhaustive, I don’t wish to commit to the claim that there are no other options for handling acceptable but ungrammatical expressions from the perspective of IH. The aim of this section is just to stress the ways in which a defense of this proposal relies on the outcome of several ongoing debates in the linguistics literature. If IH is to be successfully defended, I believe it will involve a combination of at least all of the strategies covered in this section.

7 Conclusion

In this paper I have hoped to show that the Identification Hypothesis is not as implausible as is often thought. In particular, once it is recognized that the relata in the purported identification relation are the language of thought and human I-language, many of the traditional problems with the view disappear. I also raised a novel obstacle to defending this proposal, acceptable but ungrammatical expressions, and pointed to several strategies for responding to worries of this sort. It is, at least, an open empirical question whether natural language, as identified as the target of scientific linguistic theory, differs from thought in the ways such arguments assume.