Introduction

The term ‘Artificial Intelligence’ (AI) means many things to many people (see Table 1 for different types of meanings). Sometimes, the term ‘AI’ is used to refer to the idea that intelligence can be recreated in artificial systems (Russell & Norvig, 2010). Other times, it refers to an artificial system believed to implement some form of intelligence (i.e., ‘an AI’). Some claim that such an AI can only implement domain-specific intelligence. An example could be an AI playing chess, where there is a fixed problem space defined by a fixed board, a limited number of pieces, and a small set of well-defined rules. Such a domain-specific AI can play chess, but cannot do the dishes or do medical diagnosis. Others believe that domain-general AIs—also known as artificial general intelligence (AGI)—can exist (Bubeck et al., 2023; cf. Birhane, 2021.) This domain generality can be seen as a key property of human intelligence, so that AGI would be human-level AI, able to incorporate arbitrary beliefs to solve arbitrary new problems. A person can not only play a game of chess, but reason about why their opponent has angrily left the room, and later draw on that event when writing a novel. In the history of both cognitive science and AI, it is generally understood that this domain generality is what makes human-like cognition so hard to explain, model, and replicate computationally (Fodor, 2000; Pylyshyn, 1987; van Rooij et al., 2019).Footnote 1

The term ‘AI’ is also used to refer to the research and/or engineering field pursuing the creation of AI systems based on the idea that doing so is possible and desirable. AI as a research field is historically diverse (see Table 1), including AI-as-engineering approaches that focus on different domains of application and draw on different modeling and engineering frameworks, including not only the currently popular machine learning approach, but also approaches involving Bayesian Networks, decision support systems, and robotics. Hence, the ambition to build AGI is a subfield of the broader field of AI.

Among the more troublesome meanings of ‘AI’, perhaps, is as the ideology that it is desirable to replace humans (or, specifically women) by artificial systems (Erscoi et al., 2023) and, generally, ‘AI’ as a way to advance capitalist, kyriarchalFootnote 2, authoritarian and/or white supremacist goals (Birhane & Guest, 2021; Crawford, 2021; Erscoi et al., 2023; Gebru & Torres, 2024; Kalluri, 2020; Spanton & Guest, 2022; Stark & Hutson, 2022; McQuillan, 2022). Contemporary guises of ‘AI’ as idea, system, or field are also sometimes known under the label ‘Machine Learning’ (ML), and a currently dominant view of AI advocates machine learning methods not just as a practical method for generating domain-specific artificial systems, but also as a royal road to AGI (Bubeck et al., 2023; DeepMind, 2023; OpenAI, 2023). Later in the paper, when we refer to AI-as-engineering, we specifically mean the project of trying to create an AGI system through a machine learning approach.

One meaning of ‘AI’ that seems often forgotten these days is one that played a crucial role in the birth of cognitive scienceFootnote 3 as an interdiscipline in the 1970s and ’80s. Back then, the term ‘AI’ was also used to refer to the aim of using computational tools to develop theories of natural cognition. As Simon (1983, p. 27) put it 40 years ago,

Artificial Intelligence has two goals. First, AI is directed toward getting computers to be smart and do smart things so that human beings don’t have to do them. And second, AI (sometimes called cognitive simulation, or information processing psychology) is also directed at using computers to simulate human beings, so that we can find out how humans work.

This view of ‘AI’ as a research field overlapping with psychology sees computational AI systems as theoretical tools: ‘Many early AI researchers were concerned with using computers to model the nature of people’s thinking’ (Langley, 2006).Footnote 4

Table 1 A non-comprehensive list of different (not mutually exclusive) meanings of the word AI, including AI as idea, AI as a type of system, AI as a field of study, and AI as institution(al unit)

Accordingly, AI is one of the cognitive sciences (Fig. 1), and for decades, there was a close dialogue between the fields of AI and cognitive psychology (Forbus, 2010; Gentner, 2010, 2019; Miller, 2003). This is furthermore illustrated by the use of ‘cognitive simulation’ and ‘information processing psychology’ as alternative labels for ‘AI,’ favoured by Simon and associates.Footnote 5 It is also illustrated by publications of cognitive psychological modelling research in Artificial Intelligence journals up to the ’90 s (Anderson, 1984; Cooper, et al., 1996; Thagard et al., 1990) and early 2000 s (Thagard, 2007), with still some notable exceptions these days. At the turn of the millennium, however, these productive ties between AI and psychology became severed:

the past 20 years have seen an increasing shift in AI research away from concerns with modelling human cognition and a decreasing familiarity with results from psychology. What began as a healthy balance [...] has gradually become a one-sided community that believes AI and psychology have little to offer each other. (Langley, 2006, p. 2)

AI qua information processing psychology was built on the idea that human cognition is, or can be scientifically understood as, a form of computation; this view is also known as (minimal) computationalismFootnote 6 (cf. Chalmers, 2011; Dietrich, 1994; Milkowski, 2013). Computationalism was seen to provide useful conceptual tools for cognitive science (Boden, 1988, 2008; Hardcastle, 1995, 1996; Johnson-Laird, 1988) as it affords explicit specification of hypothesized cognitive processes and reasoning through the implications of such hypotheses (e.g. with mathematical proofs or computer simulations). However, present-day AI hype and the popularity of AI as technology (Meredith Whittaker, Edward Ongweso Jr., & Sarah Myers West in Denvir et al., 2023; Larson, 2021; Timnit Gebru in Marx & Wickham, 2023; van Rooij, 2023) and AI as a money-maker (Crawford, 2021) seems to leave little room for AI as a theoretical tool for cognitive science. The reason is that BigTech currently dominates the narrative, with a focus on technological progress and impressive machine learning applications. In practice, we observe that this leads to other, more theoretical perspectives on AI being overlooked or even forgotten. Worse still, the products of present-day AI-as-engineering are sometimes believed to instantiate (parts of) minds. Besides the various psychological, social, cultural and political problems posed by this confusion (Bender, et al., 2021; Birhane & van Dijk, 2020; Erscoi et al., 2023; J. Hughes, 2021; Larson, 2021; Thrall et al., 2018; Vallor, 2015; van der Gun & Guest, 2023; Wood, 1987), here we wish to focus on how this practice creates distorted and impoverished views of ourselves and deteriorates our theoretical understanding of cognition, rather than advancing and enhancing it.

In this paper, we wish to remedy the above situation in two steps. First, we set out to release the grip of the currently dominant view on AI (viz., AI-as-engineering aiming at a human-level AI system, Table 1). This practice has taken the theoretical possibility of explaining human cognition as a form of computation to imply the practical feasibility of realising human(-like or -level) cognition in factual computational systems, and it is framing this realisation as a short-term inevitability. In this paper, we undercut these views and claims by presenting a mathematical proof of inherent intractability (formally, NP-hardness) of the task that these AI engineers set themselves. This intractability implies that any factual AI system created in the short-run (say, within the next few decades or so) is so astronomically unlikely to be anything like a human mind, or even a coherent capacity that is part of that mind, that claims of ‘inevitability’ of AGI within the foreseeable future are revealed to be false and misleading. We realize that this implication may appear counterintuitive given everyday experiences and interactions with currently impressive AI systems, but we will explain why it is not. As we will carefully unpack later in the paper, it is a mistake to assume that AI systems’ performance is either currently human-level, or will simply continue to improve and the systems will soon constitute human-level A(G)I. The problem is that—in line with our intractability result—the performance cannot scale up (see Box 1).

Fig. 1
figure 1

A visual depiction of the connections between the cognitive sciences. Solid lines denote stronger interdisciplinary ties, and dashed lines denote weaker ones. This figure is derived from the original put forth by the Sloan Foundation in 1978 and reproduced from Fig. 4 in Pléh and Gurova (2013). Different versions of it over time have used ‘Artificial intelligence’ (as above) instead of ‘Computer Science’ and vice versa (cf. Miller, 2003)

Second, we propose a way to return to the idea of AI as a theoretical tool without falling in the trap of confusing our maps for the territory. The return must be such that we do not retrace the trajectory that led us where we are today. To halt the rerun of history, we think it is vital that the idea of cognition as computation—and therefore the in-principle possibility of realising and/or explaining cognition as a form of computation—is not mistaken for the practical feasibility of replicating human minds in machines (which we prove is not feasible). The reader may think that this is a contradiction, i.e. that then computationalism is theoretically inert. But that is mistaken. Computationalism can provide both explanatory challenges for and computational constraints on cognitive explanations—using formal, conceptual, and mathematical analysis—and hence is theoretically informative.

Overview

Flying pigs are also possible in principle; possible in principle bakes no bread.

— Jerry Fodor (2005, p. 27)

The remainder of this paper will be an argument in ‘two acts’. In ACT 1: Releasing the Grip, we present a formalisation of the currently dominant approach to AI-as-engineering that claims that AGI is both inevitable and around the corner. We do this by introducing a thought experiment in which a fictive AI engineer, Dr. Ingenia, tries to construct an AGI under ideal conditions.Footnote 7 For instance, Dr. Ingenia has perfect data, sampled from the true distribution, and they also have access to any conceivable ML method—including presently popular ‘deep learning’ based on artificial neural networks (ANNs) and any possible future methods—to train an algorithm (“an AI”). We then present a formal proof that the problem that Dr. Ingenia sets out to solve is intractable (formally, NP-hard; i.e. possible in principle but provably infeasible; see Section “Ingenia Theorem”). We also unpack how and why our proof is reconcilable with the apparent success of AI-as-engineering and show that the approach is a theoretical dead-end for cognitive science. In “ACT 2: Reclaiming the AI Vertex”, we explain how the original enthusiasm for using computers to understand the mind reflected many genuine benefits of AI for cognitive science, but also a fatal mistake. We conclude with ways in which ‘AI’ can be reclaimed for theory-building in cognitive science without falling into historical and present-day traps.

ACT 1: Releasing the Grip

There is no doubt that [AI] is currently in the process of rapid hill-climbing. Every year, states of the art across many [AI] tasks are being improved[, but] the question is whether the hill we are climbing so rapidly is the right hill.

— Emily M. Bender and Alexander Koller (2020, p. 5191)

At present, the field of AI is in the grip of a dominant paradigm that pushes a narrative that AI technology is so massively successful that, if we keep progressing at our current pace, then AGI will inevitably arrive in the near future. Some multi-million AI companies have even gone so far as to claim that we are not only climbing a hill towards AGI, but even on the verge of creating AGI, whether we want to or not, if we proceed with current ML approaches to AI-as-engineering. These kinds of claims may be hard to ignore or counter if anything that is possible in principle also seems possible in practice. However, not everything that is possible in principle is possible in practice.

In this first Act, we reveal why claims of the inevitability of AGI walk on the quicksand of computational intractability. The main character on the stage is Dr. Ingenia, a fictive AI engineer, who is pursuing the kind of ML approach claimed to inevitably lead to AGI. By studying the engineering task that they have set themselves through a formal, mathematical lens, we are able to construct a proof of intractability. We then draw out its implications.

Formalising AI-by-Engineering

Give a rigorous, computationally detailed and plausible account of how learning can be done. Translation: Rigorous: theorems, please.

— Dana Angluin (1992, p. 351)

Fig. 2
figure 2

A visual illustration of the hypothetical scenario and its formalisation: Dr. Ingenia has access (magically and at no cost) to any machine learning method \(\mathbb {M}\), present or future, and by repeatedly sampling data D from the distribution \(\mathcal {D}\), they can use whatever \(\mathbb {M}\) they like to create a program \(L_A\) that when implemented and run generates behaviours \(b = A(s)\) when prompted by different situations s. The goal is to generate with non-negligible probability \(\delta (n)\) an algorithm A that behaves (approximately) human-like, in the sense that A is non-negligibly (\(\epsilon (n)\)) better than chance at picking behaviours that are possible for s in \(\mathcal {D}\). Here, n is a measure of the situation complexity, i.e. the maximum length of strings (|s|) needed to encode the relevant information in the situations s

Imagine a fictive engineer, called Dr. Ingenia, who is trying to build human-like or -level AI (or AGI)Footnote 8 using the currently dominant approach of ‘machine learning’ (in short, ‘ML’) under highly idealised and optimal conditions (Fig. 2). Dr. Ingenia is able to repeatedly sample from a distribution \(\mathcal {D}\). Here, \(\mathcal {D}\) captures possible behaviours b that humans may display in different possible situations s. Dr. Ingenia can use the information gathered by repeatedly sampling situation-behaviour pairs from \(\mathcal {D}\) to try to build (or ‘train’) an algorithm. Their goal is to make an algorithm that behaves (approximately) like \(\mathcal {D}\). Informally, we can characterise the problem that Dr. Ingenia sets themselves, as follows.Footnote 9

AI-by-Learning (informal)

Given: A way of sampling from a distribution \(\mathcal {D}\).

Task: Find an algorithm A (i.e., ‘an AI’) that, when run for different possible situations as input, outputs behaviours that are human-like (i.e., approximately like \(\mathcal {D}\) for some meaning of ‘approximate’).

In our hypothetical scenario, we will be granting computationalism, i.e. we will not be challenging the assumption that cognition is a form of computation. Hence, Dr. Ingenia can safely assume that \(\mathcal {D}\) is generated by a computational process, i.e. either by the computational processes that define a single individual’s cognition or the cognition of a finite collection of human beings. This also means that there exists an algorithm A that can approximate the distribution, namely, the algorithm that generates \(\mathcal {D}\). But there may also be many more algorithms that deviate in some way from human cognition but whose behaviour is still sufficiently human-like.

Because our aim will be to assess the intrinsic hardness of Dr. Ingenia’s machine learning problem—independent of extraneous factors—we will be granting them several idealised conditions. For instance, in our hypothetical scenario, Dr. Ingenia will have access to perfect data for training their AI system. The data have no measurement error, nor are the data contaminated by irrelevant details or sampling bias or any other distortion on the data that make it indirect or imperfect. Note that the real-world situation is far removed from this idealized scenario. In contemporary ML, the predominant protocol is to train ML algorithms on data which is typically decontextualized and scraped from the internet (Birhane, et al., 2021, 2023; Lee et al., 2023; Liesenfeld et al., 2023, July). Moreover, real AI engineers do not have perfect knowledge of which cognizer contributed which data point, of which exact situation induced which behaviour, or of how behaviour is to be parsed, (cf. ‘the segmentation problem’; Adolfi et al., 2022; Adolfi et al., 2023) or interpreted (cf. ‘theory-ladeness of data’; Andrews, 2023; Guest & Martin, 2021; Hughes, 1997.)

By granting Dr. Ingenia these highly idealised conditions that simplify and abstract away from real-world complications, we can formally derive a reliable lower-bound on the real-world complexity of constructing human-like AI from human data. To do so, we need to make the problem AI-by-Learning formally precise such that it becomes amenable to computational complexity analysis (Arora & Barak, 2009; Garey & Johnson, 1979; van Rooij et al., 2019). We will do this next.

We assume that Dr. Ingenia expresses candidate algorithms A using a specification language, \(\mathcal {L}_\mathcal {A}\). Any particular algorithm \(A \in \mathcal {A}\) can be described with a program \(L_A \in \mathcal {L}_\mathcal {A}\). The specification language \(\mathcal {L}_\mathcal {A}\) can be thought of as a programming language with the constraint that it specifies only those algorithms in a class \({\mathcal {A}}\) that the engineer assumes is suitable for designing human(-like or -level) AI. For instance, \(\mathcal {A}\) could be the class of artificial neural networks (ANNs) or any other class of algorithms that Dr. Ingenia deems sufficient to approximate human cognition.

We will add some minimal constraints on \(\mathcal {L}_\mathcal {A}\). We exclude classes of trivial algorithms that have no chance of capturing human-like or -level cognition. Specifically, we impose the constraint that \(\mathcal {L}_\mathcal {A}\) can minimally express feedforward neural networks, logical circuits, or finite state machine-equivalent class of algorithms. Scientifically, we think that a stronger assumption would be warranted, namely that \(\mathcal {L}_\mathcal {A}\) is in principle expressive enough to be Turing complete, e.g. \(\mathcal {L}_\mathcal {A}\) can express Turing machines (Turing, 1950) or otherwise Turing-equivalent algorithms, including certain (highly idealised) recurrent neural networks (Siegelmann & Sontag, 1992; Pérez et al., 2019; Weiss et al., 2018). This stronger assumption would ensure that \(\mathcal {L}_\mathcal {A}\) is expressive enough to computationally capture human cognition. It would be a reasonable assumption because cognition is generally assumed to have two properties, known as productivity (i.e. people can in principle generate infinitely many distinct thoughts, sentences, images, etc., Fodor, 2005; Fodor & Pylyshyn, 1988) and Turing-completeness or-equivalence (the ability to compute any computable function, aided by pen and paper, in principle; Turing, 1950; Wells, 1998) We nonetheless work with the more modest assumption.

We will furthermore allow (but not impose) the assumption that all A expressible by \(\mathcal {L}_\mathcal {A}\) are computationally tractable (i.e. can be run on any situation s in polynomial time, \(O(n^c)\), where n is some measure of the input size (|s|) and c is a constant). This constraint ensures that any intractability results we may derive for Dr. Ingenia’s AI-by-Learning problem are not an artefact of the time-complexity of running the algorithm A itself. Moreover, it ensures that even if human cognitive computations are all tractable (cf. ‘the tractable cognition thesis’, van Rooij, 2008; see also Frixione, 2001), our intractability results for AI-by-Learning would still hold.

We assume in our formalisation that whenever Dr. Ingenia tries to solve an instance of AI-by-Learning and searches for a program \(L_A\) that encodes a human (-like or -level) A, there is a given upper bound K on the size of the program that they can in principle encode (i.e. \(|L_A| \le K\)). One can think of K as expressing the total amount of space (computer memory) that Dr. Ingenia has available to store a program. AI models these days can be very large, and we allow for their size (e.g. an ANN may have tens of millions of parameters; Krizhevsky et al., 2012). Nonetheless, they are still bounded by some size K. Dr. Ingenia can buy more space to work with, in which case they will have a new \(K' > K\) to work with. AI engineers that claim to be able to create AGI with ML are (implicitly) assuming that their approach can work for larger and larger \(K'\).

We formalise the distribution \(\mathcal {D}\) as follows. A dataset D drawn from \(\mathcal {D}\) consists of a list of situation-behaviour pairs, a.k.a. ‘samples’: \((s_1,b_1), (s_2,b_2), (s_3,b_3),..., (s_{|D|},b_{|D|})\). Without loss of generality, we model this structure with binary strings (e.g. \(s = 101010101000001\) and \(b = 010101111\)). For any given input distribution \(\mathcal {D}_n\), there is an upper bound n on the description length of situations. In other words, the set of situations for such an instance of the problem is defined as \(S = \{0,1\}^n\). For each situation \(s \in S\), there are some appropriate (i.e. human-like) behaviours \(B_s \subsetneq B = \{0,1\}^m\), for some fixed m.

Lastly, we formalise the notion of ‘approximate’ in Dr. Ingenia’s AI-by-Learning problem. Recall that we are trying to estimate a lower-bound on the real-world problem of creating human-like or -level AI by ML and therefore give Dr. Ingenia ‘easier’ conditions than may apply in the real world. To this end, we will be setting an extremely low bar for what counts as ‘approximate’. On the one hand, we do not expect a guarantee that Dr. Ingenia succeeds, but merely that Dr. Ingenia succeeds with non-negligible probability (denoted \(\delta (n)\), where n is a measure of the size/complexity of situations; see previous paragraph). Moreover, the performance of the found A need not have high degrees of human-likeness, but merely should perform human-like with a probability that is non-negligibly higher than chance level. Specifically, in our formalisation of AI-by-Learning, we will make the simplifying assumption that there is a finite set of possible behavioursFootnote 10 and that for each situation s there is a fixed number of behaviours \(B_s\) that humans may display in situation s. Then, \(\nicefrac {|B_s|}{|B|}\) expresses chance level, and \(\nicefrac {|B_s|}{|B|} + \epsilon (n)\) expresses ‘non-negligibly better than chance’.

Given the above considerations, we can now state a formalised version of AI-by-Learning:

AI-by-Learning (formal)

Given: An integer K and a way of sampling from a distribution \(\mathcal {D}\).

Task: Find a description \(L_A \in \mathcal {L}_{\mathcal {A}}\), with length \(|L_A| \le K\), of an algorithm \(A \in \mathcal {A}\) that with probability \(\ge \delta (n)\), taken over the randomness in the sampling, satisfies:

$$\begin{aligned} \Pr _{s \sim \mathcal {D}_n} \left[ A(s) \in B_s \right] \ge \nicefrac {|B_s|}{|B|} + \epsilon (n). \end{aligned}$$

Here, \(\delta (n)\) and \(\epsilon (n)\) are arbitrary non-negligible functions. A function f is non-negligible if there is some d such that for sufficiently large n, \(f(n) \ge {1}/{n^d}\).

Ingenia Theorem

[E]ven with a whole row of the largest imaginable computers to help, all the potential distributional potentialities of a whole national language cannot possibly be found in any finite time[.]

— Margaret Masterman (1965, p. iv-19)

In this section, we present a formal proof that AI-by-Learning is intractable. For this, we use the following decision problem called Perfect-vs-Chance.Footnote 11Perfect-vs-Chance is a somewhat unnatural problem, with no direct real-world analogue and no special connection to human cognition. It is used in our proof that AI-by-Learning is intractable (Theorem 2). The proof technique that we use comes from theoretical computer science (see Appendix for full details) and has the following logical form: If AI-by-Learning were tractable, then Perfect-vs-Chance would be tractable too. Combined with a previously known result that Perfect-vs-Chance is intractable (Theorem 1 below), we can conclude a contradiction. Figure 3 illustrates the idea.

The Perfect-vs-Chance problem is formally defined as follows:

Perfect-vs-Chance (decision problem)

Given: A way to sample a given distribution \(\mathcal {D}\) over \(\{0,1\}^n \times \{0,1\}\), an integer k, and the promise that one of the following two cases apply:

  1. 1.

    There is an efficient program M of size at most k such that \(\Pr _{(x,y)\sim \mathcal {D}} [M(x) = y] = 1\)

  2. 2.

    For any program M of size at most k, \(\Pr _{(x,y)\sim \mathcal {D}} [M(x) \) \(= y] \le 1/2 + 1/2^{n^{1-\delta }}\)

where \(0< \delta < 1\) is an arbitrary constant.

Question: Is (1) or (2) the case?

In the decision problem Perfect-vs-Chance, the expression \(1/2^{n^{1-\delta }}\) can informally be read as ‘negligible probability’, since the denominator grows very fast. Hence, informally, the two cases listed in the Perfect-vs-Chance problem are (1) there exists a perfect and efficient program or, otherwise, (2) there exists no program that can work better than chance, even inefficiently.

Fig. 3
figure 3

A visual illustration of the core proof idea for Theorem 2 (Ingenia Theorem): It is known that Perfect-vs-Chance is an intractable decision problem. If a tractable method \(\mathbb {M}\) for solving AI-by-learning would exist, then we could use \(\mathbb {M}\) to solve Perfect-vs-Chance tractably, by plugging it into a sampling-plus-decision procedure that is itself tractable, easy to construct, and together with \(\mathbb {M}\) would provably solve Perfect-vs-Chance. This yields a contradiction. Therefore, we can conclude that AI-by-learning is intractable as well. See the main text for more information and the Appendix for full proof details

Note that in the Perfect-vs-Chance problem it is promised, before trying to solve the problem, that for the given distribution either case (1) or (2) holds. One may intuit that, since (1) and (2) are extreme cases that are very clearly distinct, telling these two cases apart should be easy to do. However, it is not easy. In fact, it is provably intractable to find out in which of the two cases one finds oneself. This follows from a proof by Hirahara (2022).

Theorem 1

Hirahara (2022) Perfect-vs-Chance is in-tractable.Footnote 12\(^,\)Footnote 13

We will use Theorem 1 to prove that AI-by-Learning is intractable, too. Specifically, we will show that if it were possible to tractably solve AI-by-Learning, then it would also be possible to tractably solve Perfect-vs-Chance. Since Perfect-vs-Chance is known to be intractable (Theorem 1), by modus tollens,Footnote 14 it follows that AI-by-Learning must be intractable as well. In other words, the proof will be by contradiction.

Theorem 2

(Ingenia Theorem) AI-by-Learning is intrac-table.

Proof

(sketch) For full details of the proof, we refer the reader to the Appendix. In this proof sketch, we present the main proof idea. See Fig. 3 for an illustration.

We prove by contradiction. Suppose that there exists a learning mechanism \(\mathbb {M}\) that solves AI-by-Learning in polynomial time. We will show that then there exists a polynomial-time bounded-error probabilistic algorithm that solves Perfect-vs-Chance, contradicting intractability of Perfect-vs-Chance.

The algorithm for Perfect-vs-Chance that we construct works as follows. Take an arbitrary instance for Perfect-vs-Chance, consisting of integers n and k and a distribution \(\mathcal {D}\) over \(\{0,1\}^n \times \{0,1\}\). The algorithm will use a subroutine that simulates the learning mechanism \(\mathbb {M}\), where \(K = k\) and where the data that the mechanism \(\mathbb {M}\) has sampling access to is given by the distribution \(\mathcal {D}\). In this simulation, the set of situations is \(S = \{0,1\}^n\) and the set of behaviours is \(B = \{0,1\}\). The simulation will yield an algorithm A that might or might not perform well on freshly sampled situations.

After running the subroutine that simulates \(\mathbb {M}\), the algorithm will evaluate the quality of the resulting learned algorithm A by using additional samples from \(\mathcal {D}\) and counting the number of situations s in which the algorithm A returns an appropriate behaviour \(b \in B_s\).Footnote 15 As this evaluation only involves executing A and counting, it can be done in polynomial time.

The algorithm runs the simulation subroutine several times, and for each run of the subroutine, it evaluates the resulting learned algorithm A. From all of these runs, it picks the algorithm \(A^{\star }\) that achieved the highest accuracy in the quality evaluation.Footnote 16 Based on how well the most accurate learned algorithm \(A^{\star }\) performs, the algorithm will give an answer for the input of Perfect-vs-Chance. If \(A^{\star }\) performs non-negligibly better than chance, then the algorithm will answer Yes, and otherwise, the algorithm will answer No.

By having the algorithm run the simulation a large enough (yet polynomial) number of times, and testing the output of each simulation a large enough (yet polynomial) number of times with new samples, we can ensure that the algorithm outputs a correct answer for Perfect-vs-Chance with high probability. A careful probabilistic analysis shows that because the learning mechanism \(\mathbb {M}\) performs non-negligibly better than chance on AI-by-Learning, the resulting algorithm for Perfect-vs-Chance gives the correct answer with high probability (see Appendix for full details). This contradicts the known intractability of Perfect-vs-Chance, hereby concluding the proof that AI-by-Learning is intractable.

figure a

Implications

It is desirable to guard against the possibility of exaggerated ideas that might arise as to the powers of [AI]. In considering any new subject, there is frequently a tendency [...] to overrate what we find to be already interesting or remarkable[.]

— Augusta Ada King, Countess of Lovelace (personal correspondence, July, 1843; Toole et al., 1998, p.186)

In the previous section, we presented a proof that Dr. Ingenia set themselves a machine learning problem for which no tractable method exists or can exist. The ‘intractability’ means that, even if the problem may be practically solvable for trivially simple situations (small n), any attempts to scale up to situations of real-world, human-level complexityFootnote 17 (medium to large n) will necessarily consume an astronomical amount of resources (such as time and number of samples; see Box 1 for an illustration).

The proven intractability holds for the highly simplified and idealised model of AI-by-Learning that grants Dr. Ingenia much better conditions than apply to their real-world counterparts. As explained in Section “FormalisingAI-by-Engineering”, the intractability result holds even if Dr. Ingenia (a) can sample randomly and unbiasedly, if they so wish; (b) has data which are noiseless, without error, and uncontaminated; (c) is free to use any means or methods for producing the AI (this can include present and future machine learning approaches, but is not limited to them); (d) is only required to produce with a probability slightly higher than chance an AI that matches human behaviour slightly better than chance; (e) is guaranteed that there exists an algorithm that meets those low-bar requirements. Idealizations (a)–(e) make clear that the computational complexity of Dr. Ingenia’s problem is a gross underestimation of the true complexity of the much more messy real-world AI-by-Learning problem.

Given the proof nature of the complexity-theoretic result, the claim that we are presently on a fast track to inevitably produce human-like and -level AI poses a logical contradiction. Let us unpack why this is so: While there are many claims that might reasonably be based on intuitions, any claim of the inevitability of producing any desired object (or event, or state of affairs) requires a tractable procedure as a precondition. To see this, note that, without loss of generality, we can cast the problem of building human-like AI as that of searching for such an object in some space of possibilities. To claim that the search for this object is bound to succeed in practice is to minimally claim that one has a tractable procedure for conducting the search that provably finds the object if it exists (i.e. one has a way of performing the search in a realistic amount of time). That is, to support the inevitability claim one would have to put forth a set of arguments, logically and mathematically sound, to prove not only that such a tractable procedure can exist, but also that one has it. Theorem 2 (the Ingenia Theorem) shows that this is impossible.

Given the Ingenia Theorem, how should we interpret what is happening in practice? In practice, AIs are being continuously produced which are claimed to be either human-like and human-level AI or inevitably on a path leading there. Any AIs produced in practice, however, are produced either by tractable procedures or by cutting short a procedure that would run longer (for an unfeasible amount of time). Hence, the produced AIs necessarily fail to solve the intractable learning problem. Concretely, this means that they make lots of errors—deviating substantially from human behaviour (e.g. Bowers et al., 2022)—and fail to meet the low standard set in the Ingenia Theorem (see also the list of simplifications and idealisations (a)–(e) above). These errors cannot be contained to be small, and no matter how impressive the produced AIs may appear, they fail to capture the distribution of human behaviour even approximately.

We realise that the implications that we have drawn out from our complexity-theoretic results may appear to contradict both intuition and experiences with existing AIs. However, the pattern of observations is entirely consistent with and predictable from the Ingenia Theorem. Many AIs do seem to have truly impressive human likeness and may even sometimes fool one into thinking that they have agency or are sentient.Footnote 18 Moreover, the field of AI-as-engineering has a habit of interpreting (or selling) ‘better than the state of the art’ as ‘good accuracy’, but our results imply that no matter how much ‘better’ AI gets, it will be off by light-years, wrong in exponentially many situations. This kind of self-fooling is possible in part because:

The prioritization of performance values is so entrenched in the field that generic success terms, such as ‘success’, ‘progress’, or ‘improvement’ are used as synonyms for performance and accuracy [...] However, models are not simply ‘well-performing’ or ‘accurate’ in the abstract but always in relation to and as quantified by some metric on some dataset” (Birhane et al., 2022).

However, the Ingenia Theorem implies that if one were to test these AIs rigorously and unbiasedly for human-likeness, it would quickly become evident that they behave qualitatively differently from humans. That is, if you think your AI is very human-like, then you are not testing it critically enough (cf. Bowers et al., 2023).

Unsurprisingly given our theoretical results, we see exactly this play out in practice: AIs appear human-like in non-rigorous tests, but the likeness is debunked when more rigorous tests are made (e.g. Adolfi, et al., 2023; Dentella et al., 2023). For instance, claims of abilities emerging with the scaling up of models are often revealed to be trivial products of the researcher’s choice of metric (Schaeffer et al., 2023). This back and forth between claims of human-likeness and debunking (cf. Mitchell, 2021) will keep happening if the field does not realise that AI-by-learning is intractable, and hence any model produced in the short run is but a ‘decoy’.

This is especially troubling since more and more people are taking AI systems to be candidate models of human cognition (Frank, 2023a, b; Hardy et al., 2023; Mahowald et al., 2023; Tuckute et al., 2023) or even as replacements for humans. For instance, AI systems are seen as a replacement for participants in psychological experiments (Dillion et al., 2023); but see also: (Crockett & Messeri, 2023; Harding et al., 2023) or as a replacement for workers (Eloundou et al., 2023; Rose, 2023; Semuels, 2020). By now, though, it is clear that this is only possible at the cost of an exponential increase of hidden, poorly paid and poorly treated workers (McCarty Carino & Shin, 2023; Roberts et al., 2023). Such replacements are a clear case of ‘map territory confusion’ and with a poor map at that. This may seem to make sense if one believes that the AIs approximate human behaviour (though even then it is not a sufficient condition, Guest & Martin, 2023), but as we explained above, the AIs do not actually approximate human behaviour. By nevertheless taking the AIs as cognitive models, we—as a field—distort our view of cognition, and it makes our cognitive science theoretically weaker.

This argument applies not only to AIs mistaken for models of (all of human) cognition, but for models of substantive cognitive capacities, like language, problem-solving, reasoning, analogizing, or perception (Cummins, 2000; van Rooij & Baggio, 2021). This is because these substantive cognitive capacities are already human-level in the sense of being domain general and able to operate on a set of situations which is in principle unbounded. Therefore, these substantive cognitive capacities also cannot be modelled through trivial computations or with a fixed problem space and instead require computational models that meet the minimal computational assumptions as specified in Section “FormalisingAI-by-Engineering”, and in fact even the stronger assumptions discussed there. Additionally, no piecemeal or divide-and-conquer approach to making models of human-level cognition (or these substantive capacities) can be tractable, as that would lead to a contradiction: Assume it were possible to tractably make approximate models of subcapacities, such that one could make piecemeal models of human cognition. Then, one would not be able to put them back together tractably in order to account for all of human cognition (or substantive capacities), because if one were able to, then one would have a tractable procedure for modelling all of cognition, which is an intractable problem (see also Rich et al., 2021).

ACT 2: Reclaiming the AI Vertex

Computers as such are in principle less crucial for cognitive science than computational concepts are.

— Margaret A. Boden (2008, p. 14)

Based on our analysis, we reject the view and associated project that we term ‘makeism’. Makeism has not been characterized before, but we argue that its adoption is the reason why, historically, the use of AI to understand cognition fizzled out and why it will do so again if we do not change our present course. See Box 2 for a definition of makeism; in other words, all makeists think that building cognition is sufficient for being able to explain it (b), and some think that this building is also necessary (c). The necessity claim especially reveals the implicit assumption that it is possible to build cognition (a). While some things can be understood by making them, it will not work for human-like or -level cognition, for one because this cannot plausibly be (re)made through engineering (i.e. (a) is false; see the Ingenia Theorem).

At this point, the reader may wonder: if we indeed cannot (re)make cognition—or coherent substantive parts of cognition—computationally, then is AI theoretically useless for cognitive science? No: computationalism can be theoretically productive even if makeing is futile. In this section, we explain how the notion of computation can help to challenge and constrain, and thereby inform, theories of cognition in a way that steers clear of makeism. This will also allow us to reclaim ‘AI’ as one of the cognitive sciences, i.e. one of the vertices in the hexagon (Fig. 1). In our opinion, this is vital for retaining (or restoring) cognitive science’s theoretical health and preventing (further) distortions of our understanding of human cognition.

What Not to Reclaim

As we explained in the Introduction, AI was initially conceived as a theoretical tool for cognitive science, and an active segment of work in cognitive science was understood as being part of AI. AI as a field originally included the use of computational models to study the human mind (variously referred to as cognitive simulation (Lehnert, 1977), information processing psychology (Newell, 1970; Simon, 1983), computational psychology (Boden, 1988, 2008), or theoretical psychology (Hünefeldt & Brunetti, 2004; Longuet-Higgins, 1982; Newell, 1970).

figure b

At present, this perspective on AI is largely forgotten. By looking at the history of those conceptions of AI and why they fell out of favour, we can see both the attraction of AI as cognitive science and the problems with the original vision, which we do not want to reclaim.

First, let us consider makeism part (a) (see Box 2). As noted, the makeist project only makes sense if part (a) is really true, i.e. if cognition can be programmed into a computer. This seems to have been taken for granted by many, for example by Simon, who wrote:

It is not my aim to surprise or shock you. [...] But the simplest way I can summarize is to say that there are now in the world machines that think, that learn and create. Moreover, their ability to do these things is going to increase rapidly until–in a visible future–the range of problems they can handle will be coextensive with the range to which the human mind has been applied.

Herbert Simon on The General Problem Solver in 1957, as quoted in Norvig (1992, p. 109)

Here we see the forerunner of the current idea that computationalism implies the practical realizability of thinking machines/simulations on par with human-level or -like cognition. This idea is expressed clearly in Feigenbaum and Feldman (1963) early characterization of AI (as quoted in Meinhart, 1966, emphasis added):

Researchers in the (artificial intelligence) field hold to the working hypothesis that human thinking is wholly information-processing activity within the human nervous system; that ultimately, these information processes are perfectly explicable; that the road to this explication lies in observation, experimentation, analysis, modelling, model validation, et cetera; and that digital computers, being general information processing devices, can be programmed to carry out any and all of the information processes thus explicated.

The last item on the list makes explicit the assumption that computationalism implies the practical realizability of thinking machines/simulations on par with human-level or -like cognition. And since it suddenly seemed possible to re-create aspects of cognition using computers, many early cognitive scientists enthusiastically began trying to do so. For example, consider representative comments from the late 70s, from a book by Lehnert (1977).

If experiments cannot be designed to isolate the variable factors of a proposed theory, the [experimental] psychologist can go no further. Problems concerning human cognitive processes are difficult to study within the paradigm of experimental psychology for precisely this reason. [...] What experiment can be designed to help us understand how people are able to answer simple questions like “What’s your name?” [...] Natural language processing can be productively studied within the artificial intelligence paradigm. If we construct a process model designed to account for a particular language task [...], then we can write a computer program to implement that model. By running that program, we can see where the model is weak, where it breaks down, and where it appears competent. [...] The interesting failures are those that occur because the process model underlying the program failed to recognize some critical problem or failed to handle some problem adequately (Lehnert, 1977, pp. 40-41).

If our reading of Lehnert is correct, this expresses, or at least encourages, makeism. Specifically, it reflects parts (a) and (b), with the idea that interesting parts of cognition can be simulated in computers and that we will gain understanding in this way. Furthermore, Lehnert leaves the door open to makeism part (c); it is unclear whether she herself thinks that simulating natural language is necessary for explaining it, but one could draw that conclusion.

The problem, of course, is that (a) is false, and without it, makeism (b) and (c) no longer reflect a promising strategy for cognitive science, but rather a research program doomed to fail. Indeed, in light of our demonstration that the task of creating cognition in computers is unfeasible, it is not surprising that among early researchers pursuing this project, enthusiasm waned and many people moved on.Footnote 19

As AI technology has exploded, makeism is enjoying a renaissance. However, for cognitive science, the engineering approach worsens theoretical understanding because any artefacts we could make in the short run would be gross distortions or ‘decoys’ at best. As Neisser wrote, ‘[t]he view that machines will think as man [sic] does reveals misunderstanding of the nature of human thought’ (Neisser, 1963, p. 193). Sixty years later, the risk of being misled by decoys is even greater, but we can better demonstrate the problem through the complexity theoretical arguments laid out in the previous Section (“ACT 1: Releasing the Grip”).

To abandon all of the tools and concepts that AI provided, however, is to throw the baby out with the bath water.Footnote 20 Hence, we want to reclaim much of the early conception of AI as a part of cognitive science, but without encouraging makeism. If we reconsider Lehnert’s argument for using AI, we see many correct, important insights:

  1. 1.

    That experimental psychology is limited in its ability to study cognition (hence the creation of cognitive science as an interdisciplinary field).

  2. 2.

    That cognition can be productively studied within the AI paradigm.

  3. 3.

    That models’ problems are theoretically informative.

These insights are all worth reclaiming; the community just made one seemingly small inferential miss-step that has caused a lot of problems. Makeism is not a forced move. Hunt pointed this out early on, arguing that

Computer programming seems to be a more appropriate tool for studying the broad implications of a proposal for how one should think than for realizing a testable model of how one does think (Hunt, 1968, p. 160).

Computationalism without makeism is still theoretically fruitful. We explain how next.

Theory Without Makeing

[P]rinciples from computer science and engineering can be, if done carefully, imported into how we carve [...] nature at its joints.

— Olivia Guest and Andrea E. Martin, (2023, p. 221)

How may computationalism help cognitive science advance if not through makeism? Core to the non-makeist enterprise is the realisation that computationalism primarily aids cognitive science by providing conceptual and formal tools for theory development and for carefully assessing whether something is computationally possible or not, in principle and in practice. This paper is itself an example; nowhere in this paper did we (try to) make a computational replica of cognitive capacities. Yet, we were able to use a computationalist framework to make substantial steps in reclaiming AI for cognitive science. The remainder of this section will give further examples of how AI as theoretical psychology or computational cognitive science can be pursued productively and soundly.Footnote 21

As a disclaimer, we note that research in cognitive science often cannot be cleanly divided into makeist and non-makeist. We have not seen makeism clearly distinguished from computationalism in the literature before (see Box 2), and so cognitive scientists will generally not have thought about it explicitly, let alone clarified the nature of their work. Hence, when we cite papers as examples, we wish to highlight the non-makeist readings of some of the arguments made, but without implying that there are no problematic traces of makeism in the original texts.

Levels of Explanation

Formalisms and concepts from computer science allow us to conceptually distinguish between cognitive processes (algorithms), the capacities they realise (the problems that they solve), and their physical implementations (chemical, biological, interactive, etc.). This conceptual distinction, also often referred to as Marr’s levels (Marr, 1982), is theoretically productive, especially when pursued in a non-makeist fashion.

The distinction between levels is conceptually useful in general, but also brings specific benefits when we want to formalise and reason about our theories, as we explain next.

Capacities as Problems

Within the levels-framework, the approach known as compu-tational-level modelling has a strong tradition in cognitive science (with debates on its proper interpretation continuing to this day; Blokpoel, 2018; Cooper & Peebles, 2018; Peebles & Cooper, 2015). This approach allows us to conceptually engineer cognitive capacities as ‘computational problems’ and to model them formally (see e.g. Blokpoel & van Rooij, 2021; van Rooij & Baggio, 2021; van Rooij & Blokpoel, 2020) without needing to commit to specific assumptions at the algorithmic or implementation levels (other than computability and tractability; more in the next subsection). This is especially useful since—as argued throughout this paper—we do not know how to computationally realise substantive cognitive capacities such as human-level perception, reasoning, memory, categorisation, decision-making, problem-solving, language, analogising, communication, learning, and planning. Yet, as cognitive scientists, we do want to make progress in developing a theoretical understanding of these capacities.

Computational modelling of capacities can help us to make our assumptions precise and explicit, and to draw out their consequences, without the need to simulate the postulated computations (though simulations have their uses; more on that next). For instance, with formal computational-level models and mathematical proof techniques at hand, one can critically assess claims of explanatory adequacy (Blokpoel & van Rooij, 2021; Egan, 2017; van Rooij & Baggio, 2021), claims of intractability (Adolfi, et al., 2023), claims of tractability (Kwisthout & van Rooij, 2020; van Rooij et al., 2008), claims of competing theories (Blokpoel & van Rooij, 2021), claims of evolvability (Rich et al., 2020; Woensdregt et al., 2021), and claims of approximability (Kwisthout & van Rooij, 2013; Kwisthout et al., 2011).

Algorithms and Simulations

Similarly, algorithmic- and implementation-level models can be postulated and critically assessed using computational tools. While this can sometimes be done analytically, more often computer simulations prove useful for these types of (complex and dynamic) models. Using computer simulations, for example, one can assess claims about possible functioning under network damage (Guest et al., 2020), claims of explanatory scope and adequacy (van de Braak et al., 2021; Adolfi et al., 2023), claims of approximation (Blokpoel & van Rooij, 2021, Chapter 8), claims of ruling out possible so-called neural codes (Guest & Love, 2017), and claims of mechanistic possibilities (Bartlett et al., 2023; ten Oever & Martin, 2021).

Importantly, this use of simulations is to be distinguished from makeist uses of simulation that confuse the models (explanans) for the thing modelled (explanandum) and/or take the simulation results to directly imply something about ‘how things work’ in the real world (e.g. for real-world brains, cognition, or behaviour). Instead, non-makeist computer simulations are theoretical tools that can demonstrate proof of concept or demonstrate the in-principle (im)possibility of phenomena arising from the theorised constructs and hypothesised mechanisms. Computer simulations support and extend a scientist’s thinking capacity and enable computerised ‘thought experiments’ (Cooper, 2005) to reason through ‘what ifs’ and answer questions like ‘how possibly’.Footnote 22 These simulations—as indeed any models that the cognitive scientist could use—are necessarily abstract and idealised; this is unproblematic, though, as long as the scientist recognises it and takes care to draw only those inferences which are really warranted by the model.

Underdetermination

A general theoretical property that follows from computationalism is that cognitive capacities are multiply realisable, in several ways (Chirimuuta, 2021; Guest & Martin, 2023; Hardcastle, 1995, 1996). van Rooij and Baggio (2021) use a sorting problem as a simple illustration. Sorting can be done by bubble sort, insert sort, or any of a whole host of distinct sorting algorithms (Knuth, 1968) which in turn can be physically realised by brains, computers, water pipes, or even distributed over people (see, e.g. Fig.  1 in van Rooij & Blokpoel, 2020; and Box 1 in van Rooij & Baggio, 2021). This shows how, first, one and the same problem can be computed by different algorithms, and second, one and the same algorithm can be physically realised in different ways. This implies that we are dealing with massive underdetermination of theory by data: i.e., if we observe behaviours consistent with a computational-level theory, we cannot infer which algorithms or neural processes underlie the behaviour.

The computational lens helps us to appreciate the degree of underdetermination we face. In standard experimental cognitive psychology, often two or a handful of different theories are compared and tested empirically ‘against each other’. But the principle of computational multiple realisability shows that for any given behaviour, there may be \(|\mathcal {A}| \times |\mathcal {I}|\) many possible algorithmic-implementational theories. This means that some inferential practices in computational cognitive (neuro)science are highly problematic (Guest & Martin, 2023). Moreover, computational-level theories are also themselves underdetermined by data, because any finite set of observations is also consistent with infinitely many functions (capacities).Footnote 23 This means that all computational-level theories are—and remain—conjectural. Nonetheless, we can do some things to evaluate and adjudicate between them.

Computational Realisability

Underdetermined computational-level theories can be constrained by the computationalist requirement that the problems and processes they postulate must be computationally realisable—by the cognitive system under study, not by scientists—both in principle and in practice. In-principle realisability is also known as computability; a problem is computable if there can exist at least one algorithm for computing it. In-practice realisability is also known as tractability; a problem is tractable if there can exist at least one tractableFootnote 24 algorithm for computing it.

Given that computational-level theories often formalise capacities as problems (or equivalently functions) while remaining agnostic about how these problems are computed, they can on occasion postulate problems that are uncomputable or intractable; in fact, this happens regularly. This provides the opportunity to critically reflect on the theory and, if possible, to find a minimal revision that renders the theory minimally computable and tractable while preserving the core intuitions and motivations behind the theory. This process can yield new knowledge, ideas, and research trajectories for cognitive scientists (cf. Adolfi et al., 2023). For example, it may yield new theoretical interpretations or predictions that can be used to further assess the explanatory and empirical adequacy of the (revised) theories. Alternatively, if a theory cannot be successfully revised, this can be a sign that it is time to question its initial motivation and to go back to the drawing board. The process thereby allows us to sculpt otherwise underdetermined theories so as to learn more about how cognition could or could not work (Blokpoel, 2018). This is no magic bullet, however; underdetermination of theory by data cannot be eliminated, and any ways of dealing with it will remain necessarily incomplete (cf. Adolfi et al., 2023; Devezer, 2024; Rich et al., 2021).

Slow (Computational Cognitive) Science

All of this may seem excruciatingly slow compared to the apparent speed of progress in today’s machine learning approaches to AI. But to genuinely make progress, we need to go this slowly, and in fact, we cannot go any faster (Adolfi & van Rooij, 2023; Rich et al., 2021). There is just no way to proceduralise or automate either the creation of minds or the explanation of minds. The way to make progress is through the meticulous development of theoretical ideas, informed by formal and computational modelling, drawing out limitations, consequences, and building solid knowledge along the way. In other words, what we advocate is more theoretical thinking (see also Guest, 2024; Guest & Martin, 2021; van Rooij & Baggio 2020) and less (unthinking) machine learning or less confusion between machine learning and theory (cf. Andrews, 2023). Then, AI can be a useful theoretical tool for cognitive science and regain its rightful place in the interdisciplinary hexagon.

Conclusion

The thesis of computationalism implies that it is possible in principle to understand human cognition as a form of computation. However, this does not imply that it is possible in practice to computationally (re)make cognition. In this paper, we have shown that (re)making human-like or human-level minds is computationally intractable (even under highly idealised conditions). Despite the current hype surrounding ‘impending’ AGI, this practical infeasibility actually fits very well with what we observe (for example, running out of quality training data and the non-human-like performance of AI systems when tested rigorously).

Many societal problems surrounding AI have received thorough treatment elsewhere. Our focus here has been on a different—but not unrelated—problem, namely that AI-as-engineering has been trespassing into cognitive science, with some people drawing overly hasty inferences from engineered AI systems to human cognition. This is a problem because any such system created now or in the near future is a mere decoy when our goal is to understand human cognition, and treating it as a substitute for human cognition for scientific purposes will only confuse and mislead us.

Early cognitive scientists rightly recognised the tremendous potential of AI as a theoretical tool, but due to widespread, implicit makeist elements, AI and cognitive science became increasingly dissociated over time. Now, interest in AI among cognitive scientists is enjoying a renaissance—but the interest seems to be in the wrong type of AI, namely AI-as-engineering, which distorts our understanding of cognition and cognitive science. Accordingly, the time is apt to reclaim AI-as-theoretical-psychology as a rightful part of cognitive science. As we have argued, this involves embracing all the valuable tools that computationalism provides, but without (explicitly or implicitly) falling into the trap of thinking that we can or should try to engineer human(-like or -level) cognition in practice.