1 Summary

Intelligence as Managing Limited Resources

The failure of GOFAI created a vacancy for a new guiding philosophy for AI. As it happened, a new perspective was already waiting in the wings: having previously been repeatedly rejected by peer review, Rodney Brooks’s behavior-based approach to robotics [37] had privately gained traction, with seminal works [35, 36] setting out the philosophy and practice of the ‘Physical Grounding Hypothesis’ [35]. Soon, the notion that ‘intelligence requires a body’ was common parlance in the AI community. In retrospect, this ostensible embrace of embodiment was rather more superficial than one would have hoped: Brooks subsequently observed [34] that in certain areas of AI research, “hardly a whiff of the new approaches can be smelled”. Relatively soon thereafter, nascent enthusiasm for deep learning turned attention away from the intrinsically ‘in vivo’ Brooksian approach towards the typically more forgiving ‘in silico’ environments, in which interaction with real-world objects was replaced with the simpler (but seemingly hard-to-generalize) task of classifying pictures of them.

As happened with the move from analog cybernetic feedback to digital loss functions (Sect. 7.2), we claim that an essential guiding property was thereby lost, to the detriment of AI philosophy and practice. We have argued throughout that notions of ‘universal intelligence’ [194] and associated AI architectures such as AIXI [153] have no economic utility, since they are not in any way grounded in the finiteness of resources. A system which mechanistically searches vast spaces,Footnote 1 oblivious to the ticking clock of its environment and the mortal concerns of its users is clearly the antithesis of intelligent. To the contrary, we have emphasized that the primary drivers for an intelligent system must, by design, be the asynchronous dictates of the user, continual changes in environment, and inevitable variations of resources; an intelligent system must respond gracefully to either in a timely manner.

‘Work on Command’ Manifests General Intelligence

Although Brooks emphasized that embodiment in real-world environments was essential, related work was predominantly characterized by being reactive. This therefore omits the kind of high-order cognition that we have argued is vital for sample efficiency. In contrast, the SCL approach reconciles deliberative planning with asynchronous and open-ended environments, whilst still preserving the anytime requirement of ‘work on command’. In common with many cognitive scientists, we consider the ability to leverage and extend knowledge across domains to be synonymous with general intelligence [84, 148, 189]. We therefore claim that any meaningful notions of generality must also be grounded in a purpose-centric perspective. Just as Brooks has insisted that ‘in vivo’ experimentation in noisy and complex real-world settings is the proper framing for experimentation, we claim that the essential pragmatically meaningful setting for domain generalization is that of ‘work on command’. In this setting, efficient knowledge transfer is then a pre-requisite for the ability to complete related tasks in a timely manner. In Sect. 11.2.2 below, we consider the prospect that it may be possible to learn some ‘universal’ building blocks of compositional knowledge representation.

Understanding as Representation Manipulability

The SCL formulation effectively considers ‘understanding’ to be synonymous with ‘representation manipulability’ [364]. Formally, one can say that an agent A understands phenomenon F in context C iff A possesses a representation R of F, to which A could make local modifications, within constraints specified by C, thereby producing a representation \(R'\) of F, that enables inferences or manipulations of F towards goals specified by C.

What is therefore required is that representations R be ‘sufficiently malleable’ to act as a basis for construction, common usage, and novelty (in order of depth of understanding required). This malleability is a Gestalt property: metaphorically, it must preserve the ‘essential nature’ [148] of the representation, only allowing it to be perturbed in a way that describes a possible world.Footnote 2 As per the discussion in Sect. 9.2, the production of \(R'\) in this manner is thus synonymous with our notion of the functorial transformation of hypotheses. In this sense, SCL provides an exemplar for the research program proposed by Bundy and McNeill in 2006 [40]:

Reasoning systems must be able to develop, evolve, and repair their underlying representations as well as reason with them. The world changes too fast and too radically to rely on humans to patch the representations.

It is also illustrative to contrast this notion of ‘understanding’ with the weaker notion of ‘robustness’ in machine learning. Robustness is typically interpreted as (e.g. translation) invariance and/or noise tolerance, with these together proposed to guard against adversarial inputs in the general form of ‘single pixel noise’ [9]. Robustness interpreted as translation invariance would have a classifier say a table is a table even when it is upside down, although we understand that the essence of ‘tableness’ is that it ‘affords support’. Robustness in the sense of noise tolerance would have a classifier say a car without wheels is still a car, although humans immediately understand that does not afford forward motion (q.v. [338]).

Representational issues have been long debated in AI, but—as with many related issues such as the symbol grounding and frame problems—many ostensible problems can instead be considered to be artifacts of this kind of ‘overactive reification’. This is exacerbated in current practice by the supervised learning preoccupation with ‘noun-centric’ classification, in which objects are simply assigned a nominal category that is disconnected from the context of end usage. As per Wilkenfield [364], we therefore consider ‘understanding’ to be a process: contingent on the situated relationship between system and environment, cashed-out via demonstrations of ability on ‘analogous tasks’.

2 Research Topics

2.1 Choice of Expression Language

The power of reflective reasoning in SCL is determined by the choice of expression language. By virtue of inductive construction, the interpretation of recursive expressions described in Sect. 9.2 is guaranteed to terminate [334]. Naturally, if one were to elect to use an expression language with arbitrarily expressive primitives, the ability to reason about them is formally bounded by Gödel’s incompleteness theorems [115, 319].Footnote 3 In practice, the entities being reasoned about are grounded manipulations of the environment, rather than abstract formal proof objects, hence it is anticipated that relatively simple expression languages and learned denotational interpretations will suffice.

In accordance with the Curry–Howard isomorphism [176], a constrained subregion of the state space can alternatively be considered as a type T, with the constraints reflectively represented as predicates denoting the invariant properties of that type. The instantiation of an object of type T is therefore synonymous with a sequence of state space transformations, the result of which lies within a subregion of the state space defined by T. In programming language terms, there are a range of options for trading expressiveness of compositional properties against decidability and/or efficient synthesis procedures [263]. For example, the current notion of ‘differentiable programming’ can be represented via an expression language of ‘higher-order functions which return locally-linear functions’ [77, 78]. It has also recently been proposed [321] that the category \(\mathbf {Poly}\) (of so-called polynomial functors a.k.a. dependent lenses) is particularly well-suited for representing both context-sensitive interaction and semantic closure.

Regarding abstraction, many variant algorithms have been devised for anti-unification, differing in the expressiveness of the underlying expression language. The simplest is syntactic anti-unification, which has complexity \(\mathcal {O}(\max ({|e_1 |,|e_2 |}))\), where \(|e|\) is the number of nodes in the abstract syntax tree (AST) of an expression e. Syntactic anti-unification is at its most useful whenever the expression language has a normal form, i.e., for any expression e, there is some sequence of transformations which yield a unique expression \(e'\) that acts as a representative for all such e. Languages such as the Simply Typed Lambda Calculus or System F have a normal form [265]. More generally, the existence of normal forms and Turing completeness are mutually exclusive; normal forms guarantee termination.

The Expression Language of ‘Conceptual Spaces’

As an additional example to that of first-order linear arithmetic previously-described in Section , a simple but concrete example of a possible expression language is that of ‘conceptual spaces’ [103, 104], proposed by Gärdenfors as a bridge between symbolist and connectionist approaches. It is conjectured that naturally-occurring concepts are characterized by subspace regions with a topological structure that is connected and convex. For example, the topology of color corresponds to (some variant of) the color wheel, that of time to the real line. Qualitative notions (e.g. ‘relative temperature’) are supported via the topology of ordered intervals. Goguen has previously proposed a system which uses conceptual spaces for describing anthropocentric reasoning about space and time [118], observing:

Sensors, effectors, and world models ground elements of conceptual spaces in reality, where the world models are geometrical spaces. This implies that the symbol grounding problem is artificial, created by a desire for something that is not possible for purely symbolic systems, as in classic logic-based AI, but which is natural for situated systems.

Compositionality of conceptual spaces has been explicitly studied: in recent work, Bolt et al. [28] employ category theory to perform compositional interpretation of natural language via a grammar over convex relations. This work is a foundation for future research into highly-expressive and principled compositionality. For SCL purposes, the ‘expression language’ of conceptual spaces is therefore that of affine transformations of convex (sub)regions of the state-space, and the associated truth predicate can always be determined as a function of the set of points in the intersection of two regions [342]. Expressing the operations of SCL (as described in Sect. 7.2) in the language of conceptual spaces requires much less sophisticated interpretation than does natural language, with geometric operations being sufficient to support these inference mechanisms.

2.2 Compositional Primitives

Previous chapters have argued that compositionality is key to addressing the foundational issues of generalization and representing long-tailed distributions with high sample efficiency. This suggests that what is desired are a collection of representations which form a ‘basis set’, the elements of which can be composed (perhaps nontrivially, via analogy) to describe a wide range of phenomena. We therefore claim that the strongest possible emphasis should be placed on the search for compositional primitives, i.e., compressed parametric representations of recurring phenomena, captured across multiple sensorimotor modalities. In cognitive science, such abstractions are known as image schema [159, 188] and are intended to represent common patterns in the embodied experience of space, force, motion, etc. Early work in this area induced image schema corresponding to spatial propositions from video data [96, 285]. There have also been attempts to model image schema symbolically [3, 139, 182], with recent work on a qualitative representation of containers in a sorted first order language [60]. It is clearly desirable that computational representations of image schema enjoy the cross-domain ubiquity ascribed to their cognitive counterparts. Concurrently with the development of the present work, a recent trends in deep learning is the proposed universality of so-called ‘foundation models’ [29] which provide a broad basis of representations for downstream tasks through large-scale self-supervised training. While this paradigm offers the advantage of well-known engineering pipelines, we saw in Chap. 4 that compositionality in the algebraic sense is essentially absent from deep learning, as considered across heterogeneous architectures and arbitrary constraints. Furthermore, the full grounding of language and other symbols will require representations which support strong invariant propagation and the ability to produce reasonable counterfactual statements. Since achieving the associated ‘malleability of representation’ enjoyed by humans has so far proved elusive, it is perhaps useful to focus initially on a related, but more overtly embodied notion: that of ‘affordances’.

Affordances

The term ‘affordance’ was coined by Gibson [109] to describe a relation between an agent and its environment, grounded by the physical embodiment of the agent and the recognition capacity of its perception system:

If you know what can be done with a graspable detached object, what it can be used for, you can call it whatever you please. The theory of affordances rescues us from the philosophical muddle of assuming fixed classes of objects, each defined by its common features and then given a name. [\(\ldots \)] But this does not mean you cannot learn how to use things and perceive their uses. You do not have to classify and label things in order to perceive what they afford.

Although Gibson [109] initially described affordances as being ‘directly perceivable’, it is more useful for general intelligence purposes to equate (at least ex nihilo) perception of an affordance to be equivalent to hypothesis generation, i.e., to potentially require nontrivial computational work. Supporting evidence that affordances are anyway not ‘directly perceivable’:

  • Tool use in crows, where previous work [301] implies that affordances are not simply a function of the relation between body and environment, and have (at the very least) a memetic component.

  • Although the ability to create fire (from flint and tinder) or steel (from iron and carbon) could be said to be inherent in their component parts, their manufacture required nontrivial insight.

Hence affordances offer an overarching perspective on situated representations. We believe that composition of affordances is a key step towards general intelligence and that the category theoretic machinery of Sect. 9.1 provides a suitable framing for processes that abstract and generalize across complex configuration spaces. The initial research task is then to determine the right ‘expression language’ for describing the affordances of simple agents in simple domains (which are nonetheless ‘noisy and real-world’ [35]).

Subject to the ability to generalize from initial results to more complex domains, it is then appropriate to progress from explicitly agent-centric affordances to the more general patterns described as image schema, which might then have a greater prospect of being more independent of any specific embodied configuration. By these means, it may be possible to determine whether image schema do indeed exist as universal compositional primitives and whether—as has variously been suggested [145, 228, 229, 320]—analogy has a vital role as a universal mechanism for leveraging existing knowledge.

2.3 Links with Behavioral Control

For purposes of building links with existing approaches, it is illustrative to revisit SCL from the perspective of behavioural control of open systems. In this setting, the system can be seen as taking as input a continual stream of data, consisting of sensor and monitor states, and performing open-ended learning in the following manner:

  • Observe some input and use it to progressively learn increasingly hierarchical SCL expressions.

  • Interpret some applicable subset of SCL expressions to yield predictions and/or effector actions.

  • Feed-back information about ‘surprising’ environmental transitions into the stateful parts of the observation and interpretation processes.

At the intersection of control theory and applied category theory, there is increasing interest in the behavioral approach, in which models are relations rather than merely functions. The methodological treatment due to Willems [365] comprises phases referred to as ‘tearing, zooming, and linking’. ‘Tearing’ is the transformation of observed behavior into a collection of models, performed recursively (‘zooming’) until some elementary level of model complexity is reached. ‘Linking’ is then the composition of this collection of models. The process is refined until it can obtain predictions which match the observed behavior.

A previous category-theoretic treatment of control [15] has used relations on finite-dimensional vector spaces (the category \({\mathbf {FinRel}}_{k}\)). The fact that relations are well-suited to capturing invariants is of particular interest for control and state estimation. Most interestingly, it has been shown that an analog of Lagrangean conservation laws [329] can be obtained in the very general setting of typed relations [10]. Relations also fit well with the ‘inference as typed program synthesis’ approach of SCL, since they are far better suited than functional descriptions for transforming specifications (e.g. the ‘task language’ of work on command) into implementation (the corresponding hypothesis chain) [289].

2.4 Pragmatics via ‘Causal Garbage Collection’

As discussed in Sect. 9.2, systems that operate without ‘closed-world’ assumptions can always encounter transitions which confound the expectations of their world model. Since this necessitates some form of context-specific repair to the learned denotational semantics of the model, we term this to be a ‘pragmatic’ activity. We now describe a prospective means of applying a repair that is then made consistent across the entire ruleset of the world model.

For different kinds of algebraic structure, it is common to generalize the notion of ‘basis set’ familiar from linear algebra to that of ‘generators and relators’. For example, \(C_n\) the cyclic group of order n has a single generator (g, say) and a single relator (an equation defining equivalence in the algebraic structure), \(g^n=1\), where 1 is the identity element of the group. This is notated as a so-called finite presentation: \(\langle g \mid g^n=1 \rangle \). Similarly, the symmetry group of the square has presentation

$$ \langle r,s \mid r^4=s^2=(sr)^2=1 \rangle $$

where r corresponds to rotation by 90 degrees and s to reflection. The relator equations collectively define a rewriting system [12]. For certain classes of algebraic structure, this rewriting system can be iteratively applied to yield a unique normal form for any algebraic expression. Hence, since e.g. exponents in \(C_3\) are modulo 3, then all of \(g^5,g^8,g^{11} \ldots \) are rewritten to the normal form \(g^2\).

With regard to control, Baez [15] similarly defines generators and relators for composition of expressions in \({\mathbf {FinRel}}_{k}\). Hence, a particular combination of elements may be reducible to a unique simpler representation. When the system is ‘surprised’ by a discrepancy between prediction and observation, it must be because the current interpretation of an expression does not accord with reality. This means that there are latent interactions which are not captured by the default denotational semantics. Hence, either an existing relator is invalid (as far as the world is concerned) or else there must be additional unknown relators. It is possible in principle to simultaneously make all existing invalid inferences consistent by constructing a new rewriting system. For certain classes of algebraic structure, this can achieved via the ‘Knuth–Bendix Algorithm for Strings’ [175]. This algorithm (strictly, procedure) is semidecidable; it will halt for all finitely presented algebras, but it is not possible to determine in advance how long this will take. One option would therefore be to run it as a background monitor or (typically) low-priority task, as a form of ‘garbage collection’ for causal inconsistencies.

3 Conclusion

Recent advances in machine learning have led to it being considered synonymous with the entirety of artificial intelligence, at least in popular conception. However, as exemplified by deep learning, this represents a very specific form of program synthesis, in which:

  • Mission objectives/constraints are specified a priori.

  • Voluminous data and potentially massive computational resources are available for training.

  • The trained system is deployed into the production environment and remains unchanged thereafter.

  • It is assumed that the training data/learning algorithm suffices for generalization to the production environment, even over time.

In order to solve a problem via machine learning, it is therefore necessary to impose strong a priori constraints, both on the design space and the production environment. Constraining the design space of the learner is almost always done via specialized human labor; for example, pruning the space of possible input features, crafting a reward/objective function, selecting and optimizing hyperparameters, etc. While objectives can readily be specified for simple domains (e.g. board games), in complex application domains such as those in the real world, the practical difficulties have caused initial high expectations (e.g. for autonomous vehicles) to be repeatedly revised downwards.

An additional vital concern for the artificial intelligence community is the increasing evidence that machine learning is not operating at the appropriate causal level. This is problematic since it is likely to lead to overfitting, something which is anyway encouraged by the trend for huge parameter spaces. More generally, machine learning is good at manipulating data, but this has not been demonstrated to lead to understanding, i.e., the ability to represent the space of possibilities spanned by the constraints that are latent in the training set. This has corresponding implications for robustness and safety which we discussed in detail. We have therefore argued that, in order for machine learning to progress, it must not only embrace a stronger notion of causality, but also embed it in a more comprehensive learning framework that supports reflective reasoning, which we term ‘Semantically Closed Learning’ (SCL).

It might nonetheless be claimed that the mechanisms of SCL are superfluous for general intelligence, given that reinforcement learning agents can be specified as “taking action so as to maximize an aggregate of future rewards, as a function of previous environmental observations” [143]. This statement is extremely broad, and could be argued to be AI-complete in capability. However, the broadness of the associated function signature does not automatically imbue RL with the required learning ability. Indeed, we argue that the interpretation of the above specification via common practice has canalized the expressiveness, generalization, and learning efficiency of RL. This canalization proceeds via:

  • The assumption that rewards commensurate with general intelligence can meaningfully be specified a priori.

  • The notion that feedback is best propagated via numeric (indeed, often scalar) rewards.

  • The notion that ‘learn, then deploy’ is sufficient.

It could be argued that alternatives to each of these default assumptions have been separately explored at the boundaries of RL research. However:

  • Default practice is so strongly culturally ingrained that it is necessary to explicitly delineate the alternatives.

  • There is no singular framework that simultaneously moves beyond all of these assumptions in an integrated manner. If all these assumptions are simultaneously removed, we claim this inevitably requires that RL simultaneously integrates semantic closure, anytime-bounded rationality, and second-order automation, which then effectively makes the ‘new’ RL synonymous with the proposed compositional framework of Semantically Closed Learning.

We have presented a roadmap which re-asserts the importance of embodiment and the ‘Physical Grounding Hypothesis’ [35], i.e., the necessity of making decisions, anytime, in a complex, noisy environment. We strongly believe that the artificial intelligence community must finally embrace ‘the whole iguana’ [62] of general intelligence, i.e., to design systems which are capable of open-ended learning in inevitably-changing, real-world production environments, starting from minimal objectives. The value proposition for general intelligence is then the elimination of the need to specify meaningful reward functions upfront and maintain them in tandem with a changing environment, which cannot scale in practice.

Our concept of 2nd order automation engineering realizes a specific implementation of SCL as a minimal yet concrete technical design. Inspired by biological growth, where system agency is adapted and maintained despite evolutionary pressure, we have presented three automated procedures for autonomous system engineering: self-identification, synthesis, and maintenance with guarantees. These constitute the necessary developmental dynamics of machines designed to abide to the non-stationary arbitrary expression of value, conditioned by a reflective evaluation of risk. To lift this design to a fully developed general and generative system theory will be the continuation of our work. This will inevitably demand the departure from the prevalent and exclusively algorithmic (or computationalist) world-view, towards a science of self-organized, grounded, and constructive control processes.