This section presents three applications – one for each learning mechanism – that were developed to test how AI models can inform conceptual design and idea generation, and therefore be used to develop more autonomous and participative design systems.
These applications were informed by the requirements described in “Requirements for AI-driven acquisition and communication of design knowledge”. For each application we:
introduce the AI model chosen to simulate the learning mechanism and explain why such a model is suitable for the task.
provide a cognitive interpretation of the chosen AI model.
illustrate how the AI model is trained on a design task and interfaced with CAD software.
We do not provide any technical details about the implementations because such details fall outside the scope of this article. Details on similar methods to those used for the first application are published in Mirra and Pugnale . Two other papers on playfulness and analogical reasoning are currently under review and will soon be published.
Here, we use diagrams to describe and compare the functional components of the AI models and how they were interfaced with CAD software. These components include (1) trainable AI modules; (2) the input required for the training process; (3) the output produced by the AI models; (4) the input provided to the AI model at the inference time, that is, after completion of the training and once the model has been interfaced with CAD software.
Simulating expertise with generative models
Generative Models are a class of AI techniques that are used for data generation. They have received ever-increasing attention from the AI research community since the invention of Generative Adversarial Networks (GANs) , that is, models that can be trained to synthesise artificial, albeit extremely realistic, images of human faces and objects that have never existed [69, 70]. Apart from images, these models have been successfully applied to synthesise audio, 3D models and other data typologies.
GANs and other Generative Models, such as Variational Autoencoders (VAEs) , learn in an unsupervised fashion. They do not require any human input, apart from the provision of a dataset. The application presented in this section uses VAEs to (1) extract features from precedents and (2) recombine such features to generate design propositions that do not exist.
Acquiring knowledge by copying
The VAE architecture comprises two components: an encoder and a decoder. The encoder compresses input data into a low-dimensional representation, whereas the decoder attempts to reconstruct the input data from such a low-dimensional representation. The VAE is trained to perform this task on every dataset sample, which results in a set of low-dimensional representations that are uniformly distributed in a ‘latent space’. After training, this latent space can be sampled to produce data that resemble those that populate the dataset. The resemblance is the result of the preservation of some underlying features of the dataset in the newly generated data.
Since VAEs acquire knowledge by reconstructing data – in this case, design precedents – we observed that they could simulate the strategy of knowledge acquisition described by Cross (see “Expertise”), i.e. learning through copying/reproducing. The reconstruction process follows the encoding of information, which has a psychological analogousness: the process of constructing mental representations from perceptual stimuli . The set of ‘mental representations’ constructed by a VAE is constituted by the low-dimensional representations that define the latent space.
In a previous work , we highlighted the similarity between the latent space constructed by a VAE and a design space constructed in conventional computational design applications, such as parametric design and optimisation. In this article, we extend the analogy by stating that the latent space can also be understood as a surrogate of the designer’s ‘conceptual space’, that is, the frame of reference that includes design knowledge but also a cultural background within which new ideas can be generated .
We realise that this analogy may sound inappropriate: AI does not possess a cultural background as intended in social sciences. However, if we extend the definition of culture provided by Hofstede  to artificial systems – i.e. “the collective programming of the mind that distinguishes the members of one group or category of people from another” – it can be argued that the group of AI models, collectively, does in fact exhibit a certain form of culture.
Learning from structural design precedents
In this section, we describe an application of VAE used to learn from a dataset of 40 shell and spatial structures designed by influential architects and engineers. The dataset comprised a variety of structural typologies, including masonry and RC shells, gridshells and membranes. We modelled each design sample in 3D and converted it into a 128 × 128 pixel 2D depth map. This conversion process was necessary to train an existing VAE implementation that learns from images . A data augmentation strategy was used to increase the number of samples from 40 to 4000. This strategy involves performing a set of rigid transformations of the depth maps – rotation and translation – and it allows the VAE to recognise geometric features in the samples that are independent of the position and orientation. Figure 1 illustrates the trainable VAE components and the source of the input data.
Figure 2 shows the result of the qualitative test that was performed to evaluate the characteristics of the features learnt by the model and the quality of the synthesised data. We selected four design samples from the dataset, fed them to the VAE encoder and obtained their low-dimensional representations. We then linearly interpolated these low-dimensional representations and fed them to the VAE decoder to synthesise new forms. Our results demonstrate that, even with the provision of only 40 3D models, the AI model was able to learn to extract and recombine geometric patterns in a meaningful way. In this case, the model was able to synthesise hybrid designs that blended the main features – such as openings, support edges, and curvature inversions – that characterised the selected designs.
Exploring the VAE conceptual space
We developed an AI-CAD interface to explore the potential uses of the model in design applications. Our interface comprised two nodes: (1) a server on which the AI model was run, and (2) a client that sent information to the server and waited for the output. The client exploited the GUI of existing CAD software – Rhinoceros 3D – and relied on a visual scripting environment – Grasshopper – to manage the I/O communication with the AI model.
The interface allowed the conceptual space of AI to be explored through the exchange of visual information with the model. The designer sketches 2D footprints on a canvas in Rhinoceros and waits for the model to transform the sketch into a 3D model that can be visualised on the same canvas. Therefore, unlike conventional computational design approaches, the exploration of design options involves neither the manipulation variables nor the analysis of design performances.
Figure 3 (on the right-hand side) shows a set of design propositions developed by the model, starting from 2D footprints sketched in Rhinoceros. Each row identifies multiple solutions generated from the same input by means of recursively feeding the model with forms generated over consecutive iterations. We define this process interpretation: the model first produces a 3D model by reproducing coarse design features that were extracted from the dataset, and progressively refining the form into a plausible design proposition.
Simulating playfulness with reinforcement learning
Reinforcement Learning (RL) is one of the most popular areas of research in machine learning. It is the technique that allowed AI to master board games, such as ‘Go’, at a higher-than-human level . It has also been used successfully in video games  and robotic control applications .
Here, we provide an interpretation of design as an RL problem. In particular, we focus on the possibility of modelling drawing as a time-dependent decision process and training an AI model to produce design options that satisfy certain requirements.
A playful exploration of design possibilities through drawing
RL applications are formalised as Markov Decision Processes (MDPs), which are abstractions used to represent sequential decision-making problems where an agent interacts with an environment . In plain terms, the goal of an MDP is to find a function that maps states observed from the environment into actions that maximise the agent’s reward. This function can be derived deterministically or learnt – i.e. approximated – by an artificial neural network. For an overview of the available techniques, see Sutton and Barto .
An RL model consists of a network – also known as ‘policy’ – that predicts the actions an agent must perform in an environment. In the case of chess, the agent is the player, while the environment consists of the chessboard on which actions – or game moves – are performed. The network is trained to predict actions that maximise the agent’s future reward, which in this case corresponds to the quality of a chess move in relation to the expectation of winning the game. Unlike other techniques, RL models are usually trained without any dataset. They autonomously learn by trying different actions, observing the reward of each action, and ‘reinforcing’, i.e., performing those actions that led to the achievement of a higher reward more frequently. The agent must: (1) interpret the state of the environment – e.g., a specific configuration of the chessboard – at each time step of the decision process; and (2) balance the exploration of the environment with the exploitation of the acquired knowledge.
The learnt policy defines the behaviour of the agent in the time domain, that is, how it will act/move within the environment in consecutive time steps. Since the design process is also dynamic, we propose modelling it through the MDP formalism. We define this implementation as the Markov Decision Design Process (MDDP).
In an MDDP, the agent is the designer who learns a design strategy to maximise a reward that can either be the achievement of a design goal or the pleasure of engaging in the design activity itself. The environment of an MDDP can be interpreted in many different ways: it can be (1) a conceptual space in which the designer performs design moves ; (2) a working environment in which the designer cooperates with other designers – i.e. with other agents – and/or negotiates with the client; or (3) a drawing board or spatial grid in which actions correspond to placing either strokes on paper or blocks in space to form a drawing or a spatial configuration.
In the following application, we implement the third interpretation of an MDDP, which is based on an interaction with a drawing board. We assume that the agent does not have any prior design knowledge and thus cannot rely on an existing policy. Like a child, the agent engages in the ‘playful’ exploration of an environment and develops its own policy from scratch.
Learning to design an arch/frame
We describe an application of an MDDP to train an AI agent to solve a simple design task. The task consisted in designing a 2D frame structure made of welded steel pipes and connected to the ground by two support nodes. The objective was to develop feasible design options for a variety of boundary conditions. The feasibility of a design option depended on the satisfaction of two requirements: (1) avoiding collisions of the structure with differently sized obstacles, which were randomly placed in the environment; and (2) minimising the displacement of the structure under vertical loads.
The agent was trained using a custom-made implementation of Deep Q-Network (DQN) . Figure 4 shows the trainable components of the model and the input provided for the training process. In this case, the input does not consist of a dataset but of an environment with which the agent interacts. The environment includes: (1) a drawing board, which is represented by a 32 × 32 pixel greyscale image; (2) a set of actions that control the placement of single pixels onto the drawing board; (3) an FEM solver that converts the placed pixels into nodes of a 2D structural frame, assigns mechanical properties and loading conditions, and computes the maximum displacement.
The agent observes, at each step, the current state of the drawing board – which at time zero includes the obstacle and the support nodes – and decides where to place the next structural node. The agent receives a negative reward if it positions a node within the obstacle boundaries and a positive reward if it reaches the second support point within a maximum of 300 time-steps, at which point the agent will also receive an additional reward based on the results of the structural analysis.
We evaluated the trained model by analysing its ability to generate complete structures from new boundary conditions. Figure 5 shows the test result, which demonstrated that the agent was able to produce a complete structure in about 90% of the cases and that most of the designed structures presented a negligible displacement.
Interacting with the AI agent through drawing
We developed and tested an interface to allow humans to communicate with the trained AI agent. The test required the agent to interpret and complete a partially drawn structure in a meaningful way.
Figure 6 shows the results of the test. We observed that the agent was able to complete a structurally sound frame most of the time (top row) but failed to produce performative frames when the input was significantly different from the paths produced during the training process (bottom row).
The most relevant outcome of this test was the confirmation that the dynamic nature of the agent-environment interactions at the training time also characterised the agent-human interaction at the test time. Overall, we considered this form of interaction more powerful than the static interaction mode supported by VAEs (“Simulating playfulness with reinforcement learning”).
Simulating analogical reasoning with reinforced adversarial learning
Reinforced adversarial learning is an AI technique that was first introduced by Ganin et al. , and then refined by Mellor et al. . The implementation of the technique – named SPIRAL – involves combining reinforcement learning with generative models to train AI agents in image synthesis. Unlike conventional data generation models, SPIRAL does not generate images through the recombination of features in the pixel space. The model instead learns to perform actions in a drawing software.
We here describe our implementation of SPIRAL whereby the agent interacts with a 3D modelling environment – and therefore synthesises 3D models instead of images – to design artificial replacements of natural habitats. We used a dataset of biological forms to guide the agent in the extraction of formal features that were relevant for the task.
AI-driven visual abstraction
Ganin et al.  tested SPIRAL for two kinds of application: (1) inverse graphics and (2) non-photorealistic rendering. The first type of application concerns, for instance, finding a set of commands to reconstruct an image within drawing software, whereas the second involves producing an artistic representation of a target image that preserves the main features of such an image. SPIRAL is based on reinforcement learning, and therefore, in a similarly way to the application described in “Simulating playfulness with reinforcement learning”, it models an agent that explores different drawing actions to maximise a reward. In SPIRAL, this reward is the ‘similarity between images produced by the agent and target images’, which can include human faces, handwritten digits or 2D projections of 3D sceneries. However, the reward is not provided to the agent through a deterministic function, like the FEM solver described in “Simulating playfulness with reinforcement learning”, but is learnt by the model together with the policy. Ganin et al.  included an additional network in the SPIRAL model to learn the similarity function, that is, a GAN discriminator . The discriminator is trained to learn a similarity score that differentiates between images generated by the agent and the images that populate the dataset. The similarity score informs the agent about how good the images that it has produced are.
Because of the features described above, SPIRAL can be classified as a generative model. However, SPIRAL can do more than just generate realistic images: it can also produce ‘visual abstractions’ of images, that is, simplified representations that are still able to contain the figurative meaning . The simplification is enabled by the possibility of defining which and how many drawing actions the agent can perform, which effectively constraints its representational capabilities.
We exploited the process of visual abstraction to make the SPIRAL agent synthesise simplified representations of complex natural forms. This process causes the transferral of knowledge from biology to design, and is thus classified as analogical reasoning, through the specification of design constraints.
In order to relate the reinforcement-learning application presented here with the MDDP formalism described in “A playful exploration of design possibilities through drawing”, we define the problem as a ‘conditional MDDP’. The agent still engages in the playful exploration of design possibilities, but its behaviour is conditioned by knowledge acquired from a different domain, which, in this case, is biology.
Learning to design simplified tree forms
We tested the ability of our SPIRAL implementation to acquire knowledge from a dataset of tree forms and to synthesise visual abstractions of such forms. The forms synthesised by the agent can be used to inform the design of human-made replacements for deforested areas that are easy to build and scalable. For an overview of the challenges related to this sort of design problem, see Hannan et al. .
We simulated analogical reasoning by specifying the following design constraints. We limited the agent’s action space to the placement of lines in the 3D modelling environment. These lines represented wooden poles through which a digital design option, produced by the agent, could be materialised in the real world. Furthermore, we limited the number of actions to 10. This set an upper bound for the number of lines the agent could use to synthesise a 3D form.
We developed a simple 3D modelling environment consisting of a 32 × 32 × 32 spatial grid. The agent could move a cursor within the grid boundaries and place lines that were rendered as voxels. Figure 7 illustrates the source of data used for this application, including the dataset and environment, and the two trainable components of the SPIRAL architecture.
We tested our implementation by analysing the forms produced by the agent at the last iterations of the training process. Figure 8 shows a sample of such forms. Our analysis involved (1) visually inspecting the forms generated by the agent to assess their similarity with the tree form dataset, and (2) applying a set of synthetic measures to extract geometric features and evaluate the suitability of the synthesised forms for the design of natural habitat replacements. We found that the agent was able to successfully reproduce the main features of the tree forms, such as the branching patterns and trunk-canopy articulations, even for a limited provision of only 10 modelling actions.
Interacting with the AI agent through 3D modelling
At the current stage of development, our application does not feature a human-AI interface. However, we imagine that an interface like the one described in “Interacting with the AI agent through drawing” could easily be extended to 3D modelling and implemented to interact with the SPIRAL agent. Such an interface will allow the designer to define a partial 3D form and seek design suggestions from the agent on how to further develop their design proposition.