1 Introduction

Artificial intelligence, as algorithmic computer programs are called that can generate data from heterogeneous, most often textual and image templates or patterns, evaluate these large data sets according to complex criterion structures, develop downstream new algorithms, and use them to generate outputs of a nature identical or subsequent to the initial templates and patterns, is considered “the next level” in the field of visual design - and architecture, engineering, construction, and operation of real estate (AECO) hopefully, too. Only belated, however, AECO has been joining the trend of artificial intelligence that has been entering our lives and professions since the 1980s.

1.1 Reimagining AECO

Not by chance, the wavering approach to the new technology tunes with these disciplines´ failures to cope with societal and economic development; in AECO – in architectural design and real estate (RE) development in particular, it has been a reality for at least 70 years. At the same time, the influence and impact of the disciplines´ general performance on the economy, social affairs and issues, and sustainability are immense. Dissatisfaction with the state and development of the built environment has been increasing through recent decades, while comprehension of the causes is lacking. Nevertheless, the need for the field’s paradigm change renders: new, promissing technologies appear, such as virtual reality and artificial intelligence (AI), that could improve architectural, planning, and operation practices. Virtual twins offer unprecedented abilities to create, understand, and communicate AECO and the built environment, while AI provides machine-learning capabilities for design, planning, and parametric review and assessment. Lack of comprehension of the nature and potential of the technologies is the issue so far.

1.2 State-of-the-art

Since around 2010, global star-architectural studios alongside young enthusiasts combining information technology and architecture try to embrace AI’s potential contribution to architectural design or, better to say, disclose wherefrom it might stem and what it might consist of. In 2020, DeepHimmelb(l)au - a video of a journey through an imaginary landscape of Coop Himmelb(l)au-like building forms - has come into existence. The elaboration of datasets of reference images of geomorphic formations on the one hand and actual Coop Himmelb(l)au projects on the other by CycleGAN and other forms of GAN (generative adversarial network) technologies provided “machine hallucinations” (Bolojan, 2022; Coop Himmelb(l)au, 2023; Leach, 2022a) - represented prevailingly in two dimensions, substantially lacking both spatial comprehensivity and the for architecture inherent interconnectedness of the experiential (poetic, in other words) and material attributes, discussed further in section (4) of the paper.

1.3 How is the paper organized

Reviewed by Sourek, (2022) and presented in Wearrecho, (2023), the virtual twins’ technology, its benefits and prospects attracts no more paper that concentrates on AI. The AECO domain on the one hand and data and computer sicence and development on the other are both deeply complex and greatly different from each other: achieving mutual understanding of the two is a challenge that has so far defied attempts to overcome. Revealing such a common ground is the intention of this paper in the broadest sense.

First, the paper challenges the existing approach to AI’s deployment within AECO, its theoretical starting points, and the perspectives put forward so far. To do so, the framework of a state-of-the-art of field is introduced; the methods used and expectations (2) declared are assessed within the framework. Section (3) provides a general summary of the achieved results, which the subsequent section (4) extensively discusses. The overview is a foundation for identifying the misunderstandings and limitations of recent attempts and expectations regarding the use of AI in AECO - for debunking some of the “fantastic achievements of AI promising to make man redundant” contemporarily discussed in AECO professions and beyond. In contrast, other, so far not considered perspectives of AI in AECO are introduced: in pattern-based approach to design and RE management and operation, in reinforcement learning, imitation-based learning, learning a behavior policy from demonstration, and self-learning paradigms zooming in on the AECO-design-development processes instead of only on their results. Discussion (4) sketches how AI could contribute to an upheaval of the AECO professions by overcoming their recent and contemporary technological lacking behind; true, authentic architectural, engineering, and RE-management creativity will not be stifled by the technology – the opposite, it will be unleashed by delegating parametric problems to AI that is unbeatable when it comes to computational and iterative issues. A flowchart rendering visually the proposed unprecedented AI-aided AECO-design development boosts the effectiveness of the discussion. Finally, Conclusions (5) outline the principles of directions and particular goals of further development of machine learning for the good of AECO, the built environment, and sustainable development.

2 Methods and expectations

Though generally (and in this paper, too) labeled as AI, the term machine learning (ML) adheres better to the use of the technology in AECOe: learning or training is the keyword, and machine learning is a label for various methods of how, in a shortcut, AI works. ML algorithms create models stemming sample data that the algorithms have been trainin the models on to make decisions or proposals without being programmed to do so. The base for the learning is a set of data possessing the same characteristics as the data to be generated: a truly large file as will be shown in (4), and a comprehensive one; what is not in it, the AI cannot learn. Variations of machine learning deserve reminder: supervised learning, unsupervised learning, reinforcement learning, and various alternatives and fusions.

2.1 How a machine can learn

In supervised learning, the system is given a series of categorized or labeled examples and told to make predicions about new examples it hasn’t seen yet, or for which the ground truth is not yet known (Christian, 2020a). Supervised learning uses labeled datasets, whereas unsupervised learning uses unlabeled datasets. “Labeled”, means the data already tagged with the requested answer. In supervised learning, the learning algorithm measures its accuracy through the loss function, adjusting until the error is sufficiently minimized. Two types of supervised learning distinct - classification and regression. Classification uses an algorithm to assign test data into specific categories. It recognizes specific entities within the dataset and attempts to draw some conclusions on how to label or define the entities. Common classification algorithms are support vector machines, linear classifiers, decision trees, k-nearest neighbor, random forest, and others. Regression applies to understand the relationship between dependent and independent variables - commonly to make projections. Linear regression, logistic regression, and polynomial regression are popular regression algorithms (IBM, 2022a).

Unsupervised learning analyzes and clusters unlabeled datasets. These algorithms discover hidden patterns or data groupings without the need for human intervention - without the need for labeling the datasets. In unsupervised learning, a machine is simply given a heap of data and told to make sense of it - to find patterns, regularities, and useful ways of condensing, representing, or visualizing it. (Christian, 2020a; IBM, 2022b).

Reinforcement learning concerns how intelligent agents ought to take action in an environment to maximize the notion of cumulative reward. Based typically on the Makarov decision process (a discrete- and continuous-time stochastic control process in mathematics) (Jagtap, 2022), reinforcement learning differs from supervised learning in not needing labeled input/output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead, it focuses on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). In other words, placed into an environment with rewards and punishments, [the system is] told to figure out the best way to minimize the punishments and maximize the rewards (Christian, 2020a; Wikipedia, a, 2022). As the comprehension upgrades of learning processes and - especially - of consequences of their details for the outputs, the reward signals to fine-tune the models tend to be human preferences based (referred to as reinforcement learning from human feedback (Ayush, 2023)) instead of simple automatic metrics. Indicated further in this section, the safety and alignment problems are the starting point for the deployment of these approaches that are much more time- and cost-consuming. Such is, for example, the case of InstructGPT - one of the most advanced language models today.

New implementations of learning paradigms and new learning models evolve. The human mind, human cognitive approaches, and human motivation that, as will be resembled, were an etalon and inspiration for McCulloch and Pitts (1943) at the very origins of “learning machines” does not cease to inspire current R&D as well. Comprising the dopamine-releasing mechanisms, possible-value- and actual-expectation-motivation, curiosity, a self-motivated desire for knowledge, imitation and interactive imitation, self-imitation and transcendence, countless facets of motivation and reward underwent R&D in this regard. Random network distillation (RND) algorithms using prediction errors as a reward signal (Christian, 2020a, b) or algorithms approximating a state-value function in a quality-learning framework (Mnih et al., 2022), or knowledge-seeking agents (Christian, 2020c) are the results. In such algorithms, AI agents render able to come up with their own objectives, measuring intelligence in the end effect in terms of how things behave – not in terms of the reward function (Christian, 2020d).

Imitation-based learning provides three distinct advantages over trial-and-error learning: efficiency, safety, and, which also renders promising for AI’s deployment in AECO, the ability to learn things that are hard to describe (Christian, 2020e). Moreover (again promisingly for “machine-learning-driven architecture”), learning by imitation zooms to the (design) process - “how things come to existence” - instead of output - “how things shall be”. This seemingly petty distinction will introduce in (3) a branching on the path to efficient deployment of AI in AECO design - architectural design in particular: a branching that the efforts so far have not noticed. Section (4) of the paper discusses the so far overlooked path as the game-changer that can unleash the potential of deployment of AI in AECO.

Another “next level” of the extrinsic-reward-free schemes comes with interaction that allows the algorithm to work properly requiring incredibly little feedback as Ross et al. (2011) dataset aggregation (DAgger) has shown. Brought closer in the Touchstones and Traiblasers sub-section, self-imitation and transcendence render to be a “top” of today’s learning schemes (Christian, 2020f).

Imitation learning is a framework for learning a behavior policy from demonstrations. Usually, demonstrations are presented in the form of state-action trajectories, with each pair indicating the action to take at the state being visited. To learn the behavior policy, the demonstrated actions are usually utilized in two ways. The first, known as behavior cloning, treats the action as the target label for each state and then learns a generalized mapping from states to actions in a supervised manner. Another way, known as inverse reinforcement learning, views the demonstrated actions as a sequence of decisions and aims at finding a reward/cost function under which the demonstrated decisions are optimal. Finally, a newer methodology, inverse Q-learning (Q for “quality”) aims at directly learning Q-functions from expert data, implicitly representing rewards, under which the optimal policy can be given as similar to soft Q-learning (Papers with Code, 2023). All these schemes rely on Markov decision processes, where the goal of the apprentice agent is to find a reward function: find it from the expert demonstrations that could explain the expert behavior in reinforced learning or find the agent’s objectives, values, or rewards by observing its own behavior in inverse reinforcement learning (Gonfalonieri, 2023).

Last but not least, dubbed the dark matter of intelligence, self-supervised learning renders a promising path to advance ML. As opposed to supervised learning, which is limited by the availability of labeled data, self-supervised approaches can learn from vast unlabeled data (Balestriero, 2023).

2.2 Artificial neural networks

Approaching the mentioned GAN technologies, artificial neural networks are a type of ML model that can deliver various tasks, including deep learning, Bayesian learning, and more. An artificial neural network is a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons. An artificial neuron that receives a signal then processes it and can signal neurons connected to it. A deep neural network is an artificial neural network with multiple layers between the input and output layers; in a shortcut, a deep neural network makes machine learning deep learning. (Bishop, 2020; Chaillou, 2022a; Datamind, 2022).

Designed by Ian Goodfellow and his colleagues in 2014, GAN is a class of ML frameworks. Representing recent state-of-the-art, GAN is a milestone of R&D launched by neurophysiologist and cybernetician of the University of Illinois at Chicago Warren McCulloch and self-taught logician and cognitive psychologist Walter Pitts. Building on Allan Turing’s work On Computable Numbers (1937), McCulloch’s and Pitt’s foundations-laying paper A Logical Calculus of the Ideas Immitent in Nervous Activity (1943) set a path to describe cognitive functions in abstract terms showing that simple elements connected in a network can have a huge computational capacity.

2.3 Infering by computing

Building upon “the founding fathers´” achievements, the idea of GAN copes with evolutionary biology principle of an arms race between two species. Two neural networks contest with each other in the form of a zero-sum game, where one agent’s gain is another agent’s loss. The core principle of a GAN is an “indirect” training through the discriminator – competitive network agent that can tell how “realistic” the input seems, which itself also updates dynamically. This means that the generator gets no training to minimize the distance to a specific image, but rather to fool the discriminator. This enables the model to learn in an unsupervised manner; however, GANs have also proved useful for semi-supervised learning, fully supervised learning, and reinforcement learning (Leach, 2022b).

An artificial neural network works by computing. In essence, two principles of computing apply in artificial neural networks: feedforward computing and backpropagation. The goal is always to train the models generated to cope with the criteria inserted typically by vast collections of sample datasets. Feedforward computing refers to a type of workflow without feedback connections that would form closed loops; the latter term marks a way of computing the partial derivatives during training. When training a model in the feedforward manner, the input “flows” forward through the network layers from the input to the output. By contrast, while using backpropagation, the model parameters update in the opposite direction: from the (one closer to) output layer to the (one closer to) input one. However, a backpropagation algorithm does not equal training algorithms that provide the model dataset updates; backpropagation is a strategy to compute the gradient in a neural network. Backpropagation is a general technique; in terms of neural networks, it is not restricted to feedforward networks, it works for recurrent neural networks as well (Barreto, 2022).

“Fed” by inputs, the networks deliver outputs “at the end” that try to mimic the deliverables of human work. The principle is that artificial networks deliver relentlessly, very quickly, and in huge quantities - as opposed and by contrast to humans. The vision is that amidst these quantities in no time at all emerge outputs that not only mimic but also attain, if not surpass the quality of human performance. The vision still leaves something to a human - the choice of the most suitable output provided and its fine-tuning, but who knows - 1 day... Objectively, an evaluation of solutions due to a given set of criteria is a task suitable for computer, too - a task easier in principle than a creation.

2.4 Networks´ and techniques´ evolution

GANs, a recent revolution in machine learning provides results today that achieve appreciation, Leach (2022c) puts. First introduced in 1987, the pioneers were convolutional neural networks (CNNs), also known as shift invariant or space invariant neural networks, most commonly applied to analyze visual imagery (Wikipedia, b, 2022). The foundations of CNNs were laid in 1979 when Kunihiko Fukushima (1980) introduced neocognitron, hierarchical, a multilayered artificial neural network proposed for Japanese handwritten character recognition and other pattern recognition tasks. ImageNet (2022), a groundbreaking project from the 2010s builds on this technology. Graph neural networks (GNNs) are another field of recent research aiming at the processing of graph data. And various applications of GANs are still emerging: FrankenGAN for urban context massing, detailing, and texturing, Pix2PixHD by Nvidia for high-resolution photorealistic image-to-image translation, GAN Loci, or GauGAN (Chaillou, 2022b).

The “recent” indication adheres better to recursive and (more popular) recurrent neural networks (RNNs) in fact. Recurrent neural networks are recursive artificial neural networks with a structure of a linear chain. Whereas recursive neural networks operate on any hierarchical structure, combining child representations into parent representations, recurrent neural networks operate on the linear progression of time, combining the previous time step and a hidden representation into the representation for the current time step (Wikipedia, c, 2022). In 1925, the Ising model by Wilhelm Lenz and Ernst Ising was the first RNN architecture that, however, did not learn. Shun’ichi Amari made it adaptive in 1972, to be also called the Hopfield network later. In 1993, a neural history compressor system solved a “very deep learning” task that required more than 1000 subsequent layers in an RNN unfolded in time. Long short-term memory (LSTM) networks were invented by Hochreiter and Schmidhuber in 1997 and set accuracy records in multiple application domains. Around 2007, LSTM started to revolutionize speech recognition, outperforming traditional models in certain speech applications. In 2009, a Connectionist Temporal Classification (CTC)-trained LSTM network was the first RNN to win pattern recognition contests when it won several competitions in connected handwriting recognition. LSTM also improved large-vocabulary speech recognition and text-to-speech synthesis and broke records for improved machine translation, language modeling, and multilingual language processing. LSTM combined with CNNs improved automatic image captioning (Wikipedia, d, 2022).

Variational auto-encoders (VAEs) develop another technique (Stewart, 2022). Unlike GAN, instead of the generatordiscriminator pair, Variational Autoencoder combines two distinct approaches - encoding and decoding. Encoder abstracts data by compressing while decoder brings the data back to its initial format. Through the decompression, or “reparametrization”, the decoder generates variations of the modeled phenomenon (StackExchange, 2022). The ability to emulate a phenomenon by generating multiple versions of it is a starting point of VAE’s generative potential to provide large quantities of “outputs” (as AI enthusiasts heralding the twilight of design call it) - typically in furniture design, fashion, photography, architecture, and urban design (Wikipedia, e, 2022).

Oposed to the above outlined discriminative and/or decoding techniques that identify objects and infer what is “real” and what is “fake”, generative AI systems create objects such as pictures, audio, writing samples, and anything that computer-controlled systems like 3D printers can build (Burke, 2022). Generative AI allows machines to create new works based on what they have learned from others. With such a straightforward deployment, a question arises of how much the (so far existing) generative AI systems are truly AI-driven in terms of computational networks and processes; however, in the practical framework of this paper, the resolution is not of substantial meaning. As a principle, generative and discriminative or decoding systems most often operate paired in GAN models setting the business-as-usual rather than state-of-the-art of today’s AI industry. Typically, a system labeled as generative AI is self-learning, it uses unsupervised learning (but can use other types of ML, too), and deploys anomaly detection and problem solving - it can come up with innovative solutions or approaches based on its experience with similar problems in the past (Schmidt, 2022).

2.5 General context to compare

There are ecosystems of natural language processing, image processing, voice processing, gaming, and code or software processing and development, further robotics, and expert systems or business intelligence. (Christian, 2020g; Firth-Butterfield, 2022; Gozallo-Birzuela & Garrido-Merchan, 2023; Rijmenam Van, 2023; Stylemania.it, 2023) These ecosystems exist, evolve, and (some of them) work (though sometimes obscured, even covered up) already over decades and render mature. It is a real influx of ever-new AI tools what the present experiences (Kilian, 2023; Martin, 2023; Pichai, 2023; Storm, 2023; Urban, 2023).

Transversally to the closed corporate releases on large language models, there have also been clever ideas and breakthroughs in the field of natural language processing. A new training strategy of meet-in-the-middle (MIM) (Nguyen et al., 2023) has been shown to improve not only the performance but also the interpretability and thus security of large language models that the over next subsection will introduce. Significant progress arrived in computer vision, in both diffusion models and neural radiance fields (NeRF) (Mittal, 2023) - a type of ML algorithm used for 3D modeling and rendering based on deep neural networks capable of generating high-quality, photorealistic images of complex scenes from multiple viewpoints. The new MeshDiffusion (Liu et al., 2023) allows direct generating 3D meshes without any post-processing, and also new FateZero (Qi et al., 2023) can edit the style of the videos using text while keeping the pre-trained model weights intact. Last but not least, a promising marriage of NeRF with CLIP (contrastive language-image pre-training) (Weng et al., 2023; Zheng et al., 2023) arrived: LERF (language embedded radiance ields) (Kerr, 2023). With it, natural language queries in a 3D fashion can apply within NeRF, targeting different objects in the scene. This brief overview highlights several new types of algorithms that may represent the first steps toward productive prospects for deploying AI in AECO. These algorithms, developed (most likely) with no regard to AECO dedication, may break out of the misconceptions that have so far dominated efforts in the field, as section (3) of this paper will show.

2.6 Touchstones and trailblazers

Designed by Alex Krizhnevsky et al. (2012) with Ilya Sutskever and Geoffrey Hinton, AlexNet set a benchmark in image recognition. A composition of eight layers, the first five convolutional layers, some of them followed by max-pooling layers, and the last three fully connected layers, AlexNet competed successfully in the ImageNet Large Scale Visual Recognition Challenge on September 30, 2012, achieving a top-5 error of 15.3%, more than 10.8 percentage points lower than that of the runner up (Gershgorn, 2017). During training, the then-novel graphics processing units (GPUs) delivered the high computing performance demanded by the (then) exceptional model’s depth that was essential for its high performance. Having spurred many more papers employing CNNs and GPUs to accelerate deep learning, the paper introducing AlexNet is one of the most influential in computer vision. According to Google Scholar (2023), the AlexNet paper had over 120,000 citations as of early 2023.

When IBM’s Arthur Samuel developed a ML system for playing checkers in 1959 (Christian, 2020h), he used 38 considerations determining the strength of a position – the number of pieces on each side, the spatial distribution of stones, mobility and space, safety and risks, and on. By 1990, the IBM team working on the chess supercomputer Deep Blue used 8 thousand such considerations (Campbell et al., 2002). This chess evaluation function … probably is more complicated than anything ever described in the computer chess literature, put the team lead Feng-hsiung Hsu – and it deserves noting in this paper’s framework that perhaps similarly complicated is the structure of considerations on – let’s say – a residential building spatial layout development … Moreover, in Deep Blue, those thousands of considerations were brought into balance neither by trial and error (as would be typical for reinforcement learning) nor by human labeling of diverse alternatives (as in supervised learning) but by imitation of human moves employing one of the novel machine-learning technologies, also promissing in terms of AECO deployment as reminded earlier in this section.

Fifteen years later, DeepMind’s AlphaGo system finally implemented Arthur Samuel’s vision of a system that could concoct its own positional considerations from scratch. Instead of being given a big pile of thousands of handcrafted features to consider, it used a deep neural network to automatically identify patterns and relationships that make particular moves attractive, the same way AlexNet had identified the visual textures and shapes that make a dod a dog and a cat a cat (Christian, 2020i). Again, hence an inspiration for AI-led architectural pattern-based design and analysis that later the section (4) will discuss.

As implemented already in Deep Blue, focus on the process instead of the output/result provides another, even more important lesson for AI’s deployment in AECO design. In October 2017, Google DeepMind brought this paradigm to a (so far) ultimate level by going through with the playing-against-itself strategy in AlpahGo Zero (AlphaGo, 2023b).

AlphaGo combines advanced search trees with deep neural networks. The “policy [neural] network” selects the next move to play and the “value network” predicts the winner of the game: a reinforcement-learning paradigm. Initially, the developers introduced AlphaGo to numerous amateur games to help it develop an understanding of the play. Then it played against different versions of itself thousands of times, learning from its mistakes. Over time, AlphaGo improved, becaming increasingly stronger and better at learning and decision-making. AlphaGo went on to defeat Go world champions in different global arenas and arguably became the greatest Go player ever (AlphaGo, 2023a). Introduced in Nature Journal on October 19, 2017, AlphaGo Zero is a next-level version of the Go software created without using data from human games, stronger than any previous version (Silver et al., 2017). By playing games against itself, AlphaGo Zero exceeded all the old versions of AlphaGo in 40 days (AlphaGo, 2023b).

2.7 The Black box problem, security issue and a threat to humanity

With all these nice results, it’s not clear what these models are learning, Mathew Zeiler puts it (Christian, 2020j).

Leaving (in this framework) aside the vulnerability of AI applications to various types of attacks and data “poisoning” allowing for unauthorized access or control over the system, the issue is the complexity of AI algorithms and models that are more or less impossible to interpret or understand. Black box inflicts a lack of transparency and accountability in AI decision-making, as it can show unclear how the system arrived at a particular output or decision. Causing unintended consequences or biases, also in ethical and legal concerns, obviously incorrect or misleading deliverables appear rather often. Inherently embedded in the nature of the learning process, an absence of categories “true” or “false” determines AI-driven decision making. The algorithm typically but deduces the degree of conformity or deviation according to patterns arrived at by own judgment, either without human supervision or under human direction, but always covertly in detail. In cases - not exceptional - when subsequent analysis reveals systematic or occasional inaccuracy of the outputs, confusion of cause and effect in the training data that diverts attention to the background of graphic inputs (bokeh) instead of their core or similar usually shows as the cause.

The bokeh salience feature of AI provides a comprehensive clarification of „famous “Coop Himmelb(l)au “machine hallucinations” (Bolojan, 2022; Coop Himmelb(l)au and Meet DeepHimmelb(l)au, 2023; Leach, 2022a). Figure 1 depicts not the creativity of AI but a misleading perception of visual information hidden in the black box of the algorithm; not creativity but an error and accident. Computer hallucinations by unintended bokeh salience are examples of how technology can be misused to manipulate and misinterpret visual information, either to fake art or to distort scientific research. The AI development community deserves credit for looking for and already delivering the first applications that solve this problem that, nevertheless, remains far from being solved. The goal is to build trust in AI systems by making their decision-making processes more transparent and understandable. Interpretability is the starting point of identifying and addressing a system’s biases or errors; it comes through various methods, such as visualizations, explanations, or feature importance measures. Diverse techniques start to diagnose problems with the network’s training, identify biases in their decision-making, or optimize its performance for a specific task (Christian, 2020k; Pandey, 2022). Nonetheless, achieving interpretability can sometimes come at the cost of the accuracy or performance of the system (Christian, 2020l).

Fig. 1
figure 1

Cloud augmentation of Deep Himmelblau universe© Coop Himmelb(l)au. https://coop-himmelblau.at/method/deep-himmelblau/ (accessed Aug. 20, 2023)

And fears remain. At the end of March 2023, Italy blocked ChatGPT (Zorloni, 2023) to secure the privacy of people and tycoons of global business claim pausing “giant AI experiments” - read the development of AI for six months (Harari et al., 2023) to prevent an unmanaged reaching of the singularity phase of AI development, when a spontaneous technological growth breaks out, and not only the society begins to be irreversibly changed by the effects of technology but humanity loses all control over further development of AI.

3 Results of applying AI in AECO

For a paper headlined Architectural, the preceding section may seem too extensive in terms of both general orientation and scope. However, the state-of-the-art R&D concerning applying AI in architecture shows the opposite: only the awareness of the vast achievements of the other fields allows comprehension of how sidelined (not only in terms of AI) AECO is and, especially, how great the opportunities for the development of the branch are.

A reason for AECO’s, in particular architecture’s sidelining in this regard is the multiple dimensionality and diachrony of architecture that contrast with the nature of the fields of the biggest successes of AI deployment – one-dimensional language or two-dimensional image. The dimensions issue shows (close to) obvious - opposed to the complexity issue: architecture and development of the built environment are complex to a degree that, in practical terms, makes efforts to describe unfeasible. That should turn attention to the (relatively novel, in (2) presented) imitation-based learning or, more generally, to machine learning algorithms elaborating (design) processes instead of the results. As will be shown later in this section, (processes-)imitation-based machine learning should, but so far does not attract attention in the AECO realm, which remains the domain of GANs.

Compared to the double challenge – an additional dimension and a diachrony nature, and an extreme comprehensiveness of the task, the recent and contemporary applications´ development efforts show even more daring – no matter how unsuccessful a class of them is. Or is it naivety?

3.1 AI models and creates architecture – does it?

Among multiple others, also Zaha Hadid (studio) met AI using the technology to render forms not so free to cease resembling antic temples patterns that served as imagery datasets to feed the GAN (Zaha Hadid Studio, 2023). In doctoral research under the supervision of Patrik Schumacher of ZHA in 2017, Daniel Bolojan created Parametric Semiology Study using ML algorithms and other tools of gaming AI implemented in Unity 3D to model the behavior of human agents in order to test the layout of a proposed space (Leach, 2022d).

Stanislas Chaillou (2022c), Nvidia Company, and others provide AI applications to generate floorplans and apartment layouts. ArchiGAN uses generative networks to create 2D and 3D building designs based on input parameters such as dimensions and space requirements. Another model is CityGAN, which generates drafts of city blocks and buildings. From a practical point of view and concerning the efficiency of deployment, the results of both applications are questionable - as in all other similar cases. On the principle of image-to-image translation with conditional adversarial networks (CANs), Phillip Isola (2023) Research Group provides series of machine-generated facades following the “style” and character of the pattern deployed as the “input” (Chai et al., 2023; Chaillou, 2022c). Introduced by the same team, Pix2Pix (2023) is shorthand for an implementation of a generic image-to-image translation using CANs. Developed in 2019 by Kyle Steinfeld (2023a), GAN Loci generates perspective images of urban-like scenes assembled with given facades-like textures, pathways, street furniture, pedestrians, cars, etc., by training to achieve the required “mood” - suburban, public park, etc. (Chaillou, 2022e; Steinfeld, 2023c). Blending the outcomes of Isola’s team and Steinfeld’s R&D, Sketch2Pix provides an interactive application for architectural sketching augmented by automated image-to-image translation (Steinfeld, 2023b). Unfortunately, these applications - and others to come further only render the limits of output-focused algorithms so far aiming to contribute to architectural designs.

Tom Mayne of Morphosis employed AI to develop operational strategies to generate output that could never be predicted. The studio developed Combinatorial Design Studies: a Grasshopper definition of one formal study elaborated by GAN technology provided a range of further combinatorial options (Leach, 2022e). Foster+Partners (2022), another global-star architectural studio cannot stay aside; in its Applied R + D team architects and engineers together with expert programmers combine the best of human intuition and computational rigor working with new technologies such as augmented reality, machine learning, and real-time simulation.

In terms of practical use, predictive simulations render the etalon. ComfortGAN, for example, investigates the challenges of predicting a building’s indoor thermal comfort (Quintana et al., 2020). Also structural design is on the lookout for AI. Using Variational Autoencoders, for instance, research development at MIT investigates how AI can generate diverse structures while ensuring performance standards (Chaillou, 2022f). However, due to the essential material liability of the structural design, the not yet-solved problems of the algorithm’s black box that do not allow to rely on the machine curb so far the deployment of AI in structural design to the theory and conceptual drafting.

On an urban scale, attempts are ongoing to contribute by generating “typical style” road- and circulation patterns and networks using - among others - the Neural Turtle Graphics (Chaillou, 2022g; del Campo & Manninger, 2019). Over the past decade, the deployment of online platforms has provided an adequate infrastructure to the end users (Tian, 2022), also to deploy Generative AI: Spacemaker (Kyle, 2022; Spacemaker, 2022), Cove.tool (2022), Giraffe (2022), or Creo: Design (2022) are a few examples of this growing ecosystem, offering simplified access to AI-based predictive models, generative design, augmenting reality, real-time simulation, additive manufacturing, and IoT to iterate faster, reduce costs, and improve product (Chaillou, 2022h).

Not only start-ups, academia, and spin-offs of global architectural star-studios go in for AI: the global CAD-tycoon Autodesk runs Machine Intelligence AI Lab – and much of Autodesk’s software, including Fusion 360, is AI-enabled and applying generative design today (Leach, 2018a), not to mention the acquisition of Spacemaker (TechCrunch, 2022). Nonetheless, as broad as all this listing may seem, the development of AI for AECO is still in its infancy, failing to catch up with LLMs (large language models), text-to-image processing, deployment of AI in internet search, content placement, and advertising, but also healthcare, pharmaceuticals, insurance, or justice referring to custody and bail (Christian, 2020l).

3.2 Assessment of results achieved in AECO

The overview of the results of applying AI in architecture achieved so far renders shattered: it is neither by accident nor by a lack of caring by the author. When AI performs well concerning parametric aspects of diverse materializations of architecture (such as construction, energy efficiency, daylighting, or noise in buildings and neighborhoods) and fails to be effective and productive in conceptual architectural design, it is not only a temporary swing in the performance of particular efforts. It is a consequence of AI applications´ developers failing to grasp and follow the starting points and the workflow of creating architecture, whether on the scale of buildings or the built environment. Only by overcoming this problem, the way will open up to the efficient and prolific deployment of AI in architecture and AECO broadly as section (5) discusses.

Without questioning skills and ingenuity of the authors, the results of the Phillip Isola Research Group, Kyle Steinfeld, the “typical style” road- and circulation patterns and networks delivered by Neural Turtle Graphics may be interesting outputs of research efforts in computer science, code development, or graphics, however in terms of architectural and AECO workflows and solutions they are only minor conributions. Similarly, the parametric semiology outcomes of Daniel Bolojan or Tom Mayne’s operational strategies render too speculative to provide some practical analytical starting point. Results of DeepHimmelb(l)au and ZHA alike show outputs of hundreds hours of dedicated work of talented multi-expertise teams: outputs (in terms of conceptual approach and contribution – leaving aside the “video show” that has little to do with architecture) that the principal of the studio would sketch by hand within half an hour or so - and, opposed to the AI, he would consider the spatial and operational concept represented by the sketch. All this, is it just a situation of a developing field that needs more time and effort to mature and deliver useful results? Discussion (4) will confront such a perspective with the option that it is a dead-end of the state-of-the-art AI in architecture and AECO.

On the other hand, the values of deliverables provided by the AI of Spacemaker, Cove.tool, or Creo appear ambitious. Starting from a better organization of the working environment of a design engineer, Creo contributes to the productivity and efficiency of his work by model-based defining, simulations, additive and subtractive modeling and manufacturing; Creo fosters the creative potential of a designer by means of generative design (PTC, 2022). Similarly, Cove.tool delivers performance data of the building solutions in real-time employing the power of AI (FinancesOnline, 2022). Cove.tool is a cloud-based network of tools that provides interconnectivity within the teams working in the design and pre-construction cycle on issues of daylight, carbon footprint, climate, geometry, HVAC, cost, or performance.

Also famous as the 2 hundred and 40 million acquisition of the AEC-software tycoon-software-producer Autodesk, Spacemaker not only gives the architects and developers the automation superpower to test design concepts in minutes and explore the best urban design options. It enables users to quickly generate, optimize, and iterate on design alternatives, all while considering design criteria and data like terrain, maps, wind (as Fig. 2 depicts), lighting, traffic, and zoning, with the help of AI. Utilizing the full potential of the site from the start, it allows designers to focus on the creative part of their professional work (Harouk, 2023).

Fig. 2
figure 2

Spacemaker: Microclimate Analysis. https://www.bing.com/videos/search?q=spacemaker&doc.id=603541543342970728&mid=980959C6323F8F58FE53980959C6323F8F58FE53&view=detail&FORM=VIRE (accessed Aug. 21, 2023)

However, the practical deployment of Spacemaker raises doubts: the workflow is the issue. The user enters the address of the location and the boundaries of the territory; with a help of an open database like OSM, the algorithm ganarates the terrain, existing buildings, and structures; the accuracy of the objects generated depends on the data available the quality of which varies from territory to territory. Nevertheless, so far so good. Then the user defines the area to solve, and he can add roads - only manually, an import from a CAD is not available. Buildings can be placed either manually by inserting individual floorplans as objects that can’t be subject to later adjustments, or the buildings can be generated automatically based on input parameters entered: width, height, object shape, minimum/maximum number of floors, and/or by apartments´ sizes mix. Then the user can assess the generated options based on gross- and/or netto-floor area totals. He can also further modify the chosen option by some of the spatial transformations: shift, rotation, ... Spacemaker evaluates the final solution in terms of noise, wind, sunlight, daylight, and microclimate. Exports of the valuation to Excell and of the model designed to Autodesk Revit or to .ifc format are available. All these are valuable and efficient functionalities. However, the workflow - how the design comes to existence, Pavel Shaban (2023) of MS architekti, Prague/Czech Republic based architectural studio claims, may suit a shortsighted real estate trafficker but is far from a creative and responsible architect’s workflow: in a nutshell, a comprehensively sustainable built environment develops along a grid-and-grain public space structure and not vice versa. The public space - streets, squares, parks, places, public amenities areas, ... - has to be designed carefully, responsibly, considerably, and poetically first, to adopt particular buildings only after (Sourek, 2014). However, this is a process that Spacemaker not only does not support but also does not allow.

AI applications to generate floorplans and apartment layouts emerge to be ambivalent when it comes to the effectivity and practical usability of their outputs. The quality and usability of the deliverables seem to be similar to the performance of creative applications such as DeepHimmelb(l)au: hundreds of options are delivered to ease an architect’s task when taking over a load of mechanical generating various options, only to be considered by him finally (Chaillou, 2022h); nonetheless, a vast majority rather all of them appearing prematurely published if not useless when reviewed. On the other hand, when furnishing with furniture prepared layouts, more satisfactory results emerge: as a rule, a generative AI tool shows more capable than a conceptual one. The choice of the (most) suitable of the options generated by AI shall remain reserved to the human architect as “the touch of master’s brush” while the automatic and prompt delivery of “all thinkable” options saved his time and energy. However, what is the factual contribution of AI if no result of acceptable (without substantial further adjustments) quality addresses man when browsing the output set gained this way?

4 Discussion

AI is a super-parrot: it is superb in repeating what it has learned, explains Tomas Mikolov in a chat with Dan Vavra (2023). As already mentioned, concerning AI, learning or training is the key word; for an AI performance, the magnitude and comprehensiveness of the training dataset is the starting point, the algorithm is the method or, running on an artificial neural network, the tool respectively, and the computational performance is the limit.

4.1 Poiésis: architectural design beyond and within AECO ecosystem

From architecture and urban design over construction and MEP (Mechanical, Electricity, Plumbing), environmental, climatic, meteorological, and microclimatic expertise to transportation expertise, economy, demography, and sociology, multiple professions engage in the development of the built environment. The background of some of the fields is natural sciences while, for the others, it is social sciences or even arts – poetics or poiésis (Heidegger, 2000), as will be explained soon. According to the nature of the contribution provided by the respective expertise, the design and evaluation approaches range from “hard” to “soft” ones, from quantitative and material to qualitative and emotional ones. According to such an origin and nature, quantitative parameters define the approach as well as the output in some cases, while it is (close to) feelings or moods in others. Obviously, feelings and moods resist following parametric algorithmization as well as entering datasets. Consequently, a software ecosystem, artificial neural networks not excluded, inevitably fails to elaborate feelings and moods - as opposed to quantitative magnitudes and performances (Barker, 2023).

Approaching architecture as the most significant among the built environment creators, let us be clear: it is not a natural science scheme, algorithm, or calculus that is the architecture’s starting point. Moreover, it is not a linear sequence of signs - opposed to speech or text. On the other hand, among many other attributes, architecture can be consumer goods, too; and the more a consumer goods a practical architecture shall be, the more a pattern, a calculus, and an algorithm contribute to the delivery; but even then, the environment, the narrative of the development, and/or the people passing, entering and using the building or the structure „make the difference“. The theory of public space puts it clearly: As soon as and only exposed in public space, a construction becomes architecture (Sourek, 2014). In theory, architecture unanimously distinguishes from arts. But even so, even when architecture shall not be an art like painting, sculpture, drama, dance, or literature, let us not be shy: It is poetics or poiésis as Martin Heidegger coins in antic Greek that is the starting point and method of architectural creativity. Poetically dwells man, puts it Heidegger (2000): full of merit, yet poetically dwells a man. Poiésis precludes algorithm and vice versa, and similarly, a training dataset limits poiésis. By definition and due to practical reasons, a dataset can never be comprehensive. Then, it cannot but limit the creativity for which, inevitably, the training dataset is „the whole world “– there is nothing beyond.

Also, Encyclopedia Britannica distinguishes and confirms the emotional, social and societal, non-parametric nature of architecture, … the art and technique of designing and building, as distinguished from the skills associated with construction. (Gowans et al., 2023) The characteristics that distinguish a work of architecture from other built structures are (1) the suitability of the work to use by human beings in general and the adaptability of it to particular human activities [and needs], ..., and (3) the communication of experience and ideas through its form. Obviously, “use by human beings”, “human activities and needs” as well as “communication of experiences and ideas” cannot but resist algorithmization as well as digital parametrization.

Among all types and natures of creations by humans, architecture intertwins the most with human consciousness; not by accident. Next to nature it is architecture that creates the world of human existence. In the essay Poetically Dwells Man (Heidegger, 2000), elaborating further his seminal opus Being and Time (Heidegger, 2006) and the theme of Dasein - being-there or existence in English - after the Second World War in relation to the timely and pressing topic of housing, architecture by extension, Heidegger coins the concept of das Geviert - the fourfold in English - the union of the earthly and the heavenly, the human and the divine in man’s existence and in the world of his being - thus, as we have seen, in architecture. This is not only another strong argument refuting the vision of architecture created by an algorithm. It is no coincidence that materiality manifests itself in both consciousness and architecture: materiality manifests itself in them in the same way and is a strong link between them. This recalls the dual nature of architecture - of ideas, emotions, and experiences on the one hand and material, physical on the other - that slowly-slowly begins to lead to uncovering the feasible way of deployment of AI in architecture and grasping its prospects.

Dalibor Vesely (2004) featured and reviewed critically another face of architecture’s duality starting in the heading of the groundbreaking book Architecture in the Age of Divided Representation The Question of Creativity in the Shadow of Production (2004). Creativity never can be substituted by production; however, the material side of architecture - its physical properties both in terms of microclimate convenience, durability, security, ergonomy, operational efficiency, and sustainability - deserve and are keen to enjoy productivity - productivity, that is parametric and algorithms-inclined by nature.

So far in the field of AI in architecture, as in the whole AECO field, however, all the time only analogical, parametric-oriented approaches have been witnessed (the differences between diverse neural networks and AI algorithms, as outlined in (2) make no difference in this regard). Tackling data by a computational algorithm can provide poetics only by chance and randomly. It is not a question of learning or training; by definition, a poetic “output” cannot be trained. Even if bokeh salience offers a “hallucination”, it’s not poiesis nor a creative act; it’s just a random interpretation of training data that we only additionally realize it was misleading. In a conclusion, the idea of a creative contribution of AI to conceptual architectural design is debunked; and together with it the theoretical collateral and all the AI’s outputs in the field so far. On the other hand, debunking the vision of AI or an AI’s “superuser” replacing “the architect genius” (Leach, 2018b) as erroneous should not prevent algorithmizing and machine-generating what fits; and this is the physical aspect of architecture.

4.2 When AI works

Opposed to fine arts, literature, poetry, dance, or drama, whose production is only consumed, architecture is always also used. This is not a denial of the poetic essence of architecture (recalled in previous paragraphs), this is just a remark on the complex nature of architecture. Then inevitably, two realms of architectural design and a plan to build a building can be identified and distinguished: The first one comprises properties and performances that concern (even though not exclusively) the (material) use, while the other delivers poiésis, poetry, mood, excitement, or experience. The interface between the two realms does not match the interface between architecture, as characterized by concerns to use by humans, to human activities and needs, and the communication of experiences and ideas by its form, and construction that materializes the architecture. Set by architecture, the spatial structure of a building gives the ergonomy and efficiency of movement within the building: it is the architectural design, not the construction solution that determines these material, quantitative parameters of the building.

Anything material can be parameterized, anything parametric can be quantified, and anything quantitative can be compared and evaluated objectively - or at least (very) close to objectively. And this is the case with a large part of an architectural design, a proposal of a building or an enclave of the built environment. Parametric properties of architecture are subject to creativity as well, but, to another, one can say the pettier nature of creativity: to the mimetics or mimesis, creativity stemming from imitation, as opposed to the primal creating of poiésis (Heidegger, 2006). The parameters-oriented approach certainly does not equal AI; however, this is the field of AI - a performance of a machine conditioned by training, which is nothing else than mimetic elaborating of templates. This, naturally, can be the realm of feasible deployment of AI in architecture and AECO in general: not a poiétic creation by which a computer could replace an architect.

Concerning the quantitative, objective and comparable assessment of the complex of diverse phsical performances of buildings´ and built environment’s designs – such as operational and energy efficiency, acoustics, ergonomics, daylighting, and other physical benefits that architecture provides to man, community, and society – the state-of-the-art performs mature tools related to particular parameters. This is what software applications like Cove.tool, Creo, Giraffe, Spacemaker, the applications used by MVRDV as outlined further in this chapter, and many other tools already introduced and proven in architectural and planning practice deliver, though not always distinguishing between physical architectural respects and respects of the construction. What has been lacking so far, is first a drive to set and maintain a comprehensive list of all such parameters and second an approach to parametrize, quantify, and evaluate objectively in a comparable way the hitherto overlooked parameters. Perhaps feasible previews and assessment calculation procedures in these respects have been missing so far; however, equipped with the knowledge of AI, paradigms of its deployment, and its potential in terms of data quantities and their processing both the inadequacy admitted and the two shortcomings can be overcome.

Distinguished explicitly from architecture, the construction is another story: parametric, “mathematic” by nature, the mimetic, imitative creativity of designing constructions welcomes algorithms and parametric patterns. Such is the starting point for the excellence of generative AI software systems, their leading computational approaches being optimization and optioneering, analysis and simulation. The examples of most advanced aplications are ETABS, SAP 2000, STAAD PRO, RAPT structural engineering software, SCIA Engineer (2023), or Tribby3D (Salehi & Burgueňo, 2023; Tribby3D, 2023). The structural design community, surprisingly at the first sight, compared to many other expert fields, restrains from (over) using AI. As shown in (2), the interpretability issue is a natural reason. Generative AI may be an approach deployed in multiple structural engineering software tools “from time immemorial”, however, a black box must not have a final say when it comes to responsibility, such as in the case of structural design. So far, rule-based models are proving indispensable in this regard. The vital reliability of design tools for structural engineering is evidence of awareness of the risks that AI algorithms hide and the knowledge of how to tackle them without giving up the wide possibilities and fundamental opportunities of applying AI in parametric generative design and solution optimization.

Mimetic, too, is urban design, its supportive disciplines being parametric by nature. Examples of application of AI in the field have been overviewed in (3) zooming in on tools such as Spacemaker, Creo, or Cove.tool – in terms of both succesfull and contributing use of AI and of misconceptions. The approach represented by Spacemaker to design development of a tool shows “an embodiment” of a problem that tends to become general and affect many AI tools in the development of an initially viable and promising concept. The problem turns out to be an opening of the scissors between the IT line and the user, i.e. the designer line of the tool development resulting in deficiencies concerning the workflow, starting points, and principles of designing.

At this point, Spacemaker “got spanked” for many other AEC software tools that are parametric and shorthand imitative by nature and yet they are pushed to architects as creative tools, which is not so rare in today’s practice. Fortunately, better cases have been witnessed - also in the deployment of Generative AI. MVRDV, Dutch by origin, today a global architectural studio, shows up as a successful pathfinder in terms of AI use and development. In response to the need to push the limits of technological possibilities for the sake of innovative architecture, MVRDV NEXT - shorthand for New Experimental Technologies - was founded in the 2010s as an internal startup. Headed by one of the studio’s partners Sanne van den Burgh, a group of in-house specialists develops and implements computational workflows and new technologies. Through a mixture of project-based work and standalone computational research, they rationalize designs and setup configurations, unlock potentials on an urban and particular buildings´ scale, optimize workflows, speed up processes, and make projects more efficient and adaptable in the face of change. Represented by projects such as HouseMaker, VillageMaker, The Vertical Village, Barba, Space Fighter, or Porocity, and site specifically Rotterdam Rooftops or FAR MAX for Bordeaux, their methods allow the studio to explore a future that is equitable, data-driven, and green (MVRDV, 2023). Specific, in-house developed applications allow for Design Optioneering using generative analysis and automated form-finding to expand options and drive design possibilities (as Fig. 3 illustrates), Rationalization standardizing complex geometries to maximize design ambitions and feasibility, Performance Evaluation employing simulations and analysis tools to evaluate the impact and minimize risks, or Advanced Geometries managing complex modeling tasks across design tools to facilitate efficient, fast-paced processes.

Fig. 3
figure 3

Solarscape by MVRDV NEXT: an in-house developed Grasshopper/Raytracer-based simulation and negotiation tool to find high-rise areas, sunspots potential development opportunities; case study Rotterdam. https://www.bing.com/videos/search?q=mvrdv+next+solarscape&qpvt=mvrdv+next+solarscape&view=detail&mid=35848FD4186FEE4F94A035848FD4186FEE4F94A0&&FORM=VRDGAR&ru=%2Fvideos%2Fsearch%3Fq%3Dmvrdv%2Bnext%2Bsolarscape%26qpvt%3Dmvrdv%2Bnext%2Bsolarscape%26FORM%3DVDRE (accessed Aug. 21, 2023)

Awarded the best skyscraper of 2022 in the Emporis Award competition, the MVRDV Valley at the South Axis, the central business district of Amsterdam, is a showcase of successful AI technologies deployment alongside authentic architectural creativity. Machine analyses allowed for developing a rich, truly sculptural form and maximizing the efficiency of the land’s and space’s use while ensuring generous sunlighting and daylighting of all apartments and providing views and livable garden terraces to them. In planning the project, a Grasshopper script optimized the architectural form and detail to make the construction economical and efficient and to provide for sustainability thus. Alongside the comprehensive use of information technology to analyze the tasks and the opportunities and to support and streamline the creative design process, the rigorous avoidance of the terms AI and ML in the studio’s communication is notable (van den Burgh, 2022).

4.3 Generating by patterns

Opposed to the “sky is the limit” architectures whose form is often pre-defined neither by existing neighboring structures nor by short-term financial perspectives and approaches, there are architectures – buildings designed and constructed according to given spatial conditions, terms of future usage, and strict economic templates. In fact, this is the case with the vast majority of architectures - which, nevertheless, neither diminishes their importance nor makes the role of their architects less responsible and demanding. The vast majority of what is being designed, planned, and eventually built to saturate the needs of a growing population and living standards in terms of dwelling – it is residential buildings - work, and production – from office buildings to production objects - transport and logistics – among others logistic complexes and storage facilities - and many other buildings´ typologies falls into the category of mass production and, kind of, consumer goods. Such a categorization does not challenge the contribution of the respective authors, designers, and planners in terms of “creativity used”, craft, and efforts. Many such architectures launch their way to existence in architectural competitions - formal and non-formal - and not a few of them get their “five minutes of fame” in architectural websites, magazines, and exhibitions; nevertheless, they remain a “stardust”, a sort of “no name” (except for specialist history scholars or local patriots); not to make anybody offended, let us label them “production [ones]” - production architectures and production architects. However, such architectures make the complex performance of the built environment in a consequence: more than 90% of the performance in terms of environmental impacts, sustainability, and macro economy, but also the majority of the performance in social, cultural, and economic terms. It is not the architectural icons but these production architectures that the entire population is exposed to on a day-to-day basis – at home, at work, at school, at leisure and social activities, at commuting, at going in for sports and recreational activities, and at walking pets as well as at tackling the household budgets. In terms of design and planning, the obvious richness of examples and models may balance the complexity of multiple limits and constraints that intervene in the related design processes; however, most often, it does not make a creative architectural approach redundant or expendable. And it demonstrates the importance and potential contribution of comprehensive research and analysis of the huge volume of existing samples and inspiration - which, in reality, is far from being carried out comprehensively, if at all. At the same time, even if proceeding only from the knowledge of AI and its possibilities and limitations provided by this paper, it is no less evident that such research and analysis fits AI as much as possible.

Little is more overlooked by recent and current efforts in the field of AI than this opportunity and challenge - both in architectural and planning practice and in research and tools and processing standards development.

The “almost consumer goods” characteristic evokes approaches deploying algorithms (what else should be more attractive for AI?!), parameters pre-definition, and patterns in the design process. One of the first authors and researchers active in this field was Makoto Sei Watanabe already in the 1990s (Leach, 2018c); however, having focused on machine-aided design rather than on analyses and the use of patterns, he remained unsatisfied with what AI was able to deliver in terms of design compared to the intuition of the (human) architect. Others, like Immanuel Goh or Andrea Banzi searched for explicit rules-scripting-based design generators working with inferred rules drawn from the dataset of samples. Not a patterns´ assessment and appropriation, but recognizing the internal logic of the pattern, and then extrapolating a broader design based on that logic that could potentially continue forever ... (Leach, 2018d) in reality failing to contribute to the design practice eventually.

Most recently XKool, an AI startup in Shenzhen, China, developed a web-based platform for using AI across a range of tasks from architecture to urban design (Crunchbase, 2022). Though not-so-easy to be used practically or to be tested by non-Chinese residents (XKool, 2022), the approach of the studio and results achieved so far by the application awaken hope to overcome the lack of attention to the immense richness of patterns provided by the existing building stock and design representations. XKool appears the most efficient of all AI applications for architectural design, streamlining the design process and making it more efficient in terms of both analyzing a vast range of possibilities and generating designs (or pre-designs, more accurately said) based on samples – to evaluate and return the most suitable outcomes, and, moreover, to develop them further according to the given constraints (Leach, 2018e). The way of working is revolutionary - and no worry that the outcomes do not look very novel as a rule: the core is it copes with the “consumer goods” characteristics of the design category.

The mission to challenge “the architect genius” that, so far, has been the motivation behind the efforts to develop an AI-based design tool (almost) as a rule, shows debunked by XKool approach and results achieved. In general, a new approach emerges consisting of AI “designing” by - first - delivering pre-designs, it is solutions close to set parameters - as close as the available patterns allow, and then - second - “assisting” the human designer in adapting the pre-designs, customizing the final, specific solution; the nature of the “assisting” is quantitative, parametric assessment, feedback concerning the goals and result assessment including finding the system of criteria, specification of the particular criteria, and evaluation criteria sets that AI can develop and complete continuously. The patterns-oriented approach, when confirmed and developed, and developed the patterns stock – libraries of parametric examples, representations of solutions existing so far - promises to bring a paradigm change within AI in AEC that, consequently, could find the path to an AECO paradigm change needed both by the AECO as well as the community of its clients (which includes the whole population in the end).

4.4 Advice whispering

Not a layout creation but “sampling” of generative patterns = already existing solutions, selected by AI as the most suitable not only in terms of floor-plan or/and spatial solutions but in terms of structural solutions, too, appears the key. Based on the given goal parameters and constraints, an adaptation (human, though AI supported) of selected patterns follows.

An ability to infer the properties of the solution to which the design development is heading – whether led by a human or AI – stemming from the experience gained in learning on a set of solutions is natural both to GANs and VAEs. The predictive inference can be available starting from the earliest phases of design – from the first sketch in terms of how a human drafts and develops a design. AI can go conveying the inference continuously in a way we can call whispering, providing the designer – human as well as AI – with comprehensive feedback on his or its design decisions and heading of the design. This way, the design will be optimized not in the mode try – error – correction – another error – another correction – and so forth till the designer is satisfied with the feedback parameters, or too tired to continue trying, which is the state-of-the-art today, but continuously. The effect in terms of time and cost spent, and quality of the solution achieved is obvious and huge; AI can never beat a human when it comes to true creativity – but no human intuition and experience combined compares to AI when it comes to parametric quantitative assessing and review. Here we go to the future of AECO. In essence, it is about utilizing the relevant knowledge, talent, and efforts of the entire community of architects and the computational force of AI combined. Naturally, both the selection and adaptation processes interweave with outcomes and adaptation solutions evaluation in terms of microclimate qualities – daylight and sunshine, or temperature stability – energy efficiency and consumption, acoustics, as well as area capacity and other qualities of the solution in the process that addresses sustainability comprehensively in terms of both environment, social aspects, and economy.

Nonetheless, challenges remain: first (and above all), how to access the immense sum of the preceeding architectures records when a paradigm of protecting „the know-how “by hiding the representations of the architecture designed exists. In this respect, the approach of the AECO community contrasts the approach of the IT developers community. IT developers are used to providing each other with their achievements in widely shared libraries; Github (2023), Gitlab (2023), or Patternforge (2023), and many others are the platforms. Who makes the profit are not only particular IT developers that can fulfill their tasks and achieve goals more quickly, with less effort, and for lower cost, whilst making available the results of their previous work costs them nothing; the whole field makes a profit developing quicklier and better, a more efficient way based on the joint efforts of all memebers of the community. The perspective of the benefit of free approach to the existing solutions – in particular parametric representations of architectures both built and only designed - appears an incentive for reconsideration current approaches in terms of architectural design – and whole AECO, too.

4.5 R&D anew

After a decade of “challenging the human architect,” the true potential of AI in AECO has only just revealed; R&D at the threshold begins to specify problems and solve tasks. The results achieved so far by Stanislas Chaillou, Nvidia, and others, mentioned in (3), show that the so-far-ruling principle of lossy compression and subsequent “creative” decompression within the supervised (or unsupervised) learning has exhausted its possibilities without being able to deliver truly usable results.

I believe a statistical approach to design conception will shape AI’s potential for architecture. … Pix2Pix uses a conditional generative adversarial network (cGAN) to learn a mapping from an input image to an output image. … to learn image mappings which lets our models learn topological features and space organization directly from floor plan images. We control the type of information that the model learns by formatting images. As an example, just showing our model the shape of a parcel and its associated building footprint yields a model able to create typical building footprints given a parcel’s shape, Chaillou (2023) describes the supervised-learning strategy that reduces the essentially comprehensive architectural task to image processing. Leaving aside hundreds of thousands of images, each of them labeled by humans, that are a precondition for the “statistics” to work properly, the strategy has proven a no-go in terms both of the competent AECO workflows and computing power. When trying for a sustainable and feasible AECO design, images of pattern layouts cannot do; the parameters of the spatial structure of the proposed building, the parameters of the physical properties of its constructions, and finally the parameters of the internal environment in the object must come into „consideration“. Given state-of-the-art ML, the size of a comprehensive training dataset for the presented approach should (significantly) exceed the Nth power of two, where N is the number of parameters to specify the ML task: thousands rather than hundreds of parameters when it comes to the comprehensive parametric and physical structure that materializes architecture - a building. Even if it were “only” lower hundreds, the number has a hundred and more zeros - a googol: the question of computing power - or rather the optimization of the structure of the parameters - is immediately raised when googol exceeds the estimate of the number of elementary particles in the known universe.

First, fundamentally new types of learning networks are needed, recent supervised – and reinforcement schemes have proven no more suitable both in terms of the required performance and in terms of the working principle. Instead of „good old GANs“, imitation-based learning, self-learning, and knowledge-seeking agent schemes that focus on the (design) process - “how things come to existence“ - instead of output - “how things shall be” shall be surveyed and developed concerning ML-aided AECO. As mentioned in (2), dataset aggregation, behavior cloning, inverse reinforcement learning, soft Q-learning, inverse Q-learning, self-imitation and transcendence, and others represent the field of hopefully eventually efficient R&D of ML in AECO.

Second, training datasets - predicted open source platforms pose questions on materials assembly, materials quality, and size. A pragmatic optimization of the involved parameters structure appears a key task. The question of the data format with which the algorithm will work is crucial: it seems obvious that it should be one of the BIM formats. The basis of the algorithm structure could be - it seems - a pair of mutually interfering loops: a generative loop and an advice-whispering loop, or there can be more advice-whispering loops particularized according to the diverse natures of the parameters, which will be “switched-on” only in a cascade. In the beginning, a suitable pattern will be selected from the database, which will be tested and optimized due to the specified outlines and concerning a benchmark of independent parameters.

Human-in-the-loop can be expected to be fundamental together with an unprecedented streamlined generative nature of the algorithm. New classes of imitation-based learning, learning a behavior policy from demonstration, and self-learning paradigms zooming in on the design-development processes instead of the results (to be) achieved must be welcome to “customize” the most suitable pattern to the requested final proposal. Starting from following the human-in-the-loop in the phase of “customizing”, the algorithm shall learn by self-training to master the design process better than the man in the end - hopefully. To take up such a challenge in the architectural design and building planning realm, lessons from the in (2) reminded trailblazers such as AlexNet, Deep Blue, and especially AlphaGo Zero have to be learned. The evolutionary algorithms approaches and genetic programming have to undergo a deep survey to be subsequently considered as an option. Such a concept ought not to be refused or underestimated pointing out the gap between designing production buildings and playing chess or Go on the masters level: deployment of analogical algorithms deserves to be studied thoroughly first.

5 Conclusions

Diverse current and recent attempts and successes to approach the deployment of AI in architecture and, more broadly, in AECO have been discussed in previous chapters. The question of the particular field of AI’s deployment that would be not only an interesting thesis for an academic- or allowance-trying but a helpful and feasible tool or working environment comes clear as the key; at the very core, queries and issues render.

5.1 Can AI be truly creative?

Unless computers gain consciousness, there is an unequivocal answer to the question: no. Several reasons have emerged in previous chapters. In this place, the inevitably only mimetic aspect of the way a computer is able to work can perhaps close the recent and present attempts eventually that cannot be but futile. Creativity, to be authentic and true, cannot be but poetic (Heidegger, 2006). The poetic principle requests consciousness together with intention: only consciousness together with intention is able to deliver poiésis. In terms of architecture and built environment, consciousness is reserved for a man, or, more precisely, to Dasein, as Heidegger coined a proved. An algorithm, however complex and sophisticated is the artificial network it works on, can deliver only based on the principle of equality (or similarity, which, however, is only a deficient mode of equality) or by random choice. Face-to-face to new solutions, advance knowledge is the prerequisite. Prior knowledge is another aptitude reserved for consciousness - to a human, not to a machine, and not to an algorithm. No consciousness, no own will, and no true creativity, but algorithms and immense data searched through, assessed, and prioritized according to the defined criteria are the attributes of today’s AI. And even the state-of-the-art theory does not show a vision of how machines could overcome the shortcoming.

5.2 More openly articulated the question: can AI contribute directly to how authentic architecture comes to existence?

Yes; the more poetic creativity is excluded, the better the mimetic, imitative AI approaches fit the parametric nature of the physical and quantitative aspects of architecture, not to mention construction and other features of buildings and development of the built environment such as energy efficiency, construction cost, environmental footprint, durability, and others, all discussed in (4). Adding parameters of using the built environment by humans such as economy and efficiency of layout, ergonomics, and others, the parametric realm represnting the physical side of architecture becomes complete that can be regarded as a domain of AI. When it comes to AECO, AI can assess, quantitatively evaluate and compare, and (pre)design, too, everything except for the sphere of poiésis, poetry, mood, excitement, or experience – as discussed in (4) as well. As a principle, the performance of AI in this regard is able to outperform any relevant human performance in terms of complexity, scope, accuracy, pace - and cost, of course.

5.3 AI in architectural design: the workflow and its execution

Shown in Fig. 4, the workflow of an AI-aided AECO design comprises two phases (as discussed already in (4): first, processing generative patterns to a pre-design, a solution as close to the set parameters as the stock of generative patterns allows, and second, final adapting the pre-designs, customizing the final, specific solution. For the first phase, patterns, their stock in the form of open-source platforms, and AI-driven search algorithms to identify (the most) suitable cases/patterns by inferring what fits (better and what less) are the keys, in the second phase it is (AI-driven) design development support in a form of advice whispering and continuous and complex assessing and “feedback” as outlined in the previous paragraph.

Fig. 4
figure 4

AI-aided architectural design workflow

The formation of open-source platforms of generative patterns – parametric representations of essential features of existing solutions – existing buildings or mature projects – first of all, the spatial layouts – in sufficient numbers, with a richness of volume and quality, and accessibility and transparency for search engines – is an obvious prerequisite for the generative, pattern-based AI-aided design in AECO.

Second, however great is the so far neglected benefit of learning from patterns, inviting new learning paradigms such as in (4) reminded imitation-based learning, learning a behavior policy from demonstration, and self-learning into the design process emerges the ultimate and most promising challenge. After having learned from learning pioneers such as Deep Blue or AlphaGo Zero and having „studied “how the man-in-the-loop has adapted the best fit pattern to a desired new solution, computers may eventually take over the routine of “mechanical” design- and design-analyze tasks and deliver room for the architect’s creativity. Obviously, designing production buildings is not as challenging as playing chess or Go on the masters level. However, it does not put contrary to deploying analogical algorithms; the advantage, in addition, can be that the imitation-based- or self-training would not need to be so intense to start to deliver useful results in designing production architecture.

Along with the development of AI’s deployment in AECO design and planning - inherently comprehensive disciplines developed in teamwork, as a rule, today - new roles within the design- and planning teams will emerge: among others, „superuser “tackling the AI, an architect with a strong IT background or an IT expert with a strong AECO background. However, the „superuser “will replace the leading architect neither in his conceptual role nor in aesthetic respects. The „superuser “will economize the leading architect’s efforts and energy in favor of indispensable creativity. The role will comprise competencies and responsibilities that render to be not so far (though undoubtedly distinct) from today’s BIM (Building Information Management) Coordinator.

5.4 Design reviews, evaluations of solutions and the security issue

Comparing to design processes, AI-led design reviews and evaluations of solutions show to be a sort of business as usual, no more a (basic) research and experimental development. In (3), several existing applications of this nature have been listed - Cove.tool, Creo, Spacemaker, ...; and many others exist. However, the field is far from being covered. Together with addressing others, so far sidelined attributes of the design solutions, the quality of outputs delivered will be welcome to have raised. Another question is the comprehensiveness of the assessing and reviewing. The branch can develop separately - as is most often the rule so far - or can integrate into the AI-aided-design environment - in a form of in (4) outlined advice whispering.

However, given the issue of (lack of) interpretability (addressed in (2)) and the state-of-the-art of the field, it is not AI that may have the last say. An ongoing advice whispering that leads the designer to a benchmark - and sometimes “hallucinates” him based on the training guidance in the saliency heat map - is one thing, and another is the final inspection that establishes liability.

Comprehensiveness in assessing and reviewing is another issue. Accidents, crashes, and even catastrophes, which could have been prevented if only the relevant aspects had been taken into consideration, are lining the history of the development of the built environment – and mankind. To minimize the occurrence and impacts of such events in the future, potentially threatening aspects shall be identified before they occur. Lessons taken from health care (Christian, 2020e), AI shows ready to give a hand in this regard, too: the task is - as usual - first, to set up the training dataset, second, to apply suitable algorithms to specify the risks and their parameters, and third, to develop and apply adequate technical measures. Today’s engineers are up to the third challenge, and computer scientists and developers are up to the second. The history of the built environment and mankind’s (and the biosphere’s, too) survival provides for the training dataset. However, only for a part of the training stock; as a matter of fact, taking lessons from the past is what repeatedly shows insufficient. Today, however, AI has proven able to extrapolate rather successfully the past processes into the future – to predict the future - not 100 % but still with unprecedented success. Thus, we need to deploy AI to first extrapolate a training pool based on history and current knowledge, for AI-driven specification and quantification of threats “in the second loop.”

5.5 Changing the game

A reminder of the lagging of AECO behind global societal and economic development opened this paper. Previous chapters and paragraphs reveal the potential of feasible deployment of ML in these fields: an immense potential of diverse but interconnected ways of deployment that not only contrasts with the so far wavering approach to the new technology but can provide essential contributions to responding to existing social, cultural and economic, and environmental challenges.

The first to change is the paradigm of designing architecture and the built environment, and the whole AECO. The new paradigm will replace the “traditional” infinite spiral of proposal and error - assessment - a new proposal and error - and so forth. Once the ML tools - algorithms, models, platforms - as outlined in previous chapters apply appropriately and as intensely as it deserves in the parametric realm of architecture and development of the built environment, the process of designing - creativity unrestrained, on the contrary, unleashed - becomes not only significantly more efficient and productive and less time- and cost-consuming, but consequential and objective, too. The impacts on a comprehensive quality of architecture and the built environment show obvious, notable, and positive. The shift in terms of the quality of the service and deliverables within the AECO professions gives a clear perspective of optimization of the outputs, which is likely to significantly exceed not only today’s practice but also any expectation. Poetic creativity that is essential for authentic architecture will enjoy not a direct contribution but a subsidy through creative energy, attention, and capacities released by the deployment of ML’s mimetic capabilities in the parametric realm.

The perspective is realistic; it is not a task for the distant future - the implementation can start now and the MVPs (minimum viable products) can be there in les than half a decade. The motivation is the economy in terms of the efficiency of development of the built environment, a business opportunity in terms of filling a market niche, comprehensive quality of our lives, and our sustainable future that will become boldly resilient - sustainable beyond today’s state-of-the-art efforts.