1 Introduction

At present, the architecture, engineering, construction, and operation sector (AECO) addresses a variety of challenges related to its continuous digitization, low production efficiency in construction, sustainability, circularity, elimination of carbon emissions, and mass customization through the lens of artificial intelligence (AI) as a part of revolutionizing Industry 4.0 (Saka et al., 2023). In addition to numerically controlled digital fabrication (NC), building information modeling (BIM), design for manufacture, assembly and disassembly method of production (DfMA) (Bayoumi, 2000), and standardization of building components such as kit-of-parts in construction methods (Howe et al., 1999), the AI field is becoming increasingly prevalent in efforts to augment human capabilities to address complex and high-dimensional problems, as well as to articulate unique human-made qualities in built artifacts and scenarios and human-centered use of AI in the AECO (Nabizadeh Rafsanjani and Nabizadeh, 2023).

1.1 Problem statement

The recent advancements in these technologies place greater demands on the skills of human designers, production technologists, construction experts, and workers to resolve specific production-related problems in mutual human‒machine collaborations (Duan et al., 2017). While likely unfamiliar with the latest advancements in digital fabrication, digitally unskilled workers are usually highly skilled craftsmen (e.g., carpenters in timber construction production) capable of delivering unique and customized products while preserving the traditional notion of crafts with specific human-made qualities. Thus, artisans provide qualitative value to production pipelines, meeting high standards as well as many qualitative criteria for handmade processes (aesthetic qualities, detailing, smartness, and complexity of the chosen solution made by a human, handmade quality, individually recognized authorship, specific artistic and artisanal qualities) (Pu, 2020). However, these higher-standard qualities are often delivered less efficiently due to differences in delivery time, cost, and energy.

1.2 Purpose of the paper and methodology

Given the necessity and the preservation of human-related qualities in the delivery process in a production-efficient way, to articulate human skills, knowledge, and values, and to improve digitization in the AECO sector significantly, in this paper, we investigate and propose a method in which a human demonstrator (a human master builder) demonstrates the process of making or assembling a product to an artificial agent (a machine or a robot). The agent learns and understands the process then imitates human intent. The purpose of this paper is to introduce and to describe the method of imitation learning applied in a digital production process: a) training a digital twin of a robot to learn the digital execution of the robotic toolpath based on a human gesture. This can be prospectively used as a substitution for human activity within the AECO sector and b) to support a human designer in quickly and efficiently generating and delivering a variety of design scenarios of spatial architectural assemblies, encompassing kit-of-parts components in a 3D configuration autonomously or in a cocreative process semiautonomously. In the semiautonomous process, the designer works with the AI agent simultaneously and cooperatively in the human-in-the-loop process (Mosqueira-Rey et al., 2022).

1.3 Rationale of the intent and significance of the research

The significance and originality of the paper lies in the potential discovery of novel production scenarios created autonomously by AI after a training process based on human demonstration. In that sense, humans can train AI to execute tasks independently to deliver novel, unprecedented outcomes following the designer's intent. To achieve this research aim, as stated above, we implemented the generative adversarial imitation learning (GAIL) method in two independent experiments following the demonstrator's intention:

  • Real-time robotic tool path execution based on hand gestures.

  • Assembly configuration generated via autonomous and semiautonomous processes.

In machine training, the generative adversarial imitation learning method (GAIL) considers the demonstrated task that the machine can imitate (Ho & Ermon, 2016). GAIL is inspired by the notion of craftsmanship or master building, where the teacher shows the pupil how to execute a specific task, and it is the closest to being used in the digital training of a machine, which is expected to imitate human activity. The authors argue that artificial intelligence technology augments experts’ skills and capabilities to create a novel design and production space while articulating the human capabilities of unique human contributions.

1.4 Unconsidered aspects of the paper

The study in this paper does not implement physical robotic execution at this research stage; rather, we describe the process of training and observing an AI agent in a digital space. The results and observations can be input for later research experiments with a physical robot. In addition, the current research stage does not integrate structural feedback for machines during AI-driven assembly. The digital assembly process in this study is simplified, and we employ prismatic building blocks inserted in the assembly from the bottom upward, which creates a spatial template to be occupied by a predesigned kit of parts. The process results in a vertical scenario where the structural relationship of the building blocks is created in the next step—in the distribution of the kit-of-parts components and assembled by a human builder.

2 Research question and overall aims

This research starts with the following question: Is AI capable of autonomous or collaborative cocreative design for manufacturing and assembly (DfMA) production following the designer’s intent and intuition while making self-decisions? The practical implementation of cocreative human‒machine interaction building processes with the use of artificial intelligence (AI) and, more specifically, imitation learning covers the following domains:

  • improvement of the digitization of the AECO by the smooth and cocreative process of designing and creating spatial scenarios to support a human designer or artisans in delivering an outcome efficiently;

  • considering the designer’s or artisans’ individual cognitive and creative capacities, intuition, and knowledge;

  • preserving human-related qualities in the design process and making;

  • training the robot to execute the task either cocreatively (in the process of human-in-the-loop) or autonomously (fully AI-driven and based on previously learned knowledge from demonstrations) (Fig. 1);

  • improving the production efficiency of the construction (assembly) processes while reducing cost, health risk, labor demand, and energy;

  • enhancing safety conditions on-site;

  • facilitating the design and production of space exploration conducted by the cointelligent creation process between a human and artificial agency.

Fig. 1
figure 1

Concept diagram of the human-in-the-loop process engaging with AI technology

3 Artificial intelligence in the scope of DfMA and craftsmanship

In this article, we propose a hypothesis. Through human and artificial intelligence-driven processes of digital fabrication and the production of artifacts where craft skills are recognized, learned, trained, and implemented in a human-in-the-loop cocreative production workflow such as in one-shot imitation learning (Finn et al., 2017), technology is capable of developing and strengthening its “wisdom” in a way similar to how humans improve their skills, experience, and wisdom in time. Thus, such systems can make autonomous decisions in the production process. Unique knowledge and design skills demonstrated by a designer or an artisan can be translated into demonstrations, digital models, additional data sources (such as images, videos, and sequences), and digital processes (generation of tool paths for specific digital robotic fabrication execution and unique assembly modes).

Can a machine conceptualize learned knowledge to yield novel artifacts through hybridized/synthesized modes of human‒machine interactions utilizing neural networks (NNs) and deep reinforced learning (DRL)? Is the robot capable of creating a novel artifact and of generating artificial production space?

In this paper, we further propose that by linking human intelligence (knowledge, experiences, craft skills, capacity to make relevant decisions) and machine intelligence (responsive robotics based on multisensorial setup (Felbrich et al., 2022), XR devices and digital operations) in one coherent hybridized production loop, a novel communication and interaction platform between humans and machines can be created via physical interventions and demonstrations, which can lead to improvements in machine capabilities to execute the production task.

We introduce a computational connection between a human agent and an AI agent in a digital process to create an abstract toolpath while theoretically envisaging a novel searching and generation method for design and production space based on a human-in-the-loop cooperative learning process. In the first experiment, we provide a preliminary concept of imitation learning by using the digital twin of a desktop robotic arm equipped with a virtual multisensorial setup. The machine learns a simple human-driven toolpath, considering gentle human movements of the hand physically provided by the demonstrator, translated into the digital space. In this report, the human and machine agents consider human-driven toolpath generation an intuitive gesture. Such a framework is used to imitate a human gesture in a reinterpreted event, e.g., for drawing or painting intervention in a digital space. In the second experiment described in Sect. 4 of this paper, human logic, preferences, memory, cognition, intuition, and creativity are integrated into the design process. The human designer collaborates with the artificial agent to deliver a cocreative spatial scenario in the computational framework utilizing the GAIL approach. Then, an AI agent can simulate the assembly process in the digital assembly. The physical expression of the implemented computational design-to-production framework is intuitively evaluated by human designers in the physical assembly in four assembled studies, following the notion of the kit-of-parts production method.

These creative processes are shown to be transferable to a machine in a continuous training and learning process driven by a human. Consequently, the device constantly improves its capability based on human agents' inputs and becomes more autonomous in the decision-making and generation of AI-driven design and production space.

Instead of replication and recreation of the crafting process in a numerically controlled way of digital fabrication (NC), our goal is to discover a means for the machine to communicate through a unique and enriched craft language, reflecting the formal expression found in human artifacts but in an unconventional way, expanding beyond the boundaries of human imagination within the realm of solution craft space.

3.1 Imitation learning and human‒machine interaction

Digital technology is impacting building design, assembly, and construction processes at an unprecedented rate; this is especially true given the rise of artificial intelligence and machine learning. In contrast to other machine learning methods, which rely on the quantity and quality of the dataset, reinforcement learning is rewarded for actions taken in the state of the environment, which means that this training process relies heavily on the design of the reward system and the input data (Matulis & Harvey, 2021). Setting an accurate and interpretable reward function for many reinforcement learning tasks is tedious. Instead, learning directly from human feedback naturally integrates human domain knowledge into the policy optimization process, which is the basic principle of imitation learning (Taranovic et al., 2023).

In the rapidly evolving fields of robotics and AI, imitation learning, especially generative adversarial imitation learning (GAIL) (Ho & Ermon, 2016), bridges the gap between human cognition and machine capabilities. This paradigm allows machines to learn by observing and by imitating human behavior, fundamentally changing how we interact with tools and robotic systems. Recent research results in deep learning and reinforcement learning have paved the way for robots to perform highly complex tasks and applications in AECO.

3.2 Robotic training-related research

In 1962, an industrial robot, Unimate, grasped wooden blocks with two fingers and stacked them on each other. The robot was designed to mimic human functions (Hua et al., 2021). Early robots did not perceive the world or possess specific intelligence or dexterity. With breakthroughs in hardware technology, integrating multimodel information such as vision, haptics, and perception has enabled robots to recognize a target’s pose more accurately (Homberg et al., 2015), and breakthroughs have been made in the direction of robot grasping applications.

With the assistance of deep learning, data-driven robot manipulation methods are mainly categorized into two types: methods that transmit motion information to the robot through wearable sensing devices and methods that extract object features based on visual perception to formulate strategies (Alexandrova et al., 2014). The first method collects data through wearable devices to analyze the coordinated motion relationship between multiple joints of the human hand (Huang et al., 2015). This approach can be based on a statistical representation of the degrees of freedom of the human arm, by using time series analysis to build a database for learning, thus establishing a mapping relationship between the human hand and the machine (Pérez-D'Arpino and Shah, 2015). The second method, learning to manipulate objects based on visual perception, can rely on the vision system to automatically acquire the information features and to understand the operation of the robot with the help of the self-feature learning ability of deep learning combined with the mathematical modeling method (Varley et al., 2015).

Imitation learning, as mentioned in Sect. 3.1, enables machines to learn operations quickly by observing demonstrations or by using small amounts of data, which improves the success rate of training. The operation of the robot can be viewed as a Markov decision process, which then encodes the expert's action sequences into state-action pairs consistent with the expert (Hua et al., 2021). By combining GAIL with reinforcement learning mechanisms, the speed and accuracy of imitation learning can be improved through iterative adversarial training to make the distributions between the expert and the agent as close as possible (Kuefler et al., 2017). Compared to commonly used first-person demonstration methods, the unsupervised third-person imitation learning (TPIL) method can overcome this dilemma by training the agent to correctly achieve goals in environments when demonstrations are provided from different perspectives (Stadie et al., 2019). The combination of multiple sensors expands the application scenarios of robot manipulation; Hamano et al., (2022) used eye-tracking devices to analyze gaze images to drive a robot.

In addition, with the development of digital construction technology, the application of virtual scenes can simulate accurate locations and can achieve better human-tool interactions. Bedaka et al. (2019) applied three-dimensional visual information to generate robot motion trajectories. Lin et al. (2020) also simulated robots through a digital platform by using the ML agent to find a path of an industrial robot to reach a goal. Kurrek et al. (2019) used AI to develop control strategies for a robot manipulation task in a simulated environment. Hassel and Hofmann (2020) demonstrated a line-tracking robot trained in a virtual space under the digital twin paradigm. The Unity platform is also used to build an efficient reinforcement learning framework to illustrate the relationship between virtual and actual physical information (Matulis & Harvey, 2021).

3.3 Current learning experiments in the processes of AI in the AECO and the use of GAIL

The application of AI deep learning within robotic digital fabrication processes has undergone testing across a range of tasks, potentially benefiting the AECO sector. These tasks include the assembly of a lap joint (Apolinarska et al., 2021) or pick and place scenarios for component assemblies (Felbrich et al., 2022). Other researchers have focused on codesigning strategies for autonomous construction methods (Menges & Wortmann, 2022), exploring the integration of deep reinforcement learning for the intelligent behavior of construction robots as builders.

The question of how to involve human agency in AI-driven processes to achieve coherent results for the potential use of AI in AECO applications on a larger scale or in human-made operations must still be explored. Imitation learning, especially generative adversarial imitation learning (GAIL) (Ho & Ermon, 2016), as a method for teaching a robot to perform a task, has solid potential to be integrated into design-to-production processes if we consider a smaller scale in the early stage of production, such as drawing or cutting toolpath generation.

Pinochet (2015, 2023) proposes an interaction model that uses a two-axis CNC plotter and a customized software interface to develop and to implement three types of interactions—gestural, tangible, and collaborative exchanges—which utilize bodily gestures and behaviors given to the fabrication machine to establish a dialog that engages designers in an improvisational, insightful, and cognitive design process. Pick-and-place scenarios of simple objects by using visual demonstrations and data collected from a human agent are successfully deployed (Finn et al., 2017) as a combination of imitation and meta-learning strategies. However, the movements of the robot are still very technical and preprogrammed, although they successfully perform simple tasks. The Unity and ML-Agents tools to train a robot were previously introduced by Pinochet as Smart Collaborative Agents (2020) and Hahm (2021) when the robot follows predefined targets, does not use an imitation learning approach, and is based on configurable joints or a Unity articulation body.

The imitation or delivery of human craftsmanship with a unique execution is very complex, and such results require complex information and data to be collected then processed. This research explores how to engage the human agent with a robot, aiming to find a method for a process to either participate together in a real-time sequence scenario or the human acts as an expert and demonstrator to teach the robot a task to execute.

4 Implementing the computational framework

Two experiments are described in this paper: a) gesture-driven navigation of a robot that integrates real-time robotic twin navigation by a human gesture in Unity and Rhino | Grasshopper environments, which serve as an initial input for gesture-driven toolpath generation. Both environments (Unity and Rhino) for gesture navigation are detailed in Sect. 4.2 (Figs. 2 and 3), and b) the framework of digital assemblies investigating the aspects of intuition, cocreation, and cointelligence in a digital and physical space encompassing kit-of-parts systems in Sect. 4.3.

Fig. 2
figure 2

Hand-tracking implementation in Unity connected to the UR robot. Based on configurable joints, the robot follows the finger in real time while adequately rotating

Fig. 3
figure 3

The robot and gesture movements were tested in real time. The hand spawns the robotic toolpath targets according to the direction of the hand movements. The framework was implemented in Unity and Rhino Grasshopper utilizing UDP, Ghowl, and Robots addons

4.1 Real-time gesture-driven navigation of the robot and cointelligent assemblies

The real-time navigation of a robot may have a variety of possible uses in the field of digital fabrication and manufacturing for architecture. The proposed framework can also serve as a designer environment to be assessed and explored before any manufacturing or crafting process. The data captured from a human can be stored and implemented in a custom scenario. Even though the implementation at this stage is not fully practical due to specific constraints related to noisy interference of the data exchange, the real-time interaction is engaging. This approach can be used for further investigations in combination with real robots. The computational models for both strategies are available from Buš (2023a, 2023b, 2023c) and Github (n.d.). The imitation learning approach in a digital space to navigate the artificial agent simultaneously and cocreatively with a human expert is applied in four assembly scenarios built by human builders, as described in Sect. 4.3.

4.2 Unity and Rhino/GH implementation – hand tracking

The robotic digital twin implementations in Unity and Rhino encompass the User Datagram Protocol library (UDP) for the data transfer between the actual gesture, the digital environment, and the robotic twin. For this implementation, the Universal Robot UR1 digital twin and a standard web camera were used for human hand capture.

The computational approach of hand recognition and the data exchange platform in Unity implements the Unity UDP receiver script provided by the CV Zone platform as an open data resource (Murtaza, 2022). The Unity and Rhino Grasshopper environments were customized and adapted for robotic movement. Both strategies utilize the CVZone hand tracking module with the hand detector implemented in Python to recognize a human's hand (Murtaza, 2022). Hand recognition involves 21 points that are interconnected with lines, which represent the virtual skeleton of the human hand.

4.2.1 Unity basic setup

By using the local host, the 21 recognized points are transferred to the Unity environment via the UDP data protocol. The receiver constantly receives the data, and the points are embedded as game objects, creating the foundation for the skeleton.

The specific point can be selected as a spawner of the checkpoints for the robotic toolpath. Based on human movement, the hand spawns the targets for the robot, which are precisely rotated according to the hand movement. Custom C# scripts were written to link the hand with the digital model of the UR robot, which is based on configurable joints for each of its axes. As such, it was possible to create a target for the robot that follows the movement. In that way, the robot is navigated by the hand point on the selected finger in real time, considering the physics engine in Unity and rotating and moving based on the customized configurable joints.

4.2.2 Rhino/Grasshopper setup

Similarly, the recognized hand points were transferred via the UDP protocol into the Rhino | Grasshopper environment, and the points were reconnected. This process was performed as an independent platform. For the UDP communication transfer, the GHowl addon has been used (Alomar et al., 2011), considering the position of the points and the distance information between the hand and the web camera.

By using this information, it was possible to implement the third dimension to navigate a virtual end effector of the robot in all three dimensions. The Robot addon was utilized for the real-time simulation of the moving robot. The GH definition can serve as a starting point and a test bed for further implementation and testing purposes. The working version of the GH definition is availableFootnote 1 (Buš 2023b) (Github, n.d.).

4.2.3 GAIL and behavioral cloning test in Unity and observation

The Unity environment was further evaluated to teach the robot to recognize and to interpret human gestures after they were captured. Several custom scripts and the standard toolpath following a system based on numerically controlled positions were developed to do so. These were seized from the spawned checkpoints from the human gesture to create a linear toolpath. However, implementing the deep learning method, in this case, imitation learning utilizing the ML-Agents tool in Unity with the GAIL learning method combined with behavioral cloning, was tested and observed (Juliani et al., 2020).

GAIL (Ho & Ermon, 2016) considers the policy from the expert demonstration to perform a task based on ‘how to act by directly learning a policy’ from the data provided. The ML-Agents tool contains an imitation learning approach utilizing the GAIL and the behavioral cloning method, which captures the predefined process of demonstrating how the robot should perform the task according to the expert demonstration. It follows the sequence of targets in a toolpath previously generated by a human in real time. In this experiment, the data captured from the gesture served as an input for the demonstration recording, containing the transform position information (transform position, rotation, scale) of spawned targets from the motion.

The positions were translated into the toolpath, and a virtual ML agent ran through them several times (see Fig. 4). The agent can subsequently serve as an input for the robotic end effector target mentioned above. The heuristics training simulation contains the digital demonstration, which is captured as a demonstration for the GAIL and behavioral cloning training.

Fig. 4
figure 4

The virtual end-effector follows the agent while running through the checkpoints on the toolpath. Training sessions with several scenarios run simultaneously

The training algorithm included the default algorithm based on the proximal policy optimization hyperparameters (PPO) for the ML agent, tested with different setups for the GAIL strength or behavioral cloning (Fig. 6). The virtual agent looks randomly for the checkpoint positions in space and learns from them how to interact with them in each of the episodes. The task for the agent was to recognize the starting position and end position, as well as checkpoints to perform the toolpath in the proper order and direction. In addition, each of the iterations slightly and randomly moves the positions of the path checkpoints to encourage the agent to learn from these novel positions. This might serve for potential future gestures that differ each time.

Each time it collides with the correct or wrong checkpoint, the agent receives a positive or negative reward, respectively. The learning process contains 3 to 6 million iterations (steps) with a positive or negative reward structure for the agent. The process generated the virtual brain for future testing scenarios. As was qualitatively observed from the preliminary tests in the learned positions of the agent in the final inference training, the results with the current setups do not precisely imitate the original demonstration. However, the agent sequentially reaches the targets in the right directions and with the correct orientation. The quantitative results are provided in the following scalars, captured from the TensorBoard platform (TensorFlow, 2023), showing relevant reward processes and GAIL policies. When the cumulative reward increased over time (when the model was trained to combine extrinsic rewards and GAIL), the GAIL loss and pretraining loss showed that the model adopted well according to the demonstration, as the curve slightly decreased over time, assuming that the agent learned the policy.

The GAIL reward decreased after several iterations, and the agent obtained relevant rewards while learning the policy. A considerable decrease in the cumulative reward was observed at the beginning of the training process, which depended on various combinations of hyperparameters set in the configuration file. The training delivers a variety of brains with less or acceptable training results. During the training, each scenario had a moment of decreasing the reward value, which later stabilized. In addition, the agent continuously improved the imitation of the demonstration during the training duration (Figs. 5 and 6).

Fig. 5
figure 5

GAIL training results of the agent in two running scenarios. The scalars show that the reward and episode length increase, although there is a significant drop after the robot learns the policy. After 6 million training iterations, a generated ONNX brain showed proper results in the inference learning process, and the robot learned the demonstrated toolpath. An additional set of observations related to GAIL policies shows satisfactory results

Fig. 6
figure 6

Configuration of YAML files with hyperparameters to configure the neural network and training settings

4.3 Cocreative Assemblies: four design scenarios for the proposed framework based on cocreative intelligence

The proposed imitation learning framework was further evaluated for manufacturing and assembly production methods in the design context and qualitatively evaluated. To reduce cost, energy, health risk, material resources, and labor demand, the creation method involves assembling a kit-of-parts into a spatial whole through a cooperative human-in-the-loop process or by being fully automated by employing an artificial agent in a digital environment. The design process starts with the expert’s demonstration, a master builder creating the spatial scenario digitally, cooperatively, and interactively with the artificial agent. If the designer makes this decision, the workflow allows the artificial agent to work autonomously without the human agent and to follow the learned principles of previously observed demonstrations provided by the expert.

The process is implemented in the workflow of spatial assemblies following the paradigm of discrete architecture (Retsin, 2019). It is still abstract but scalable into a 1:1 assembly process, integrating prefabricated kit-of-parts components. In this article, we present four design scenarios to evaluate the proposed framework following design to fabrication.

The design intention of the four design scenarios was to digitally design, assemble, manufacture, and build an abstract architectural space by using a digitally prefabricated kit of parts specifically designed following the designer’s intention and intuition. Each of the scenarios incorporates the different shapes of the planar material; 12 mm thick plywood or transparent acrylic sheets were used. The simple planar components were assembled into smaller subassemblies, creating spatial building blocks inserted into each other and creating stable structures. The following design scenarios were applied:

  1. 1)

    to encompass modular assemblies in an abstract repetitive pattern to create modular spaces (Scenario 1–Fig. 12), integrating simple slotting joints of triangular components in tetrahedron-like subassemblies and creating more complex spatial patterns;

  2. 2)

    to reinterpret historical architectural language in a speculative and contemporary tower-like spatial assembly (Scenario 2-Fig. 13),

  3. 3)

    to create an abstract interior light object citing the idea of the Babel Tower with 3 different expressions of shape and color of the used kit of parts as a standing object (Scenario 3- Fig. 14),

  4. 4)

    to speculate about an extraterrestrial adaptive canopy possibly built as a configurable space responding to the spatial needs of its inhabitants (Scenario 4 – Fig. 15).

These scenarios were then assessed based on their deliverability and scalability. In addition, the framework allows digital AI-assisted generation of the toolpath when a new component is added to the design scenario based on an expert's demonstration (Figs. 7 and 8).

Fig. 7
figure 7

The computational design framework encompassed Unity and Rhino/Grasshopper interfaces. The data created in Unity by the master builder and AI agent are transferred to the Rhino interface as a spatial template to be populated with the predefined components

Fig. 8
figure 8

The imitation learning process diagram. The master builder in the human-in-the-loop process creates a scenario simultaneously with the AI using the keyboard (arrow keys) to navigate the agent

The demonstration is provided by a human master builder who interacts with the scenario and the artificial agent. The master builder navigates the agents by using arrow keys or any custom keys (e.g., W, A, S, or D) during heuristic training to digitally provide demonstrations. After several steps (episodes) of the demonstrations, the data are used to train the agent in the default training, where the AI agent follows the behavior of a master builder in the demonstration, who incrementally adds new building blocks into the spatial template, and the agent always reaches the last added block. When a building agent collides with a block, it adds new blocks to the template by itself following the growth criteria (vector-driven generation of building blocks near the location where the master builder adds new blocks, the number of blocks added, and the randomness of growth level, as previously computationally tested by Hahm (2020a, 2020b)). It modifies the entire spatial scenario (environment). According to the environment's last state, the human master builder reacts spontaneously and intuitively to the agent’s modifications and adds new blocks again. This process, in which the human is in a loop with the AI, is created intuitively and in real time. Once the designer is satisfied with the results (following intuition and desired design criteria, such as verticality, horizontality, and the proportional character of the scenario), the template is then put into the design-to-production system at the Rhino interface utilizing a kit-of-parts populated into the spatial template (Figs. 8 and 11).

The training scenario follows the principle of imitation learning, where the AI and master builder work simultaneously during inference learning after 6 million training iterations when the ONNX brain is generated (Figs. 9 and 10).

Fig. 9
figure 9

The training measures after 6 million training iterations of the agent based on a master builder’s demonstration. The agent correctly learned the policy following the demonstration. The diagrams represent the spatial templates generated during inference training with the induced ONNX brain

Fig. 10
figure 10

Training process of the spatial scenario following the reward policy. The agent understands the principle of building while receiving positive or negative rewards

In the following production process, creating a spatial assembly consisting of different components with predefined connections was possible (Fig. 11). The components were joined through slots while considering the appropriate tolerance model of the joinery and were evaluated physically. The experimental assemblies were tested in two types of materials—birch plywood and transparent acrylic—to test the deliverability of the scenario. After the spatial scenarios were generated, the designed components were digitally fabricated and assembled according to the proposed templates without manual assembly. Four studies were conducted to assess the proposed framework and were successfully performed on the proposed scale (Figs. 12, 13, 14 and 15).

Fig. 11
figure 11

The assembly scenario with the kit-of-parts distribution according to the generated three-dimensional template. The predesigned components fill the generated template of prismatic building blocks following the position of the bounding box with some user-defined distance according to a specific design of the kits of parts and joinery. The components are digitally fabricated by using the CNC router and assembled by human builders

Fig. 12
figure 12

This spatial study combines intuition and an imitation learning method of 3D template generation (Cospecies Habitation). Authors of the study: Mingli Sun, Kenan Sun, and Jianhao Chen

Fig. 13
figure 13

Speculative scenario of the use of an imitation learning framework (Digital Gothic); authors of the study: Jiashu Li, Zihao Ma, and Hanqi Zhang

Fig. 14
figure 14

An abstract interior light object design using an imitation learning method of 3D template generation (Babel’s Babelism). Authors and photographs of the study: Xiaoming Qin, Xiaochen Ding, and Liu Yang

Fig. 15
figure 15

An extraterrestrial adaptive canopy design possibly built as a configurable space combining intuition and an imitation learning method of 3D template generation (Extraterrestrial Rapid Response). Authors of the study: Dayang Wang, Zhipeng Guo, Kaizhong Yang, and Zheyuan Kuang

5 Discussion and prospects

The agent's training to follow the toolpath in the gesture-driven navigation of a robot is satisfactory. The Unity/Rhino-based computational framework may serve as a basis for further testing and observations applied in more practical operations. To date, only one type of algorithm has been tested, namely, proximal policy optimization (PPO), which uses a neural network to approximate the ideal function that maps an agent's observations to the best action an agent can take in a given state (Github n.d., Juliani, et al., 2020).

The other algorithms and different hyperparameters can be evaluated according to the designer's specific needs, such as trying different strengths of GAIL or behavioral cloning and their combinations. The potential of the human hand, movements, and gesture recognition lies in the prospective implementation in the making and crafting processes when the hand and actions of the artisan can be captured and recognized to inform the learning policy in the form of an expert demonstration.

At this stage of the investigation, the robot movement is not straightforward, as it contains specific noise. Such noise prevents the robot from moving smoothly, as in the demonstration. More training episodes can address this during the default training (which also requires longer training time) and more steps in the demonstration data. The robot itself can be set up through an updated articulation body tool in Unity, benefiting from Unity physics instead of the current setup of configurable joints. This will improve the motion of the robot. Even though AI does not precisely clone the gesture, the resulting digital process partially follows the human inputs because of a pretrained process during behavioral cloning.

The experiment regarding the growth process of spatial configurations used a computational approach to train the AI to assemble three-dimensional scenarios based on components deployed as kit-of-parts. In the context of the AECO, this may contribute to the space created in unconstrained construction site conditions while considering human intuition and creativity in creating a unique space.

In future research, it is important to concentrate on the demonstrations provided by artisans, utilizing more advanced recognition-based sensorial setups, such as motion capture methods and tactile sensors, to obtain more precise data. The plan is to integrate these into the Unity framework.

From the results, the author observed that GAIL combined with behavioral cloning (the strength ratio implemented was 0.5 for both reward signals—GAIL—and behavioral cloning) has potential in digital fabrication and production processes; however, more tasks and more robust processes involving physical robots must be tested first, such as the creation of an assembly based on a kit-of-parts system conducted by a robotic arm or a cable-driven parallel robot on a larger scale.

6 Conclusion

In this article, we introduced a computational framework by using the GAIL method and by employing the ML-Agents tool in Unity in two independent experiments. The framework can be deployed in handcrafting or assembly processes utilizing tools such as a collaborative robot or a creative artificial agent. The concept of demonstration and a building strategy based on a master builder demonstration were applied, tested, and evaluated by observation and construction in four computational design scenarios, which proved that the proposed framework can be applied in DfMA practice and can serve as a generative system where human and AI agencies can work interactively and cooperatively, as in human-in-the-loop.

Environments such as Unity and Rhinoceros can serve as platforms to integrate gentile operations in making based on handcrafting, followed and learned by AI. The hypothesis stated at the beginning has been partially demonstrated through digital and physical assembly, and the computational models are open for enhancements to improve the notion of interactive cocreation. Hands-on operations followed by AI-driven technologies may shift how crafting processes can be executed and may provide a novel understanding of where the human agent is still an expert and a critical production agency in human-in-the-loop processes.

The observations of the virtual hand proved satisfactory real-time navigation of the robot (without a specific sensorial framework); however, further testing with the physical robot is necessary to verify the concept entirely. In future research, the proposed digital frameworks can be combined with a physical robot to conduct imitation learning scenarios in a physical, operational environment executing specific craft-related tasks.