DeepCraft: imitation learning method in a cointelligent design to production process to deliver architectural scenarios

Buš, Peter; Dong, Zhiyong

doi:10.1007/s44223-024-00055-2

DeepCraft: imitation learning method in a cointelligent design to production process to deliver architectural scenarios

Research article
Open access
Published: 02 April 2024

Volume 3, article number 12, (2024)
Cite this article

Download PDF

You have full access to this open access article

Architectural Intelligence Aims and scope Submit manuscript

DeepCraft: imitation learning method in a cointelligent design to production process to deliver architectural scenarios

Download PDF

672 Accesses
1 Altmetric
Explore all metrics

Abstract

The recent advancements in digital technologies and artificial intelligence in the architecture, engineering, construction, and operation sector (AECO) have induced high demands on the digital skills of human experts, builders, and workers. At the same time, to satisfy the standards of the production-efficient AECO sector by reducing costs, energy, health risk, material resources, and labor demand through efficient production and construction methods such as design for manufacture and assembly (DfMA), it is necessary to resolve efficiency-related problems in mutual human‒machine collaborations. In this article, a method utilizing artificial intelligence (AI), namely, generative adversarial imitation learning (GAIL), is presented then evaluated in two independent experiments related to the processes of DfMA as an efficient human‒machine collaboration. These experiments include a) training the digital twin of a robot to execute a robotic toolpath according to human gestures and b) the generation of a spatial configuration driven by a human's design intent provided in a demonstration. The framework encompasses human intelligence and creativity, which the AI agent in the learning process observes, understands, learns, and imitates. For both experimental cases, the human demonstration, the agent's training, the toolpath execution, and the assembly configuration process are conducted digitally. Following the scenario generated by an AI agent in a digital space, physical assembly is undertaken by human builders as the next step. The implemented workflow successfully delivers the learned toolpath and scalable spatial assemblies, articulating human intelligence, intuition, and creativity in the cocreative design.

Autonomous robotic additive manufacturing through distributed model‐free deep reinforcement learning in computational design environments

Article Open access 01 March 2022

DeepCraft: Co-Intelligent Architecture and Human and AI-Driven Craftsmanship in Design-to-Production Pipelines

Digital-Twin-Enabled Framework for Training and Deploying AI Agents for Production Scheduling

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

At present, the architecture, engineering, construction, and operation sector (AECO) addresses a variety of challenges related to its continuous digitization, low production efficiency in construction, sustainability, circularity, elimination of carbon emissions, and mass customization through the lens of artificial intelligence (AI) as a part of revolutionizing Industry 4.0 (Saka et al., 2023). In addition to numerically controlled digital fabrication (NC), building information modeling (BIM), design for manufacture, assembly and disassembly method of production (DfMA) (Bayoumi, 2000), and standardization of building components such as kit-of-parts in construction methods (Howe et al., 1999), the AI field is becoming increasingly prevalent in efforts to augment human capabilities to address complex and high-dimensional problems, as well as to articulate unique human-made qualities in built artifacts and scenarios and human-centered use of AI in the AECO (Nabizadeh Rafsanjani and Nabizadeh, 2023).

1.1 Problem statement

The recent advancements in these technologies place greater demands on the skills of human designers, production technologists, construction experts, and workers to resolve specific production-related problems in mutual human‒machine collaborations (Duan et al., 2017). While likely unfamiliar with the latest advancements in digital fabrication, digitally unskilled workers are usually highly skilled craftsmen (e.g., carpenters in timber construction production) capable of delivering unique and customized products while preserving the traditional notion of crafts with specific human-made qualities. Thus, artisans provide qualitative value to production pipelines, meeting high standards as well as many qualitative criteria for handmade processes (aesthetic qualities, detailing, smartness, and complexity of the chosen solution made by a human, handmade quality, individually recognized authorship, specific artistic and artisanal qualities) (Pu, 2020). However, these higher-standard qualities are often delivered less efficiently due to differences in delivery time, cost, and energy.

1.2 Purpose of the paper and methodology

Given the necessity and the preservation of human-related qualities in the delivery process in a production-efficient way, to articulate human skills, knowledge, and values, and to improve digitization in the AECO sector significantly, in this paper, we investigate and propose a method in which a human demonstrator (a human master builder) demonstrates the process of making or assembling a product to an artificial agent (a machine or a robot). The agent learns and understands the process then imitates human intent. The purpose of this paper is to introduce and to describe the method of imitation learning applied in a digital production process: a) training a digital twin of a robot to learn the digital execution of the robotic toolpath based on a human gesture. This can be prospectively used as a substitution for human activity within the AECO sector and b) to support a human designer in quickly and efficiently generating and delivering a variety of design scenarios of spatial architectural assemblies, encompassing kit-of-parts components in a 3D configuration autonomously or in a cocreative process semiautonomously. In the semiautonomous process, the designer works with the AI agent simultaneously and cooperatively in the human-in-the-loop process (Mosqueira-Rey et al., 2022).

1.3 Rationale of the intent and significance of the research

The significance and originality of the paper lies in the potential discovery of novel production scenarios created autonomously by AI after a training process based on human demonstration. In that sense, humans can train AI to execute tasks independently to deliver novel, unprecedented outcomes following the designer's intent. To achieve this research aim, as stated above, we implemented the generative adversarial imitation learning (GAIL) method in two independent experiments following the demonstrator's intention:

Real-time robotic tool path execution based on hand gestures.
Assembly configuration generated via autonomous and semiautonomous processes.

In machine training, the generative adversarial imitation learning method (GAIL) considers the demonstrated task that the machine can imitate (Ho & Ermon, 2016). GAIL is inspired by the notion of craftsmanship or master building, where the teacher shows the pupil how to execute a specific task, and it is the closest to being used in the digital training of a machine, which is expected to imitate human activity. The authors argue that artificial intelligence technology augments experts’ skills and capabilities to create a novel design and production space while articulating the human capabilities of unique human contributions.

1.4 Unconsidered aspects of the paper

The study in this paper does not implement physical robotic execution at this research stage; rather, we describe the process of training and observing an AI agent in a digital space. The results and observations can be input for later research experiments with a physical robot. In addition, the current research stage does not integrate structural feedback for machines during AI-driven assembly. The digital assembly process in this study is simplified, and we employ prismatic building blocks inserted in the assembly from the bottom upward, which creates a spatial template to be occupied by a predesigned kit of parts. The process results in a vertical scenario where the structural relationship of the building blocks is created in the next step—in the distribution of the kit-of-parts components and assembled by a human builder.

2 Research question and overall aims

This research starts with the following question: Is AI capable of autonomous or collaborative cocreative design for manufacturing and assembly (DfMA) production following the designer’s intent and intuition while making self-decisions? The practical implementation of cocreative human‒machine interaction building processes with the use of artificial intelligence (AI) and, more specifically, imitation learning covers the following domains:

improvement of the digitization of the AECO by the smooth and cocreative process of designing and creating spatial scenarios to support a human designer or artisans in delivering an outcome efficiently;
considering the designer’s or artisans’ individual cognitive and creative capacities, intuition, and knowledge;
preserving human-related qualities in the design process and making;
training the robot to execute the task either cocreatively (in the process of human-in-the-loop) or autonomously (fully AI-driven and based on previously learned knowledge from demonstrations) (Fig. 1);
improving the production efficiency of the construction (assembly) processes while reducing cost, health risk, labor demand, and energy;
enhancing safety conditions on-site;
facilitating the design and production of space exploration conducted by the cointelligent creation process between a human and artificial agency.

3 Artificial intelligence in the scope of DfMA and craftsmanship

In this article, we propose a hypothesis. Through human and artificial intelligence-driven processes of digital fabrication and the production of artifacts where craft skills are recognized, learned, trained, and implemented in a human-in-the-loop cocreative production workflow such as in one-shot imitation learning (Finn et al., 2017), technology is capable of developing and strengthening its “wisdom” in a way similar to how humans improve their skills, experience, and wisdom in time. Thus, such systems can make autonomous decisions in the production process. Unique knowledge and design skills demonstrated by a designer or an artisan can be translated into demonstrations, digital models, additional data sources (such as images, videos, and sequences), and digital processes (generation of tool paths for specific digital robotic fabrication execution and unique assembly modes).

Can a machine conceptualize learned knowledge to yield novel artifacts through hybridized/synthesized modes of human‒machine interactions utilizing neural networks (NNs) and deep reinforced learning (DRL)? Is the robot capable of creating a novel artifact and of generating artificial production space?

In this paper, we further propose that by linking human intelligence (knowledge, experiences, craft skills, capacity to make relevant decisions) and machine intelligence (responsive robotics based on multisensorial setup (Felbrich et al., 2022), XR devices and digital operations) in one coherent hybridized production loop, a novel communication and interaction platform between humans and machines can be created via physical interventions and demonstrations, which can lead to improvements in machine capabilities to execute the production task.

We introduce a computational connection between a human agent and an AI agent in a digital process to create an abstract toolpath while theoretically envisaging a novel searching and generation method for design and production space based on a human-in-the-loop cooperative learning process. In the first experiment, we provide a preliminary concept of imitation learning by using the digital twin of a desktop robotic arm equipped with a virtual multisensorial setup. The machine learns a simple human-driven toolpath, considering gentle human movements of the hand physically provided by the demonstrator, translated into the digital space. In this report, the human and machine agents consider human-driven toolpath generation an intuitive gesture. Such a framework is used to imitate a human gesture in a reinterpreted event, e.g., for drawing or painting intervention in a digital space. In the second experiment described in Sect. 4 of this paper, human logic, preferences, memory, cognition, intuition, and creativity are integrated into the design process. The human designer collaborates with the artificial agent to deliver a cocreative spatial scenario in the computational framework utilizing the GAIL approach. Then, an AI agent can simulate the assembly process in the digital assembly. The physical expression of the implemented computational design-to-production framework is intuitively evaluated by human designers in the physical assembly in four assembled studies, following the notion of the kit-of-parts production method.

These creative processes are shown to be transferable to a machine in a continuous training and learning process driven by a human. Consequently, the device constantly improves its capability based on human agents' inputs and becomes more autonomous in the decision-making and generation of AI-driven design and production space.

Instead of replication and recreation of the crafting process in a numerically controlled way of digital fabrication (NC), our goal is to discover a means for the machine to communicate through a unique and enriched craft language, reflecting the formal expression found in human artifacts but in an unconventional way, expanding beyond the boundaries of human imagination within the realm of solution craft space.

3.1 Imitation learning and human‒machine interaction

Digital technology is impacting building design, assembly, and construction processes at an unprecedented rate; this is especially true given the rise of artificial intelligence and machine learning. In contrast to other machine learning methods, which rely on the quantity and quality of the dataset, reinforcement learning is rewarded for actions taken in the state of the environment, which means that this training process relies heavily on the design of the reward system and the input data (Matulis & Harvey, 2021). Setting an accurate and interpretable reward function for many reinforcement learning tasks is tedious. Instead, learning directly from human feedback naturally integrates human domain knowledge into the policy optimization process, which is the basic principle of imitation learning (Taranovic et al., 2023).

In the rapidly evolving fields of robotics and AI, imitation learning, especially generative adversarial imitation learning (GAIL) (Ho & Ermon, 2016), bridges the gap between human cognition and machine capabilities. This paradigm allows machines to learn by observing and by imitating human behavior, fundamentally changing how we interact with tools and robotic systems. Recent research results in deep learning and reinforcement learning have paved the way for robots to perform highly complex tasks and applications in AECO.

3.2 Robotic training-related research

In 1962, an industrial robot, Unimate, grasped wooden blocks with two fingers and stacked them on each other. The robot was designed to mimic human functions (Hua et al., 2021). Early robots did not perceive the world or possess specific intelligence or dexterity. With breakthroughs in hardware technology, integrating multimodel information such as vision, haptics, and perception has enabled robots to recognize a target’s pose more accurately (Homberg et al., 2015), and breakthroughs have been made in the direction of robot grasping applications.

With the assistance of deep learning, data-driven robot manipulation methods are mainly categorized into two types: methods that transmit motion information to the robot through wearable sensing devices and methods that extract object features based on visual perception to formulate strategies (Alexandrova et al., 2014). The first method collects data through wearable devices to analyze the coordinated motion relationship between multiple joints of the human hand (Huang et al., 2015). This approach can be based on a statistical representation of the degrees of freedom of the human arm, by using time series analysis to build a database for learning, thus establishing a mapping relationship between the human hand and the machine (Pérez-D'Arpino and Shah, 2015). The second method, learning to manipulate objects based on visual perception, can rely on the vision system to automatically acquire the information features and to understand the operation of the robot with the help of the self-feature learning ability of deep learning combined with the mathematical modeling method (Varley et al., 2015).

Imitation learning, as mentioned in Sect. 3.1, enables machines to learn operations quickly by observing demonstrations or by using small amounts of data, which improves the success rate of training. The operation of the robot can be viewed as a Markov decision process, which then encodes the expert's action sequences into state-action pairs consistent with the expert (Hua et al., 2021). By combining GAIL with reinforcement learning mechanisms, the speed and accuracy of imitation learning can be improved through iterative adversarial training to make the distributions between the expert and the agent as close as possible (Kuefler et al., 2017). Compared to commonly used first-person demonstration methods, the unsupervised third-person imitation learning (TPIL) method can overcome this dilemma by training the agent to correctly achieve goals in environments when demonstrations are provided from different perspectives (Stadie et al., 2019). The combination of multiple sensors expands the application scenarios of robot manipulation; Hamano et al., (2022) used eye-tracking devices to analyze gaze images to drive a robot.

In addition, with the development of digital construction technology, the application of virtual scenes can simulate accurate locations and can achieve better human-tool interactions. Bedaka et al. (2019) applied three-dimensional visual information to generate robot motion trajectories. Lin et al. (2020) also simulated robots through a digital platform by using the ML agent to find a path of an industrial robot to reach a goal. Kurrek et al. (2019) used AI to develop control strategies for a robot manipulation task in a simulated environment. Hassel and Hofmann (2020) demonstrated a line-tracking robot trained in a virtual space under the digital twin paradigm. The Unity platform is also used to build an efficient reinforcement learning framework to illustrate the relationship between virtual and actual physical information (Matulis & Harvey, 2021).

3.3 Current learning experiments in the processes of AI in the AECO and the use of GAIL

The application of AI deep learning within robotic digital fabrication processes has undergone testing across a range of tasks, potentially benefiting the AECO sector. These tasks include the assembly of a lap joint (Apolinarska et al., 2021) or pick and place scenarios for component assemblies (Felbrich et al., 2022). Other researchers have focused on codesigning strategies for autonomous construction methods (Menges & Wortmann, 2022), exploring the integration of deep reinforcement learning for the intelligent behavior of construction robots as builders.

The question of how to involve human agency in AI-driven processes to achieve coherent results for the potential use of AI in AECO applications on a larger scale or in human-made operations must still be explored. Imitation learning, especially generative adversarial imitation learning (GAIL) (Ho & Ermon, 2016), as a method for teaching a robot to perform a task, has solid potential to be integrated into design-to-production processes if we consider a smaller scale in the early stage of production, such as drawing or cutting toolpath generation.

Pinochet (2015, 2023) proposes an interaction model that uses a two-axis CNC plotter and a customized software interface to develop and to implement three types of interactions—gestural, tangible, and collaborative exchanges—which utilize bodily gestures and behaviors given to the fabrication machine to establish a dialog that engages designers in an improvisational, insightful, and cognitive design process. Pick-and-place scenarios of simple objects by using visual demonstrations and data collected from a human agent are successfully deployed (Finn et al., 2017) as a combination of imitation and meta-learning strategies. However, the movements of the robot are still very technical and preprogrammed, although they successfully perform simple tasks. The Unity and ML-Agents tools to train a robot were previously introduced by Pinochet as Smart Collaborative Agents (2020) and Hahm (2021) when the robot follows predefined targets, does not use an imitation learning approach, and is based on configurable joints or a Unity articulation body.

The imitation or delivery of human craftsmanship with a unique execution is very complex, and such results require complex information and data to be collected then processed. This research explores how to engage the human agent with a robot, aiming to find a method for a process to either participate together in a real-time sequence scenario or the human acts as an expert and demonstrator to teach the robot a task to execute.

4 Implementing the computational framework

Two experiments are described in this paper: a) gesture-driven navigation of a robot that integrates real-time robotic twin navigation by a human gesture in Unity and Rhino | Grasshopper environments, which serve as an initial input for gesture-driven toolpath generation. Both environments (Unity and Rhino) for gesture navigation are detailed in Sect. 4.2 (Figs. 2 and 3), and b) the framework of digital assemblies investigating the aspects of intuition, cocreation, and cointelligence in a digital and physical space encompassing kit-of-parts systems in Sect. 4.3.

4.1 Real-time gesture-driven navigation of the robot and cointelligent assemblies

The real-time navigation of a robot may have a variety of possible uses in the field of digital fabrication and manufacturing for architecture. The proposed framework can also serve as a designer environment to be assessed and explored before any manufacturing or crafting process. The data captured from a human can be stored and implemented in a custom scenario. Even though the implementation at this stage is not fully practical due to specific constraints related to noisy interference of the data exchange, the real-time interaction is engaging. This approach can be used for further investigations in combination with real robots. The computational models for both strategies are available from Buš (2023a, 2023b, 2023c) and Github (n.d.). The imitation learning approach in a digital space to navigate the artificial agent simultaneously and cocreatively with a human expert is applied in four assembly scenarios built by human builders, as described in Sect. 4.3.

4.2 Unity and Rhino/GH implementation – hand tracking

The robotic digital twin implementations in Unity and Rhino encompass the User Datagram Protocol library (UDP) for the data transfer between the actual gesture, the digital environment, and the robotic twin. For this implementation, the Universal Robot UR1 digital twin and a standard web camera were used for human hand capture.

The computational approach of hand recognition and the data exchange platform in Unity implements the Unity UDP receiver script provided by the CV Zone platform as an open data resource (Murtaza, 2022). The Unity and Rhino Grasshopper environments were customized and adapted for robotic movement. Both strategies utilize the CVZone hand tracking module with the hand detector implemented in Python to recognize a human's hand (Murtaza, 2022). Hand recognition involves 21 points that are interconnected with lines, which represent the virtual skeleton of the human hand.

4.2.1 Unity basic setup

By using the local host, the 21 recognized points are transferred to the Unity environment via the UDP data protocol. The receiver constantly receives the data, and the points are embedded as game objects, creating the foundation for the skeleton.

The specific point can be selected as a spawner of the checkpoints for the robotic toolpath. Based on human movement, the hand spawns the targets for the robot, which are precisely rotated according to the hand movement. Custom C# scripts were written to link the hand with the digital model of the UR robot, which is based on configurable joints for each of its axes. As such, it was possible to create a target for the robot that follows the movement. In that way, the robot is navigated by the hand point on the selected finger in real time, considering the physics engine in Unity and rotating and moving based on the customized configurable joints.

4.2.2 Rhino/Grasshopper setup

Similarly, the recognized hand points were transferred via the UDP protocol into the Rhino | Grasshopper environment, and the points were reconnected. This process was performed as an independent platform. For the UDP communication transfer, the GHowl addon has been used (Alomar et al., 2011), considering the position of the points and the distance information between the hand and the web camera.

By using this information, it was possible to implement the third dimension to navigate a virtual end effector of the robot in all three dimensions. The Robot addon was utilized for the real-time simulation of the moving robot. The GH definition can serve as a starting point and a test bed for further implementation and testing purposes. The working version of the GH definition is available^{Footnote 1} (Buš 2023b) (Github, n.d.).

4.2.3 GAIL and behavioral cloning test in Unity and observation

The Unity environment was further evaluated to teach the robot to recognize and to interpret human gestures after they were captured. Several custom scripts and the standard toolpath following a system based on numerically controlled positions were developed to do so. These were seized from the spawned checkpoints from the human gesture to create a linear toolpath. However, implementing the deep learning method, in this case, imitation learning utilizing the ML-Agents tool in Unity with the GAIL learning method combined with behavioral cloning, was tested and observed (Juliani et al., 2020).

GAIL (Ho & Ermon, 2016) considers the policy from the expert demonstration to perform a task based on ‘how to act by directly learning a policy’ from the data provided. The ML-Agents tool contains an imitation learning approach utilizing the GAIL and the behavioral cloning method, which captures the predefined process of demonstrating how the robot should perform the task according to the expert demonstration. It follows the sequence of targets in a toolpath previously generated by a human in real time. In this experiment, the data captured from the gesture served as an input for the demonstration recording, containing the transform position information (transform position, rotation, scale) of spawned targets from the motion.

The positions were translated into the toolpath, and a virtual ML agent ran through them several times (see Fig. 4). The agent can subsequently serve as an input for the robotic end effector target mentioned above. The heuristics training simulation contains the digital demonstration, which is captured as a demonstration for the GAIL and behavioral cloning training.

The training algorithm included the default algorithm based on the proximal policy optimization hyperparameters (PPO) for the ML agent, tested with different setups for the GAIL strength or behavioral cloning (Fig. 6). The virtual agent looks randomly for the checkpoint positions in space and learns from them how to interact with them in each of the episodes. The task for the agent was to recognize the starting position and end position, as well as checkpoints to perform the toolpath in the proper order and direction. In addition, each of the iterations slightly and randomly moves the positions of the path checkpoints to encourage the agent to learn from these novel positions. This might serve for potential future gestures that differ each time.

Each time it collides with the correct or wrong checkpoint, the agent receives a positive or negative reward, respectively. The learning process contains 3 to 6 million iterations (steps) with a positive or negative reward structure for the agent. The process generated the virtual brain for future testing scenarios. As was qualitatively observed from the preliminary tests in the learned positions of the agent in the final inference training, the results with the current setups do not precisely imitate the original demonstration. However, the agent sequentially reaches the targets in the right directions and with the correct orientation. The quantitative results are provided in the following scalars, captured from the TensorBoard platform (TensorFlow, 2023), showing relevant reward processes and GAIL policies. When the cumulative reward increased over time (when the model was trained to combine extrinsic rewards and GAIL), the GAIL loss and pretraining loss showed that the model adopted well according to the demonstration, as the curve slightly decreased over time, assuming that the agent learned the policy.

The GAIL reward decreased after several iterations, and the agent obtained relevant rewards while learning the policy. A considerable decrease in the cumulative reward was observed at the beginning of the training process, which depended on various combinations of hyperparameters set in the configuration file. The training delivers a variety of brains with less or acceptable training results. During the training, each scenario had a moment of decreasing the reward value, which later stabilized. In addition, the agent continuously improved the imitation of the demonstration during the training duration (Figs. 5 and 6).

4.3 Cocreative Assemblies: four design scenarios for the proposed framework based on cocreative intelligence

The proposed imitation learning framework was further evaluated for manufacturing and assembly production methods in the design context and qualitatively evaluated. To reduce cost, energy, health risk, material resources, and labor demand, the creation method involves assembling a kit-of-parts into a spatial whole through a cooperative human-in-the-loop process or by being fully automated by employing an artificial agent in a digital environment. The design process starts with the expert’s demonstration, a master builder creating the spatial scenario digitally, cooperatively, and interactively with the artificial agent. If the designer makes this decision, the workflow allows the artificial agent to work autonomously without the human agent and to follow the learned principles of previously observed demonstrations provided by the expert.

The process is implemented in the workflow of spatial assemblies following the paradigm of discrete architecture (Retsin, 2019). It is still abstract but scalable into a 1:1 assembly process, integrating prefabricated kit-of-parts components. In this article, we present four design scenarios to evaluate the proposed framework following design to fabrication.

The design intention of the four design scenarios was to digitally design, assemble, manufacture, and build an abstract architectural space by using a digitally prefabricated kit of parts specifically designed following the designer’s intention and intuition. Each of the scenarios incorporates the different shapes of the planar material; 12 mm thick plywood or transparent acrylic sheets were used. The simple planar components were assembled into smaller subassemblies, creating spatial building blocks inserted into each other and creating stable structures. The following design scenarios were applied:

1)
to encompass modular assemblies in an abstract repetitive pattern to create modular spaces (Scenario 1–Fig. 12), integrating simple slotting joints of triangular components in tetrahedron-like subassemblies and creating more complex spatial patterns;
2)
to reinterpret historical architectural language in a speculative and contemporary tower-like spatial assembly (Scenario 2-Fig. 13),
3)
to create an abstract interior light object citing the idea of the Babel Tower with 3 different expressions of shape and color of the used kit of parts as a standing object (Scenario 3- Fig. 14),
4)
to speculate about an extraterrestrial adaptive canopy possibly built as a configurable space responding to the spatial needs of its inhabitants (Scenario 4 – Fig. 15).

These scenarios were then assessed based on their deliverability and scalability. In addition, the framework allows digital AI-assisted generation of the toolpath when a new component is added to the design scenario based on an expert's demonstration (Figs. 7 and 8).

The demonstration is provided by a human master builder who interacts with the scenario and the artificial agent. The master builder navigates the agents by using arrow keys or any custom keys (e.g., W, A, S, or D) during heuristic training to digitally provide demonstrations. After several steps (episodes) of the demonstrations, the data are used to train the agent in the default training, where the AI agent follows the behavior of a master builder in the demonstration, who incrementally adds new building blocks into the spatial template, and the agent always reaches the last added block. When a building agent collides with a block, it adds new blocks to the template by itself following the growth criteria (vector-driven generation of building blocks near the location where the master builder adds new blocks, the number of blocks added, and the randomness of growth level, as previously computationally tested by Hahm (2020a, 2020b)). It modifies the entire spatial scenario (environment). According to the environment's last state, the human master builder reacts spontaneously and intuitively to the agent’s modifications and adds new blocks again. This process, in which the human is in a loop with the AI, is created intuitively and in real time. Once the designer is satisfied with the results (following intuition and desired design criteria, such as verticality, horizontality, and the proportional character of the scenario), the template is then put into the design-to-production system at the Rhino interface utilizing a kit-of-parts populated into the spatial template (Figs. 8 and 11).

The training scenario follows the principle of imitation learning, where the AI and master builder work simultaneously during inference learning after 6 million training iterations when the ONNX brain is generated (Figs. 9 and 10).

In the following production process, creating a spatial assembly consisting of different components with predefined connections was possible (Fig. 11). The components were joined through slots while considering the appropriate tolerance model of the joinery and were evaluated physically. The experimental assemblies were tested in two types of materials—birch plywood and transparent acrylic—to test the deliverability of the scenario. After the spatial scenarios were generated, the designed components were digitally fabricated and assembled according to the proposed templates without manual assembly. Four studies were conducted to assess the proposed framework and were successfully performed on the proposed scale (Figs. 12, 13, 14 and 15).

5 Discussion and prospects

The agent's training to follow the toolpath in the gesture-driven navigation of a robot is satisfactory. The Unity/Rhino-based computational framework may serve as a basis for further testing and observations applied in more practical operations. To date, only one type of algorithm has been tested, namely, proximal policy optimization (PPO), which uses a neural network to approximate the ideal function that maps an agent's observations to the best action an agent can take in a given state (Github n.d., Juliani, et al., 2020).

The other algorithms and different hyperparameters can be evaluated according to the designer's specific needs, such as trying different strengths of GAIL or behavioral cloning and their combinations. The potential of the human hand, movements, and gesture recognition lies in the prospective implementation in the making and crafting processes when the hand and actions of the artisan can be captured and recognized to inform the learning policy in the form of an expert demonstration.

At this stage of the investigation, the robot movement is not straightforward, as it contains specific noise. Such noise prevents the robot from moving smoothly, as in the demonstration. More training episodes can address this during the default training (which also requires longer training time) and more steps in the demonstration data. The robot itself can be set up through an updated articulation body tool in Unity, benefiting from Unity physics instead of the current setup of configurable joints. This will improve the motion of the robot. Even though AI does not precisely clone the gesture, the resulting digital process partially follows the human inputs because of a pretrained process during behavioral cloning.

The experiment regarding the growth process of spatial configurations used a computational approach to train the AI to assemble three-dimensional scenarios based on components deployed as kit-of-parts. In the context of the AECO, this may contribute to the space created in unconstrained construction site conditions while considering human intuition and creativity in creating a unique space.

In future research, it is important to concentrate on the demonstrations provided by artisans, utilizing more advanced recognition-based sensorial setups, such as motion capture methods and tactile sensors, to obtain more precise data. The plan is to integrate these into the Unity framework.

From the results, the author observed that GAIL combined with behavioral cloning (the strength ratio implemented was 0.5 for both reward signals—GAIL—and behavioral cloning) has potential in digital fabrication and production processes; however, more tasks and more robust processes involving physical robots must be tested first, such as the creation of an assembly based on a kit-of-parts system conducted by a robotic arm or a cable-driven parallel robot on a larger scale.

6 Conclusion

In this article, we introduced a computational framework by using the GAIL method and by employing the ML-Agents tool in Unity in two independent experiments. The framework can be deployed in handcrafting or assembly processes utilizing tools such as a collaborative robot or a creative artificial agent. The concept of demonstration and a building strategy based on a master builder demonstration were applied, tested, and evaluated by observation and construction in four computational design scenarios, which proved that the proposed framework can be applied in DfMA practice and can serve as a generative system where human and AI agencies can work interactively and cooperatively, as in human-in-the-loop.

Environments such as Unity and Rhinoceros can serve as platforms to integrate gentile operations in making based on handcrafting, followed and learned by AI. The hypothesis stated at the beginning has been partially demonstrated through digital and physical assembly, and the computational models are open for enhancements to improve the notion of interactive cocreation. Hands-on operations followed by AI-driven technologies may shift how crafting processes can be executed and may provide a novel understanding of where the human agent is still an expert and a critical production agency in human-in-the-loop processes.

The observations of the virtual hand proved satisfactory real-time navigation of the robot (without a specific sensorial framework); however, further testing with the physical robot is necessary to verify the concept entirely. In future research, the proposed digital frameworks can be combined with a physical robot to conduct imitation learning scenarios in a physical, operational environment executing specific craft-related tasks.

Availability of data and materials

Both digital computational design frameworks, combining Unity and the Rhinoceros/GH interface for production space generation, are available via the data repository Github (Github, n.d.; Buš 2023a, 2023b) through the following links: https://github.com/peterbus/Hand-Tracking-to-navigate-a-robot--Rhino-Grasshopper-framework (the robotic navigation via hand-tracking) and https://github.com/peterbus/Co-Creative-Assemblies-DF-Workshop-2023.git (Co-Intelligent assemblies workshop 2023 content). The research data repository contains Unity Assets, scripts, and additional datasets containing experimental results and Grasshopper and Rhino file definitions containing the proposed digital workflows.

Additional materials (videos and PPT files) are available via the publicly accessible repository Google Drive: https://drive.google.com/drive/folders/18krXZbctjt9rHj2-W6scDGX5vLc_vuJ9?usp=sharing (Buš 2023c).

Notes

The framework is available on the Github repository (Github, n.d., Buš 2023b).

References

Alexandrova, S., Cakmak, M., Hsiao, K. & Takayama, L. (2014). Robot Programming by Demonstration with Interactive Action Visualization. In: Robotics: Science and Systems. Available at: https://doi.org/10.15607/RSS.2014.X.048
Alomar, D., Fraguada, L. E., Piacentino, G. (2011). Food4Rhino. gHowl. Available at: https://www.food4rhino.com/en/app/ghowl
Apolinarska, A. A., Pacher, M., Li, H., Cote, N., Pastrana, R., Gramazio, F., & Kohler, M. (2021). Robotic assembly of timber joints using reinforcement learning. Automation in Construction, 125, 103569. https://doi.org/10.1016/j.autcon.2021.103569
Article Google Scholar
Bayoumi, A. (2000). Design for manufacture and assembly (DFMA): Concepts, benefits, and applications. In: M.F. Hassan and S.M. Megahed, eds., Current Advances in Mechanical Design and Production VII. Pergamon, pp.501–509. Available at: https://doi.org/10.1016/B978-008043711-8/50051-9
Bedaka, A. K., Vidal, J., & Lin, C. (2019). Automatic robot path integration using three-dimensional vision and offline programming. The International Journal of Advanced Manufacturing Technology, 102(5–8), 1935–1950. https://doi.org/10.1007/s00170-018-03282-w
Article Google Scholar
Buš, P. (2023a). Repositories [Shenzhen], Github; [updated 2023 May 4, cited 2023 May 5]. Available from https://github.com/peterbus?tab=repositories
Buš, P. (2023b). Hand-Tracking-to-navigate-a-robot-Rhino-Grasshopper-framework. [Shenzhen], Github; [updated 2023 May 4, cited 2023 May 5]. Available at: https://github.com/peterbus/Hand-Tracking-to-navigate-a-robot--Rhino-Grasshopper-framework
Buš, P. (2023c). Supplementary materials. [Shenzhen], Google Drive Repository; [updated 2023. Cited 2023 October 7]. Available at: https://drive.google.com/drive/folders/18krXZbctjt9rHj2-W6scDGX5vLc_vuJ9?usp=sharing
Duan, Y., Andrychowicz, M., Stadie, B., Ho, J. et al. (2017). One-shot imitation learning. In: I. Guyon, U. Luxburg, S. Bengio, H. Wallach & R. Fergus (Eds.), Advances in neural information processing systems 30. Curran Associates, Inc. (pp. 1087–1098). Available at: http://papers.nips.cc/paper/6709-one-shot-imitation-learning.pdf
Felbrich, B., Schork, T., & Menges, A. (2022). Autonomous robotic additive manufacturing through distributed model-free deep reinforcement learning in computational design environments. Construction Robotics, 6(1), 15–37. https://doi.org/10.1007/s41693-022-00069-0
Article Google Scholar
Finn, C., Yu, T., Zhang, T., Abbeel, P. & Levine, S. (2017). One-Shot Visual Imitation Learning via Meta-Learning. Available at: http://arxiv.org/pdf/1709.04905v1. Accessed 4 Jan. 2024.
Github. Let's build from here. [Place unknown], Github; n.d. [updated 2023, cited 2023 May 5]. Available from https://github.com/
Hahm, S. (2020a). Diffusion Limited Aggregation in Unity C# Part 2. [Place unknown], Youtube, [updated 2020; cited 2023 September 9]. Available from https://www.youtube.com/watch?v=vgD273g22Gk&list=PLZ55wFj-13MRLrwX7IAl99rhj4D5OexJ2&index=20
Hahm, S. (2020b). Diffusion Limited Aggregation in Unity C# Part 1. [Place unknown], Youtube, [updated 2020; cited 2023 September 9]. Available from https://www.youtube.com/watch?v=WBBAT2pJfs8&list=PLZ55wFj-13MRLrwX7IAl99rhj4D5OexJ2&index=19
Hahm, S. (2021). Training robot arm with Unity ML agents. [Place un-known],Youtube; [updated 2020; cited 2023 May 5]. Available from https://www.youtube.com/watch?v=HOUPkBF-yv0
Hamano, S., Kim, H., Yoshiyuki Ohmura & Kuniyoshi, Y. (2022). Using human gaze in few-shot imitation learning for robot manipulation. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). https://doi.org/10.1109/iros47612.2022.9981706
Hassel, T. & Hofmann, O. (2020). Reinforcement Learning of Robot Behavior Based20. on a Digital Twin. In Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods ICPRAM - Volume 1, pp. 381–386, Valletta, Malta. https://doi.org/10.5220/0008880903810386
Ho J., Ermon, S., (2016). Generative Adversarial Imitation Learning, Cornell University arXiv e-prints, available at: https://doi.org/10.48550/arXiv.1606.03476
Homberg, B., Katzschmann, R., Dogar, M. and Rus, D. (2015). Haptic Identification Of Objects Using a Modular Soft Robotic Gripper. In: Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Hamburg, Germany, pp.1698–1705.
Howe, A.S., Ishii, I. and Yoshida, T. (1999). Kit-of-Parts: A Review of Object-Oriented Construction Techniques. In: C. Balaguer, ed., Proceedings of the 16th IAARC/IFAC/IEEE International Symposium on Automation and Robotics in Construction. Madrid, Spain: International Association for Automation and Robotics in Construction (IAARC), pp.165–172.
Hua, J., Zeng, L., Li, G., & Ju, Z. (2021). Learning for a Robot: Deep Reinforcement Learning, Imitation Learning Transfer Learning. Sensors, 21(4), 1278. https://doi.org/10.3390/s21041278
Article Google Scholar
Huang, D., Ma, M., Ma, W. and Kitani, K. (2015). How Do We Use Our Hands? Discovering a Diverse Set Of Common Grasps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (pp. 666–675). Boston.
Juliani, A., Berges, V.-P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., Gao, Y., Henry, H., Mattar, M. and Lange, D. (2020). Unity: A General Platform for Intelligent Agents. arXiv:1809.02627 [cs, stat]. Available at: https://arxiv.org/abs/1809.02627
Kuefler , A., Morton, J., Wheeler , T. and Kochenderfer , M. (2017). Imitating Driver Behavior With Generative Adversarial Networks. In: Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV). pp.204–211.
Kurrek, P., Jocas, M., Zoghlami, F., Stoelen, M. F., & Salehi, V. (2019). Ai Motion Control – A Generic Approach to Develop Control Policies for Robotic Manipulation Tasks. Proceedings of the Design Society, International Conference on Engineering Design, 1(1), 3561–3570. https://doi.org/10.1017/dsi.2019.363
Article Google Scholar
Lin, M., Shan, L., & Zhang, Y. (2020). Research on robot arm control based on Unity3D machine learning. Journal of Physics: Conference Series, 1633(1), 012007. https://doi.org/10.1088/1742-6596/1633/1/012007
Article Google Scholar
Matulis, M., & Harvey, C. (2021). A robot arm digital twin utilising reinforcement learning. Computers & Graphics, 95, 106–114. https://doi.org/10.1016/j.cag.2021.01.011
Article Google Scholar
Menges, A. & Wortmann, T. (2022). Synthesising Artificial Intelligence and Physical Performance. Machine Hallucinations Architecture and artificial intelligence, Architectural Design. Available at: https://www.wiley.com/en-us/Machine+Hallucinations:+Architecture+and+Artificial+Intelligence-p-9781119748847
Mosqueira-Rey, E., Hernández-Pereira, E., Alonso-Ríos, D., Bobes-Bascarán, J., & Fernández-Leal, Á. (2022). Human-in-the-loop machine learning: A state of the art. Artificial Intelligence Review. https://doi.org/10.1007/s10462-022-10246-w
Article Google Scholar
Murtaza H. (2022) Computer Vision Zone. Available at: https://www.computervision.zone/
Nabizadeh Rafsanjani, H., & Nabizadeh, A. H. (2023). Towards human-centered artificial intelligence (AI) in architecture, engineering, and construction (AEC) industry. Computers in Human Behavior Reports, 11, 100319. https://doi.org/10.1016/j.chbr.2023.100319
Perez-D’Arpino, C., & Shah, J. A. (2015). Fast target prediction of human reaching motion for cooperative human-robot manipulation tasks using time series classification. International Conference on Robotics and Automation. https://doi.org/10.1109/icra.2015.7140066
Article Google Scholar
Pinochet, P. D. I. (2015). Making Gestures: Design and Fabrication through Real Time Human Computer Interaction. Master’s dissertation. Massachusetts Institute of Technology.
Google Scholar
Pinochet, P. D. I. (2023). Computational gestural Making: a framework for exploring the creative potential of gestures, materials, and computational tools. PhD thesis. Massachusetts Institute of Technology.
Google Scholar
Pinochet, Diego. Digital Futures 2020 Digital Futures 2020 - Smart collaborative Agents Sessions. www.youtube.com. (n.d.). Available at: https://www.youtube.com/watch?v=KDObBwoyzKg&t=771s. Accessed 4 Jan 2024.
Pu, J. (2020). Integration of Arts and Crafts in Artificial Intelligence Environment. Journal of Physics: Conference Series, 1574(1), 012162. https://doi.org/10.1088/1742-6596/1574/1/012162
Article Google Scholar
Retsin, G. (2019). Discrete Architecture in the Age of Automation. Architectural Design, [online] 89(2) (pp. 6–13). https://doi.org/10.1002/ad.2406
Book Google Scholar
Saka, A. B., Oyedele, L. O., Akanbi, L. A., Ganiyu, S. A., Chan, D. W. M., & Bello, S. A. (2023). Conversational artificial intelligence in the AEC industry: A review of present status, challenges and opportunities. Advanced Engineering Informatics, 55, 101869. https://doi.org/10.1016/j.aei.2022.101869
Article Google Scholar
Stadie, B. C., Abbeel, P., & Sutskever, I. (2019). Third-Person Imitation Learning. arXiv.org. . https://doi.org/10.48550/arXiv.1703.01703
Book Google Scholar
Taranovic, A., Kupcsik, A. G., Freymuth, N., & Neumann, G. (2023). Adversarial Imitation Learning with Preferences. In International Conference on Learning Representations (ICLR 2023).
Google Scholar
TensorFlow. (2023). TensorBoard | TensorFlow. TensorFlow. Available at: https://www.tensorflow.org/tensorboard
Varley, J., Weisz, J., Weiss, J. and Allen, P.K. (2015). Generating multi-fingered robotic grasps via deep learning. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, pp. 4415–4420. https://doi.org/10.1109/iros.2015.7354004

Download references

Acknowledgements

The authors would like to express their sincere gratitude to the organizers of the Digital Futures World 2023 workshop event “Emerging Planetarism” at The College of Architecture and Urban Planning of Tongji University in Shanghai, Prof. Philip F. Yuan and Assist. Prof. Chao Yan. They generously provided an experimental space for the workshop Co-Intelligent Assemblies 1.0 conducted and exhibited during the event. We also thank Chaoyun Wu, Hongjun Li, and Yalan Mei from the Institute of Future Human Habitats Tsinghua Shenzhen International Graduate School in Shenzhen for their support during the workshop and all workshop participants.

A part of this research is funded by the Scientific Research Start-up Funds provided by Tsinghua Shenzhen International Graduate School (project n. 002023009C) and the Shenzhen Pencheng Peacock Program in the project Deep Co-Intelligent Architecture conducted at the Institute of Future Human Habitats in the Co-Intelligent Architecture laboratory led by Assist. Prof. Dr. Peter Buš.

Funding

The author, Peter Buš, has received research support from the Tsinghua Shenzhen International Graduate School under the Scientific Research Start-up Funds grant, n. 002023009C and Shenzhen Pencheng Peacock Specific Program.

Partial financial support was received from Tongji University in Shanghai during the Emerging Planetarism Digital Futures 2023 event and exhibition, where a part of this study was conducted and exhibited.

Author information

Authors and Affiliations

Institute of Future Human Habitats, Tsinghua Shenzhen International Graduate School, University Town of Shenzhen, Nanshan District, Shenzhen, 518055, P.R. China
Peter Buš & Zhiyong Dong

Authors

Peter Buš
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Dong
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, methodology, formal analysis and investigation, draft preparation and writing (except Sects. 3.1, 3.2, and partially 3.3), diagrams, writing review and editing, supervision: Peter Buš.

Writing, editing, manuscript Sects. 3.1, 3.2, and partially 3.3, state-of-the-art research and literature review, proofreading: Zhiyong Dong.

Corresponding authors

Correspondence to Peter Buš or Zhiyong Dong.

Ethics declarations

Ethics approval and consent to participate

This was an observational study. The Tsinghua Shenzhen International Graduate School Research Ethics Committee has confirmed that the article has been considered to have no ethical issues by approval no. (2023) F119. No human data or personal identities were compromised by conducting this research.

The author, Peter Buš, consented to participate in this study as a demonstrator, described and photographed in Sect. 4.

Consent for publication

The authors consent to the publication of identifiable details, including photographs, videos, case history, and details within the text, to be published in the Architectural Intelligence Journal.

Competing interests

Employment: The author, Peter Buš, is employed by the Tsinghua Shenzhen International Graduate School, Institute of Future Human Habitats. The author, Zhiyong Dong, received a PhD allowance from the Tsinghua Shenzhen International Graduate School, Institute of Future Human Habitats, where he studies in a PhD program.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Material 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Buš, P., Dong, Z. DeepCraft: imitation learning method in a cointelligent design to production process to deliver architectural scenarios. ARIN 3, 12 (2024). https://doi.org/10.1007/s44223-024-00055-2

Download citation

Received: 18 October 2023
Accepted: 07 March 2024
Published: 02 April 2024
DOI: https://doi.org/10.1007/s44223-024-00055-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

DeepCraft: imitation learning method in a cointelligent design to production process to deliver architectural scenarios

Abstract

Similar content being viewed by others

Autonomous robotic additive manufacturing through distributed model‐free deep reinforcement learning in computational design environments

DeepCraft: Co-Intelligent Architecture and Human and AI-Driven Craftsmanship in Design-to-Production Pipelines

Digital-Twin-Enabled Framework for Training and Deploying AI Agents for Production Scheduling

Explore related subjects

1 Introduction

1.1 Problem statement

1.2 Purpose of the paper and methodology

1.3 Rationale of the intent and significance of the research

1.4 Unconsidered aspects of the paper

2 Research question and overall aims

3 Artificial intelligence in the scope of DfMA and craftsmanship

3.1 Imitation learning and human‒machine interaction

3.2 Robotic training-related research

3.3 Current learning experiments in the processes of AI in the AECO and the use of GAIL

4 Implementing the computational framework

4.1 Real-time gesture-driven navigation of the robot and cointelligent assemblies

4.2 Unity and Rhino/GH implementation – hand tracking

4.2.1 Unity basic setup

4.2.2 Rhino/Grasshopper setup

4.2.3 GAIL and behavioral cloning test in Unity and observation

4.3 Cocreative Assemblies: four design scenarios for the proposed framework based on cocreative intelligence

5 Discussion and prospects

6 Conclusion

Availability of data and materials

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Supplementary Material 1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation