1 Introduction

Robots are widely used in industrial manufacturing, agricultural production, services, and defense to help people perform repetitive, heavy, or dangerous tasks [1]. However, in the case of complex and dynamic tasks and environments, robots without intelligence are unable to respond to changes in a correct and timely manner. Therefore, empowering robots with intelligence constitutes an important research trend [2, 3]. Intelligent robots combine artificial intelligence (AI) technology with robotics to produce an autonomous system with intelligence. These systems can learn and respond to dynamic requirements and environmental changes via machine learning, image recognition, target detection, and other AI techniques, rather than simply executing pre-defined commands. The main modules that affect how a robot functions as an intelligent machine include the morphology, the controller, and the vision perception system, which are analogous to the human body, brain, and eyes, respectively. Therefore, in the design automation of intelligent robotic systems, our work aims at developing an automated design methodology for the “Body-Brain-Eye” of intelligent robots.

With the emergence of advanced technologies such as deep learning, evolutionary computing, machine learning, intelligent control, and robotics, the study of design automation for intelligent robots has received significant attention from scholars [4, 5], which is also considered to be an important branch of knowledge-worker automation [6]. In this paper, we systematically provide a detailed explanation of the main concept of modular design automation. In general, modular design automation (MODENA) refers to an approach that decomposes the overall design process of an intelligent robot system into multiple relatively simple and independent functional modules. Each module can be modeled as a unified graph model, which facilitates the optimization of the design. This enables the automatic design and combination of modules. In particular, MODENA for intelligent robots refers to the decomposition of the morphology (body) [7], controller (brain) [8], and vision system (eyes) [9] of an intelligent robot into some independent and interpretable graphical modular units in a digital twin architecture. Then, with the help of artificial intelligence technologies such as genetic programming [10], evolutionary computation [11], deep learning [12], reinforcement learning [13], and causal reasoning [14], these modular units are combined automatically, and the evolution of combination rules is performed. Through this approach, the design process of intelligent robots can be automated. During the design automation process, it is notable that the system that is automatically discovered can be constructed into a new modular unit and added to the module library. This new modular unit can then be utilized in a closed-loop design automation process, allowing for systematical and continuous improvement in the performance of the intelligent robot system. Compared with traditional design methods, the MODENA method can significantly improve the design efficiency and performances of intelligent robots, by promoting the generation of innovative designs not limited by the experiences and intuitions of human designers, and the repetitive trial-and-error processes and laborious routine tasks to be conducted by traditional design methods.

The proposed MODENA approach (see Fig. 1) has received increasing academic attention in recent decades. It applies a constrained multi-objective genetic programming method to automatically generate and evolve the topologies and parameters of graph models (e.g., bond graph, finite state machine, gene regulatory network, deep neural network, and Bayesian network). In this way, the design rules of intelligent robots can be constructed to generate robots with high performance. To efficiently solve the multi-objective programming problem, two key techniques, i.e., constrained multi-objective evolutionary algorithms and genetic programming methods, are simultaneously applied to optimize the topology and parameters of an arbitrary graph structure. Specifically, the constrained multi-objective evolutionary algorithm can efficiently solve multiple conflicting objectives with various types of constraints and a large number of discrete or continuous variables. Genetic programming is used to search for optimization of the topologies and internal parameters of graph models, which can obtain models with innovative optimized structures that perform well in specific aspects. To effectively represent the target object with an appropriate graph model according to its characteristics, we applied different types of graph models for various intelligent robot sub-systems, namely, the morphology, controller and vision systems. Specifically, for the morphology and controller sub-systems, bond graphs are used to unify the modeling of multi-domain physical systems and controller systems, which can conduct comprehensive analysis and modeling of dynamic characteristics. For controllers of swarm robot systems, finite state machines and gene regulatory networks are commonly applied. In particular, finite state machines can abstract robot behaviors into several states, allowing the moving robot to switch among different states. The gene regulatory network is a structural model that integrates the interactions among individuals and their environments, enabling the behavior control of each agent in swarm robots. In vision systems, deep neural networks and Bayesian networks are widely utilized. Deep neural networks are used to learn internal relationships and representation levels of data, enabling robots to achieve human-level analysis abilities on various forms of data, such as text, images and sounds. Bayesian networks, on the other hand, utilize a probabilistic graph model to describe causal relationships of uncertainty among variables, which can process environmental information received by vision systems.

Figure 1
figure 1

Key components in modular design automation

For example, Hod Lipson [15] employed evolutionary computation to design robotic systems automatically in a computer and then created the corresponding prototypes using 3D printing, thereby realizing for the first time the concept of using a machine to design and build machines. That work was published in Nature in 2000. Subsequently, Lipson published a series of papers about design automation in Nature and Science [1618]. There, he presented a more general research question: Can we automatically design a mechatronic or robotic system that can satisfy pre-defined design specifications using Lego-like building blocks? At about the same time, Erik Goodman (the founding director of Beacon center for the study of evolution in action) and his team made breakthrough research in the field of mechatronic design automation (MDA) by employing bond graph (BG) and genetic programming (GP) to automate the design process of general mechatronic systems [19]. BG is a graphical modeling tool that can unify the modeling of multi-domain physical systems in a mechatronic system. GP is a powerful tool in the field of evolutionary computation that can simultaneously optimize the topology and parameters of an arbitrary graph structure. Several circuits and mechanical systems [2023] have been designed automatically using the bond graph and genetic programming (BGGP) approach, and the combined automatic design of controllers and controlled objects in continuous systems has also been achieved in [24]. In 2007, Clarence D. Silva and his team [25] extended the BGGP approach to allow it to treat nonlinear systems, and proposed the concept of mechatronic design quotients to address design problems involving multiple objectives. In 2012, Zhun Fan and his team [26] proposed an extension of BGGP, called hBGGP with the capability of dealing with both continuous and discrete dynamics as well as designing both the plant and the controller concurrently. The MODENA approach has also been effectively applied to swarm robots. In 2018, Garattoni utilized finite state machines to govern a swarm of robots with complex cognitive capabilities that can perform tasks successfully without knowing the exact execution sequence [27].

To summarize, existing design automation approaches usually pre-define a library of basic modules via a graphical modeling tool. Then, they employ optimization or metaheuristic methods, e.g., evolutionary computation, to search for optimal solutions. When designing mechatronic systems, the modeling language can be a bond graph [28, 29]. In the design of a vision system, the representation can be deep neural networks [30, 31]. When designing the behaviors of swarm robots, the modeling language includes finite state machines and gene regulatory networks [3234]. These modeling languages are modular and parametric and can be uniformly represented by graphical models. In this paper, systematic and comprehensive reviews of the current state-of-the-art design automation approaches to intelligent robot bodies, controllers, and vision systems are presented. The current problems and challenges of this emerging research field are analyzed, and future research directions are discussed. We purport to attract the attention of the relevant scholars and promote the development of industrial software for design automation of intelligent robots.

The remainder of this paper is organized as follows. Section 2 provides an overview of the design automation for the morphologies of intelligent robots. The design automation for the controllers of intelligent robots is reviewed in Sect. 3. In Sect. 4, the integrated design automation for the morphologies and controllers is presented. Design automation for the vision systems of intelligent robots is summarized in Sect. 5. Section 6 discusses the research and development trends of the integrated design automation of “Body-Brain-Eye” for intelligent robots. Section 7 summarizes and discusses several key technologies, current problems, and challenges involved in the MODENA for intelligent robots. Finally, conclusions are drawn in Sect. 8.

2 Design automation for the morphologies of intelligent robots

MODENA for the morphologies of intelligent robots refers to the systematic use of intelligent design optimization methods to design the robot morphologies, i.e. the plants or mechanical infrastructures. The current research on the design automation for intelligent robot morphologies is primarily divided into two categories: 1) Fixing the morphological topology and optimizing the geometric parameters of the morphology [3539]. 2) Establishing a library of parametric modules for the morphologies of intelligent robots [4042], and then simultaneously optimizing the topologies and geometric parameters of the morphologies, by reconfiguring the parameterizable modules.

2.1 Parametric optimization of the morphologies

The optimization of intelligent robots’ designs presents a challenging problem which is usually a constrained multi-objective problem with mixed discrete and continuous variables that exhibit non-differentiation, discontinuity, and nonlinearity. The evaluation of some objectives also requires time-consuming simulations. Consequently, evolutionary algorithms are popular choices in practical engineering applications. For example, West et al. [43] utilized a genetic algorithm to optimize the output error system to identify problems for a seven-degree-of-freedom manipulator. The algorithm optimized the parameters of joints to generate a high-performance manipulator. Similarly, Xiao et al. [44] applied NSGA-II to optimize the weight and manipulability of the manipulator, resulting in a lighter and more maneuverable manipulator than the original UR5 structure. Hassan et al. [45] used NSGA-II to optimize a robotic gripper, achieving an optimal gripping force while also revealing significant relationships among objective functions and variable values from Pareto-optimal solutions. In addition, Fan et al. [46] proposed a push and pull search framework [47] combined with a multi-objective evolutionary algorithm based on decomposition to optimize a six-degree-of-freedom teaching manipulator. Their approach resulted in designs that outperformed those of human engineers and some popular constrained multi-objective evolutionary algorithms. Additionally, reinforcement learning has been employed to optimize the parameters of morphologies. As an example, Zhang et al. [48] proposed an algorithm that utilizes reinforcement learning to automate optimal robot hand design, demonstrating its effectiveness in tasks such as grasping boxes, cylinders, and spheres.

2.2 Integrated design automation for parameters and topologies of morphologies

Modular robots [4952] embody the principles of integrated design automation, which incorporates the optimization of parameters and topologies to create diverse morphologies. Modular graph models for the morphologies of intelligent robots are composed of either homogeneous or heterogeneous modules, each of which involves a variety of actuators and sensors [53, 54], which allows intelligent robots to achieve self-assembly, self-reconfiguration and self-repair. For example, Lipson et al. [15] were not only the first to use modules from a pre-defined library of modules to automatically assemble electromechanical systems that meet pre-defined functional requirements but were also the first to apply evolutionary algorithms to design robotic systems on the computer. Kelly et al. [55] applied a stochastic optimization algorithm to autonomously assemble a model for planar distributed assembly, which achieved innovative designs. Inspired by the large and complex nests built by social insects, Werfel et al. [56] established a distributed system for automating construction, which built some particular desired structures according to a high-level design provided by users. Inspired by the principles of biological evolution [57], Dai et al. proposed the metamorphic theory [58], which allows the topologies of morphologies to be reconfigured and metamorphosed [59] and to evolve dynamically [60] according to actual needs, thus flexibly adapting to changing working environments and functional requirements. On this basis, a variety of robots have been developed, such as a hybrid continuum robot based on pneumatic muscles [61], a crawling robot [62], and a quadruped robot based on the metamorphic mechanism [63, 64].

With the development of topology optimization design methods, modular robots are increasingly applying such methods to achieve innovative designs of morphologies [66, 67]. Compared with traditional topology optimization design methods (e.g., the level set method [68], the evolutionary structural optimization method [69], and the moving morphable component method [70]), isogeometric topology optimization (ITO) [71] is a modern structural optimization technique that leverages isogeometric analysis. Specifically, ITO seamlessly integrates computer-aided design, computer-aided engineering, and structural topology optimization, laying a theoretical foundation for the integration of design, analysis, and optimization of the morphologies for intelligent robots [72]. In recent years, ITO has been extensively studied and has driven the development of a new generation of digital design. For example, Gao et al. [7375] studied the ITO method to design new materials and structures with special properties, such as auxetic metamaterials [76] and ultra-lightweight architected materials [77]. To improve the stability and accuracy of the optimization process and broaden the application scenarios of topology optimization, Seo et al. [78] proposed a new ITO, which can eliminate the design space dependency. Wang et al. [79] integrated isogeometric analysis with the level set method and proposed a high-precision ITO that satisfies geometric constraints. ITO enables the integration of digital design and analysis, thus significantly shortening the development cycle of the morphologies of intelligent robots and reducing research and development costs.

BGGP combines the capability of bond graphs (BG) to represent the mixed-domain physics of generic mechatronic systems in a unified way, and of genetic programming (GP) to explore in an open-topology design space automatically and optimize both the topologies and parameters of design candidates represented by bond graphs. For example, Fan et al. [19, 29, 81] proposed an automatic design method for mechatronic systems combining bond graphs and genetic programming, which has already been applied to the design of electrical and mechatronic systems, such as analog filters [81], electric filters [19] and the driver system of a printer [29]. Meanwhile, Wang et al. [24] proposed a knowledge-based evolutionary design framework for mechatronic systems by combining the BGGP method with human knowledge, as shown in Fig. 2. In the BGGP method, BG is used to model multi-domain systems and GP is employed to search the open-end design spaces automatically. Figure 3 illustrates the mapping from genotype to phenotype in the BGGP method. Compared with other methods, the BGGP method has a distinct advantage of being able to search in a topologically open-ended design space that is represented uniformly by bond graphs. As a special kind of mechatronic system, robotic systems can also utilize the BGGP approach to the design automation of their morphologies. Because modular robotic morphologies involve many physical sub-systems, they need a unified expression to model and analyze their performance. BG, as a modeling language that can describe all physical sub-systems (and continuous controllers) uniformly, can be utilized to model and analyze the dynamics of the designed mechatronic systems effectively and efficiently [82, 83].

Figure 2
figure 2

The framework of evolutionary synthesis of mechatronic systems [65]

Figure 3
figure 3

An example of genotype-phenotype mapping [80]

In conclusion, many achievements have been made in design automation for the parameters and topologies of the robot morphologies. In particular, self-assembly [84], self-reconfiguration [85] and self-repair [86, 87] characteristics of modular robots demonstrate the superiority of applying design automation for parameters and topologies of the morphologies. It is noted that the controller is also an important part of an intelligent robot, and the next section will detail the design automation for the controllers of intelligent robots.

2.3 Summary

In summary, design automation for the morphologies of intelligent robots has been widely applied, which can simultaneously optimize the geometric parameters and topologies of robots. Here, we present a concise overview of the various methods reviewed, highlighting the connections and differences among them from multiple perspectives, as displayed in Fig. 4.

Figure 4
figure 4

A summary of design automation for the morphologies of intelligent robots

The research on the design automation for the morphologies of intelligent robots is mainly divided into three categories: (1) Optimizing geometric parameters while keeping a fixed morphological topology [4346, 48]. These methods usually use multi-objective evolutionary algorithms [4346] or reinforcement learning methods [48] to optimize the geometric parameters to meet task-specific requirements and obtain an optimal design. Since the topology is fixed, it is difficult to adapt to complex tasks. (2) Topology optimization methods. These methods are represented by isogeometric topology optimization [7375, 78, 79]. After setting the design space of the topology structure, optimization objectives and constraints, these methods can automatically perform topology optimization design of the robot system’s components based on the implementation of computer aided engineering (CAE) analysis. These topology optimization methods can not only shorten the design cycle but also improve the design quality. However, the current work is mainly focused on the topology design of the components of intelligent robots. (3) Simultaneous optimization of topologies and geometric parameters of robot morphologies [15, 19, 24, 55, 56, 81]. These methods usually decompose the morphologies of intelligent robots into a series of independent modular units, and then achieve assembly automation and parameter design by using evolutionary computation or reinforcement learning techniques. However, these approaches rarely perform CAE analysis of the assembled morphologies, which cannot perform testing using computer simulations and provide valuable insights into the performance of robot morphologies during the early development phase. To summarize, although a large number of in-depth studies have been conducted on design automation for the morphologies of intelligent robots, further research is still required on how to conduct efficient design automation methods to meet the requirements of dynamic and complex tasks and environments.

3 Design automation for the controllers of intelligent robots

3.1 Design automation for the controllers of individual robots

In an intelligent robotic system, the controller often plays a key role [88, 89]. Many studies [90] have conducted in-depth research on the design automation for the controllers of intelligent robots. For example, Zhong et al. [91] proposed a novel kinematic calibration method based on an improved whale swarm algorithm to optimize the controller design of a biped robot to enable the robot to walk continuously and smoothly on complex ground. Due to the complexity of the walking dynamics of the biped robot, Gao et al. [92] applied a pre-trained neural network to design an optimal gait control model. Simulation results showed that the control model could effectively improve the maximum walking speed and terrain adaptability in a short time. In addition, hydraulic actuators are frequently employed in biped robot controllers. Nevertheless, due to the nonlinearity of hydraulic systems, their dynamic performance of the systems under control requires further improvement [93]. To this end, Dong et al. [94] proposed an improved drone squadron optimization-based approach to optimize the design of the hydraulic controller. The comprehensive experimental results indicated that the optimized hydraulic controller had better stability and higher accuracy.

In addition, proportional-integral-derivative (PID) controllers have been widely utilized in intelligent robots due to their advantages of simple design, easy implementation, fast response, and small steady-state error. Many studies [9598] have conducted in-depth research on the design optimization of PID controllers. For example, Sharma et al. [99] applied the cuckoo search algorithm to optimize the parameters of the fractional-order fuzzy PID controller for a two-link planar rigid robotic manipulator. Experimental results demonstrated that the optimized PID controller outperformed the other controllers in terms of trajectory tracking, model uncertainty, disturbance rejection, and noise suppression. For the trajectory tracking of autonomous mobile robots, Ali et al. [100] employed an artificial bee colony to optimize the parameters of a PID controller, which obtained two high-performance PID controllers (speed controller and azimuth controller). Taherkhorsandi et al. [101] proposed an adaptive and robust controller that combines PID with sliding control to better control the motion of a biped robot. They utilized a multi-objective genetic algorithm to optimize the controller, resulting in successful control of a biped robot walking on a slope in the lateral plane. In general, PID controllers have difficulty in achieving optimal control of complex and nonlinear control systems [102]. To this end, Sun et al. [103] established a set of component units and performance units, and designed an optimal controller using the differential evolution algorithm. On this basis, Xin et al. [104] proposed a general design automation method for controllers to simultaneously optimize the structures and parameters of the controllers. Their approach combines basic controller components and related parameters to automatically create an optimal control model tailored to specific requirements.

In addition to the design automation methods mentioned above for PID controllers, many studies have employed neural networks as controllers for intelligent robots [105107]. For example, Gallagher et al. [108] developed an approach in which they evolved neural networks in simulation to control the locomotion in an artificial insect, and successfully transferred the controller to a real hexapod robot. Nolfi et al. [109] applied an evolutionary algorithm to design and optimize a neural controller, which makes a bipedal robot equipped with actuators and sensors move according to concentration differences. In Paul et al.’s study [110], an evolutionary algorithm was used to optimize the design of a closed loop recurrent neural network controller, which achieved stable and bipedal movements on a 5-link biped robot in a physics-based simulation environment. In addition, Rahmani et al. [111] proposed a novel adaptive neural network integral sliding-mode controller that utilized a bat algorithm to control a biped robot, and proved its stability using the Lyapunov theory.

3.2 Design automation for the controllers of swarm robots

Traditional control methods were initially designed to control the motions of individual robotic systems. However, when the scale of intelligent robotic systems is enlarged with numerous individual robots involved, traditional control approaches may face many challenges. These challenges include insufficient fault tolerance, meaning that the failure of a few individuals may lead to the failure of the whole system, a significant increase in computational overhead, making it difficult to respond to unexpected occurrences timely, and other issues. The design automation of controllers for swarm robots provides a viable solution to the above difficulties. To this end, some studies have extracted the basic unit of swarm behavior by exploring the mapping between swarm behavior and individual behavior [112116]. Then, an evolutionary computation-based swarm behavior control framework suitable for dynamic and complex task environments is automatically designed. For example, Francesca et al. [117] abstracted some individual behavior into several states (such as random motion and static state) and then applied an optimization algorithm (named F-Race) to automatically design controllers based on a probability finite state machine. In the following year, Francesca et al. [118] improved the design of control software for robot swarms and proposed two automated design methods (Vanilla and EvoStick). The experimental results demonstrated that the proposed design automation methods outperformed human designers in specific experimental scenarios. Although the works [117, 118] successfully addressed relatively simple or constrained problems, their limitations quickly emerged as the problem complexity increased [119]. In particular, a complex task is made of several subtasks that may require cooperation and have mutual dependencies and time constraints [120]. To this end, Fan et al. [33] constructed a library of logical relationships of information exchange between agents by learning from the method of information exchange between cells in organisms. They then applied genetic programming to automatically design the optimal swarm behavior control model so that swarm robots can entrap targets in different patterns according to different environments (as shown in Fig. 5). Furthermore, Wu et al. [121] refined individual simple behavioral rules with universal applicability (such as exploration, moving to the target, and avoiding obstacles) through an in-depth analysis of the flocking task. They then optimized these individual behavior rules by combining behavioral trees and the proposed heterogeneous–homogeneous co-evolution method to automatically design swarm behavior control strategies. Currently, these studies [33, 121] are mainly in laboratory environments or simulation environments, and few studies are deployed in practical application environments. To this end, Vásárhelyi et al. [122] applied CMA-ES to optimize the design of the swarm control mechanism by considering the presence of machine failures, communication delays, and airflow disturbances in actual flight, which achieved a successful flocking flight in the field with 30 unmanned aerial vehicles (UAVs).

Figure 5
figure 5

Diagram of the automated design framework [33] for entrapping pattern generation

3.3 Summary

To summarize, research on the design automation of the controllers is a key procedure to achieve the design automation of the entire intelligent robots. In this regard, we have summarized the characteristics and applicability of various design automation methods for controllers in two different aspects: the applied techniques and target objects, such as single robot controller and swarm robot controller, as shown in Fig. 6.

Figure 6
figure 6

A summary of design automation for the controllers of intelligent robots

The research on the design automation for controllers consists of two main aspects: (1) Optimizing the geometric parameters of the controller with a fixed controller topology. For example, evolutionary algorithms, such as MOGA [101, 108110], CMA-ES [122] and hybrid evolutionary algorithms (such as the improved whale swarm algorithm [91], cuckoo search algorithm [99], artificial bee colony [100], and bat algorithm [111]), are applied to optimize controller parameters [91, 94, 99101, 108111, 122]. (2) Simultaneous optimization of the topologies and geometric parameters of the controller [33, 92, 103, 104, 117, 118, 121]. These methods usually pre-build various modular control units and then apply evolutionary algorithms to automatically assemble and parameterize these units, resulting in the automatic design of the optimal controller topology and parameters.

From the perspective of the scale of controlled objects, the design automation of controllers can be divided into two categories: (1) Design automation for the controllers of single robotic systems [91, 94, 99101, 108111]. (2) Design automation for controllers of swarm robotic systems [33, 103, 104, 117, 118, 121, 122]. Compared to the design of a single robot controller, designing a swarm robot controller is more complex. The main reason is that the mapping mechanism from swarm behavior control to individual behavior control is not clear. Designing behavior control rules for each robot in the swarm robot to generate intelligent swarm behavior at the system level is an important research direction in the future.

4 Integrated design automation for the morphologies and controllers of intelligent robots

In recent years, researchers have introduced the idea of biological evolution into integrated design automation for morphologies and controllers of intelligent robots [123126], which can automatically identify the optimal designs of intelligent robots according to fitness functions determined by given tasks or environments. Based on these ideas, some studies [127129] have proposed an underlying system architecture called the triangle of life, which consists of three stages: morphogenesis, infancy, and mature life. This system allows for a population of robotic organisms that evolve and adapt to the given environment. Additionally, evolutionary computation, as a biologically-inspired algorithm, has been used in numerous studies for integrated design automation of morphologies and controllers of intelligent robots [53, 130]. Modular robots can integrate the morphologies and controllers into a whole and simplify the search space, improving the efficiency of evolutionary computation [51]. Thus, the design automation of modular robots based on evolutionary computing has become an important research method for integrated design automation for the morphologies and controllers of intelligent robots. For example, Marbach et al. [131] utilized genetic programming to integrate configuration and control of locomoting homogenous modular robots, breaking through the limitations of human designers’ experience and intuitions in manual design methods. It is worth noting that crossover and mutation in the evolutionary process may cause mismatches between robot morphologies and controllers of the offspring. To alleviate this problem, Agrim Gupta et al. [132] designed a deep evolutionary reinforcement learning framework, which learned challenging motor tasks in complex environments by evolving different surrogate models. The study confirmed that environmental complexity can promote the evolutionary design of robots, helping offspring robots learn new skills. Furthermore, the study confirmed that the robot structure is related to the learning efficiency of the controller. An excellent structure can promote the effective learning of the offspring robots.

Recently, neural network-based approaches have been widely applied in integrated design automation for the morphologies and controllers of intelligent robots [133, 134]. A RoboGrammar system inspired by arthropods was proposed by Zhao et al. [135]. The proposed system could efficiently generate hundreds of thousands of robotic structures composed of the given components. Then, high-performance robots were found by applying graph heuristic search and model predictive control (MPC), achieving concurrent optimization of robot morphologies and controllers. By extending the single-objective graph heuristic search procedure based on the RoboGrammar system, Xu et al. [136] proposed a new multi-objective co-design algorithm for obtaining Pareto-optimal robot topologies and controllers. Aslan Miriyev and Technology and Mirko Kovač [137] created a symbiotic human–robot ecosystem (physical artificial intelligence) through the integrated evolution of the organism, control, morphology, action execution, and perception. The ecosystem decides and adapts in real-time for navigation, locomotion, and manipulation by processing combinations of signals simultaneously sent from multiple sensors in their “body” to their “brain”.

In addition, genetic programming can also be utilized for efficient integrated design automation of the morphologies and controllers of electromechanical systems. For example, Wang et al. [138] proposed a “body-brain” design automation method that integrates GP and bond graphs to automate the integrated design of a quarter-car suspension control system’s morphologies and controllers. Compared with traditional methods, this method can help designers to achieve more creative and flexible designs. In addition, Dupuis et al. [26] proposed a design automation method called HBGGP, which merges hybrid bond graph (HBG) and genetic programming (GP) into the evolutionary design of topologies and parameters of a hybrid dynamical system. In the proposed method, HBG is utilized to represent dynamic systems involving both continuous and discrete system dynamics, and GP is used to explore the open-ended design space of HBGs to optimize the morphologies and parameters of DC-DC converters. Thereafter, they investigated the evolutionary design of controllers for hybrid mechatronic systems [139] and employed a finite state automaton (FSA) to represent discrete controllers. A case study of a two-tank system demonstrated that the proposed evolutionary approach can lead to a successful design of an FSA controller for the hybrid mechatronic system.

To summarize, the integrated design automation of the morphologies and controllers of intelligent robots is an important trend in future research. Separate consideration of the design automation of the morphologies and the controllers would lead to sub-optimal solutions and unsatisfactory overall performance. Here, we summarize the characteristics and applications of various methods from the perspective of research directions and applied optimization techniques of integrated “body-brain” design automation for intelligent robots, as illustrated in Fig. 7.

Figure 7
figure 7

A summary of integrated design automation for the morphologies and controllers of intelligent robots

The current research directions for morphologies and controllers mainly consist of three aspects: (1) Designing the search space [26, 135, 138]. It is crucial to construct a reasonable search space so that novel solutions can be found. (2) Designing the search strategy [26, 131, 132, 135, 136]. A good search strategy can improve the efficiency and effectiveness of the algorithm. (3) Designing evaluation indicators [131, 135]. The evaluation indicators of comprehensive performance are designed to evaluate the performance of search candidates and guide the algorithm’s search.

According to the applied optimization techniques, the integrated design automation for the morphologies and controllers is divided into three main categories: (1) Evolutionary computation-based approaches [131, 136]. These approaches focus on finding the best design solutions for the integrated design of the morphologies and controllers by simulating the evolutionary process in nature. The advantage of these methods is that they allow the design of solutions that are superior to manual ones. However, due to the complex design space and randomness in the search process, the optimal design is not guaranteed. (2) Learning-based approaches [135]. These approaches focus on learning the integrated design strategies for the morphologies and controllers by setting appropriate reward functions and making dynamic decisions with known knowledge to obtain a design solution that maximizes rewards. The method simplifies the design space and improves the search efficiency through a heuristic search method, and is suitable for the integrated design automation for the morphologies and controllers of intelligent robots with complex structures. (3) Combination evolution and learning approaches [26, 132, 137, 138]. These methods mainly apply the evolution-based method to design the morphologies, and then apply the learning-based method to design the controllers, which can effectively reduce the search space and improve search efficiency.

The integrated design automation of the morphologies and controllers of intelligent robots presents a challenge due to the strong coupling relationship between the morphology and controller, as it involves multi-energy domain physical systems. This makes it an important area for further research.

5 Design automation for the vision systems of intelligent robots

The vision systems of intelligent robots can provide rich visual perception information, such as depth information and motion information. This information is often one of the most important components for guiding the intelligent robot’s motion-decision-making process [140]. However, in practice, vision systems are often designed manually. In most cases, designers require numerous trial-and-error experiments to obtain an appropriate design scheme for the vision system [141, 142]. Design automation for vision systems can be used to automatically design an optimal or desired vision system design scheme for the robotic vision tasks needed. Therefore, design automation for vision systems represents an indispensable element of design automation for intelligent robots.

Computer vision research provides an essential foundation for the design automation of robotic vision systems, where deep learning has become a crucial research direction in this field. Researchers can obtain desired results by constructing a neural network and using the corresponding image data for training, provided that the neural network architecture is properly designed. However, the design of neural network architecture requires designers to have a full understanding of various computing modules and training methods. In addition, designers have to conduct repeated experiments to adjust network architectures to produce optimal architectures with excellent performance [142144]. In recent years, neural architecture search (NAS) has gradually emerged as a research hotspot. In a given search space, NAS can automatically identify optimized neural network architectures without manual design. Therefore, NAS provides an important foundation for the design automation of intelligent robotic vision systems.

This section introduces the recent work related to NAS and highlights the shortcomings of existing research. It also identifies the problems that need to be addressed in the future to achieve the design automation of intelligent robotic vision systems.

5.1 Neural architecture search

NAS is primarily composed of three parts: search space design, search strategy, and performance estimation strategy. Depending on the search strategy, NAS can be mainly classified into three categories [145147]: (1) RL-based NAS, (2) differentiable NAS, and (3) evolutionary NAS.

The RL-based NAS models the search task as a Markov decision process and offers rewards depending on the performance of the generated network after training on a test set. Then, the method trains the RL model according to the reward and adjusts the generated neural network architecture, thereby using the RL to guide the neural network architecture generation. Representative achievements include MetaQNN [148] (proposed by MIT) and NASNet [149, 150] (proposed by Google), both of which search the layers of the neural network. In contrast, BlockQNN [151] (proposed by Shangtang Technology) searches modules of the neural network. Unlike the application of evolutionary algorithms or RL to a discrete and non-differentiable search space, differentiable methods make architecture searches more efficient by using gradient information through the continuous relaxation of the architecture representation [152]. The network architectures designed by differentiable-based NAS have also achieved excellent performances with representative examples, including the differentiable architecture search (DARTS) [152] (proposed by Google Brain) and PDARTS (proposed by Huawei’s Noah’s Ark Laboratory) [153]. Evolutionary NAS regards the topological structure and super-parameter adjustments of the model as an optimization problem and adopts an evolutionary algorithm to optimize the neural network. In 2019, the Uber AI Lab published a review article in Nature Machine Intelligence that strongly advocated the evolutionary NAS and anticipated its future development [154]. Representative evolutionary NAS examples include the neuroevolution of augmenting topologies (NEAT) [155], CoDeepNEAT [156], and NSGA-Net algorithms [157].

5.2 Design automation for vision systems

In real life, robots assigned to different tasks require different visual capabilities. For example, drones use object detection [158], object tracking [159], motion estimation [160] and depth estimation [161] for autonomous obstacle avoidance. Autonomous cars use 3D object detection [162] to establish the physical positions of obstacles for path planning. Medical robots use image segmentation [163165] to analyze the information in medical examination reports and thereby help doctors diagnose a patient’s condition, and more (see Fig. 8). Different from laboratory studies, robots in practical applications typically are unable to provide sufficient computing resources with the embedded devices offered. Consequently, the development of light-weight models is a promising research area.

Figure 8
figure 8

Different visual tasks that robots often encountered, which include object detection [166], semantic segmentation [166], instance segmentation [166], depth estimation [167], image deraining [168] and image dehazing [169]

Thus, by investigating the vision tasks often encountered in current robot applications, this section introduces design automation for vision systems involved in the vision tasks that robots currently face, including (1) object detection, (2) image segmentation, (3) depth estimation, (4) video analysis, and (5) embedded device application.

5.2.1 Neural architecture search for object detection

Object detection can enable a robot to identify the object of interest in an image and determine its position, allowing the robot to perform tasks such as object picking [170, 171], object tracking [172, 173], and other tasks. The network architecture of object detection is primarily classified into three parts: the backbone, neck, and head. The backbone is responsible for extracting image features, the neck is responsible for fusing features, and the head is responsible for classifying and locating objects. Currently, two main methods are available for object detection architectures: (1) searching for the overall network architecture [174] and (2) searching for parts of the network architecture while using other parts of the existing network architecture [175]. Depending on the problem characteristics of the object detection tasks and the characteristics of the network structures, various methods have been introduced for searching object detection network architectures.

Chen et al. [176] proposed the DetNAS algorithm to address the problem of losing object location features when directly using an image classification network as the backbone for object detection. To achieve this, they search the entire network architecture using ShuffleNetV2 as the search space. The algorithm is pre-trained on ImageNet datasets and fine-tuned on object detection task datasets to improve classification and localization capabilities. Meanwhile, DetNAS employs an evolutionary algorithm to search the sub-network. Wang et al. [175] proposed NAS-FCOS, a fast neural architecture search algorithm for object detection, to reduce the computational burden and improve search speed. The algorithm uses an existing image classification network, such as ResNet or MobileNet, as the backbone network and constructs the network according to the feature pyramid network (FPN) and detection head. NAS-FCOS searches only the network structures of the FPN and detection header in different search spaces. The algorithm employs a long short-term memory (LSTM) network as an agent and uses an RL-based search strategy to build a network for the FPN and detection header. Structural-to-modular NAS [177] adopts a two-stage search strategy to search network architectures for object detection. In the first stage, different existing networks are combined based on the structure of the target detection network to identify the combination of network structures that achieved the Pareto optimum in terms of inference speed and accuracy. In the second stage, all network structures in the Pareto solution set are further searched in different modules.

In recent years, NAS for object detection has received increasing attention and achieved very competitive results. However, how to define an optimal search strategy and search space remains a problem for object-detection NAS.

5.2.2 Neural architecture search for image segmentation

Image segmentation is a process that involves classifying each pixel in an image, making it a dense prediction task. Robots can utilize image segmentation for various functions, including defect detection and measurement [178] and medical analysis [179, 180]. Currently, image segmentation architecture search methods fall into two categories. The first category involves searching for the module structure under a fixed network architecture, while the second category involves searching for both the network architecture and module structure simultaneously.

Liu et al. [181] proposed Auto-DeepLab, which first applied NAS to image segmentation. Auto-DeepLab uses architecture- and cell-level search methods to explore the overall architecture of the model and cell structure, respectively. It formulates the architecture search problem as a differentiable optimization one and uses the gradient-based method to search the model architecture. To quickly search a lightweight semantic segmentation network for mobile device applications, Nekrasov et al. [182] employed the existing network architecture as the encoder and focused on searching the decoder network architecture under the encoder-decoder network architecture. Wei et al. [183] proposed a Genetic U-Net estimation for retinal vessel segmentation, which takes U-shaped encoder-decoder structure as the network architecture and explores the network structure within each cell in the encoder network and decoder network by an evolutionary algorithm. Genetic U-Net uses binary coding to encode the network structure and regards the network performance on the test dataset as the fitness of individuals. Through genetic operations such as selection, crossover and mutation, better offspring individuals are evolved continuously and finally the network structures with the best performance are identified. Experimental results show that Genetic U-Net has higher segmentation accuracy yet fewer parameters than existing algorithms in DRIVE, STARE, CHAS_DB and HRF public datasets. It is worth noting that Genetic U-Net is a rather general framework, which can conveniently switch to different vision tasks and generate optimal models according to the provided training data, as depicted in Fig. 9.

Figure 9
figure 9

The framework of Genetic U-Net [183]. In the framework, if the dataset is pose estimation [184], object tracking [185] or object detection [186], the framework can automatically generate the corresponding optimal neural network model

5.2.3 Neural architecture search for depth estimation

Depth estimation enables robots to calculate the distance to objects by analyzing images [167, 187]. These estimations are crucial in downstream tasks such as autonomous obstacle avoidance [188, 189] and path planning [190, 191], making them important vision functions for robots. Monocular and binocular depth estimation techniques are the two most commonly used vision systems in robots. Therefore, this section primarily focuses on introducing architecture search algorithms for monocular and binocular depth estimation.

Monocular depth estimation directly predicts the depth map of the input image, which is an intensive prediction task. Huynh et al. [192] proposed LiDNAS to search for lightweight monocular depth estimation networks. Under the preset network architecture, each module structure is searched using an auxiliary tabu search algorithm. During network training, the prediction accuracy and the number of parameters are used to obtain a network model with fewer parameters and higher estimation accuracy. Saikia et al. [193] extended DARTS [152] and applied it to depth estimation tasks by using AutoML technology to efficiently search for optimal network structures. Nekrasov et al. [182] utilized a method to search for a lightweight semantic segmentation network architecture for depth estimation, which resulted in competitive performance compared to manually designed depth estimation networks.

Binocular depth estimation primarily involves identifying matching points in left and right images using stereo matching [194]. A stereo vision system model is then used to estimate the depth map. Therefore, the architecture search in the binocular depth estimation task is one of the search tasks for the stereo matching network model. This network is typically composed of two parts: a feature extraction network and a matching network. Inspired by multi-resolution feature extraction and fusion, Cheng et al. [195] proposed the learning effective architecture stereo algorithm. This algorithm, which is based on a gradient-based search strategy, adopts a two-level hierarchical search strategy to search the network architecture and the internal structures of the constitutive modules simultaneously. To solve the problem of decreased matching accuracy in unseen scenes, Zhang et al. [196] established a reusable architecture growth framework that allows the resulting network to learn to match stereo unseen scenes. Wang et al. [197] introduced an elastic and accurate network for stereo matching (EASNet), which divides the network architecture into four components based on different functions. The search space of each component includes manually designed calculation modules for stereo matching. Experiments show that EASNet achieves superior results in terms of both inference speed and matching accuracy.

To summarize, depth estimation is primarily classified into monocular and binocular depth estimation. Different estimation methods produce different search models. Currently, the depth estimation architecture search is based on single images. However, depth estimations implementing multi-image information can exploit more spatial information. Therefore, multi-image-based depth estimation architecture search is a promising research direction in the field of depth estimation in the future.

5.2.4 Neural architecture search for video analysis

Different from rapid development towards image data, NAS on video data is still an under-explored area and only several video tasks are studied, including action recognition, super resolution and pose estimation. Existing methods mainly focus on introducing successful experiences from image data and further exploit spatio-temporal cues and motion information in video data.

For action recognition, Peng et al. [198] first proposed a NAS method for 3D models to achieve design automation. Specifically, it uses the pseudo 3D operator to process spatial and temporal features in the search space. To further exploit spatio-temporal relationships, Piergiovanni et al. [199] proposed EvaNet by introducing an inflated temporal gaussian mixture (iTGM) to the search space, which enables the model to catch the spatial and temporal interactions among feature flows. In addition, Ryoo et al. [200] established AssembleNet to consider object motion information in design automation. In particular, it first builds a two-stream model as directed graphs and then uses evolutionary algorithms to establish connections between different blocks on RGB and optical flow input at different temporal resolutions. This can better exploit appearance and motion information from videos. Unlike the previous methods, Wang et al. [201] considered introducing an attention mechanism and proposed AttentionNAS, which builds a spatio-temporal attention cell search space and enables generated models to catch long-distance dependencies in video data. Additionally, Piergiovanni et al. [202] focused on improving the computation efficiency of video models and proposed TinyVideoNet, which introduces model running time into the reward loss function and guides the search strategy to generate a desired model with low computing latency. For video super resolution, Liu et al. [203] proposed EVSRNet to achieve high fidelity results and efficient computation. Specifically, it uses the residual block as the basic building block, and then similarly introduces the fidelity of results and computation cost of candidate models into the reward loss function. After that, a gradient descent method is performed to search the optical number and size of the residual blocks. Consequently, the generated model can produce more accurate details while keeping lower computation costs and fewer model parameters. For video pose estimation, Xu et al. [204] proposed ViPNAS to utilize pose relationships between adjacent frames. Particularly, it established the search space by considering the correlation information between adjacent frames and then performed feature fusion on the heatmaps of the previous and current frames via a series of optional operations. Thus, the model can automatically learn the best fusion operation and the best stage to fuse.

5.2.5 Neural architecture search for embedded devices

Although current NAS methods can generate high-precision models, these models are often not applicable in real-world intelligent robots due to unacceptable computing latency. This is because real-world robots are usually built on embedded devices, which can only provide limited memory and computing resources. However, current NAS methods do not account for these important factors. Therefore, designing a suitable NAS method according to the characteristics and requirements of embedded devices is an urgent problem to be solved.

On embedded devices, computing latency and memory consumption of models are two key factors. To optimize the computing latency, Cai et al. [205] developed ProxylessNAS to model the computing latency of models as a continuous function and optimized it as a regularization loss to find a model with low latency. Similarly, Wu et al. [206] established DANS, which uses the latency of each block to estimate the latency of the entire model and introduces a latency reward loss to guide the search strategy. López et al. [207] introduced E-DNAS to use a multi-objective differentiable loss function combining classification accuracy and minimum latency on the feature map. Luo et al. [208] proposed LightNAS with a two-step procedure, which first applies a large-scale and one-time search for models that satisfy the latency constraints and then iteratively selects the candidate with the best accuracy. To optimize the memory consumption, Cassimon et al. [209] proposed introducing two soft constraints (cache and performance) and two hard constraints (memory cost and latency) into the reward loss function, which can guide the search strategy to find a model that meets resource requirements. In addition, Wan et al. [210] developed DMaskingNAS with an efficient masking mechanism for feature reuse and effective shape propagation, drastically expanding the search space by supporting searches over spatial and channel dimensions.

In addition to building models from scratch, another direction to consider is how to automatically compress an existing large model. He et al. [211] first proposed automated model compression (AMC) to achieve automated model pruning by using reinforcement learning. Specifically, AMC models the pruning rate and parameter-related information of each layer as the action space and state space, respectively. Then it uses DDPG [212] to train the agent to automatically determine the pruning rate of each layer. Motivated by AMC, Gupta et al. [213] developed PuRL to provide rewards at each pruning step, achieving sparsity and accuracy comparable to state-of-the-art (SOTA) methods with a shorter training cycle. Yu et al. [214] proposed introducing topological information into the model compression procedure, finding the optimal compression ratio while ensuring model accuracy instead of relying solely on the local importance of parameters. To consider the relationship between convolutional filters and channels, Wang et al. [215] established MCTS-RL to prune unnecessary filters before channel pruning, effectively reducing the search space and making channel pruning ratio searching easier. In addition to network pruning, tensor decomposition [216], data quantization [217] and knowledge distillation [218] are other effective techniques for model compression. We do not discuss them here because they rely on hand-crafted design and expert experience and are unrelated to the topic of design automation. Interested readers can refer to [219222] for further investigation.

5.3 Summary

To summarize, neural architecture search (NAS) has been widely applied in design automation for vision systems, which can automatically search for neural networks and offer improved performances in various vision tasks. In this section, we provide a brief overview from different angles to illustrate the connection and difference between the methods reviewed in this section, as displayed in Fig. 10.

Figure 10
figure 10

A summary of design automation for vision systems with NAS

Existing works in NAS mainly focus on three key components: (1) the search space [176, 183, 196, 201, 204, 210, 211, 215], which contains all network architecture candidates to be chosen, (2) the search strategy [175, 177, 181, 182, 192, 193, 195, 198200], guiding how to select a good candidate that meets a specific requirement from the search space, and (3) the performance evaluation [197, 202, 203, 205209, 213, 214], which generates a performance matrix of a candidate and provides guidance information for the search strategy.

From the view of applied techniques, existing works primarily lie in three categories: RL-based NAS [148151], differentiable NAS [152, 153] and evolutionary NAS [155157]. Although RL-based NAS methods can achieve superior performance, they often require thousands of GPUs performing several days even on a median-scale dataset. Differentiable NAS methods are usually more efficient than RL-based methods. However, they often find ill-conditioned architectures due to improper gradient-based optimization. Because evolutionary NAS methods are insensitive to local minima and do not require gradient information, they have shown promising characteristics in solving complex non-convex optimization problems [223], even when the objective function’s mathematical form is unknown [224].

Regarding applications, existing studies typically either incorporate specific prior information into the construction of a NAS method or tackle some special issues within a particular visual task. Taking binocular depth estimation as an example, existing works [195197] are proposed to preset the network architecture as a stereo matching network and search for the internal structures. For embedded devices, since memory cost and computation latency are highly considered in practical applications, existing works [205209] evaluate these two factors during searching and encode them in the rewarding functions. In this way, the proposed NAS method can automatically generate a reasonable network with low memory cost and latency.

6 Integrated design automation for the “body-brain-eye” of intelligent robots

At present, most studies separately design the morphologies, controllers and vision systems of intelligent robots. However, strong couplings exist between the designs of the morphologies, controllers, and vision systems [225]. Therefore, it is necessary to consider the integrated design relationship of morphologies, controllers, and vision systems of intelligent robots. These strong coupling relationships are also reflected in nature. According to the law of “survival of the fittest” in biological evolution, many creatures have evolved a large diversity of eye structures and corresponding body morphologies. For example, the morphologies of birds and primates are very different, and their eye locations on the face are also different. It is believed that their brains’ mechanisms of processing visual information are also quite different. It is notable that through the cooperation of biological populations, the perception of individual organisms can be further improved [226]. If studies in intelligent robots can automate the design of morphologies, controllers and vision systems, such as in biological evolution, intelligent robots with significantly improved performance may be developed.

Qiao et al. [227229] took the lead in introducing a “hand-eye-brain” system of intelligent robots that imitates the mechanism, structure and function of the human brain, nervous system, and body motor system. In their proposed method, the role of the “hand” is the motion control of the intelligent robots. Inspired by the “muscle-tendon-bone” organization, Qiao et al. [230] established a control framework based on synergistic activation of muscles and an “attractive region in environment” theory [231, 232]. This framework enabled high-precision flexible operation under low-precision morphologies and low-precision sensors. The role of “eyes” is to construct the visual cognitive system of intelligent robots. Inspired by the brain-inspired visual cognition and memory mechanism of the hippocampus, Qiao et al. [233235] established a new visual recognition framework, ensuring that intelligent robots can achieve higher recognition accuracy and faster recognition speed. The role of the “brain” is the decision-making of intelligent robots. Inspired by the brain’s nervous system, Qiao et al. [236, 237] introduced a brain-inspired motor decision model based on emotion regulation modulation. This model implemented high-level decision-making with an “accuracy-efficiency-speed” balance. Compared with the traditional robot design method, the proposed “hand-eye-brain” system of intelligent robots realizes human-like manipulation with high precision, flexibility, and robustness.

Inspired by Qiao et al.’s “Hand-Eye-Brain” system of intelligent robots, this paper proposes an integrated “Body-Brain-Eye” design automation for intelligent robots, as illustrated in Fig. 11. Specifically, this paper proposes the integrated MODENA framework for automatically designing the morphologies, controllers, and vision systems of intelligent robots, inspired by the evolution of biological forms as displayed in Fig. 12. By constructing a modular graph model for the morphologies, controllers, and vision systems of intelligent robots under digital twin architectures and by applying powerful capabilities of genetic programming, evolutionary computation, deep learning, reinforcement learning, and causal reasoning in optimization, decision-making, and reasoning, the MODENA framework can achieve the purpose of obtaining innovative and optimal designs of intelligent robots.

Figure 11
figure 11

“Hand-Eye-Brain” system of intelligent robots vs integrated “Body-Brain-Eye” design automation for intelligent robots

Figure 12
figure 12

The proposed MODENA for the morphologies, controllers, and vision systems of intelligent robots

In the process of applying MODENA to design the morphologies, controllers, and vision systems of intelligent robots, the construction of a modular graph model is a fundamental task. These modules are selected or designed according to the application scopes, operating characteristics, and functionalities of the designed robotic systems to meet pre-defined design specifications. In the mechanical field, the modular graph model contains running modules, link modules, joint modules, and end-effector modules, which are used to build the morphologies of intelligent robots [238]. For the image processing part, the modular graph models may contain convolutional layers, pooling layers, and fully connected layers, among others, which are components of a deep neural network architecture that can be used to construct the vision systems of intelligent robots [239]. In the control field, the modular graph model contains main control units, actuators, detecting units, among others, which build the controller of intelligent robots [240]. For the control of swarm robots, the modular graph model contains basic network motifs that can be employed to automatically construct gene regulatory network (GRN) models. A multi-objective genetic programming method can be applied to optimize the structure and parameters of the GRN-based model in parallel so that the behavior of swarm robots can be controlled [33].

7 Problems and prospects

7.1 Existing problems

In this section, we will explore the various aspects that should be considered in the integrated design automation for the “Body-Brain-Eye” of intelligent robots. These include modeling, optimization, knowledge extraction, environment perception, swarm robots, and generalization in unseen scenarios.

(1) Unified Modeling for the “Body-Brain-Eye” of Intelligent Robots

Since intelligent robots are typically multi-energy domain physical systems [241, 242], we need to build a unified graph model to facilitate the design automation process. However, for different categories of intelligent robot modules, we still need to use different modeling tools. For example, we use a geometric model or a bond graph for morphologies, a finite state machine or a model predictive controller for controllers, a gene regulatory network for swarm control, and a deep neural network model for vision systems. Although all these models can be abstracted to a graph model, they are still different modeling languages. Different parts of the graph models need to be decoded separately to obtain the complete intelligent robot. On the other hand, various modules within the intelligent robots are usually coupled with each other, and it is still challenging to express this coupling relationship through a unified graph model.

(2) Efficient Methods for Solving Robot Optimization Problems

In the integrated “Body-Brain-Eye” design automation process of intelligent robots, various types of decision variables (e.g., continuous variables, discrete variables) [243] are included, along with various types of optimization objectives and constraints with different difficulty types [46]. The calculation of objectives or constraints is usually time-consuming [244], and in most cases, external simulators need to be called, making it a computationally expensive optimization problem. Therefore, efficiently solving these constrained multi-objective optimization problems with mixed decision variables and expensive fitness evaluation is a challenging task.

(3) Knowledge Extraction during the Design Process

The optimization of integrated “Body-Brain-Eye” systems of intelligent robots in various experimental scenarios generates a vast amount of data, including intermediate data that contain crucial design knowledge as well as optimization-related knowledge [245]. To extract knowledge and rules from the data with good interpretability, genetic programming methods are effective. However, their accuracy is limited. Deep learning methods, on the other hand, offer high model accuracy, but their black-box characteristics pose a problem for model interpretability. A crucial challenge in creating an iterative optimization system with feedback is to identify causal relationships within and between modules of an intelligent robot to gain innovative design knowledge automatically.

(4) Multi-modal Information Fusion for Environment Perception

The working environments faced by intelligent robots are often complex and varied. Therefore, intelligent robots should have the capability to learn actively and continuously optimize their systems during operations, make efficient and accurate judgments, and respond quickly and appropriately in complex and dynamic working environments. Hence, solving the problem of combining multi-modal architecture search and active vision technology to endow robots with the ability to integrate multi-sensor information and actively optimize their hardware system in real-time is essential.

(5) Design Automation for Swarm Robots

The control of swarm robots is witnessing rapid progress in applications, which can be divided into two categories: centralized control and decentralized control. Centralized control is a natural and widely accepted approach, but it faces many challenges when the size of the swarm increases to a certain level. For example, a large system with centralized control has insufficient fault tolerance. Failures of just a few individuals may lead to the failure of the whole system’s functionality. Computational costs may also increase dramatically, making it difficult to react to unexpected factors timely. As a result, decentralized control has received increasing attention recently and has gradually become a new mainstream. The key idea here is to design a proper (and in most cases a common) control scheme for each robot in the swarm so that the swarm as a whole can accomplish the specified tasks. It is obviously a challenge to do so, especially when the size of the swarm is large. To address this challenge, design automation approaches play an increasingly important role [246], where MODENA can also contribute greatly [33]. To design and manage such a complex UAV swarm system, the key challenge is to define a rigorous engineering approach to program each robot so that the UAV swarm behaves in a desired manner. How to distill the basic units of the swarm behavior strategy and thus carry out the research on the design automation of unmanned swarm behavior strategy is another emerging issue.

(6) Poor Generalization in Unseen Scenarios

Existing methods mostly automatically generate a model and utilize this fixed model for practical applications. However, this paradigm usually results in unsatisfactory performance, because the model is unable to generalize well to unseen testing scenarios that are different from the training one. Typically, this domain gap between training and testing scenarios is common in real-world applications since the environment is changing all the time, especially for vision systems. Therefore, how to design a NAS method to search for a robust vision model that can perform consistently among different scenes is an urgent issue to be addressed.

7.2 Future directions

Although significant progress has been made in modular design automation over the past two decades, several important issues still need to be addressed, and new application areas are emerging. The following subsections will discuss potential future research directions from two perspectives: theoretical studies and practical applications.

7.2.1 Theoretical studies

(1) Multi-view Unified Modeling of Intelligent Robots

Building a unified model of the morphology, controller and vision system of an intelligent robot is an effective approach to facilitate the design of automation processes. Currently, morphology, controller and vision systems are usually represented by different modeling tools and composed of various modules [241, 242]. For example, when designing a mechanical system, the modeling language might be a bond graph. When designing an unmanned swarm controller, the modeling language may be a finite state machine or a gene regulatory network. When designing a vision system, the modeling language may be a deep neural network. Different modeling languages have different application scopes and characteristics, and it is challenging to capture the coupling relationships among the modules represented by them. Therefore, constructing a multi-view unified modeling tool that can represent the morphology, controller and vision systems effectively and efficiently is an essential direction for the design automation of intelligent robots.

(2) Surrogate-assisted Constrained Multi-objective Optimization for Intelligent Robots

The optimization of intelligent robots often requires the simultaneous consideration of multiple conflicting design objectives and a large number of constraints. In addition, the calculations of objectives and constraints are usually time-consuming and often require the invocation of external simulation software. Therefore, the optimization problem of an intelligent robot can be defined as an expensive constrained multi-objective optimization problem [46, 247]. In the research of MODENA for intelligent robots, constrained multi-objective evolutionary algorithms are gradually becoming a popular approach to solve the above multi-objective optimization problems. In the study of constrained multi-objective evolutionary algorithms, the conventional view is that each infeasible region is equally important. Only the constraints represented by infeasible regions close to the unconstrained Pareto front affect the true Pareto front. Therefore, how to take advantage of features like this to deal with the contradiction among convergence, diversity and feasibility has become a major consideration in designing constrained multi-objective optimization algorithms. In terms of surrogate models, considering an adaptive surrogate model approach by combining global and local surrogate models for optimization objectives and constraints to establish novel constrained multi-objective evolutionary algorithms is another direction worthy of in-depth investigation in the future.

(3) Knowledge Extraction in Design Automation

The knowledge extracted in the design automation process of intelligent robots involves both explicit knowledge and implicit knowledge. Explicit knowledge is also called human knowledge, which can often be directly understood by human experts and has very good interpretability. On the other hand, implicit knowledge is usually not directly understandable by humans, but can be stored and inferred by machines. Thus, it is also called machine knowledge, which has the potential to be understood by humans one day in the future. Symbolic regression, a method based on genetic programming, is usually used for explicit knowledge mining. This method can automatically mine the explicit knowledge contained in the data by manually defining a set of functions and terminals using prior knowledge of the problem domain. Causal reasoning, an emerging research field, can also be used to obtain explainable knowledge. This can, in turn, guide the search for expensive constrained multi-objective evolutionary algorithms and the adjustment of the problem formulation of the intelligent robot optimization problems.

(4) MODENA for Intelligent Robots Based on Digital Twins

MODENA for intelligent robots necessitates numerous simulations and experiments in both virtual and real-world environments. These efforts can be significantly expedited through the application of emerging digital twin technology. This technology creates a unique type of metaspace that replicates the physical laws of space with exceptional precision and is a subset of the Metaverse. Traditional methods of designing intelligent robots typically involve a laborious and time-consuming trial-and-error process. Conversely, implementing the digital twin approach enables the faithful mapping of the robot from real space to virtual space in four dimensions: geometry, contact dynamics, behavior, and rules. Moreover, it allows for the arbitrary adjustment of the morphology, controller, and vision systems, generating practically unlimited design candidates whose optimization is efficiently supported through the integration of the powerful capabilities of machine learning and evolutionary computing. Additionally, machine learning can be used to mine knowledge and rules from data generated during the design process, which can then be utilized in future design activities. Therefore, exploring how to fully harness the power of digital twin technology for MODENA is another crucial area of study.

(5) Domain Adaptation and Generalization in Design Automation

Since the poor generalization ability of designed models to unseen scenarios is an urgent issue, especially for vision systems, model robustness becomes an important factor when designing a NAS method. A promising solution is to introduce domain adaptation [248] and generalization [249] evaluation in the design procedure, which focuses on transferring learned knowledge in training scenarios to unseen testing ones. Specifically, we can introduce an additional evaluation matrix for generalization ability for candidate model selection. In this way, the search strategy can choose a model with a particular trade-off that can achieve good performance and be robust to noisy and varied environments. There are a few works [250252] that concentrate on this appealing direction.

(6) Active and Continual Learning in Design Automation

In addition to designing a robust model, another solution to tackle the poor generalization issue is to perform online learning in testing scenarios, which can allow the model to quickly adjust its parameters and adapt to unknown environments. To achieve effective online learning, there are two key problems to be solved. First, the model needs to figure out what to learn in a given environment. To address this, active vision and learning [253, 254] can guide models to explore valuable targets and learn superior decision-making behaviors, as studied in different applications, including robot exploration [255257], unmanned aerial vehicle (UAV) swarm localization, and other tasks [258260]. Second, the model needs to overcome the catastrophic forgetting issue during online learning. Specifically, when the model learns new knowledge in a new scene, the previously learned knowledge will be dramatically forgotten, leading to a severe overfitting issue and making the model harder to generalize to another unseen scene. To tackle this issue, continual learning [261] has been proposed to guide models to continually learn over time by accommodating new knowledge while retaining previously learned experiences. Several works [262264] have tried to introduce continual learning in NAS.

To summarize, it is important and desirable for a model to automatically optimize itself and adapt to varied and unseen scenarios, achieving higher levels of intelligence. This remains as an open and attractive problem in the design automation of intelligent robotic vision systems.

7.2.2 Practical applications

In this section, we present some exemplary scenarios to illustrate the potential benefits of applying MODENA. For instance, power plants serve as the cornerstone of the power system, and their operational health plays a crucial role in ensuring the system’s safety. The intricate layout of pipelines in power plants makes manual inspection challenging. Moreover, manual inspection is vulnerable to problems such as missed inspections, false inspections, and concerns about the personal safety of inspectors, which can be influenced by various factors, such as labor intensity and weather conditions. With the advent of heterogeneous unmanned swarm technology, the integration of flying inspection robots, ground inspection robots, and pipeline leak-detecting and repairing robots has become technically feasible, offering significant advantages over manual inspections. This integration may also become a hot research theme in the future, as illustrated in Fig. 13. Therefore, we suggest considering the following prospects for future research in this paper.

Figure 13
figure 13

The integrated “Body-Brain-Eye” design automation for heterogeneous unmanned swarm system of inspection and repair robots used in the power plant

(1) Design Automation for the Morphologies of Unmanned Swarm Systems

Unmanned swarm systems need to accomplish multiple tasks, and the relationship between their overall performance and component properties is extremely complex, which makes the morphological structure design of unmanned swarm systems complicated. The MODENA method provides a new idea for the morphological structure design of unmanned swarm systems. For example, in the power plant environment, the unmanned swarm contains flying inspection robots, ground inspection robots and pipeline leak-detecting and repairing robots. Their morphologies differ greatly, with different application scopes, operation characteristics, functions and performances. Designing corresponding morphological models based on the application scopes, operational characteristics, functions, and performances of various robots within unmanned swarm systems to achieve superior overall performance is a challenge and a focus of future research.

(2) Design Automation for the Environmental Perception and Cognitive Systems of Unmanned Swarm Systems

In complex environments with high dynamics, uncertainty and resource constraints, unmanned swarm systems need to achieve distributed sensing and cognition of the environment through multi-modal interaction techniques. For example, in the power plant environment, aerial inspection robots equipped with vision sensors, ground inspection robots equipped with high-precision LIDAR and vision sensors, and ground repair robots equipped with infrared vision sensors work together to achieve rapid and precise localization of power plant faults and timely repair through data obtained from different sensors. It is crucial to design a proper model that can process heterogeneous sensor data to achieve efficient perception and cognition of complex environments. Therefore, research on the design automation of the visual perception model is an important direction for distributed environment perception and cognition.

(3) Design Automation for the Controllers of Unmanned Swarm Systems

Because the environment faced by an unmanned swarm system is uncertain or unpredictable, it is difficult to design algorithms that can control swarm behaviors based on accurate models. In swarm control, the biggest challenge is to design a proper control scheme for each robot so that the swarm as a whole can generate collective behavior that can accomplish the pre-defined task for the swarm. Since each robot in the swarm follows the same behavioral model, designing controllers using traditional methods faces great difficulty. Design automation methods can play a significant role here since they can generate and explore a large number of potential candidates with the help of digital twin technology. Design automation techniques can also identify the optimal ones that satisfy the specified task requirements more efficiently by using metaheuristic methods, such as evolutionary computation. Therefore, the design automation of behavioral control strategies for UAV swarms based on evolutionary algorithms is another important research direction.

8 Conclusion

In this paper, we present a comprehensive survey of MODENA for designing the morphologies, controllers and vision systems of intelligent robots. Given the increasing complexity of working environments and the diversification of tasks, there is a growing need for MODENA to design the “Body-Brain-Eye” of intelligent robots. In the MODENA approach, the robot system’s morphology, controller, and vision systems can all be expressed as a graphical model. By automatically exploring the design space of the graphical model, a set of design candidates of intelligent robots that satisfy pre-defined functions and requirements can be obtained. The key components in MODENA include surrogate-assisted constrained multi-objective evolutionary algorithms (CMOEAs), topological search algorithms such as genetic programming, neural architecture search, and techniques for knowledge extraction during the design process, among others. MODENA is a core technology that can significantly improve the design efficiency and performance of robots, and it will become an increasingly important research theme in the future for designing either individual or swarm robots, just as EDA has played an important role in both academia and industry.