Abstract
Five-axis flank milling is prevalent in complex surfaces manufacturing, and it typically consumes high electricity energy. To save energy and improve energy efficiency, this paper proposes a tool path optimisation of five-axis flank milling by meta-reinforcement learning. Firstly, considering flank milling features, a feed angle is defined that guides tool spatial motion and identifies an ideal principal path. Then, machining energy consumption and time are modelled by tool path variables, i.e., feed angle, cutting strip width and path length. Secondly, an energy-efficient tool path dynamic optimisation model is constructed, which is then described by multiple Markov Decision Processes (MDPs). Thirdly, meta-learning integrating with the Soft Actor-Critic (MSAC) framework is utilised to address the MDPs. In an MDP with one principal path randomly generated by a feed angle, cutting strip width is dynamically optimised under a maximum scallop height limit to realise energy-efficient multi-expansions. By quick traversal of MDPs with various feed angles, MSAC enables an energy-efficient path generation and expansion integrated scheme. Experiments show that, regarding machining energy consumption and time, the proposed method achieves a reduction of 69.96% and 68.44% over the end milling with an iso-scallop height, and of 41.50% and 39.80% over the flank milling with an iso-scallop height, with a minimum amount of machining carbon emission, which highlights its contribution to the arena of energy-oriented and sustainable intelligent manufacturing.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
In the context of rising industrialisation and its significant impact on global carbon emissions and energy consumption, it is crucial to focus on improving energy efficiency in the manufacturing industry, a sector contributes notably to 55% of global energy consumption (Khalilpourazari et al., 2021; Li et al., 2019). Therein, the mechanical manufacturing represents 74.7% of the manufacturing sector’s energy use (He et al., 2020b), yet its energy efficiency hovers around just 30% (Cai et al., 2019). With wide distribution and considerable energy-saving potential, the mechanical manufacturing industry is urged to adopt sustainable practices. This effort is vital for advancing energy-focused intelligent manufacturing (Jia et al., 2014; Xu et al., 2021).
Reducing the machining energy consumption of machine tools is sensible to improve the machining process’s energy efficiency (Tong et al., 2020). According to Wan et al. (2023), milling machines comprise approximately 60% to 90% of CO2 equivalent emissions during operation. As an essential machining process, five-axis flank milling possesses exclusive capabilities in achieving high-precision machining of complex surfaces. Therefore, it is ubiquitous in industries like aerospace, automotive, and mould-making that require mass production of such shapes (Chu et al., 2020; Yi et al., 2018). Inevitably, five-axis flank milling contributes significantly to energy consumption in the machine tool’s operation (Lu et al., 2023). Therefore, developing energy-saving process planning for the flank milling is imperative. Compared with cutting parameters optimisation, tool path optimisation could reduce machining energy consumption without lowering surface quality and production efficiency (Edem et al., 2019). Hence, studying energy-efficient tool path optimisation of the flank milling is vital to achieve energy conservation. It further presents a pathway towards sustainability in the mechanical manufacturing industry.
Accurately estimating the machining energy consumed is fundamental for selecting the most energy-efficient tool paths (Jia et al., 2016). Current methods, however, overlook the impact of tool movements in their energy consumption models (Gao et al., 2022). Another key aspect is optimising tool path parameters to effectively generate and expand the principal path (Fountas et al., 2019). Presently, there are two types for achieving energy-efficient tool paths. The first prefers to find the shortest total path length using heuristic algorithms, typically with a constant scallop height for tool path expansion (Zou, 2021). Nevertheless, the height is set too conservative for complex surfaces with a large changing curvature, leading to a small cutting strip width and suboptimal energy consumption improvements (Liang et al., 2020). The second one aims to optimise the principal path using maximum cutting strip width as a metric (Lu et al., 2019), but this can increase material removal energy consumption and damage surface quality. Therefore, it is pivotal to strike an optimal balance between cutting strip width, energy consumption and surface quality in the energy-efficient tool path optimisation. Additionally, since flank milling generates space movements of feeding and transmission systems (Song et al., 2022; Yoon et al., 2018), machining energy consumption tends to vary across principal paths along different feed angles. Unfortunately, there is a blank in the studies regarding the optimisation of feed angles to improve energy efficiency.
In response to these challenges, this paper proposes an energy-efficient tool path generation and expansion method. The motivations behind the method are to maximise energy efficiency while maintaining surface quality in five-axis flank milling for complex surfaces. To achieve this, the proposed method firstly defines a feed angle that guides the tool’s spatial motion, considering flank milling features. Then, a novel machining energy consumption model is built by tool path generation and expansion variables, including feed angle, cutting strip width and path length. Secondly, a tool path dynamic optimisation model is established to minimise machining energy consumption and time, containing a maximum scallop height constraint for ensuring machining quality. Thirdly, the model is formulated as multiple Markov Decision Processes (MDPs), which contain multi-expansion tasks based on different principal paths generated by various feed angles. After that, meta reinforcement learning (MRL) is applied to solve the MDPs to find the optimal tool path generation and expansion scheme. The reason why MRL is applied is that it enables faster learning than traditional RL, not requiring full retraining for a new task (Wang et al., 2020). By pre-learning task similarities, it efficiently solves MDPs (Li & Zhou, 2023). For this reason, model-agnostic meta-learning (MAML) (Mo et al., 2023) and Soft Actor-Critic (SAC) (Ping et al., 2023) (MSAC) are integrated to solve the similar tasks driven by the feed angle.
Based on the above study, there are three pivotal contributions:
-
(1)
Considering five-axis flank milling features, machining energy consumption is firstly modelled by the tool path parameters. The model is both material- and machine tool-agnostic, which includes spatial movements of tool path to aid operators in easily adjusting tool paths to achieve sustainability;
-
(2)
MSAC is applied to accelerate MDPs traversal. It enables energy-efficient expansions following a principal path through dynamic optimisation of cutting strip width in each MDP. Subsequently, an optimal tool path generation and expansion scheme is identified by navigating MDPs across different feed angles;
-
(3)
The proposed method outperforms current methods by significantly lowering machining energy consumption and time, as well as enhancing sustainability, evidenced in a case study. Its viability is further verified through surface quality assessments, establishing the method as a key reference for feasibility and sustainability in complex parts manufacturing.
This paper is structured as below: section “Literature Review” surveys related studies, summarising the existing research gaps. Then, the studied problem is formulated and an optimisation model is provided. After that, an MSAC framework is presented. Subsequently, the proposed method is evaluated. Conclusions, limitations and future works are outlined finally.
Literature review
This paper contributes to two critic research domains: (1) energy consumption modelling in tool path optimisation; (2) tool path optimisation method related to energy efficiency. In this section, the literature on these two aspects are reviewed, followed by the identification of the research gaps.
Energy consumption modelling in tool path optimisation
The classical problem of energy-efficient tool path optimisation in numerical control machining involves optimising machining energy consumption through tool path strategies (Chu et al., 2020). Therefore, investigating the correlation between machining energy consumption and tool path is a precondition (Zhao et al., 2022). Table 1 provides an overview of the related works on energy consumption modelling.
From Table 1, Campatelli et al. (2015) discovered that the energy consumption rates of the motor varies with cutting path schemes. Luan et al. (2018) put forward an energy consumption function incorporating tool path schemes impacts, including zag, zigzag and spiral paths. Edem and Mativenga (2017) proposed an energy consumption calculation method related to the CNC tool path with NC code as input. Similar work is also found in Bhinge et al. (2016). Shin et al. (2017) built a model capable of forecasting the energy used in the tool path level for particular machining setups. Wang et al. (2019) proposed a cutting energy consumption modelling with different tool path schemes. He et al. (2020a) incorporated tool path schemes into the machining energy consumption modelling; yet, the method mainly applies to specific workpieces, and did not consider the tool path movement effects. Xu et al. (2016) defined an energy consumption model of the designated machine tool in the end milling process. Li et al. (2018) and Hu et al. (2018) both built an energy consumption function that related the tool path length into the machining time. Vishnu et al. (2020) proposed a two-stage energy model that relied on the NC codes to identify the path length.
In summary, most of the works are only applicable to traditional three-axis machining. Given the five-axis machining involved complex parts and long machining efficiency, the modelling considering five-axis machining features is very necessary.
Energy-efficient tool path optimisation
Research on energy-efficient optimisation of tool paths for flank milling is currently lacking. But in common practices in NC machining, there are two mainstreams for energy-efficient or efficient path planning, both using iso-scallop height for path expansion. Table 2 provides an overview of the related works on energy efficient too path optimisation.
As shown in Table 2, the first type of optimisation focuses on generating the shortest tool path to boost energy efficiency by heuristic algorithms, e.g., Genetic Algorithm (Karuppusamy & Kang, 2017), Harmony Search Algorithm (Yi et al., 2018), Depth-first Search (Hu et al., 2018), Ant Colony Algorithm (Feng et al., 2022), NSGA-II (Zhou et al., 2020), Artificial Electric Field Algorithm (Sun et al., 2023). Li et al. (2018) optimised tool paths with minimum energy consumption during the milling process, where an improved GA was used to optimise the ways of cutter contact points. The advantages lie in achieving faster machining, consequently reducing overall energy consumption. However, it may overlook the impact of cutting strip width on energy consumption, potentially leading to suboptimal energy optimisation in certain scenarios.
The second aims to maximise the cutting strip width in tool path expansion to improve energy efficiency. Li and Tang (2021) realised a larger cutting strip width by choosing a section of the cutting edge with a smaller curvature of a non-spherical tool for freeform surface machining. Xu et al. (2016) achieved a reduction in milling energy consumption by enlarging the width of the cutting curves. Hu and Tang (2016) investigated the strip-width-maximisation study to enhance machining efficiency of flank milling. Liu et al. (2018) and Grandguillaume et al. (2021) optimised cutting strip width to enhance time efficiency. However, it is worth noting that blindly enlarging the cutting strip width may increase material removal energy consumption and damage the surface quality.
Research gaps
In summary, considerable research has been conducted on energy modelling for energy-efficient tool path optimisation and the optimisation process itself. However, studies in the realm of five-axis machining are notably lacking, with specific gaps including:
-
(1)
Most works focused on three-axis machining and built a simple energy consumption model by just relating machining time with path length, failing to figure out the impacts of tool path movements on machining energy consumption for five-axis machining, thus unable to act as a fundament of the five-axis tool path optimisation;
-
(2)
Existing tool path optimisation typically aims for either tool path generation or expansion, using a constant iso-scallop height. Yet, these approaches fall short in global optimisation of tool paths, overlooking the impacts of variable cutting strip width and feed angle on overall machining energy efficiency and surface quality.
The aim of this paper is to shorten the gap between requirements of energy-efficient five-axis machining and existing works. Specifically, energy consumption modelling considering tool path movement, and a tool path generation and expansion optimisation are presented. It not only optimises feed angle, but adjusts cutting strip width during each expansion. This work promises to enhance energy efficiency and adhere surface quality in complex parts five-axis machining.
Problem formulation
This section introduces the preliminaries of tool path optimisation problem, and devises an energy-efficient tool path optimisation model for five-axis machining.
Preliminaries
For modelling free-form surfaces, this paper applies Non-uniform rational B-splines (NURBS), which offers a precise uniform representation of parametric curves for free-form surfaces (Shi et al., 2024), as expressed in Eq. (1):
where u and v are parameters of NURBS model. \(\{ N_{a,p} (u)\}\) and \(\{ N_{b,q} (v)\}\) are basis functions in NURBS modelling. \(C_{a,b}\) are control points, which define the surface topology. \(\{ \omega_{a,b} \}\) are weighting factors, which adjust the dominance of each control point, and are tuned to follow the desired shape.
Figure 1 shows the five-axis flank-milling process, where a Workpiece Coordinate System (WCS) \(o_w - x_w y_w z_w\) is founded, where centre of the bottom of the workpiece is as the origin. A local tool coordinate system (LCS) is established using three key directions: a feed direction \(f\), a normal direction \(n\) of the surface, as well as a product direction \(k = f \times n\) by the right-hand screw rule.
In the Local Coordinate System (LCS) of a five-axis machine tool, tool orientation \(T_w (\alpha ,\beta )\) is characterised by a lead angle \(\alpha\) and a tilt angle \(\beta\), which is mathematically described as:
where \(\alpha\) and \(\beta\) are fixed by this study. A tool path \(L^c\) encompasses the trajectory traced by the cutter location (CL) curve along with the tool orientation at each CL point, where each point has one cutter contact (CC) point. Therefore, \(L^c\) is defined by Eq. (3):
where \({\mathop {C_{k - 1}^{T_{k - 1} } C_k^{T_k } }\limits^\cap }\) refers to the connections between adjacent CC points on \(L^c\), which includes the net-cutting path \(L^{nc}\) and air-cutting path \(L^{ac}\). k is the total point number on \(L^c\).
For \(S(u,v)\) in the WCS, a plane \(\Phi\) passes through two points \(S({u / 2},0)\) and \(S({u / 2},v)\) on \(S(u,v)\) with its normal vector \(n_p\) perpendicular to a normal vector \(n_s\) of \(S(u,v)\) at \(p({u / {2,{v / 2}}})\), and intersects into a path curve, i.e., \(\Phi \cap S = L^{pr} (\lambda )\), where \(L^{pr} (\lambda )\) is a principal path. \(\lambda\) is defined as a feed angle between \(L^{pr} (\lambda )\) and \(n_p\), rotating around \(n_s\) along \(\Phi\), as shown in Fig. 2.
\(\lambda\) guides \(L^{pr} (\lambda )\) to change in the plane of \(S(u,v)\). In five-axis flank milling, the tool moves along each \(L^{pr} (\lambda )\) driven by various values of \(\lambda\) and forms a family of tool surfaces in space, generating varying energy consumption from moving the workpiece workbench. Therefore, the optimal energy-saving principal curve is identified by optimising \(\lambda\).
To fulfil \(S(u,v)\), multiple expansions based on \(L^{pr} (\lambda )\) are performed. Assuming \(p_1\) of the first expanded path \(L_1^{ex}\) is the neighbouring point of \(p_0\) on \(L^{pr} (\lambda )\), it is expressed as Eq. (4):
where Δu and Δv represent the incremental movements along the tool path on the surface from one point to the next (Lo, 1999). \(d_1\) is an increment, also known as a cutting strip width, between \(p_0\) and \(p_1\) in the parametric [u,v] domain, i.e., \(d_1 = |p_0^{\,} p_1 |\). Similarly, \(d_j = |p_{j - 1} p_j |\) is an increment in jth expansion. The jth expanded path is \(L_j^{ex} = \{ {\mathop {C_{1,j}^{T_{1,j} } C_{2,j}^{T_{2,j} } }\limits^\cap } , \cdots ,{\mathop {C_{k - 1,j}^{T_{k - 1,j} } C_{k,j}^{T_{k,j} } }\limits^\cap } \}\). In this paper, the drum taper cutter is used. Each expansion is shown in Fig. 3.
As shown in Fig. 3, cutting edges AB and CD are involved in flank milling. \(R_b\) is variable curvature of the machined surface, while \(r_b\) is a constant one in the case of no tool change. According to the principle of scallop height, \(d\) is calculated as Eq. (5):
where the symbols ‘+’ and ‘−’ are selected to represent convex and concave surfaces, respectively. For free-form surfaces, \(R_b\) changes frequently, and the maximum principal curvature is often used to restrict \(d\) in a conservative way due to the strict limitation of a constant \(h\). This increases the overall toolpath trajectory and the machining energy consumption. Considering this, this paper allows to optimise \(d\) to expand the path with a variable \(h\) instead of a constant value to explore larger energy-efficient machining potentials. The expansion scheme is depicted in Fig. 4.
From Fig. 4, \(d_j \in \{ d_1 ,d_2 , \cdots ,d_i \}\) is the variable to expand from one point on the tool path to the corresponding point on the next path. For free-form surface parts, the change of \(d_j\) causes the length of the current expanded path \(L_j^{ex} \in \{ L_1^{ex} ,L_2^{ex} , \cdots ,L_i^{ex} \}\) to be different. Therefore, \(L_j^{ex}\) varies with \(d_j\). It is important to note that \(d_j\) varies with the region’s complexity. For example, for high-curvature regions, the optimisation range of \(d_j\) is dynamically diminished within \(h_{\max }\) to enhance the approximating precision of the current region’s shape, which is essentially a partition-based expansion type.
Problem description
This research aims to optimise tool path integration of five-axis flank milling, to minimise both machining energy consumption and time while preserving surface quality. The problem consists of two tasks: (i) quantifying the machining energy consumption by including tool path parameters considering the flank milling features; (ii) deriving a tool path generation and expansion integration scheme with the quantified model.
To effectively solve such two tasks, the problem is decomposed into multiple sub-optimisation problems by all feasible values of \(\lambda\). For mth sub-problem, \(d_{m,j}\) is optimised to search for an ideal expansion scheme \(\{ (d_{m,1} ,L_{m,1}^{ex} (d_{m,1} )), \cdots ,(d_{m,\,i} ,L_{m,i}^{ex} (d_{m,\,i} ))\}\) following a generated path \(L^{pr} (\lambda_m )\). By successive iterations, the optimal path generation and path expansion are searched. Before the study, the assumptions and prerequisites are clarified:
-
(1)
Cutting parameters, i.e., the feed rate \(f_r\) (mm/r), cutting depth \(a_p\) (mm) and cutting speed \(v_f\) (mm/min) are known before tool path optimisation;
-
(2)
The machining energy consumption starts to be quantified after the cutting tool moves;
-
(3)
There is no tool change during the machining process, and the tool orientation is fixed.
Tool path optimisation modelling
Objective variables
As stated above, \(\lambda\) is a variable to be optimised to select an optimal path \(L^{pr} (\lambda )\). \(d\) is another variable. As \(L^{ex} (d)\) is related to \(d\), it is obtained indirectly by optimising \(d\). For jth expansion in mth task driven by \(\lambda_m\), the optimisation variables are \(d_{m,j}\) and \(L_{m,j}^{ex} (d_{m,j} )\) to form an energy-efficient expansion scheme along a principal path \(L^{pr} (\lambda_m )\) for fulfilling the cutting area.
Optimisation functions
The machining energy consumption \(E_{total}^c\) as well as the machining time \(T_{total}^c\) are introduced to thoroughly assess both the economic and environmental dimensions of the flank milling process. \(E_m^c\) is generated by an expansion scheme based on \(L^{pr} (\lambda_m )\) for mth task. \(E_{m,j}^c\) in jth expansion is modelled as Eq. (6):
where \(E_{m,j}^b\) stands for the electricity energy consumed to sustain basic functions, \(E_{m,j}^{nc}\) is the net-cutting energy consumption, \(E_{m,j}^{op}\) indicates energy consumption caused by the machine axes’ movement, and \(E_{m,j}^{ac}\) denotes energy consumption related to machine tools’ acceleration phase. Among them, \(E_{m,j}^b\) is calculated by Eq. (7):
where \(len(L_{m,j}^{ex} )\) is the length of \(L_{m,j}^{ex} (d_{m,j} )\), which is calculated in Eq. (8).
where k is the number of the points on \(L_{m,j}^{ex} (d_{m,j} )\). The coordinate of the points on \(L_{m,j}^{ex} (d_{m,j} )\) is obtained with a distance \(d_{m,j}\) from the corresponding points on the previous path by Eq. (4). Specifically, when j is 1, the points on \(L_{m,1}^{ex} (d_{m,1} )\) are derived from \(L^{pr} (\lambda_m )\) where the points on \(L^{pr} (\lambda_m )\) are discretised by a forward step \(d_f\) and each point is determined by NURBS modelling. It is obvious that \(d_{m,j}\) varies along the path, thus deciding the positioning (also the coordinates) of points on the next path. Meanwhile, different values of \(\lambda_m\) result in \(L^{pr} (\lambda_m )\) covering the cutting surface in various ways, thus leading to different lengths of \(L^{pr} (\lambda_m )\). Therefore, it inevitably affects the expanded path \(L_{m,j}^{ex} (d_{m,j} )\) along \(L^{pr} (\lambda_m )\) indirectly. \(E_{m,j}^{nc}\) is given by Eq. (9):
where \(T_{m,j}^{nc}\) refers to the time for removing materials in jth expansion. \(L_{m,j}^{nc}\) is the net-cutting path section of \(L_{m,j}^{ex} (d_{m,j} )\). \(P_{m,j}^{nc}\) is calculated approximately by an empirical model (Deng et al., 2017), which is affected by the inputs: three cutting parameters, including \(v_f\), \(f_r\), \(a_p\) and \(d_j\) in jth expansion. Since this study does not optimise \(v_f\), \(f_r\), \(a_p\), they are fixed constants in the tool path optimisation. \(K_{Fc}\) is a correction coefficient, and a, b, c as well as e are exponent coefficients. Linear regression is used to calibrate these parameters using the collected cutting power data. The application of the model assumes \(K_{Fc}\), a, b, c, and e are unchanged in specific processing conditions, i.e., workpiece and tool material, and machine tool specification. Moreover, the cutting process is assumed to remain stable without tool breakage or abnormal wear.
In the flank milling, the energy consumed by tool paths along the gravitational direction is denoted as potential energy consumption \(E^{pt}\) in this paper. The expanded paths along \(L^{pr} (\lambda )\) maintain a direction like \(\lambda\). This direction affects the projection length of the paths in the gravitational direction, thereby influencing the potential energy consumption. For jth expansion, \(E_{m,j}^{pt}\) is included in the modelling of \(E_{m,j}^{op}\), which is quantified by the gravitational movement of the machine tool’s axes, as shown in Eq. (10):
where \(m_z\) is the mass of z-axis. Notably, \(E_{m,j}^{pt}\) exists only when the arms move upwards. \(L_j^{\mathop G\limits^\to }\) is the projection of \(L_j^{ex} (d_j )\) at the direction of vertical upward movement, which is calculated by the Euclidean distance in Eq. (11):
where \(q_{k,j}^{\mathop G\limits^\to }\) is the component of kth point on \(L_{m,j}^{ex} (d_{m,j} )\) in the gravitational direction.
Besides, \(E_{m,j}^{op}\) also includes the energy consumption for translation and rotation of the mechanical arms along \(L_{m,j}^{ex}\), i.e., \(E_{m,j}^{tr}\) and \(E_{m,j}^{rt}\). \(E_{m,j}^{tr}\) represents the energy consumption attributed to overcoming friction in the movement along axes. Similar to \(E_{m,j}^{pt}\), \(E_{m,j}^{tr}\) is also indirectly influenced by \(\lambda_m\) in terms of the projection length in the translation direction:
where \({\mathop {f_x }\limits^\to }\) and \({\mathop {f_y }\limits^\to }\) represent friction forces along the two respective axes. \(L_{m,j}^{{\mathop {T_x }\limits^\to } }\) and \(L_{m,j}^{{\mathop {T_y }\limits^\to } }\) are the projections of \(L_{m,j}^{ex}\) at the direction of translation, where the lengths are given by Eq. (13):
where \(q_k^{{\mathop {T_x }\limits^\to } }\) and \(q_k^{{\mathop {T_y }\limits^\to } }\) are components of kth point on \(L_{m,j}^{ex}\) in translation along x and y axes.
\(E_{m,j}^{rt}\) in Eq. (14) includes the rotational energy consumption of \(\alpha_r\) and \(\beta_r\) rotary axes:
where ω is angular speed (rad/s) of rotary axes, and I (m4) is the moment of inertia of rotary axes.
One flank milling cycle along \(L_{m,j}^{ex} (d_{m,j} )\) undergoes two accelerations and decelerations. The former consumes energy, while the latter relies solely on inertia without extra consumption. The two accelerations in one cycle are approximately calculated by Eq. (15):
where \(m_x\) and \(m_y\) are the masses of the x and y axes of the machine tool. \(v_x\), \(v_y\) and \(v_z\) are the components of \(v_f\) on the three axes of machine tool.
To sum up, \(E_{m,j}^c\) in jth expansion of mth task guided by \(L^{pr} (\lambda_m )\) is built by Eq. (16):
Generally, there are i expanded paths guided by a selected \(L^{pr} (\lambda_m )\) in the path planning, and the total machining energy consumption of i expansions in mth task is quantified by Eq. (17):
The machining time consumed by the jth expansion scheme and total scheme in mth task is shown in Eq. (18):
The effects of \(\lambda_m\), \(d_{m,j}\) and \(L_{m,j}^{ex} (d_{m,j} )\) on the machining energy consumption and machining time are included in Eqs. (17) and (18), facilitating the energy-efficient tool path optimisation of flank milling. Why \(\lambda_m\) does not appear explicitly in the equations is that \(\lambda_m\) determines the direction of \(L^{pr} (\lambda_m )\). The expanded paths along \(L^{pr} (\lambda_m )\), due to the free form surfaces topology, only roughly follow the direction indicated by \(\lambda_m\). Therefore, while \(\lambda_m\) does indeed impact \(E_m^c\) and \(T_m^c\), it does not explicitly correlate with them.
Constraints design
Several constraints are designed to ensure the proposed method’s feasibility:
Equation (19) denotes that the decision parameter \(\lambda_m\) of mth task should set as an integer and confined within a practical range, considering it is selected in the plane \(S(u,v)\) and for ease of actual machining.
Equation (20) indicates that \(d_{m,j}\) does not exceed the chord length \(l_{ch}\) and the maximum scallop height \(d_{\max }\), considering requirements for cutting tool interference and workpiece surface quality.
where \(l_{ch} = 2r_b \sin \frac{\theta_\partial }{2}\)(mm). \(\theta_\partial\)(°) is a central angle corresponding to \(l_{ch}\). \(d_{\max } = \sqrt {{\frac{{8R_b r_b h_{\max } }}{R_b \pm r_b }}}\) and \(h_{\max }\) is determined by the surface quality requirement. Calculation of \(R_b\) refers to Lo (1999).
Equation (21) denotes the cutting power for each expansion is regulated to be lower than the spindle motor’s output power for machining feasibility (Li et al., 2019):
where \(P_{\max }\)(W) denotes the motor’s rated power. \(P_{m,j}^c\) (W) is the cutting power in jth expansion. \(\eta^c\) is the motor efficiency.
Equation (22) represents that the sum of cutting time does not fall short of the pre-set tool life (Chen et al., 2019).
where \(T_e\) (min) is the pre-set value, and \(T_{m,j}\) (min) is the cutting time consumed in jth tool path expansion of mth task.
Equation (23) denotes the sum of the cutting area per expansion for i times should constitute the specified cutting area (Zhang & Tang, 2019).
where \(A_m^c\) (mm2) is total cutting area of mth task, and \(A_{m,j}^c\) is the cutting area in jth path expansion.
Optimisation model
Based on the explanation, to find the optimal tool path generation and expansion integrated scheme for flank milling, an optimisation model is designed in Eq. (24):
where M is the number of total traversal tasks initiated by different values of \(\lambda\), and \(\lambda\) decides the generation of a principal path in flank milling. For mth task, there is a principal path \(L^{pr} (\lambda_m )\) needs to be expanded for i times. In jth expansion, \(d_{m,j}\) is dynamically optimised with a resulted \(L_{m,j}^{ex} (d_{m,j} )\). Through M tasks traversal, the optimal integrated scheme of path generation and expansion with an optimal \(\lambda^\ast\) is searched ultimately by optimising \(F^*\). The model involves not only the local dynamic optimisation of each expansion task under a given value of \(\lambda\), but a global optimisation that searches for the optimal integrated scheme among all tasks. To tackle such a complex multi-task optimisation problem, the MRL algorithm is introduced. By pre-learning the similarities between tasks, MRL can quickly adapt to new tasks, thereby accelerating the multi-task traversal and reducing the learning time required for each new task.
Energy-efficient tool path optimisation framework
This section proposes an energy-efficient tool path optimisation framework based on MSAC, as seen in Fig. 5.
Figure 5 illustrates how MSAC addresses the traversal problem across MDPs, where varying feed angles initiate distinct learning tasks. This solving process encompasses three key stages:
Stage 1 Generation of a principal path, which is dictated by the defined feed angle \(\lambda\) on a NURBS modelled surface based on Eq. (1);
Stage 2 Following the principal path generated in stage 1, the tool path undergoes dynamic expansions by optimising cutting strip width. This is viewed as an MDP, which is solved by SAC;
Stage 3 This stage culminates in traversing multiple MDPs driven by all feasible feed angles. Meta learning is employed to enhance the SAC’s adaptability to various new tasks. By examining a spectrum of feed angles, the MSAC will find the optimal path generation and expansion scheme.
Tool path generation scheme
In stage 1, different principal paths are generated by traversing all values of \(\lambda\). Based on \(\Phi \cap S = L^{pr} (\lambda )\), a principal path \(L^{pr} (\lambda_m )\) is generated when selecting \(\lambda_m\). In each traversing, \(\lambda\) varies, changing the principal path and initial state for tool path expansion, which means a new MDP problem of path expansion optimisation arises. There are multiple MDPs problems that remain to be solved. The mth MDP problem with a selected \(\lambda_m\) is modelled in the following subsection.
Tool path expansion scheme
In stage 2, each task for learning an effective expansion policy is formulated as a new MDP containing < \(\mathcal{S}\), \(\mathcal{A}\), \(\mathcal{P}\), \(\mathcal{R}\)>. Naturally, the studied problem satisfies the MDP’s property, and the SAC algorithm is applied.
-
(1)
Environment modelling. Models of machining energy consumption and time in Eqs. (16) and (18) in each timestep t are used to assist the agent in learning the optimal expansion policy for an MDP.
-
(2)
MDP formulation. The tool path expansion problem along one principal path selected by a value of \(\lambda\) is formulated as a finite MDP problem. The mth MDP with \(\lambda_m\) is defined as below.
State\(\mathcal{S}\) In timestep t of mth MDP, \(s_{m,\,\,t}\) should represent the expansion state at time step t in mth MDP so that the agent can better select path parameters each time. Therefore, it includes: (i) pre-set machining configuration (MC), including cutting parameters and cutter geometry parameters; and (ii) key tool path variables, i.e., \(\lambda_m\), the residual cutting area \(A_{m,\,\,t}^{rc} = A_m^c - \sum_t {A_{m,\,\,t}^c }\), the accumulated machining energy consumption \(E_{m,\,\,t}^{acu}\) and time \(T_{m,\,\,t}^{acu}\) at tth timestep. It is important to note that \(\lambda_m\) is included as a state vector rather than as an action, since it specifies the initial conditions for the tool path expansion and is not modified during each MDP task execution. Therefore, \(s_{m,t} = \{ MC,\lambda_m ,A_{m,\,t}^{rc} ,E_{m,\,t}^{acu} ,T_{m,\,t}^{acu} \}\).
Action\(\mathcal{A}\) \(a_{m,\,t} = \{ d_{m,\,t} ,L_{m,t}^{ex} (d_{m,\,t} )\}\) characterising the action selected at tth timestep in mth task, which is continuous and bounded.
Reward \(\mathcal{R}\) \(r_{m,\,t}\) primarily comprises objectives and constraints at tth expansion. The defined reward is a combination, which is expressed as Eq. (25):
$$\begin{aligned} & r_{m,\,t} (s_{m,\,t} ,a_{m,\,t} ) \\ &\quad= f_{nor} (\xi - E_{m,\,t}^c - T_{m,\,t}^c - \nu \sum {C_{m,t} } ) \end{aligned}$$(25)where \(f_{nor} ( \cdot )\) normalises objectives and constraints with varying units in a standardised range of [-1, 1]. \(\xi\) is a constant that encourages the agent to get positive feedback. \(\nu\) is the penalty coefficient, which is set as 200 to warn the agents not to violate constraints \(C_{m,t}\) in Eq. (26):
$$ \begin{gathered} C_{m,t} = \max \{ (\sum_{t = 1}^{t_o } {(A_{m,\,t}^c } { - }A_m^c )),0\} \hfill\\ \quad+ \max \{ (P_{m,t}^c { - }\eta^c P_{\max } ),0\} + \max \{ (T_e { - }\sum_{t = 1}^{t_o } {T_{m,t}^c } ),0\} \hfill \\ \quad +\max \{ (d_{m,\,t} { - }l_{ch} ),0\} + \max \{ (d_{m,\,t} - d_{\max } ),0\} \hfill \\ \end{gathered} $$(26)where \(t_o\) represents the final expansion. The final aim is to discover a favourable policy to maximise cumulative rewards. Notably, \(r_{m,\,\,t}\) is bounded in this paper, which provides necessary conditions for convergence of the algorithm (Li et al., 2023).
-
(3)
SAC algorithm. SAC addresses each MDP by maximising reward expectation and entropy on the actor-critic architecture. The description, network architecture and training details of SAC are found in Appendix A.
Tool path integrated optimisation based on MSAC
This subsection introduces a MSAC algorithm for training a model to facilitate a rapid adaptation to new tasks. The principle of the MSAC involves inputting results from various environments into the training model, so that the sensitivity of the loss function of the new task maximises the policy parameters (Wen et al., 2021).
-
(1)
Meta task set. To facilitate training, tasks are selected from workpieces of different surface topologies, where each task begins from a principal path randomly selected. The goal is to find a global optimal tool path generation and expansion scheme. For each task, the meta-features are defined by weight factors \(\{ \omega_{a,b} \}\) and \(\lambda\). Therein, a training task set \(T_N\), a validation task set \(T_V\), and a learning task set \(T_M\) are mutually different but follow the same distribution \(U\{ \lambda ,\{ \omega_{a,b} \} \}\), where \(\lambda \ U(0^\circ ,180^\circ )\), and \(\{ \omega_{a,b} \} \ U(0, + \infty )\).
-
(2)
Meta-training phase. The aim is to optimise the parameters of SAC in each task by MAML via individual and global training stages. As the network structures and parameters training of SAC for each MDP task is well defined in Sect. 4.2, the related formulas are directly applied here. For a specific task \(T_n (n = 1, \cdots ,N)\), K Trajectories are collected in a dataset \(D_n^{tr}\), where the training data includes the input meta-features and corresponding tool path integrated optimisation results. Each trajectory contains H time steps. The meta parameters \(\theta_n^{\prime}\), \(\vartheta_n^{\prime}\) and \(\varphi_n^{\prime}\) are updated by gradient descent method, which is an individual updating as shown in Eq. (27):
$$ \left\{ \begin{gathered} \theta _n^{'(t_n + 1)} \leftarrow \theta _n^{'(t_n )} - \psi _{in} \nabla _{\theta _n^{'(t_n )} } L_{T_n } (\theta _n^{'(t_n )} ) \hfill \\ \vartheta _n^{'(t_n + 1)} \leftarrow \vartheta _n^{'(t_n )} - \varsigma _{in} \nabla _{\vartheta _n^{'(t_n )} } L_{T_n } (\vartheta _n^{'(t_n )} ) \hfill \\ \varphi _n^{'(t_n + 1)} \leftarrow \varphi _n^{'(t_n )} - \varpi _{in} \nabla _{\varphi _n^{'(t_n )} } L_{T_n } (\varphi _n^{'(t_n )} ) \hfill \\ \end{gathered} \right. $$(27)where \(\psi_{in}\), \(\varsigma_{in}\) and \(\varpi_{in}\) denote the learning rate of individual-level updating of MSAC networks, and \(t_n\) denotes the last iteration during the updating. \(\theta_n^{{\prime}{(t_n )}} = \theta\), \(\vartheta_n^{{\prime}{(t_n )}} = \vartheta\), and \(\varphi_n^{^{\prime}(t_n )} = \varphi\) in the first iteration. \(L_{T_n } ( \cdot )\) is the loss function defined in the SAC. After the tasks finish updating their network parameters, N tasks with each dataset are stored in the buffer \(D^{tr} \doteq D_{1:N}^{tr}\). Then, outer-loop training is performed to find the global parameters. The optimal ability is found by estimating the aggregation loss. Task samples are used for the updating process in Eq. (28):
$$ \left\{ \begin{gathered} {\mathop \theta \limits^\wedge }^{(t_n + 1)} \leftarrow {\mathop \theta \limits^\wedge }^{(t_n )} - \psi_{ou} \nabla_{\mathop \theta \limits^\wedge } \sum_{n = 1}^N {L_{T_n } (\theta_n^{\prime} )} \hfill \\ {\mathop \vartheta \limits^\wedge }^{(t_n + 1)} \leftarrow {\mathop \vartheta \limits^\wedge }^{(t_n )} - \varsigma_{ou} \nabla_{\mathop \vartheta \limits^\wedge } \sum_{n = 1}^N {L_{T_n } (\vartheta_n^{\prime} )} \hfill \\ {\mathop \varphi \limits^\wedge }^{(t_n + 1)} \leftarrow {\mathop \varphi \limits^\wedge }^{(t_n )} - \varpi_{ou} \nabla_{\mathop \varphi \limits^\wedge } \sum_{n = 1}^N {L_{T_n } (\varphi_n^{\prime} )} \hfill \\ \end{gathered} \right. $$(28)where \({\mathop \theta \limits^\wedge }\), \({\mathop \vartheta \limits^\wedge }\) and \({\mathop \varphi \limits^\wedge }\) are the meta policy parameters trained over all training tasks. \(\psi_{ou}\), \(\varsigma_{ou}\) and \(\varpi_{ou}\) are the learning rate of outer-loop training. After meta-training, \(T_V\) is employed to tune the hyper-parameters of the model.
-
(3)
Meta-Learning phase. In this stage, the meta-model is used to quickly adapt the trained meta-features on new tasks through a few descent steps. The network is updated by Eq. (29):
$$ \left\{ \begin{gathered} \theta^{\prime} \leftarrow \theta^{\prime} - \psi \nabla_{\theta^{\prime} } L(\theta^{\prime} ) \hfill \\ \vartheta^{\prime} \leftarrow \vartheta^{\prime} - \varsigma \nabla_{\vartheta^{\prime} } L(\vartheta^{\prime} ) \hfill \\ \varphi^{\prime} \leftarrow \varphi^{\prime} - \varpi \nabla_{\varphi^{\prime} } L(\varphi^{\prime} ) \hfill \\ \end{gathered} \right. $$(29)where \(\theta^{\prime}\), \(\vartheta^{\prime}\) and \(\varphi^{\prime}\) are meta policy parameters initialised by the trained parameters \({\mathop \theta \limits^\wedge }\), \({\mathop \vartheta \limits^\wedge }\) and \({\mathop \varphi \limits^\wedge }\). The experiences are stored in the buffer \(D^{ml} = (s_t^{ml} ,a_t^{ml} ,r_t^{ml} ,s_{t + 1}^{ml} )\).
The detailed implementation of MSAC for multi-MDP tasks traversal and its convergence analysis are given in Appendix B. Please note that the analysis is under certain conditions, such as the use of an appropriate learning rate. Therefore, theoretical analysis alone is not enough to ensure the reliability of these theoretical findings. To this end, MSAC is tested in practical scenarios in Sect. “Case study” to confirm its convergence and effectiveness.
Case study
This part conducts a five-axis flank milling experiment and employs the proposed approach. The energy consumption model is initially validated to ensure reliability. Subsequently, the effectiveness of the MSAC algorithm is evaluated, and a performance comparison with other methods is presented.
Accuracy analysis of the energy consumption model
Experimental setup and data collection
The proposed energy consumption model is verified for its accuracy to provide an accurate mimic environment for MSAC implementation. Typical freeform parts, i.e., aluminium alloy 6061 (a typical Al–Mg–Si alloy) integral impeller blades are chosen as the objects, and the physical machining with a drum-taper milling cutter is conducted on JDGR400_A13S, as depicted in Fig. 6a. The consumption data generated by 12 tool paths are collected. Every four paths swept along one blade surface along \(\lambda\) = 30°, 90°, and 120°, respectively, and path lengths of the same angle are different from each other, where \(\lambda\) = 90° is exemplified in Fig. 6b. Each path is executed three times. Moreover, the cutting strip width in each single path is maintained as a constant to ensure uniformity in experimental conditions and to simplify the validation. An impeller model is inputted to NX to get the coordinate data of points in each path to calculate the actual path length, as illustrated in Fig. 6c. Figure 6d shows the HIOKI PW3360-30 power meter. An orthogonal experimental L16(44) is arranged in Table 3 to calibrate coefficients in Eq. (9). Cutting power data is sampled in Table 4. After that, the coefficients’ values are obtained in Table 5 by the linear regression. To ease the regression, Eq. (9) is logarithmically transformed.
Testing of the machining energy consumption model
Importantly, 12 single tool-path-expansion experiment are operated to verify \(E_j^c\), not \(E_{total}^c\), for visualising the energy model’s accuracy. The averaged predicted values are listed in Table 6, where the actual values are compared, and the absolute relative errors (ARE) are calculated.
From Table 6, the predicted values closely match the measured ones, with the maximum ARE being 7.169%. This confirms the robust predictive capability of the machining energy consumption model. Furthermore, it validates that the proposed energy consumption model is capable of providing reliable model support for optimising tool path scheme. Meanwhile, it is observed that \(E_j^c\) at different \(\lambda\) varies greatly. Besides, \(E_j^c\) generated by the same angle is also different, which is directly related to \(d_j\) and \(L_j^{ex} (d_j )\), verifying the significance of improving energy efficiency by optimising \(d_j\) and \(\lambda\).
Evaluation of the MSAC-based tool path optimisation
Algorithm parameters setting
The MSAC algorithm is implemented on free-form surfaces five-axis flank-milling at semi-finishing stage to realise the energy-efficient tool path planning. All tasks apply cubic NURBS curves and share the common parameters in Eq. (1) except for the weight points \(\{ \omega_{a,b} \}\), a key parameter influencing the surface topology. 300 tasks of \(T_N\) and 30 tasks of \(T_V\) are separately generated from the distribution \(U\{ \lambda ,\{ \omega_{a,b} \} \}\). \(h_{\max }\) is set as 0.02 mm considering the surface roughness requirement. \(P_{\max }\) is 7.5 kW, \(a_p\) is 0.5 mm, \(n_s\) is 2800r/min, \(v_f\) is 500 mm/min, \(T_e\) is 75 min. The algorithm's hyperparameters are detailed in Table 7, which is implemented in Python 3.6 with a processor being Intel © Core © i5-9400 CPU @ 2.90 GHz and 8 GB RAM.
Implementation of the MSAC algorithm
The environment-mimic model obtains rewards according to Eq. (25). Figure 7 illustrates the variations in cumulative rewards throughout the training process. The curve represents the average over 3 random seeds, where the classic SAC acts as a benchmark.
Both the MSAC and SAC agents exhibit an increase in the reward as they gather further knowledge through iterations from Fig. 7, confirming the appropriateness of the defined state space and reward function. Meanwhile, MSAC achieves a higher average reward value when it converges to about 500 episodes compared with SAC. This provides evidence for its strong convergence properties, largely due to its enhanced adaptability to new tasks, which allows for enhanced exploration efficiency within the state space.
To test the MSAC’ s adaptability to new tasks, a specific case study of the widely used impeller blades made of aluminium alloy 6061 is introduced. Meta-policy parameters are initialised by the trained global network parameters to perform the new tasks. \(\{ \omega_{a,b} \}\) are set differently from that in the training tasks but follow the same distribution, where \(\{ \omega_{a,b} \}\) = [3, 4, 5, 6, 100, 10, 9, 50]. The number of testing tasks is 181, which share the same NURBS-modelled surface topology of blades but with different traversing values from 0° to 180°. Cutting parameters are the same as that of the meta-training process. The rewards changing curve is plotted in Fig. 8.
Figure 8 shows that the trained MSAC requires approximately 40 episodes of further training to converge for a new task, which converges far earlier than the trained SAC. Meanwhile, compared with Fig. 7, the test process of the MSAC algorithm realises a nearly 92% reduction compared with the training process in terms of the episodes to reach convergence (40 vs. 500). This demonstrates that MASC swiftly adjusts to novel tasks and converges faster than SAC. Therefore, the learning curve verifies that the meta-leaning helps the agent adapt to the new environment, suitable for the multi-task traversing optimisation problem in this paper.
After traversing all the learning tasks by MSAC, the optimal principal path with its expansion scheme is identified with \(\lambda^\ast\). To display the optimal results across the range of feed angle from 0° to 180°, a polar plot is drawn in Fig. 9 where the traditional SAC is compared. For ease of observation, an average weighted normalisation on the optimal results of machining energy consumption and time is performed for all 181 scenarios.
As seen in Fig. 9, the performance results are depicted across all MDPs tasks with different \(\lambda\) values. The radial distance from the centre to the red/blue curve represents the optimal performance of the searched tool path integrated scheme under each value of feed angle. A shorter radial distance indicates a better performance. From this reason, the values of \(\lambda^\ast\) are easily identified for two methods, which are 107° for the SAC and 95° for the proposed MSAC algorithm, respectively.
Performance results of the optimal tool path integrated schemes along \(\lambda^\ast\) are presented for the two methods. To reduce the impact of random errors and enhance the reliability of experimental results, each scheme of the two methods is subjected to three replicates, and the average results of machining energy consumption, time and path length are computed. Additionally, driven by carbon taxes and policies, manufacturers should assess the carbon emissions impacts generated by manufacturing activities (Camarinha-Matos et al., 2024). According to Zhou et al. (2019), there are three sources of total carbon emissions \(CE_{total}^c\) during machining, including the electricity energy carbon emissions \(CE_{energy}^c\), materials carbon emissions \(CE_{{\text{material}}}^c\) and waste carbon emissions \(CE_{{\text{waste}}}^c\). They are evaluated to enrich the proposed method’s robustness and relevance in today’s eco-conscious landscape. \(CE_{total}^c\) is calculated in Eq. (30):
where \(CE_{energy}^c\) is calculated in Eq. (31).
where \(EF_e\) is the electricity energy carbon emissions factor, and its value is 0.7242kgCO2/kwh (Jiang et al., 2019). \(CE_{material}^c\) is calculated in Eq. (32):
where \(CE_{m - t}^c\) and \(CE_{m - w}^c\) mention carbon emissions generated by the cutting tool and workpiece involved in the flank milling. \(EF_{m - t}\) and \(EF_{m - w}\) are the production carbon emission factors of tool and workpiece, which are 33.7478kgCO2/kg and 16.13kgCO2/kg (Li et al., 2015). \(m_{tool}\) is the tool mass, which is 0.022 kg. \(\Delta V\) is the volume of material removed, which is 4.65 cm3. \(\rho_w\) is the density of aluminium alloy 6061, which is 2.7 g/cm3. \(CE_{waste}^c\) is given in Eq. (33):
where \(EF_{w - t}\) and \(EF_{w - c}\) are the carbon emissions of post-processing scrap tool and chips, which are 0.01346kgCO2/kg and 0.256kgCO2/kg (Li et al., 2015). Based on Eqs. (30)–(33), \(CE_{total}^c\) is quantified in Table 8, along with other optimisation results.
It is found that the proposed method with a shortest path length (774.35 mm), surpasses the SAC (1061.26 mm), showing its superiority. Moreover, similar trends are observed in \(E_{total}^c\) (a reduction of 24.89%) and \(T_{total}^c\) (a reduction of 27.04%). The performance is further confirmed by carbon emissions evaluation, where the proposed method results in predicted emissions of 255.07 g, compared to 273.92 g of the SAC. The findings highlight the proposed method’s consistency theoretical contributions and practical effects, and demonstrate the necessity of using meta-reinforcement learning to solve complex multi-task models. Meanwhile, absolute relative errors in machining energy consumed by the SAC and the proposed scheme are 7.24% and 5.17%, respectively, confirming the reliability of the energy consumption model.
Validation of workpiece surface quality
Evaluating the workpiece surface quality is another vital facet for validating the efficacy of the MSAC-based tool path strategy. Workpiece surface roughness is viewed as a metric, where the Mitutoyo Company SJ210 tester is used, with an accuracy of 0.002 μm. For measuring accuracy, surface roughness data is gathered from 11 points uniformly chosen and triple at every point. Measurement setup is presented in Fig. 10a. Measuring results of the machined part are listed in Fig. 10b.
It is observed from Fig. 10b that the worst result of the points does not exceed 1 μm, which aligns with the specified requirements of 1.6 μm for the semi-finishing process of free-form blades (Lindvall et al., 2021). Thus, the tool path optimal scheme proposed by this paper substantially enhances machining energy efficiency while upholding surface quality in five-axis flank milling.
Comparison with the state-of-the-art methods
A comparative test is proceeded to further elucidate the proposed method’s benefits. Hereinto, two widely-used methods are introduced: (a) the end milling method with an iso-scallop height using ball-nosed cutters (Liang et al., 2021), and (b) the flank milling method with an iso-scallop height using drum-type cutters (Lu et al., 2022). For the two benchmarks, both u and v surface patch borders are used as the principal tool path. Machining parameters are set as in meta-learning. Figure 11 displays the CL curves of the three methods.
From Fig. 11a–c, unlike the benchmarks with a constant scallop height, the proposed method selects a principal path with an optimal feed angle and varies the cutting strip width dynamically to achieve an energy-efficient machining. Figure 11d–f show the tool paths of the three methods in actual processing.
To observe the performance of the three methods precisely, the optimisation results are shown in Fig. 12, where the benchmarks are also evaluated three times and the average results are selected to ensure reliability and replicability.
From Fig. 12, the proposed method enjoys the shortest tool path length (774.3 mm), followed by the flank milling (1286.3 mm), then is the end milling (2453.6 mm). More than this, the proposed method demonstrates its prowess in substantially curtailing both machining energy consumption and machining time, which realise a reduction of 69.96% in \(E_{total}^c\) and 68.44% in \(T_{total}^c\) compared to the end milling, and a reduction of 41.50% in \(E_{total}^c\) and 39.80% in \(T_{total}^c\) compared to the flank milling. Meanwhile, the proposed method results in a minimum amount of the cutting carbon emissions (255.07gCO2), which further accentuates the ecological benefits of the proposed method. Not only this, it is worth noting that flank milling reduces more machining energy consumption and time compared to end milling, indicating its advantages in energy-efficient and sustainable machining. However, with the optimisation of cutting strip width and feed angle, the proposed method further reinforces the potential of enhancing sustainability in manufacturing practices.
Discussion
This study validates its contributions in two key aspects: (1) energy consumption modelling for energy-efficient tool path optimisation and (2) the suggested tool path optimisation method. To demonstrate the contributions, this work firstly assesses the accuracy of the developed energy consumption model. The lowest value 2.739% and the highest value 7.169% of ARE in Table 6 show that the proposed model estimates \(E_{total}^c\) precisely. Moreover, the analysis confirms a strong connection between feed angle, cutting strip width, and machining energy consumption, highlighting the need for feed angle and cutting strip width optimisation in energy-efficient machining.
Secondly, this study evaluates the optimisation performance of MSAC. The optimisation results are first compared with those of the SAC without meta-learning regarding the adaptability to new tasks. Table 8 show that the experiment average results of the proposed method achieve higher energy-saving (24.89%) and time-saving (27.04%) potentials than that of SAC. The proposed method also reduces more machining carbon dioxide emissions, amounting to only 255.07 g CO2. Besides, the feasibility of the proposed method is verified by surface roughness measurement, which ensures that practical considerations are consistent with the theoretical contributions, showing that the proposed method is both sustainable and feasible.
The paper also includes a comparative analysis with two commonly used machining methods: end milling and flank milling, based on iso-scallop height principle. The practical application of these methods is shown in Fig. 11, with insights from Fig. 12 providing evidence that: (1) end milling is less effective in improving energy efficiency due to its narrower cutting strip width, unlike flank milling which is more suitable for free-form surfaces; (2) flank milling with an iso-scallop height causes a conservative tool path generation and expansion, resulting in a longer total path length. This further demonstrates the contributions of this paper: neither of the two methods leads to a significant reduction in energy efficiency. The optimal tool path strategy, therefore, involves carefully adjusting cutting strip width and feed angle, illustrating the study’s contributions in energy efficiency.
To the best of the authors’ knowledge, few studies address the energy and time minimisation problem of five-axis flank milling from a global perspective of optimising tool path generation and expansion. This work supplements the previous research (Li & Tang, 2021) and explores energy-efficiency potentials to a greater extent, promising to further ease the industrial energy crisis.
Conclusions
This paper introduces a new theoretical framework for optimising the tool path of flank milling to save energy, which is useful for machining free-form surfaces that are widely needed. The main contributions of this paper are: (1) developing a new model of machining energy consumption that accounts for the features of five-axis machining, and relates it to tool path variables such as the feed angle, cutting strip width, and path length; (2) setting up a model of tool path generation and expansion optimisation, using MDPs. By applying MSAC, MDPs are navigated efficiently, allowing to find the best tool path scheme for the most energy-efficient machining.
The case study shows the advantages of the proposed method: (1) the model of machining energy consumption is first validated, which achieves a prediction error of less than 7.169%; (2) Based on the model, the performance of MSAC is evaluated, achieving a 24.89% and 27.04% reduction in machining energy and time, compared to SAC without meta-learning. This shows the proposed method's benefits on energy efficiency, and also highlights the need of using meta learning with reinforcement learning to solve multi-complex optimisation problems; moreover, when compared with two common machining methods, the proposed method showed improvements of 69.96% and 68.44% in energy consumption and machining time than end milling, and 41.50% and 39.80% than flank milling, proving the importance of optimising cutting strip width and feed angle. These findings confirm that the contributions of this paper match the progress made in practical machining.
This study has some limitations that suggest directions for future studies. Firstly, about the validation of the machining energy consumption model, this paper only applies it to dual-head machine tools, because the main objective of this work is tool path optimisation. However, the proposed method should be valid for any machine tool in theory, so future work will test the model on different types of machine tools to verify the generality of the proposed model. Secondly, for the tool path optimisation in five-axis flank milling, the study recognises that complex surfaces with thin-walled features may deform and affect the machined quality. This problem is related to the cutting strip width, which influences cutting forces. A future research area is to include the workpiece deformation in the model’s constraints, in order to improve the proposed tool path optimisation, and enhance the machining sustainability and machined quality.
Data availability statement
The data that support the findings of this paper are available on reasonable request to the corresponding author.
References
Bhinge, R., Park, J., Law, K. H., Dornfeld, D. A., Helu, M., & Rachuri, S. (2016). Toward a generalized energy prediction model for machine tools. Journal of Manufacturing Science and Engineering, 139(4), 041013. https://doi.org/10.1115/1.4034933
Cai, W., Liu, C., Lai, K.-H., Li, L., Cunha, J., & Hu, L. (2019). Energy performance certification in mechanical manufacturing industry: A review and analysis. Energy Conversion and Management, 186, 415–432. https://doi.org/10.1016/j.enconman.2019.02.041
Camarinha-Matos, L. M., Rocha, A. D., & Graça, P. (2024). Collaborative approaches in sustainable and resilient manufacturing. Journal of Intelligent Manufacturing, 35(2), 499–519. https://doi.org/10.1007/s10845-022-02060-6
Campatelli, G., Scippa, A., Lorenzini, L., & Sato, R. (2015). Optimal workpiece orientation to reduce the energy consumption of a milling process. International Journal of Precision Engineering and Manufacturing-Green Technology, 2(1), 5–13. https://doi.org/10.1007/s40684-015-0001-3
Chen, X., Li, C., Tang, Y., Li, L., Du, Y., & Li, L. (2019). Integrated optimization of cutting tool and cutting parameters in face milling for minimizing energy footprint and production time. Energy, 175, 1021–1037. https://doi.org/10.1016/j.energy.2019.02.157
Chu, C.-H., Chen, H.-Y., & Chang, C.-H. (2020). Continuity-preserving tool path generation for minimizing machining errors in five-axis CNC flank milling of ruled surfaces. Journal of Manufacturing Systems, 55, 171–178. https://doi.org/10.1016/j.jmsy.2020.03.004
Deng, Z., Zhang, H., Fu, Y., Wan, L., & Liu, W. (2017). Optimization of process parameters for minimum energy consumption based on cutting specific energy consumption. Journal of Cleaner Production, 166, 1407–1414. https://doi.org/10.1016/j.jclepro.2017.08.022
Edem, I. F., Balogun, V. A., Nkanang, B. D., & Mativenga, P. T. (2019). Software analyses of optimum toolpath strategies from computer numerical control (CNC) codes. The International Journal of Advanced Manufacturing Technology, 103(1), 997–1007. https://doi.org/10.1007/s00170-019-03604-6
Edem, I. F., & Mativenga, P. T. (2017). Modelling of energy demand from computer numerical control (CNC) toolpaths. Journal of Cleaner Production, 157, 310–321. https://doi.org/10.1016/j.jclepro.2017.04.096
Fallah, A., Mokhtari, A., & Ozdaglar, A. (2020). On the convergence theory of gradient-based model-agnostic meta-learning algorithms. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (pp. 1082–1092). PMLR. https://doi.org/10.48550/arXiv.1908.10400
Feng, C., Chen, X., Zhang, J., Huang, Y., & Qu, Z. (2022). Minimizing the energy consumption of hole machining integrating the optimization of tool path and cutting parameters on CNC machines. The International Journal of Advanced Manufacturing Technology, 121(1), 215–228. https://doi.org/10.1007/s00170-022-09343-5
Feriani, A., Wu, D., Xu, Y. T., Li, J., Jang, S., Hossain, E., Liu, X., & Dudek, G. (2022). Multiobjective load balancing for multiband downlink cellular networks: A meta- reinforcement learning approach. IEEE Journal on Selected Areas in Communications, 40(9), 2614–2629. https://doi.org/10.1109/JSAC.2022.3191114
Fountas, N. A., Benhadj-Djilali, R., Stergiou, C. I., & Vaxevanidis, N. M. (2019). An integrated framework for optimizing sculptured surface CNC tool paths based on direct software object evaluation and viral intelligence. Journal of Intelligent Manufacturing, 30(4), 1581–1599. https://doi.org/10.1007/s10845-017-1338-y
Gao, Y., Mi, S., Zheng, H., Wang, Q., & Wei, Z. (2022). An energy efficiency tool path optimization method using a discrete energy consumption path model. Machines, 10(5), 348. https://doi.org/10.3390/machines10050348
Grandguillaume, L., Lavernhe, S., & Tournier, C. (2021). Optimal tool orientation in 3 + 2-axis machining considering machine kinematics. The International Journal of Advanced Manufacturing Technology, 115(9), 2765–2783. https://doi.org/10.1007/s00170-021-07036-z
Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning (pp. 1861–1870). PMLR. https://doi.org/10.48550/arXiv.1801.01290
He, Y., Tian, X., Li, Y., Wang, S., & Sutherland, J. W. (2020a). Modeling machining energy consumption including the effect of toolpath. Procedia CIRP, 90, 573–578. https://doi.org/10.1016/j.procir.2020.02.055
He, Y., Wu, P., Li, Y., Wang, Y., Tao, F., & Wang, Y. (2020b). A generic energy prediction model of machine tools using deep learning algorithms. Applied Energy, 275, 115402. https://doi.org/10.1016/j.apenergy.2020.115402
Hu, L., Liu, Y., Peng, C., Tang, W., Tang, R., & Tiwari, A. (2018). Minimising the energy consumption of tool change and tool path of machining by sequencing the features. Energy, 147, 390–402. https://doi.org/10.1016/j.energy.2018.01.046
Hu, P., & Tang, K. (2016). Five-axis tool path generation based on machine-dependent potential field. International Journal of Computer Integrated Manufacturing, 29(6), 636–651. https://doi.org/10.1080/0951192X.2015.1068451
Jia, S., Tang, R., & Lv, J. (2014). Therblig-based energy demand modeling methodology of machining process to support intelligent manufacturing. Journal of Intelligent Manufacturing, 25(5), 913–931. https://doi.org/10.1007/s10845-012-0723-9
Jia, S., Tang, R., & Lv, J. (2016). Machining activity extraction and energy attributes inheritance method to support intelligent energy estimation of machining process. Journal of Intelligent Manufacturing, 27(3), 595–616. https://doi.org/10.1007/s10845-014-0894-7
Jiang, Z., Gao, D., Lu, Y., & Liu, X. (2019). Optimization of cutting parameters for trade-off among carbon emissions, surface roughness, and processing time. Chinese Journal of Mechanical Engineering, 32(1), 94. https://doi.org/10.1186/s10033-019-0408-9
Karuppusamy, S. N., & Kang, B.-Y. (2017). Minimizing airtime by optimizing tool path in computer numerical control machine tools with application of A* and genetic algorithms. Advances in Mechanical Engineering, 9(12), 1687814017737448. https://doi.org/10.1177/1687814017737448
Khalilpourazari, S., Khalilpourazary, S., Özyüksel Çiftçioğlu, A., & Weber, G.-W. (2021). Designing energy-efficient high-precision multi-pass turning processes via robust optimization and artificial intelligence. Journal of Intelligent Manufacturing, 32(6), 1621–1647. https://doi.org/10.1007/s10845-020-01648-0
Li, K., Jin, X., Jia, Q. S., Ren, D., & Xia, H. (2023). An OCBA-Based Method for Efficient Sample Collection in Reinforcement Learning. IEEE Transactions on Automation Science and Engineering (pp. 1–12). https://doi.org/10.1109/TASE.2023.3282257
Li, C., Li, L., Tang, Y., Zhu, Y., & Li, L. (2019). A comprehensive approach to parameters optimization of energy-aware CNC milling. Journal of Intelligent Manufacturing, 30(1), 123–138. https://doi.org/10.1007/s10845-016-1233-y
Li, C., Tang, Y., Cui, L., & Li, P. (2015). A quantitative approach to analyze carbon emissions of CNC-based machining systems. Journal of Intelligent Manufacturing, 26, 911–922. https://doi.org/10.1007/s10845-013-0812-4
Li, J., & Zhou, T. (2023). Active fault-tolerant coordination energy management for a proton exchange membrane fuel cell using curriculum-based multiagent deep meta-reinforcement learning. Renewable and Sustainable Energy Reviews, 185, 113581. https://doi.org/10.1016/j.rser.2023.113581
Li, L., Deng, X., Zhao, J., Zhao, F., & Sutherland, J. W. (2018). Multi-objective optimization of tool path considering efficiency, energy-saving and carbon-emission for free-form surface milling. Journal of Cleaner Production, 172, 3311–3322. https://doi.org/10.1016/j.jclepro.2017.07.219
Li, Z., & Tang, K. (2021). Partition-based five-axis tool path generation for freeform surface machining using a non-spherical tool. Journal of Manufacturing Systems, 58, 248–262. https://doi.org/10.1016/j.jmsy.2020.12.004
Liang, F., Kang, C., & Fang, F. (2020). A smooth tool path planning method on NURBS surface based on the shortest boundary geodesic map. Journal of Manufacturing Processes, 58, 646–658. https://doi.org/10.1016/j.jmapro.2020.08.047
Liang, F., Kang, C., Lu, Z., & Fang, F. (2021). Iso-scallop tool path planning for triangular mesh surfaces in multi-axis machining. Robotics and Computer-Integrated Manufacturing, 72, 102206. https://doi.org/10.1016/j.rcim.2021.102206
Lindvall, R., Lenrick, F., M’Saoubi, R., Ståhl, J.-E., & Bushlya, V. (2021). Performance and wear mechanisms of uncoated cemented carbide cutting tools in Ti6Al4V machining. Wear, 477, 203824. https://doi.org/10.1016/j.wear.2021.203824
Liu, X., Li, Y., & Xu, X. (2018). A region-based tool path generation approach for machining freeform surfaces by applying machining strip width tensor. The International Journal of Advanced Manufacturing Technology, 98(9), 3191–3204. https://doi.org/10.1007/s00170-018-2427-6
Lo, C. C. (1999). Efficient cutter-path planning for five-axis surface machining with a flat-end cutter. Computer-Aided Design, 31(9), 557–566. https://doi.org/10.1016/S0010-4485(99)00052-4
Lu, F., Zhou, G., Zhang, C., Liu, Y., Chang, F., & Xiao, Z. (2023). Energy-efficient multi-pass cutting parameters optimisation for aviation parts in flank milling with deep reinforcement learning. Robotics and Computer-Integrated Manufacturing, 81, 102488. https://doi.org/10.1016/j.rcim.2022.102488
Lu, Y., Chen, S., & Wang, C. (2022). Smooth flank milling tool path generation for blisk surface with barrel cutters. Journal of Mechanical Engineering, 58(1), 256–266. https://doi.org/10.3901/JME.2022.01.256
Lu, Y., Ding, Y., Wang, C., & Zhu, L. (2019). Tool path generation for five-axis machining of blisks with barrel cutters. International Journal of Production Research, 57(5), 1300–1314. https://doi.org/10.1080/00207543.2018.1470344
Luan, X., Zhang, S., Li, J., Li, G., Chen, J., & Mendis, G. (2018). Comprehensive effects of tool paths on energy consumption, machining efficiency, and surface integrity in the milling of alloy cast Iron. The International Journal of Advanced Manufacturing Technology, 98(5), 1847–1860. https://doi.org/10.1007/s00170-018-2269-2
Mo, Y., Li, L., Huang, B., & Li, X. (2023). Few-shot RUL estimation based on model-agnostic meta-learning. Journal of Intelligent Manufacturing, 34(5), 2359–2372. https://doi.org/10.1007/s10845-022-01929-w
Ping, Y., Liu, Y., Zhang, L., Wang, L., & Xu, X. (2023). Sequence generation for multi-task scheduling in cloud manufacturing with deep reinforcement learning. Journal of Manufacturing Systems, 67, 315–337. https://doi.org/10.1016/j.jmsy.2023.02.009
Shi, P., Tong, X., Cai, M., & Niu, S. (2024). A novel 2.5D machining feature recognition method based on ray blanking algorithm. Journal of Intelligent Manufacturing, 35(4), 1585–1605. https://doi.org/10.1007/s10845-023-02122-3
Shin, S.-J., Woo, J., & Rachuri, S. (2017). Energy efficiency of milling machining: Component modeling and online optimization of cutting parameters. Journal of Cleaner Production, 161, 12–29. https://doi.org/10.1016/j.jclepro.2017.05.013
Song, D.-N., Zheng, D.-W., Zhong, Y.-G., Ma, J.-W., & Li, J.-S. (2022). Non-isometric dual-spline interpolation for five-axis machine tools by FIR filtering-based feedrate scheduling using pseudo curvature under axial drive constraint. Journal of Manufacturing Processes, 79, 827–843. https://doi.org/10.1016/j.jmapro.2022.05.023
Sun, S., Sun, Y., & Xu, J. (2023). Tool path generation for 5-axis flank milling of ruled surfaces with optimal cutter locations considering multiple geometric constraints. Chinese Journal of Aeronautics, 36(12), 408–424. https://doi.org/10.1016/j.cja.2023.03.040
Tong, X., Liu, Q., Pi, S., & Xiao, Y. (2020). Real-time machining data application and service based on IMT digital twin. Journal of Intelligent Manufacturing, 31(5), 1113–1132. https://doi.org/10.1007/s10845-019-01500-0
Vishnu, V. S., Varghese, K. G., & Gurumoorthy, B. (2020). Energy Prediction in Process Planning of Five-axis Machining by Data-driven Modelling. Procedia CIRP, 93, 862–867. https://doi.org/10.1016/j.procir.2020.04.087
Wan, N., Zhuang, Q. X., Chang, Z. Y., & Yi, Z. F. (2023). An allowance optimization method for near-net-shape blade considering material-saving, energy consumption and carbon emissions. International Journal of Environmental Science and Technology, 20(4), 4339–4354. https://doi.org/10.1007/s13762-022-04242-4
Wang, J., Hu, J., Min, G., Zomaya, A. Y., & Georgalas, N. (2020). Fast adaptive task offloading in edge computing based on meta reinforcement learning. IEEE Transactions on Parallel and Distributed Systems, 32(1), 242–253. https://doi.org/10.1109/TPDS.2020.3014896
Wang, L., Meng, Y., Ji, W., & Liu, X. (2019). Cutting energy consumption modelling for prismatic machining features. The International Journal of Advanced Manufacturing Technology, 103(5), 1657–1667. https://doi.org/10.1007/s00170-019-03667-5
Wen, S., Wen, Z., Zhang, D., Zhang, H., & Wang, T. (2021). A multi-robot path-planning algorithm for autonomous navigation using meta-reinforcement learning based on transfer learning. Applied Soft Computing, 110, 107605. https://doi.org/10.1016/j.asoc.2021.107605
Xu, K., Luo, M., & Tang, K. (2016). Machine based energy-saving tool path generation for five-axis end milling of freeform surfaces. Journal of Cleaner Production, 139, 1207–1223. https://doi.org/10.1016/j.jclepro.2016.08.140
Xu, L., Huang, C., Li, C., Wang, J., Liu, H., & Wang, X. (2021). Estimation of tool wear and optimization of cutting parameters based on novel ANFIS-PSO method toward intelligent machining. Journal of Intelligent Manufacturing, 32(1), 77–90. https://doi.org/10.1007/s10845-020-01559-0
Yi, J., Chu, C.-H., Kuo, C.-L., Li, X., & Gao, L. (2018). Optimized tool path planning for five-axis flank milling of ruled surfaces using geometric decomposition strategy and multi-population harmony search algorithm. Applied Soft Computing, 73, 547–561. https://doi.org/10.1016/j.asoc.2018.08.041
Yoon, H.-S., Singh, E., & Min, S. (2018). Empirical power consumption model for rotational axes in machine tools. Journal of Cleaner Production, 196, 370–381. https://doi.org/10.1016/j.jclepro.2018.06.028
Zhang, Y., & Tang, K. (2019). Automatic sweep scan path planning for five-axis free-form surface inspection based on hybrid swept area potential field. IEEE Transactions on Automation Science and Engineering, 16(1), 261–277. https://doi.org/10.1109/TASE.2018.2827102
Zhao, J., Li, L., Li, C., Sutherland, J. W., & Li, L. (2022). Energy-aware sub-regional milling method for free-form surface based on clustering features. Journal of Manufacturing Processes, 84, 937–952. https://doi.org/10.1016/j.jmapro.2022.10.057
Zhou, G., Lu, Q., Xiao, Z., Zhou, C., & Tian, C. (2019). Cutting parameter optimization for machining operations considering carbon emissions. Journal of Cleaner Production, 208, 937–950. https://doi.org/10.1016/j.jclepro.2018.10.191
Zhou, G., Zhang, C., Lu, F., & Zhang, J. (2020). Integrated optimization of cutting parameters and tool path for cavity milling considering carbon emissions. Journal of Cleaner Production, 250, 119454. https://doi.org/10.1016/j.jclepro.2019.119454
Zou, Q. (2021). Robust and efficient tool path generation for machining low-quality triangular mesh surfaces. International Journal of Production Research, 59(24), 7457–7467. https://doi.org/10.1080/00207543.2020.1842939
Funding
Open access funding provided by Linköping University. This work was funded by the National Natural Science Foundation of China [No. 52105530 and 52005400]; the China Postdoctoral Science Foundation [No. 2021M692556 and 2021M693881]; the Key Research and Development Program of Shaanxi Province [No. 2023-ZDLNY-71]; and the Fundamental Research Funds for the Central Universities (No. xzy012022053).
Author information
Authors and Affiliations
Contributions
FL: writing—original draft, methodology, conceptualisation. GZ: modelling, supervision, project administration, funding acquisition. CZ: experiment, writing—review & editing, funding acquisition. YL: supervision, validation, writing—review & editing. FC: review. QL: review, funding acquisition. ZX: review.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
Appendix A
With the basic definition of the SAC, the structures of the actor network and critic network used in this paper are specified in Fig.
13, respectively.
From Fig. 13, it is observed that the actor network’s input layer has 5 neurons corresponding to the space of \(s_{m,t}\) with 5 elements, and includes multiple hidden layers. It features two output layers, each with 2 neurons, producing the mean and logarithmic standard deviation of actions. The critic network's input layer has 7 neurons, equal to the combined elements in the space of \(s_{m,t}\) and \(a_{m,t}\), and possesses several hidden layers. Its output layer has only 1 neuron, reflecting the single action value for a specific state-action pair, i.e., \(Q(s_{m,t} ,a_{m,t} )\) and uses linear output rather than an activation function. The target critical network mirrors the critic network’s structure.
The training mechanism of SAC is shown in Table
9. Specifically, the initialisation is performed in line 1, including initialising the network parameters, environment, experience buffer, and the area to be removed. In the iteration loop (line 3–13), the agent starts to interact with the environment (line 4–8). During the interaction, it observes \(s_{m,t}\) and samples \(a_{m,t}\) from a Gaussian distribution according to the output by the actor network. After that, the agent receives a reward \(r_{m,t}\) at time step t and observes a new state \(s_{m,t + 1}\). In line 9–12, parameters \(\theta\), \(\vartheta\) and \(\varphi\) are updated via several gradient steps by backpropagation. The learning process continues to move forward until reaching the maximum iterations.
SAC’s hyper-parameters, including learning rate, minibatch size, discount factor and hidden layers, are important to agent’s learning performance. To tune them within their defined parameter space, the random search is applied here. Meanwhile, to ensure consistent initial conditions before training, same random seeds are used across different hyperparameter setups. Each combination repeats three times with varied seeds to assess performance stability under random influences. Performance metrics, namely cumulative reward, and convergence iterations, are averaged considering SAC’s inherent randomness. The optimal hyperparameters are ultimately selected based on their generalisation capabilities in new tasks, which are presented in Sect. “Case study”.
Appendix B
The implementation of the proposed MSAC is detailed in Table 10. The well-trained meta policy parameters \({\mathop \theta \limits^\wedge }\), \({\mathop \vartheta \limits^\wedge }\) and \({\mathop \varphi \limits^\wedge }\) output by the meta-training stage are used to initialise \(\theta^{\prime}\), \(\vartheta^{\prime}\) and \(\varphi^{\prime}\) in meta adaptation stage (see meta-adaptation line 1). The parameters are fine-tuned on each expansion task with a value of \(\lambda\) ranges from 0° to 180°. There are 181 tasks guided by \(\lambda\). Each expansion task is optimised and its performance results, i.e., machining energy consumption and time are obtained (see meta-adaptation line 2–8). The optimal expansion scheme with a value of \(\lambda\) is identified by comparing the results of all tasks within \(T_M\) (see meta-adaptation line 9). Naturally, the traced value of \(\lambda\) is known as the optimal one, i.e., \(\lambda^\ast\).
To analyse the MSAC’s convergence, we should first prove that the optimal policy is searched for each meta training task.
Lemma 1 (Policy Improvement)
Consider the soft Bellman backup operator in MSAC be \(\tau_{\pi_m }^{meta}\). For a fixed policy, the soft Q-value can be computed iteratively, starting from any function \(Q = Q_{\pi_m }^{meta} :S \times A \to {\mathbb{R}}\) with \(|A| < \infty\), \(m \in [1, \cdots ,M]\) and define \(Q(k + 1) = \tau_{\pi_m }^{meta} Q(k)\). \(Q(k)\) will converge to soft value of \(\pi_m\) as \(k \to \infty\).
Proof
Define the reward in MSAC for each task:
Rewrite the update rule as
With \(|A| < \infty\) is well guaranteed in this paper, the entropy reward is thus bounded. The soft value function for any policy \(\pi\) is obtained by repeatedly applying \(\tau_{\pi_m }^{meta}\). Leveraging the classic policy evaluation convergence results (Haarnoja et al., 2018), Lemma 1 is proven.
In the policy improvement step, for each state of each task, the policy is updated in Eq. (36).
Lemma 2 (Policy Improvement)
Let \(\pi_m^{new} \in \prod\) be the optimizer of the problem defined in Eq. (36). Then \(Q_{\pi_m^{new} } (s_{m,t} ,a_{m,t} ) \ge Q_{\pi_m^{old} } (s_{m,t} ,a_{m,t} )\) for all \((s_{m,t} ,a_{m,t} ) \in S \times A\) with \(|A| < \infty\).
Proof
Let \(\pi_m^{new} \in \prod\), \(Q_{\pi_m^{old} }^{meta}\) and \(V_{\pi_m^{old} }^{meta}\) be the corresponding soft state-action value and soft state value with met a learning in mth meta training task of MSAC, and \(\pi_m^{new}\) be defined as:
where \(J_{\pi_m^{old} } (\pi_m^{\prime} ( \cdot |s_{m,t} )) = D_{KL} [\pi_m^{\prime} ( \cdot |s_{m,t} )||\frac{{\exp (Q_{\pi_m^{old} }^{meta} (s_{m,t} , \cdot ))}}{{Z_{\pi_m^{old} } (s_{m,t} )}}]\). Since \(\pi_m^{new} = \pi_m^{old} \in \Pi\) is always chosen, \(J_{\pi_m^{old} } (\pi_m^{new} (s_{m,t} , \cdot )) \le J_{\pi_m^{old} } (\pi_m^{old} (s_{m,t} , \cdot ))\). Hence,
where \(\log Z^{\pi_m^{old} } (s_{m,t} )\) is a partition function. Thus, Eq. (38) is rewritten as in Eq. (39):
Next, the soft Bellman equation is applied to derive the value of \(Q_{\pi_m^{old} }^{meta} (s_{m,t} ,a_{m,t} )\):
where \(Q_{\pi_m^{old} }^{meta} (s_{m,t} ,a_{m,t} )\) is repeatedly expanded on RHS. It is obvious that the updated value function of \(\pi_m^{new}\) with meta learning is superior to the old one. The soft state–action value sequence in each task by MSAC will finally converge according to Lemma 1.
Theorem 1
Repeated application of soft policy evaluation and soft policy improvement to any meta policy \(\pi_m \in \Pi ,\forall m\) will converge to an optimal policy \(\pi^\ast\) and \(Q^{\pi_m^\ast } (s_{m,t} ,a_{m,t} ) \ge Q^{\pi_m } (s_{m,t} ,a_{m,t} )\) for all \(\pi_m \in \Pi ,\forall m\) and \((s_{m,t} ,a_{m,t} ) \in S \times A\) with \(|A| < \infty\).
Proof
Let \(\pi_m (k)\) be the policy after k times in MSAC for one task. By Lemma 2, \(Q^{\pi_m (k + 1)} \ge Q^{\pi_m (k)}\). Since \(Q^{\pi_m }\) is bounded for \(\pi_m \in \Pi ,\forall m\), the sequence converges to some \(\pi_m^\ast\). For any \(\pi_m \in \Pi ,\forall m\) and \(\pi_m \ne \pi_m^\ast\), it is true that \(J_{\pi_m^\ast } (\pi_m^\ast ( \cdot |s_{m,t} )) < J_{\pi_m^\ast } (\pi_m ( \cdot |s_{m,t} ))\). As stated in Lemma 2, \(Q^{\pi_m^* } (s_{m,t} ,a_{m,t} ) \ge Q^{\pi_m } (s_{m,t} ,a_{m,t} )\).
We have proved that the optimal policy will be searched for each meta training task by MSAC. Next, we consider the impact of the MAML on SAC. The convergence of one-step MAML is guaranteed to converge to a first-order stationary point in \(\mathcal{O}(1/{\epsilon }^{2})\) if enough samples (\(D^{tr}\) and \(D^{ml}\)) are used on the training phase (Fallah et al., 2020). For any \(\epsilon \in (\mathrm{0,1})\), a solution \({\theta }_{\epsilon }\) is found such that \({{\mathbb{E}}[|\left|{\theta }_{\epsilon }\right||]}^{2}\le\Delta +{\epsilon }^{2}\), where \(\Delta \) is the meta training parameters. The convergence of multi-step MAML for RL is guaranteed to converge in \(\mathcal{O}({\epsilon }^{2})\) iterations if meta learning rate is chosen properly (Feriani et al., 2022). Therefore, the MAML-RL is basically guaranteed to converge.
Appendix C
See Table 11.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lu, F., Zhou, G., Zhang, C. et al. Energy-efficient tool path generation and expansion optimisation for five-axis flank milling with meta-reinforcement learning. J Intell Manuf (2024). https://doi.org/10.1007/s10845-024-02412-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10845-024-02412-4