Introduction

In the context of rising industrialisation and its significant impact on global carbon emissions and energy consumption, it is crucial to focus on improving energy efficiency in the manufacturing industry, a sector contributes notably to 55% of global energy consumption (Khalilpourazari et al., 2021; Li et al., 2019). Therein, the mechanical manufacturing represents 74.7% of the manufacturing sector’s energy use (He et al., 2020b), yet its energy efficiency hovers around just 30% (Cai et al., 2019). With wide distribution and considerable energy-saving potential, the mechanical manufacturing industry is urged to adopt sustainable practices. This effort is vital for advancing energy-focused intelligent manufacturing (Jia et al., 2014; Xu et al., 2021).

Reducing the machining energy consumption of machine tools is sensible to improve the machining process’s energy efficiency (Tong et al., 2020). According to Wan et al. (2023), milling machines comprise approximately 60% to 90% of CO2 equivalent emissions during operation. As an essential machining process, five-axis flank milling possesses exclusive capabilities in achieving high-precision machining of complex surfaces. Therefore, it is ubiquitous in industries like aerospace, automotive, and mould-making that require mass production of such shapes (Chu et al., 2020; Yi et al., 2018). Inevitably, five-axis flank milling contributes significantly to energy consumption in the machine tool’s operation (Lu et al., 2023). Therefore, developing energy-saving process planning for the flank milling is imperative. Compared with cutting parameters optimisation, tool path optimisation could reduce machining energy consumption without lowering surface quality and production efficiency (Edem et al., 2019). Hence, studying energy-efficient tool path optimisation of the flank milling is vital to achieve energy conservation. It further presents a pathway towards sustainability in the mechanical manufacturing industry.

Accurately estimating the machining energy consumed is fundamental for selecting the most energy-efficient tool paths (Jia et al., 2016). Current methods, however, overlook the impact of tool movements in their energy consumption models (Gao et al., 2022). Another key aspect is optimising tool path parameters to effectively generate and expand the principal path (Fountas et al., 2019). Presently, there are two types for achieving energy-efficient tool paths. The first prefers to find the shortest total path length using heuristic algorithms, typically with a constant scallop height for tool path expansion (Zou, 2021). Nevertheless, the height is set too conservative for complex surfaces with a large changing curvature, leading to a small cutting strip width and suboptimal energy consumption improvements (Liang et al., 2020). The second one aims to optimise the principal path using maximum cutting strip width as a metric (Lu et al., 2019), but this can increase material removal energy consumption and damage surface quality. Therefore, it is pivotal to strike an optimal balance between cutting strip width, energy consumption and surface quality in the energy-efficient tool path optimisation. Additionally, since flank milling generates space movements of feeding and transmission systems (Song et al., 2022; Yoon et al., 2018), machining energy consumption tends to vary across principal paths along different feed angles. Unfortunately, there is a blank in the studies regarding the optimisation of feed angles to improve energy efficiency.

In response to these challenges, this paper proposes an energy-efficient tool path generation and expansion method. The motivations behind the method are to maximise energy efficiency while maintaining surface quality in five-axis flank milling for complex surfaces. To achieve this, the proposed method firstly defines a feed angle that guides the tool’s spatial motion, considering flank milling features. Then, a novel machining energy consumption model is built by tool path generation and expansion variables, including feed angle, cutting strip width and path length. Secondly, a tool path dynamic optimisation model is established to minimise machining energy consumption and time, containing a maximum scallop height constraint for ensuring machining quality. Thirdly, the model is formulated as multiple Markov Decision Processes (MDPs), which contain multi-expansion tasks based on different principal paths generated by various feed angles. After that, meta reinforcement learning (MRL) is applied to solve the MDPs to find the optimal tool path generation and expansion scheme. The reason why MRL is applied is that it enables faster learning than traditional RL, not requiring full retraining for a new task (Wang et al., 2020). By pre-learning task similarities, it efficiently solves MDPs (Li & Zhou, 2023). For this reason, model-agnostic meta-learning (MAML) (Mo et al., 2023) and Soft Actor-Critic (SAC) (Ping et al., 2023) (MSAC) are integrated to solve the similar tasks driven by the feed angle.

Based on the above study, there are three pivotal contributions:

  1. (1)

    Considering five-axis flank milling features, machining energy consumption is firstly modelled by the tool path parameters. The model is both material- and machine tool-agnostic, which includes spatial movements of tool path to aid operators in easily adjusting tool paths to achieve sustainability;

  2. (2)

    MSAC is applied to accelerate MDPs traversal. It enables energy-efficient expansions following a principal path through dynamic optimisation of cutting strip width in each MDP. Subsequently, an optimal tool path generation and expansion scheme is identified by navigating MDPs across different feed angles;

  3. (3)

    The proposed method outperforms current methods by significantly lowering machining energy consumption and time, as well as enhancing sustainability, evidenced in a case study. Its viability is further verified through surface quality assessments, establishing the method as a key reference for feasibility and sustainability in complex parts manufacturing.

This paper is structured as below: section “Literature Review” surveys related studies, summarising the existing research gaps. Then, the studied problem is formulated and an optimisation model is provided. After that, an MSAC framework is presented. Subsequently, the proposed method is evaluated. Conclusions, limitations and future works are outlined finally.

Literature review

This paper contributes to two critic research domains: (1) energy consumption modelling in tool path optimisation; (2) tool path optimisation method related to energy efficiency. In this section, the literature on these two aspects are reviewed, followed by the identification of the research gaps.

Energy consumption modelling in tool path optimisation

The classical problem of energy-efficient tool path optimisation in numerical control machining involves optimising machining energy consumption through tool path strategies (Chu et al., 2020). Therefore, investigating the correlation between machining energy consumption and tool path is a precondition (Zhao et al., 2022). Table 1 provides an overview of the related works on energy consumption modelling.

Table 1 An overview of the energy consumption modelling problem

From Table 1, Campatelli et al. (2015) discovered that the energy consumption rates of the motor varies with cutting path schemes. Luan et al. (2018) put forward an energy consumption function incorporating tool path schemes impacts, including zag, zigzag and spiral paths. Edem and Mativenga (2017) proposed an energy consumption calculation method related to the CNC tool path with NC code as input. Similar work is also found in Bhinge et al. (2016). Shin et al. (2017) built a model capable of forecasting the energy used in the tool path level for particular machining setups. Wang et al. (2019) proposed a cutting energy consumption modelling with different tool path schemes. He et al. (2020a) incorporated tool path schemes into the machining energy consumption modelling; yet, the method mainly applies to specific workpieces, and did not consider the tool path movement effects. Xu et al. (2016) defined an energy consumption model of the designated machine tool in the end milling process. Li et al. (2018) and Hu et al. (2018) both built an energy consumption function that related the tool path length into the machining time. Vishnu et al. (2020) proposed a two-stage energy model that relied on the NC codes to identify the path length.

In summary, most of the works are only applicable to traditional three-axis machining. Given the five-axis machining involved complex parts and long machining efficiency, the modelling considering five-axis machining features is very necessary.

Energy-efficient tool path optimisation

Research on energy-efficient optimisation of tool paths for flank milling is currently lacking. But in common practices in NC machining, there are two mainstreams for energy-efficient or efficient path planning, both using iso-scallop height for path expansion. Table 2 provides an overview of the related works on energy efficient too path optimisation.

Table 2 An overview of the energy efficient too path optimisation problem

As shown in Table 2, the first type of optimisation focuses on generating the shortest tool path to boost energy efficiency by heuristic algorithms, e.g., Genetic Algorithm (Karuppusamy & Kang, 2017), Harmony Search Algorithm (Yi et al., 2018), Depth-first Search (Hu et al., 2018), Ant Colony Algorithm (Feng et al., 2022), NSGA-II (Zhou et al., 2020), Artificial Electric Field Algorithm (Sun et al., 2023). Li et al. (2018) optimised tool paths with minimum energy consumption during the milling process, where an improved GA was used to optimise the ways of cutter contact points. The advantages lie in achieving faster machining, consequently reducing overall energy consumption. However, it may overlook the impact of cutting strip width on energy consumption, potentially leading to suboptimal energy optimisation in certain scenarios.

The second aims to maximise the cutting strip width in tool path expansion to improve energy efficiency. Li and Tang (2021) realised a larger cutting strip width by choosing a section of the cutting edge with a smaller curvature of a non-spherical tool for freeform surface machining. Xu et al. (2016) achieved a reduction in milling energy consumption by enlarging the width of the cutting curves. Hu and Tang (2016) investigated the strip-width-maximisation study to enhance machining efficiency of flank milling. Liu et al. (2018) and Grandguillaume et al. (2021) optimised cutting strip width to enhance time efficiency. However, it is worth noting that blindly enlarging the cutting strip width may increase material removal energy consumption and damage the surface quality.

Research gaps

In summary, considerable research has been conducted on energy modelling for energy-efficient tool path optimisation and the optimisation process itself. However, studies in the realm of five-axis machining are notably lacking, with specific gaps including:

  1. (1)

    Most works focused on three-axis machining and built a simple energy consumption model by just relating machining time with path length, failing to figure out the impacts of tool path movements on machining energy consumption for five-axis machining, thus unable to act as a fundament of the five-axis tool path optimisation;

  2. (2)

    Existing tool path optimisation typically aims for either tool path generation or expansion, using a constant iso-scallop height. Yet, these approaches fall short in global optimisation of tool paths, overlooking the impacts of variable cutting strip width and feed angle on overall machining energy efficiency and surface quality.

The aim of this paper is to shorten the gap between requirements of energy-efficient five-axis machining and existing works. Specifically, energy consumption modelling considering tool path movement, and a tool path generation and expansion optimisation are presented. It not only optimises feed angle, but adjusts cutting strip width during each expansion. This work promises to enhance energy efficiency and adhere surface quality in complex parts five-axis machining.

Problem formulation

This section introduces the preliminaries of tool path optimisation problem, and devises an energy-efficient tool path optimisation model for five-axis machining.

Preliminaries

For modelling free-form surfaces, this paper applies Non-uniform rational B-splines (NURBS), which offers a precise uniform representation of parametric curves for free-form surfaces (Shi et al., 2024), as expressed in Eq. (1):

$$ S(u,v) = \frac{{\sum_{a = 0}^A {\sum_{b = 0}^B {\omega_{a,b} } N_{a,p} (u)N_{b,q} (v)C_{a,b} } }}{{\sum_{a = 0}^A {\sum_{b = 0}^B {\omega_{a,b} } N_{a,p} (u)N_{b,q} (v)} }} \quad 0 \le u,v \le 1 $$
(1)

where u and v are parameters of NURBS model. \(\{ N_{a,p} (u)\}\) and \(\{ N_{b,q} (v)\}\) are basis functions in NURBS modelling. \(C_{a,b}\) are control points, which define the surface topology. \(\{ \omega_{a,b} \}\) are weighting factors, which adjust the dominance of each control point, and are tuned to follow the desired shape.

Figure 1 shows the five-axis flank-milling process, where a Workpiece Coordinate System (WCS) \(o_w - x_w y_w z_w\) is founded, where centre of the bottom of the workpiece is as the origin. A local tool coordinate system (LCS) is established using three key directions: a feed direction \(f\), a normal direction \(n\) of the surface, as well as a product direction \(k = f \times n\) by the right-hand screw rule.

Fig. 1
figure 1

Five-axis flank milling process

In the Local Coordinate System (LCS) of a five-axis machine tool, tool orientation \(T_w (\alpha ,\beta )\) is characterised by a lead angle \(\alpha\) and a tilt angle \(\beta\), which is mathematically described as:

$$ T_w (\alpha ,\beta ) = \left[ \begin{gathered} n_{x,y,z} \hfill \\ f_{x,y,z} \hfill \\ k_{x,y,z} \hfill \\ \end{gathered} \right](\cos \alpha ,\sin \alpha \cos \alpha ,\sin \alpha \sin \beta ) $$
(2)

where \(\alpha\) and \(\beta\) are fixed by this study. A tool path \(L^c\) encompasses the trajectory traced by the cutter location (CL) curve along with the tool orientation at each CL point, where each point has one cutter contact (CC) point. Therefore, \(L^c\) is defined by Eq. (3):

$$ L^c = \{ {\mathop {C_1^{T_1 } C_2^{T_2 } }\limits^\cap } , \cdots ,{\mathop {C_{k - 1}^{T_{k - 1} } C_k^{T_k } }\limits^\cap } \} $$
(3)

where \({\mathop {C_{k - 1}^{T_{k - 1} } C_k^{T_k } }\limits^\cap }\) refers to the connections between adjacent CC points on \(L^c\), which includes the net-cutting path \(L^{nc}\) and air-cutting path \(L^{ac}\). k is the total point number on \(L^c\).

For \(S(u,v)\) in the WCS, a plane \(\Phi\) passes through two points \(S({u / 2},0)\) and \(S({u / 2},v)\) on \(S(u,v)\) with its normal vector \(n_p\) perpendicular to a normal vector \(n_s\) of \(S(u,v)\) at \(p({u / {2,{v / 2}}})\), and intersects into a path curve, i.e., \(\Phi \cap S = L^{pr} (\lambda )\), where \(L^{pr} (\lambda )\) is a principal path. \(\lambda\) is defined as a feed angle between \(L^{pr} (\lambda )\) and \(n_p\), rotating around \(n_s\) along \(\Phi\), as shown in Fig. 2.

Fig. 2
figure 2

The defined feed angle diagram

\(\lambda\) guides \(L^{pr} (\lambda )\) to change in the plane of \(S(u,v)\). In five-axis flank milling, the tool moves along each \(L^{pr} (\lambda )\) driven by various values of \(\lambda\) and forms a family of tool surfaces in space, generating varying energy consumption from moving the workpiece workbench. Therefore, the optimal energy-saving principal curve is identified by optimising \(\lambda\).

To fulfil \(S(u,v)\), multiple expansions based on \(L^{pr} (\lambda )\) are performed. Assuming \(p_1\) of the first expanded path \(L_1^{ex}\) is the neighbouring point of \(p_0\) on \(L^{pr} (\lambda )\), it is expressed as Eq. (4):

$$ p_1 = p_0 + \frac{\partial S}{{\partial u}}\Delta u + \frac{\partial S}{{\partial v}}\Delta v $$
(4)

where Δu and Δv represent the incremental movements along the tool path on the surface from one point to the next (Lo, 1999). \(d_1\) is an increment, also known as a cutting strip width, between \(p_0\) and \(p_1\) in the parametric [u,v] domain, i.e., \(d_1 = |p_0^{\,} p_1 |\). Similarly, \(d_j = |p_{j - 1} p_j |\) is an increment in jth expansion. The jth expanded path is \(L_j^{ex} = \{ {\mathop {C_{1,j}^{T_{1,j} } C_{2,j}^{T_{2,j} } }\limits^\cap } , \cdots ,{\mathop {C_{k - 1,j}^{T_{k - 1,j} } C_{k,j}^{T_{k,j} } }\limits^\cap } \}\). In this paper, the drum taper cutter is used. Each expansion is shown in Fig. 3.

Fig. 3
figure 3

Each path expansion diagram

As shown in Fig. 3, cutting edges AB and CD are involved in flank milling. \(R_b\) is variable curvature of the machined surface, while \(r_b\) is a constant one in the case of no tool change. According to the principle of scallop height, \(d\) is calculated as Eq. (5):

$$ d = \sqrt {{\frac{8R_b r_b h}{{R_b \pm r_b }}}} $$
(5)

where the symbols ‘+’ and ‘−’ are selected to represent convex and concave surfaces, respectively. For free-form surfaces, \(R_b\) changes frequently, and the maximum principal curvature is often used to restrict \(d\) in a conservative way due to the strict limitation of a constant \(h\). This increases the overall toolpath trajectory and the machining energy consumption. Considering this, this paper allows to optimise \(d\) to expand the path with a variable \(h\) instead of a constant value to explore larger energy-efficient machining potentials. The expansion scheme is depicted in Fig. 4.

Fig. 4
figure 4

Expansion mechanism of tool path

From Fig. 4, \(d_j \in \{ d_1 ,d_2 , \cdots ,d_i \}\) is the variable to expand from one point on the tool path to the corresponding point on the next path. For free-form surface parts, the change of \(d_j\) causes the length of the current expanded path \(L_j^{ex} \in \{ L_1^{ex} ,L_2^{ex} , \cdots ,L_i^{ex} \}\) to be different. Therefore, \(L_j^{ex}\) varies with \(d_j\). It is important to note that \(d_j\) varies with the region’s complexity. For example, for high-curvature regions, the optimisation range of \(d_j\) is dynamically diminished within \(h_{\max }\) to enhance the approximating precision of the current region’s shape, which is essentially a partition-based expansion type.

Problem description

This research aims to optimise tool path integration of five-axis flank milling, to minimise both machining energy consumption and time while preserving surface quality. The problem consists of two tasks: (i) quantifying the machining energy consumption by including tool path parameters considering the flank milling features; (ii) deriving a tool path generation and expansion integration scheme with the quantified model.

To effectively solve such two tasks, the problem is decomposed into multiple sub-optimisation problems by all feasible values of \(\lambda\). For mth sub-problem, \(d_{m,j}\) is optimised to search for an ideal expansion scheme \(\{ (d_{m,1} ,L_{m,1}^{ex} (d_{m,1} )), \cdots ,(d_{m,\,i} ,L_{m,i}^{ex} (d_{m,\,i} ))\}\) following a generated path \(L^{pr} (\lambda_m )\). By successive iterations, the optimal path generation and path expansion are searched. Before the study, the assumptions and prerequisites are clarified:

  1. (1)

    Cutting parameters, i.e., the feed rate \(f_r\) (mm/r), cutting depth \(a_p\) (mm) and cutting speed \(v_f\) (mm/min) are known before tool path optimisation;

  2. (2)

    The machining energy consumption starts to be quantified after the cutting tool moves;

  3. (3)

    There is no tool change during the machining process, and the tool orientation is fixed.

Tool path optimisation modelling

Objective variables

As stated above, \(\lambda\) is a variable to be optimised to select an optimal path \(L^{pr} (\lambda )\). \(d\) is another variable. As \(L^{ex} (d)\) is related to \(d\), it is obtained indirectly by optimising \(d\). For jth expansion in mth task driven by \(\lambda_m\), the optimisation variables are \(d_{m,j}\) and \(L_{m,j}^{ex} (d_{m,j} )\) to form an energy-efficient expansion scheme along a principal path \(L^{pr} (\lambda_m )\) for fulfilling the cutting area.

Optimisation functions

The machining energy consumption \(E_{total}^c\) as well as the machining time \(T_{total}^c\) are introduced to thoroughly assess both the economic and environmental dimensions of the flank milling process. \(E_m^c\) is generated by an expansion scheme based on \(L^{pr} (\lambda_m )\) for mth task. \(E_{m,j}^c\) in jth expansion is modelled as Eq. (6):

$$ E_{m,j}^c = E_{m,j}^b + E_{m,j}^{nc} + E_{m,j}^{op} + E_{m,j}^{ac} $$
(6)

where \(E_{m,j}^b\) stands for the electricity energy consumed to sustain basic functions, \(E_{m,j}^{nc}\) is the net-cutting energy consumption, \(E_{m,j}^{op}\) indicates energy consumption caused by the machine axes’ movement, and \(E_{m,j}^{ac}\) denotes energy consumption related to machine tools’ acceleration phase. Among them, \(E_{m,j}^b\) is calculated by Eq. (7):

$$ E_{m,j}^b = P^{_b } \cdot \frac{{len(L_{m,j}^{ex} )}}{v_f } $$
(7)

where \(len(L_{m,j}^{ex} )\) is the length of \(L_{m,j}^{ex} (d_{m,j} )\), which is calculated in Eq. (8).

$$ len(L_{m,j}^{ex} ) = \sum_k {||(q_{k,j} q_{k - 1,j} , \cdots ,q_{2,j} q_{1,j} )\parallel } $$
(8)

where k is the number of the points on \(L_{m,j}^{ex} (d_{m,j} )\). The coordinate of the points on \(L_{m,j}^{ex} (d_{m,j} )\) is obtained with a distance \(d_{m,j}\) from the corresponding points on the previous path by Eq. (4). Specifically, when j is 1, the points on \(L_{m,1}^{ex} (d_{m,1} )\) are derived from \(L^{pr} (\lambda_m )\) where the points on \(L^{pr} (\lambda_m )\) are discretised by a forward step \(d_f\) and each point is determined by NURBS modelling. It is obvious that \(d_{m,j}\) varies along the path, thus deciding the positioning (also the coordinates) of points on the next path. Meanwhile, different values of \(\lambda_m\) result in \(L^{pr} (\lambda_m )\) covering the cutting surface in various ways, thus leading to different lengths of \(L^{pr} (\lambda_m )\). Therefore, it inevitably affects the expanded path \(L_{m,j}^{ex} (d_{m,j} )\) along \(L^{pr} (\lambda_m )\) indirectly. \(E_{m,j}^{nc}\) is given by Eq. (9):

$$ E_{m,j}^{nc} = P_{m,j}^{nc} \cdot T_{m,j}^{nc} = K_{Fc} v_f^a f_r^b a_p^c (d_j )^e \cdot \frac{{len(L_{m,j}^{nc} )}}{v_f } $$
(9)

where \(T_{m,j}^{nc}\) refers to the time for removing materials in jth expansion. \(L_{m,j}^{nc}\) is the net-cutting path section of \(L_{m,j}^{ex} (d_{m,j} )\). \(P_{m,j}^{nc}\) is calculated approximately by an empirical model (Deng et al., 2017), which is affected by the inputs: three cutting parameters, including \(v_f\), \(f_r\), \(a_p\) and \(d_j\) in jth expansion. Since this study does not optimise \(v_f\), \(f_r\), \(a_p\), they are fixed constants in the tool path optimisation. \(K_{Fc}\) is a correction coefficient, and a, b, c as well as e are exponent coefficients. Linear regression is used to calibrate these parameters using the collected cutting power data. The application of the model assumes \(K_{Fc}\), a, b, c, and e are unchanged in specific processing conditions, i.e., workpiece and tool material, and machine tool specification. Moreover, the cutting process is assumed to remain stable without tool breakage or abnormal wear.

In the flank milling, the energy consumed by tool paths along the gravitational direction is denoted as potential energy consumption \(E^{pt}\) in this paper. The expanded paths along \(L^{pr} (\lambda )\) maintain a direction like \(\lambda\). This direction affects the projection length of the paths in the gravitational direction, thereby influencing the potential energy consumption. For jth expansion, \(E_{m,j}^{pt}\) is included in the modelling of \(E_{m,j}^{op}\), which is quantified by the gravitational movement of the machine tool’s axes, as shown in Eq. (10):

$$ E_{m,j}^{pt} { = }{\mathop {m_z g}\limits^\to } \cdot len(L_{m,j}^{\mathop G\limits^\to } ) $$
(10)

where \(m_z\) is the mass of z-axis. Notably, \(E_{m,j}^{pt}\) exists only when the arms move upwards. \(L_j^{\mathop G\limits^\to }\) is the projection of \(L_j^{ex} (d_j )\) at the direction of vertical upward movement, which is calculated by the Euclidean distance in Eq. (11):

$$ len(L_{m,j}^{\mathop G\limits^\to } ) = \sum_k {||(q_{k,j}^{\mathop G\limits^\to } q_{k - 1,j}^{\mathop G\limits^\to } , \cdots ,q_{2,j}^{\mathop G\limits^\to } q_{1,j}^{\mathop G\limits^\to } )\parallel } $$
(11)

where \(q_{k,j}^{\mathop G\limits^\to }\) is the component of kth point on \(L_{m,j}^{ex} (d_{m,j} )\) in the gravitational direction.

Besides, \(E_{m,j}^{op}\) also includes the energy consumption for translation and rotation of the mechanical arms along \(L_{m,j}^{ex}\), i.e., \(E_{m,j}^{tr}\) and \(E_{m,j}^{rt}\). \(E_{m,j}^{tr}\) represents the energy consumption attributed to overcoming friction in the movement along axes. Similar to \(E_{m,j}^{pt}\), \(E_{m,j}^{tr}\) is also indirectly influenced by \(\lambda_m\) in terms of the projection length in the translation direction:

$$ E_{m,j}^{tr} = {\mathop {f_x }\limits^\to } \cdot len(L_{m,j}^{{\mathop {T_x }\limits^\to } } ) + {\mathop {f_y }\limits^\to } \cdot len(L_{m,j}^{{\mathop {T_y }\limits^\to } } ) $$
(12)

where \({\mathop {f_x }\limits^\to }\) and \({\mathop {f_y }\limits^\to }\) represent friction forces along the two respective axes. \(L_{m,j}^{{\mathop {T_x }\limits^\to } }\) and \(L_{m,j}^{{\mathop {T_y }\limits^\to } }\) are the projections of \(L_{m,j}^{ex}\) at the direction of translation, where the lengths are given by Eq. (13):

$$ \left\{ {\begin{array}{*{20}c} {len\left( {L_{m,j}^{\overrightarrow {T_x } } } \right) = \sum_k {\left\| {\left( {q_k^{\overrightarrow {T_x } } q_{k - 1}^{\overrightarrow {T_x } } , \cdots q_{2\quad 1}^{\overrightarrow {T_x } \overrightarrow {T_x } } } \right)} \right\|} } \\ {len\left( {L_{m,j}^{\overrightarrow {T_y } } } \right) = \sum_k {\left\| {\left( {q_k^{\overrightarrow {T_y } } q_{k - 1}^{\overrightarrow {T_y } } , \cdots q_{2\quad 1}^{\overrightarrow {T_x } \overrightarrow {T_y } } } \right)} \right\|} } \\ \end{array} } \right. $$
(13)

where \(q_k^{{\mathop {T_x }\limits^\to } }\) and \(q_k^{{\mathop {T_y }\limits^\to } }\) are components of kth point on \(L_{m,j}^{ex}\) in translation along x and y axes.

\(E_{m,j}^{rt}\) in Eq. (14) includes the rotational energy consumption of \(\alpha_r\) and \(\beta_r\) rotary axes:

$$ E_{m,j}^{rt} = \frac{1}{2}I(\omega_{\alpha_r }^2 + \omega_{\beta_r }^2 ) $$
(14)

where ω is angular speed (rad/s) of rotary axes, and I (m4) is the moment of inertia of rotary axes.

One flank milling cycle along \(L_{m,j}^{ex} (d_{m,j} )\) undergoes two accelerations and decelerations. The former consumes energy, while the latter relies solely on inertia without extra consumption. The two accelerations in one cycle are approximately calculated by Eq. (15):

$$ E_{m,j}^{ac} = m_x v_x^2 + m_y v_y^2 + m_z v_z^2 $$
(15)

where \(m_x\) and \(m_y\) are the masses of the x and y axes of the machine tool. \(v_x\), \(v_y\) and \(v_z\) are the components of \(v_f\) on the three axes of machine tool.

To sum up, \(E_{m,j}^c\) in jth expansion of mth task guided by \(L^{pr} (\lambda_m )\) is built by Eq. (16):

$$ \begin{gathered} E_{m,j}^c |_{L^{pr} (\lambda_m )} = P^b \cdot \frac{{len(L_{m,j}^{ex} )}}{v_f } + P_{m,j}^{nc} \cdot \frac{{len(L_{m,j}^{nc} )}}{v_f } + \overrightarrow {m_z g} \cdot len(L_{m,j}^{\mathop G\limits^\to } ) + {\mathop {f_x }\limits^\to } \cdot len(L_{m,j}^{{\mathop {T_x }\limits^\to } } ) + \hfill \\ \quad \;\;\;\quad \quad \;\;{\mathop {\ f_y }\limits^\to } \cdot len(L_{m,j}^{{\mathop {T_y }\limits^\to } } ) + \frac{1}{2}I(\omega_{\alpha_r }^2 + \omega_{\beta_r }^2 ) + (m_x v_x^2 + m_y v_y^2 + m_z v_z^2 ) \hfill \\ \end{gathered} $$
(16)

Generally, there are i expanded paths guided by a selected \(L^{pr} (\lambda_m )\) in the path planning, and the total machining energy consumption of i expansions in mth task is quantified by Eq. (17):

$$ E_m^c |_{L^{pr} (\lambda_m )} = \sum_{j = 1}^i {E_{m,j}^c } (d_{m,j} ,L_{m,j}^{ex} (d_{m,j} )) $$
(17)

The machining time consumed by the jth expansion scheme and total scheme in mth task is shown in Eq. (18):

$$ \left\{ \begin{gathered} T_{m,j}^c |_{L^{pr} (\lambda_m )} = \frac{{len(L_{m,j}^{ac} + L_{m,j}^{nc} )}}{v_f } \hfill \\ T_m^c |_{L^{pr} (\lambda_m )} = \sum_{j = 1}^i {T_{m,j}^c } (L_{m,j}^{ex} (d_{m,j} )) \hfill \\ \end{gathered} \right. $$
(18)

The effects of \(\lambda_m\), \(d_{m,j}\) and \(L_{m,j}^{ex} (d_{m,j} )\) on the machining energy consumption and machining time are included in Eqs. (17) and (18), facilitating the energy-efficient tool path optimisation of flank milling. Why \(\lambda_m\) does not appear explicitly in the equations is that \(\lambda_m\) determines the direction of \(L^{pr} (\lambda_m )\). The expanded paths along \(L^{pr} (\lambda_m )\), due to the free form surfaces topology, only roughly follow the direction indicated by \(\lambda_m\). Therefore, while \(\lambda_m\) does indeed impact \(E_m^c\) and \(T_m^c\), it does not explicitly correlate with them.

Constraints design

Several constraints are designed to ensure the proposed method’s feasibility:

Equation (19) denotes that the decision parameter \(\lambda_m\) of mth task should set as an integer and confined within a practical range, considering it is selected in the plane \(S(u,v)\) and for ease of actual machining.

$$ 0^\circ \le \lambda_m \le 180^\circ ,\;\lambda_m \in {\mathbb{Z}} $$
(19)

Equation (20) indicates that \(d_{m,j}\) does not exceed the chord length \(l_{ch}\) and the maximum scallop height \(d_{\max }\), considering requirements for cutting tool interference and workpiece surface quality.

$$ 0 < d_{m,j} \le \min (l_{ch} ,\,d_{\max } ) $$
(20)

where \(l_{ch} = 2r_b \sin \frac{\theta_\partial }{2}\)(mm). \(\theta_\partial\)(°) is a central angle corresponding to \(l_{ch}\). \(d_{\max } = \sqrt {{\frac{{8R_b r_b h_{\max } }}{R_b \pm r_b }}}\) and \(h_{\max }\) is determined by the surface quality requirement. Calculation of \(R_b\) refers to Lo (1999).

Equation (21) denotes the cutting power for each expansion is regulated to be lower than the spindle motor’s output power for machining feasibility (Li et al., 2019):

$$ P_{m,j}^c \le \eta^{_{^c } } P_{\max } $$
(21)

where \(P_{\max }\)(W) denotes the motor’s rated power. \(P_{m,j}^c\) (W) is the cutting power in jth expansion. \(\eta^c\) is the motor efficiency.

Equation (22) represents that the sum of cutting time does not fall short of the pre-set tool life (Chen et al., 2019).

$$ \sum_{j = 1}^i {T_{m,j} } \ge T_e $$
(22)

where \(T_e\) (min) is the pre-set value, and \(T_{m,j}\) (min) is the cutting time consumed in jth tool path expansion of mth task.

Equation (23) denotes the sum of the cutting area per expansion for i times should constitute the specified cutting area (Zhang & Tang, 2019).

$$ \begin{gathered} A_m^c = \sum_{j = 1}^i {A_{m,j}^c } \hfill \\ \quad \;\;\; = \sum_{j = 1}^i {\int {S(d_{m,j} ,len(L_{m,j}^{ex} ))} } dd_{m,j} dlen(L_{m,j}^{ex} ) \hfill \\ \end{gathered} $$
(23)

where \(A_m^c\) (mm2) is total cutting area of mth task, and \(A_{m,j}^c\) is the cutting area in jth path expansion.

Optimisation model

Based on the explanation, to find the optimal tool path generation and expansion integrated scheme for flank milling, an optimisation model is designed in Eq. (24):

$$ \begin{gathered} \min F^* = {\mathop {\min }\limits_{m \in \{ 1, \cdots ,M\} }} \min F_m (X_m ,\lambda_m ) \hfill \\ \min F_m (X_m ,\lambda_m ) = \min \sum_{j = 1}^i {(E_{m,j}^c ,T_{m,j}^c )} \hfill \\ X_m = \{ [d_{m,j} ,L_{m,j}^{ex} (d_{m,j} )]|j = 1, \cdots ,i\} \hfill \\ s.t.\left\{ \begin{gathered} \{ \lambda_m \in {\mathbb{Z}}|0^\circ \le \lambda_m \le 180^\circ \} \hfill \\ 0 < d_{m,j} \le \min (l_{ch} ,d_{\max } ) \hfill \\ P_{m,j}^{c} \le \eta^c P_{\max } \hfill \\ \sum_{j = 1}^i {T_{m,j} } \ge T_e \hfill \\ A_{m}^c = \sum_{j = 1}^i {A_{m,j}^c } \hfill \\ \end{gathered} \right.\; \hfill \\ \end{gathered} $$
(24)

where M is the number of total traversal tasks initiated by different values of \(\lambda\), and \(\lambda\) decides the generation of a principal path in flank milling. For mth task, there is a principal path \(L^{pr} (\lambda_m )\) needs to be expanded for i times. In jth expansion, \(d_{m,j}\) is dynamically optimised with a resulted \(L_{m,j}^{ex} (d_{m,j} )\). Through M tasks traversal, the optimal integrated scheme of path generation and expansion with an optimal \(\lambda^\ast\) is searched ultimately by optimising \(F^*\). The model involves not only the local dynamic optimisation of each expansion task under a given value of \(\lambda\), but a global optimisation that searches for the optimal integrated scheme among all tasks. To tackle such a complex multi-task optimisation problem, the MRL algorithm is introduced. By pre-learning the similarities between tasks, MRL can quickly adapt to new tasks, thereby accelerating the multi-task traversal and reducing the learning time required for each new task.

Energy-efficient tool path optimisation framework

This section proposes an energy-efficient tool path optimisation framework based on MSAC, as seen in Fig. 5.

Fig. 5
figure 5

MSAC-based tool path optimisation framework

Figure 5 illustrates how MSAC addresses the traversal problem across MDPs, where varying feed angles initiate distinct learning tasks. This solving process encompasses three key stages:

Stage 1 Generation of a principal path, which is dictated by the defined feed angle \(\lambda\) on a NURBS modelled surface based on Eq. (1);

Stage 2 Following the principal path generated in stage 1, the tool path undergoes dynamic expansions by optimising cutting strip width. This is viewed as an MDP, which is solved by SAC;

Stage 3 This stage culminates in traversing multiple MDPs driven by all feasible feed angles. Meta learning is employed to enhance the SAC’s adaptability to various new tasks. By examining a spectrum of feed angles, the MSAC will find the optimal path generation and expansion scheme.

Tool path generation scheme

In stage 1, different principal paths are generated by traversing all values of \(\lambda\). Based on \(\Phi \cap S = L^{pr} (\lambda )\), a principal path \(L^{pr} (\lambda_m )\) is generated when selecting \(\lambda_m\). In each traversing, \(\lambda\) varies, changing the principal path and initial state for tool path expansion, which means a new MDP problem of path expansion optimisation arises. There are multiple MDPs problems that remain to be solved. The mth MDP problem with a selected \(\lambda_m\) is modelled in the following subsection.

Tool path expansion scheme

In stage 2, each task for learning an effective expansion policy is formulated as a new MDP containing < \(\mathcal{S}\), \(\mathcal{A}\), \(\mathcal{P}\), \(\mathcal{R}\)>. Naturally, the studied problem satisfies the MDP’s property, and the SAC algorithm is applied.

  1. (1)

    Environment modelling. Models of machining energy consumption and time in Eqs. (16) and (18) in each timestep t are used to assist the agent in learning the optimal expansion policy for an MDP.

  2. (2)

    MDP formulation. The tool path expansion problem along one principal path selected by a value of \(\lambda\) is formulated as a finite MDP problem. The mth MDP with \(\lambda_m\) is defined as below.

    State\(\mathcal{S}\) In timestep t of mth MDP, \(s_{m,\,\,t}\) should represent the expansion state at time step t in mth MDP so that the agent can better select path parameters each time. Therefore, it includes: (i) pre-set machining configuration (MC), including cutting parameters and cutter geometry parameters; and (ii) key tool path variables, i.e., \(\lambda_m\), the residual cutting area \(A_{m,\,\,t}^{rc} = A_m^c - \sum_t {A_{m,\,\,t}^c }\), the accumulated machining energy consumption \(E_{m,\,\,t}^{acu}\) and time \(T_{m,\,\,t}^{acu}\) at tth timestep. It is important to note that \(\lambda_m\) is included as a state vector rather than as an action, since it specifies the initial conditions for the tool path expansion and is not modified during each MDP task execution. Therefore, \(s_{m,t} = \{ MC,\lambda_m ,A_{m,\,t}^{rc} ,E_{m,\,t}^{acu} ,T_{m,\,t}^{acu} \}\).

    Action\(\mathcal{A}\) \(a_{m,\,t} = \{ d_{m,\,t} ,L_{m,t}^{ex} (d_{m,\,t} )\}\) characterising the action selected at tth timestep in mth task, which is continuous and bounded.

    Reward \(\mathcal{R}\) \(r_{m,\,t}\) primarily comprises objectives and constraints at tth expansion. The defined reward is a combination, which is expressed as Eq. (25):

    $$\begin{aligned} & r_{m,\,t} (s_{m,\,t} ,a_{m,\,t} ) \\ &\quad= f_{nor} (\xi - E_{m,\,t}^c - T_{m,\,t}^c - \nu \sum {C_{m,t} } ) \end{aligned}$$
    (25)

    where \(f_{nor} ( \cdot )\) normalises objectives and constraints with varying units in a standardised range of [-1, 1]. \(\xi\) is a constant that encourages the agent to get positive feedback. \(\nu\) is the penalty coefficient, which is set as 200 to warn the agents not to violate constraints \(C_{m,t}\) in Eq. (26):

    $$ \begin{gathered} C_{m,t} = \max \{ (\sum_{t = 1}^{t_o } {(A_{m,\,t}^c } { - }A_m^c )),0\} \hfill\\ \quad+ \max \{ (P_{m,t}^c { - }\eta^c P_{\max } ),0\} + \max \{ (T_e { - }\sum_{t = 1}^{t_o } {T_{m,t}^c } ),0\} \hfill \\ \quad +\max \{ (d_{m,\,t} { - }l_{ch} ),0\} + \max \{ (d_{m,\,t} - d_{\max } ),0\} \hfill \\ \end{gathered} $$
    (26)

    where \(t_o\) represents the final expansion. The final aim is to discover a favourable policy to maximise cumulative rewards. Notably, \(r_{m,\,\,t}\) is bounded in this paper, which provides necessary conditions for convergence of the algorithm (Li et al., 2023).

  3. (3)

    SAC algorithm. SAC addresses each MDP by maximising reward expectation and entropy on the actor-critic architecture. The description, network architecture and training details of SAC are found in Appendix A.

Tool path integrated optimisation based on MSAC

This subsection introduces a MSAC algorithm for training a model to facilitate a rapid adaptation to new tasks. The principle of the MSAC involves inputting results from various environments into the training model, so that the sensitivity of the loss function of the new task maximises the policy parameters (Wen et al., 2021).

  1. (1)

    Meta task set. To facilitate training, tasks are selected from workpieces of different surface topologies, where each task begins from a principal path randomly selected. The goal is to find a global optimal tool path generation and expansion scheme. For each task, the meta-features are defined by weight factors \(\{ \omega_{a,b} \}\) and \(\lambda\). Therein, a training task set \(T_N\), a validation task set \(T_V\), and a learning task set \(T_M\) are mutually different but follow the same distribution \(U\{ \lambda ,\{ \omega_{a,b} \} \}\), where \(\lambda \ U(0^\circ ,180^\circ )\), and \(\{ \omega_{a,b} \} \ U(0, + \infty )\).

  2. (2)

    Meta-training phase. The aim is to optimise the parameters of SAC in each task by MAML via individual and global training stages. As the network structures and parameters training of SAC for each MDP task is well defined in Sect. 4.2, the related formulas are directly applied here. For a specific task \(T_n (n = 1, \cdots ,N)\), K Trajectories are collected in a dataset \(D_n^{tr}\), where the training data includes the input meta-features and corresponding tool path integrated optimisation results. Each trajectory contains H time steps. The meta parameters \(\theta_n^{\prime}\), \(\vartheta_n^{\prime}\) and \(\varphi_n^{\prime}\) are updated by gradient descent method, which is an individual updating as shown in Eq. (27):

    $$ \left\{ \begin{gathered} \theta _n^{'(t_n + 1)} \leftarrow \theta _n^{'(t_n )} - \psi _{in} \nabla _{\theta _n^{'(t_n )} } L_{T_n } (\theta _n^{'(t_n )} ) \hfill \\ \vartheta _n^{'(t_n + 1)} \leftarrow \vartheta _n^{'(t_n )} - \varsigma _{in} \nabla _{\vartheta _n^{'(t_n )} } L_{T_n } (\vartheta _n^{'(t_n )} ) \hfill \\ \varphi _n^{'(t_n + 1)} \leftarrow \varphi _n^{'(t_n )} - \varpi _{in} \nabla _{\varphi _n^{'(t_n )} } L_{T_n } (\varphi _n^{'(t_n )} ) \hfill \\ \end{gathered} \right. $$
    (27)

    where \(\psi_{in}\), \(\varsigma_{in}\) and \(\varpi_{in}\) denote the learning rate of individual-level updating of MSAC networks, and \(t_n\) denotes the last iteration during the updating. \(\theta_n^{{\prime}{(t_n )}} = \theta\), \(\vartheta_n^{{\prime}{(t_n )}} = \vartheta\), and \(\varphi_n^{^{\prime}(t_n )} = \varphi\) in the first iteration. \(L_{T_n } ( \cdot )\) is the loss function defined in the SAC. After the tasks finish updating their network parameters, N tasks with each dataset are stored in the buffer \(D^{tr} \doteq D_{1:N}^{tr}\). Then, outer-loop training is performed to find the global parameters. The optimal ability is found by estimating the aggregation loss. Task samples are used for the updating process in Eq. (28):

    $$ \left\{ \begin{gathered} {\mathop \theta \limits^\wedge }^{(t_n + 1)} \leftarrow {\mathop \theta \limits^\wedge }^{(t_n )} - \psi_{ou} \nabla_{\mathop \theta \limits^\wedge } \sum_{n = 1}^N {L_{T_n } (\theta_n^{\prime} )} \hfill \\ {\mathop \vartheta \limits^\wedge }^{(t_n + 1)} \leftarrow {\mathop \vartheta \limits^\wedge }^{(t_n )} - \varsigma_{ou} \nabla_{\mathop \vartheta \limits^\wedge } \sum_{n = 1}^N {L_{T_n } (\vartheta_n^{\prime} )} \hfill \\ {\mathop \varphi \limits^\wedge }^{(t_n + 1)} \leftarrow {\mathop \varphi \limits^\wedge }^{(t_n )} - \varpi_{ou} \nabla_{\mathop \varphi \limits^\wedge } \sum_{n = 1}^N {L_{T_n } (\varphi_n^{\prime} )} \hfill \\ \end{gathered} \right. $$
    (28)

    where \({\mathop \theta \limits^\wedge }\), \({\mathop \vartheta \limits^\wedge }\) and \({\mathop \varphi \limits^\wedge }\) are the meta policy parameters trained over all training tasks. \(\psi_{ou}\), \(\varsigma_{ou}\) and \(\varpi_{ou}\) are the learning rate of outer-loop training. After meta-training, \(T_V\) is employed to tune the hyper-parameters of the model.

  3. (3)

    Meta-Learning phase. In this stage, the meta-model is used to quickly adapt the trained meta-features on new tasks through a few descent steps. The network is updated by Eq. (29):

    $$ \left\{ \begin{gathered} \theta^{\prime} \leftarrow \theta^{\prime} - \psi \nabla_{\theta^{\prime} } L(\theta^{\prime} ) \hfill \\ \vartheta^{\prime} \leftarrow \vartheta^{\prime} - \varsigma \nabla_{\vartheta^{\prime} } L(\vartheta^{\prime} ) \hfill \\ \varphi^{\prime} \leftarrow \varphi^{\prime} - \varpi \nabla_{\varphi^{\prime} } L(\varphi^{\prime} ) \hfill \\ \end{gathered} \right. $$
    (29)

    where \(\theta^{\prime}\), \(\vartheta^{\prime}\) and \(\varphi^{\prime}\) are meta policy parameters initialised by the trained parameters \({\mathop \theta \limits^\wedge }\), \({\mathop \vartheta \limits^\wedge }\) and \({\mathop \varphi \limits^\wedge }\). The experiences are stored in the buffer \(D^{ml} = (s_t^{ml} ,a_t^{ml} ,r_t^{ml} ,s_{t + 1}^{ml} )\).

The detailed implementation of MSAC for multi-MDP tasks traversal and its convergence analysis are given in Appendix B. Please note that the analysis is under certain conditions, such as the use of an appropriate learning rate. Therefore, theoretical analysis alone is not enough to ensure the reliability of these theoretical findings. To this end, MSAC is tested in practical scenarios in Sect. “Case study” to confirm its convergence and effectiveness.

Case study

This part conducts a five-axis flank milling experiment and employs the proposed approach. The energy consumption model is initially validated to ensure reliability. Subsequently, the effectiveness of the MSAC algorithm is evaluated, and a performance comparison with other methods is presented.

Accuracy analysis of the energy consumption model

Experimental setup and data collection

The proposed energy consumption model is verified for its accuracy to provide an accurate mimic environment for MSAC implementation. Typical freeform parts, i.e., aluminium alloy 6061 (a typical Al–Mg–Si alloy) integral impeller blades are chosen as the objects, and the physical machining with a drum-taper milling cutter is conducted on JDGR400_A13S, as depicted in Fig. 6a. The consumption data generated by 12 tool paths are collected. Every four paths swept along one blade surface along \(\lambda\) = 30°, 90°, and 120°, respectively, and path lengths of the same angle are different from each other, where \(\lambda\) = 90° is exemplified in Fig. 6b. Each path is executed three times. Moreover, the cutting strip width in each single path is maintained as a constant to ensure uniformity in experimental conditions and to simplify the validation. An impeller model is inputted to NX to get the coordinate data of points in each path to calculate the actual path length, as illustrated in Fig. 6c. Figure 6d shows the HIOKI PW3360-30 power meter. An orthogonal experimental L16(44) is arranged in Table 3 to calibrate coefficients in Eq. (9). Cutting power data is sampled in Table 4. After that, the coefficients’ values are obtained in Table 5 by the linear regression. To ease the regression, Eq. (9) is logarithmically transformed.

Fig. 6
figure 6

Environment setting and verification experiment

Table 3 Levels of cutting parameters in orthogonal experiment
Table 4 Orthogonal experiment data
Table 5 Experimental parameters

Testing of the machining energy consumption model

Importantly, 12 single tool-path-expansion experiment are operated to verify \(E_j^c\), not \(E_{total}^c\), for visualising the energy model’s accuracy. The averaged predicted values are listed in Table 6, where the actual values are compared, and the absolute relative errors (ARE) are calculated.

Table 6 Experiment results comparison

From Table 6, the predicted values closely match the measured ones, with the maximum ARE being 7.169%. This confirms the robust predictive capability of the machining energy consumption model. Furthermore, it validates that the proposed energy consumption model is capable of providing reliable model support for optimising tool path scheme. Meanwhile, it is observed that \(E_j^c\) at different \(\lambda\) varies greatly. Besides, \(E_j^c\) generated by the same angle is also different, which is directly related to \(d_j\) and \(L_j^{ex} (d_j )\), verifying the significance of improving energy efficiency by optimising \(d_j\) and \(\lambda\).

Evaluation of the MSAC-based tool path optimisation

Algorithm parameters setting

The MSAC algorithm is implemented on free-form surfaces five-axis flank-milling at semi-finishing stage to realise the energy-efficient tool path planning. All tasks apply cubic NURBS curves and share the common parameters in Eq. (1) except for the weight points \(\{ \omega_{a,b} \}\), a key parameter influencing the surface topology. 300 tasks of \(T_N\) and 30 tasks of \(T_V\) are separately generated from the distribution \(U\{ \lambda ,\{ \omega_{a,b} \} \}\). \(h_{\max }\) is set as 0.02 mm considering the surface roughness requirement. \(P_{\max }\) is 7.5 kW, \(a_p\) is 0.5 mm, \(n_s\) is 2800r/min, \(v_f\) is 500 mm/min, \(T_e\) is 75 min. The algorithm's hyperparameters are detailed in Table 7, which is implemented in Python 3.6 with a processor being Intel © Core © i5-9400 CPU @ 2.90 GHz and 8 GB RAM.

Table 7 Hyper-parameters used in MSAC

Implementation of the MSAC algorithm

The environment-mimic model obtains rewards according to Eq. (25). Figure 7 illustrates the variations in cumulative rewards throughout the training process. The curve represents the average over 3 random seeds, where the classic SAC acts as a benchmark.

Fig. 7
figure 7

Average accumulated rewards throughout training

Both the MSAC and SAC agents exhibit an increase in the reward as they gather further knowledge through iterations from Fig. 7, confirming the appropriateness of the defined state space and reward function. Meanwhile, MSAC achieves a higher average reward value when it converges to about 500 episodes compared with SAC. This provides evidence for its strong convergence properties, largely due to its enhanced adaptability to new tasks, which allows for enhanced exploration efficiency within the state space.

To test the MSAC’ s adaptability to new tasks, a specific case study of the widely used impeller blades made of aluminium alloy 6061 is introduced. Meta-policy parameters are initialised by the trained global network parameters to perform the new tasks. \(\{ \omega_{a,b} \}\) are set differently from that in the training tasks but follow the same distribution, where \(\{ \omega_{a,b} \}\) = [3, 4, 5, 6, 100, 10, 9, 50]. The number of testing tasks is 181, which share the same NURBS-modelled surface topology of blades but with different traversing values from 0° to 180°. Cutting parameters are the same as that of the meta-training process. The rewards changing curve is plotted in Fig. 8.

Fig. 8
figure 8

Rewards changing curves during the learning phase

Figure 8 shows that the trained MSAC requires approximately 40 episodes of further training to converge for a new task, which converges far earlier than the trained SAC. Meanwhile, compared with Fig. 7, the test process of the MSAC algorithm realises a nearly 92% reduction compared with the training process in terms of the episodes to reach convergence (40 vs. 500). This demonstrates that MASC swiftly adjusts to novel tasks and converges faster than SAC. Therefore, the learning curve verifies that the meta-leaning helps the agent adapt to the new environment, suitable for the multi-task traversing optimisation problem in this paper.

After traversing all the learning tasks by MSAC, the optimal principal path with its expansion scheme is identified with \(\lambda^\ast\). To display the optimal results across the range of feed angle from 0° to 180°, a polar plot is drawn in Fig. 9 where the traditional SAC is compared. For ease of observation, an average weighted normalisation on the optimal results of machining energy consumption and time is performed for all 181 scenarios.

Fig. 9
figure 9

Optimisation search process for \(\lambda^\ast\)

As seen in Fig. 9, the performance results are depicted across all MDPs tasks with different \(\lambda\) values. The radial distance from the centre to the red/blue curve represents the optimal performance of the searched tool path integrated scheme under each value of feed angle. A shorter radial distance indicates a better performance. From this reason, the values of \(\lambda^\ast\) are easily identified for two methods, which are 107° for the SAC and 95° for the proposed MSAC algorithm, respectively.

Performance results of the optimal tool path integrated schemes along \(\lambda^\ast\) are presented for the two methods. To reduce the impact of random errors and enhance the reliability of experimental results, each scheme of the two methods is subjected to three replicates, and the average results of machining energy consumption, time and path length are computed. Additionally, driven by carbon taxes and policies, manufacturers should assess the carbon emissions impacts generated by manufacturing activities (Camarinha-Matos et al., 2024). According to Zhou et al. (2019), there are three sources of total carbon emissions \(CE_{total}^c\) during machining, including the electricity energy carbon emissions \(CE_{energy}^c\), materials carbon emissions \(CE_{{\text{material}}}^c\) and waste carbon emissions \(CE_{{\text{waste}}}^c\). They are evaluated to enrich the proposed method’s robustness and relevance in today’s eco-conscious landscape. \(CE_{total}^c\) is calculated in Eq. (30):

$$ CE_{total}^c \, = CE_{energy}^c + CE_{material}^c + CE_{waste}^c $$
(30)

where \(CE_{energy}^c\) is calculated in Eq. (31).

$$ CE_{energy}^c = E_{total}^c \times EF_e $$
(31)

where \(EF_e\) is the electricity energy carbon emissions factor, and its value is 0.7242kgCO2/kwh (Jiang et al., 2019). \(CE_{material}^c\) is calculated in Eq. (32):

$$ \begin{gathered} CE_{material}^c = CE_{m - t}^c + CE_{m - w}^c \hfill \\ \quad \quad \quad \;\; = \frac{{T^{nc} }}{T_e } \times m_{tool} \times EF_{m - t} + \Delta V \times \rho_w \times EF_{m - w} \hfill \\ \end{gathered} $$
(32)

where \(CE_{m - t}^c\) and \(CE_{m - w}^c\) mention carbon emissions generated by the cutting tool and workpiece involved in the flank milling. \(EF_{m - t}\) and \(EF_{m - w}\) are the production carbon emission factors of tool and workpiece, which are 33.7478kgCO2/kg and 16.13kgCO2/kg (Li et al., 2015). \(m_{tool}\) is the tool mass, which is 0.022 kg. \(\Delta V\) is the volume of material removed, which is 4.65 cm3. \(\rho_w\) is the density of aluminium alloy 6061, which is 2.7 g/cm3. \(CE_{waste}^c\) is given in Eq. (33):

$$ \begin{gathered} CE_{waste}^c = CE_{w - t}^c + CE_{w - c}^c \hfill \\ \quad \quad \quad \;\; = \frac{{T^{nc} }}{T_e } \times m_{tool} \times EF_{w - t} + \Delta V \times \rho_w \times EF_{w - c} \hfill \\ \end{gathered} $$
(33)

where \(EF_{w - t}\) and \(EF_{w - c}\) are the carbon emissions of post-processing scrap tool and chips, which are 0.01346kgCO2/kg and 0.256kgCO2/kg (Li et al., 2015). Based on Eqs. (30)–(33), \(CE_{total}^c\) is quantified in Table 8, along with other optimisation results.

Table 8 Optimisation results of the two methods

It is found that the proposed method with a shortest path length (774.35 mm), surpasses the SAC (1061.26 mm), showing its superiority. Moreover, similar trends are observed in \(E_{total}^c\) (a reduction of 24.89%) and \(T_{total}^c\) (a reduction of 27.04%). The performance is further confirmed by carbon emissions evaluation, where the proposed method results in predicted emissions of 255.07 g, compared to 273.92 g of the SAC. The findings highlight the proposed method’s consistency theoretical contributions and practical effects, and demonstrate the necessity of using meta-reinforcement learning to solve complex multi-task models. Meanwhile, absolute relative errors in machining energy consumed by the SAC and the proposed scheme are 7.24% and 5.17%, respectively, confirming the reliability of the energy consumption model.

Validation of workpiece surface quality

Evaluating the workpiece surface quality is another vital facet for validating the efficacy of the MSAC-based tool path strategy. Workpiece surface roughness is viewed as a metric, where the Mitutoyo Company SJ210 tester is used, with an accuracy of 0.002 μm. For measuring accuracy, surface roughness data is gathered from 11 points uniformly chosen and triple at every point. Measurement setup is presented in Fig. 10a. Measuring results of the machined part are listed in Fig. 10b.

Fig. 10
figure 10

Measurement showing a setup and b roughness results

It is observed from Fig. 10b that the worst result of the points does not exceed 1 μm, which aligns with the specified requirements of 1.6 μm for the semi-finishing process of free-form blades (Lindvall et al., 2021). Thus, the tool path optimal scheme proposed by this paper substantially enhances machining energy efficiency while upholding surface quality in five-axis flank milling.

Comparison with the state-of-the-art methods

A comparative test is proceeded to further elucidate the proposed method’s benefits. Hereinto, two widely-used methods are introduced: (a) the end milling method with an iso-scallop height using ball-nosed cutters (Liang et al., 2021), and (b) the flank milling method with an iso-scallop height using drum-type cutters (Lu et al., 2022). For the two benchmarks, both u and v surface patch borders are used as the principal tool path. Machining parameters are set as in meta-learning. Figure 11 displays the CL curves of the three methods.

Fig. 11
figure 11

CL curves showing ac simulation curves of end milling, flank milling and proposed methods, and df actual curves of the three methods

From Fig. 11a–c, unlike the benchmarks with a constant scallop height, the proposed method selects a principal path with an optimal feed angle and varies the cutting strip width dynamically to achieve an energy-efficient machining. Figure 11d–f show the tool paths of the three methods in actual processing.

To observe the performance of the three methods precisely, the optimisation results are shown in Fig. 12, where the benchmarks are also evaluated three times and the average results are selected to ensure reliability and replicability.

Fig. 12
figure 12

Performance comparison of the three methods

From Fig. 12, the proposed method enjoys the shortest tool path length (774.3 mm), followed by the flank milling (1286.3 mm), then is the end milling (2453.6 mm). More than this, the proposed method demonstrates its prowess in substantially curtailing both machining energy consumption and machining time, which realise a reduction of 69.96% in \(E_{total}^c\) and 68.44% in \(T_{total}^c\) compared to the end milling, and a reduction of 41.50% in \(E_{total}^c\) and 39.80% in \(T_{total}^c\) compared to the flank milling. Meanwhile, the proposed method results in a minimum amount of the cutting carbon emissions (255.07gCO2), which further accentuates the ecological benefits of the proposed method. Not only this, it is worth noting that flank milling reduces more machining energy consumption and time compared to end milling, indicating its advantages in energy-efficient and sustainable machining. However, with the optimisation of cutting strip width and feed angle, the proposed method further reinforces the potential of enhancing sustainability in manufacturing practices.

Discussion

This study validates its contributions in two key aspects: (1) energy consumption modelling for energy-efficient tool path optimisation and (2) the suggested tool path optimisation method. To demonstrate the contributions, this work firstly assesses the accuracy of the developed energy consumption model. The lowest value 2.739% and the highest value 7.169% of ARE in Table 6 show that the proposed model estimates \(E_{total}^c\) precisely. Moreover, the analysis confirms a strong connection between feed angle, cutting strip width, and machining energy consumption, highlighting the need for feed angle and cutting strip width optimisation in energy-efficient machining.

Secondly, this study evaluates the optimisation performance of MSAC. The optimisation results are first compared with those of the SAC without meta-learning regarding the adaptability to new tasks. Table 8 show that the experiment average results of the proposed method achieve higher energy-saving (24.89%) and time-saving (27.04%) potentials than that of SAC. The proposed method also reduces more machining carbon dioxide emissions, amounting to only 255.07 g CO2. Besides, the feasibility of the proposed method is verified by surface roughness measurement, which ensures that practical considerations are consistent with the theoretical contributions, showing that the proposed method is both sustainable and feasible.

The paper also includes a comparative analysis with two commonly used machining methods: end milling and flank milling, based on iso-scallop height principle. The practical application of these methods is shown in Fig. 11, with insights from Fig. 12 providing evidence that: (1) end milling is less effective in improving energy efficiency due to its narrower cutting strip width, unlike flank milling which is more suitable for free-form surfaces; (2) flank milling with an iso-scallop height causes a conservative tool path generation and expansion, resulting in a longer total path length. This further demonstrates the contributions of this paper: neither of the two methods leads to a significant reduction in energy efficiency. The optimal tool path strategy, therefore, involves carefully adjusting cutting strip width and feed angle, illustrating the study’s contributions in energy efficiency.

To the best of the authors’ knowledge, few studies address the energy and time minimisation problem of five-axis flank milling from a global perspective of optimising tool path generation and expansion. This work supplements the previous research (Li & Tang, 2021) and explores energy-efficiency potentials to a greater extent, promising to further ease the industrial energy crisis.

Conclusions

This paper introduces a new theoretical framework for optimising the tool path of flank milling to save energy, which is useful for machining free-form surfaces that are widely needed. The main contributions of this paper are: (1) developing a new model of machining energy consumption that accounts for the features of five-axis machining, and relates it to tool path variables such as the feed angle, cutting strip width, and path length; (2) setting up a model of tool path generation and expansion optimisation, using MDPs. By applying MSAC, MDPs are navigated efficiently, allowing to find the best tool path scheme for the most energy-efficient machining.

The case study shows the advantages of the proposed method: (1) the model of machining energy consumption is first validated, which achieves a prediction error of less than 7.169%; (2) Based on the model, the performance of MSAC is evaluated, achieving a 24.89% and 27.04% reduction in machining energy and time, compared to SAC without meta-learning. This shows the proposed method's benefits on energy efficiency, and also highlights the need of using meta learning with reinforcement learning to solve multi-complex optimisation problems; moreover, when compared with two common machining methods, the proposed method showed improvements of 69.96% and 68.44% in energy consumption and machining time than end milling, and 41.50% and 39.80% than flank milling, proving the importance of optimising cutting strip width and feed angle. These findings confirm that the contributions of this paper match the progress made in practical machining.

This study has some limitations that suggest directions for future studies. Firstly, about the validation of the machining energy consumption model, this paper only applies it to dual-head machine tools, because the main objective of this work is tool path optimisation. However, the proposed method should be valid for any machine tool in theory, so future work will test the model on different types of machine tools to verify the generality of the proposed model. Secondly, for the tool path optimisation in five-axis flank milling, the study recognises that complex surfaces with thin-walled features may deform and affect the machined quality. This problem is related to the cutting strip width, which influences cutting forces. A future research area is to include the workpiece deformation in the model’s constraints, in order to improve the proposed tool path optimisation, and enhance the machining sustainability and machined quality.