1 Introduction

Modern manufacturing companies endeavor to achieve long-term sustainability, tackling challenges, e.g., customer retention (Galletta et al. 2018), imposed by the market itself. In this effort, they constantly introduce new merchandise in an attempt to sustain, or increase their market share (Palsodkar et al. 2023). For the merchandise, they integrate supply chains that facilitate the material flow between integrated facilities and machines. In this respect, the supply chain management integrates a diversity of operations, including raw material procurement, product storage and delivery to customers. These operations could be well-supported by the digitization of the manufacturing operations and processes encouraged by current manufacturing concepts, such as Industry 4.0 (Dohale et al. 2023). However, despite this digitization, they are still ad-hoc controlled through operation strategies. By definition, these strategies, e.g., maintenance strategies (O’Donovan et al. 2015), intend to enrich the productivity and maintainability of manufacturing environments, yielding a better financial performance for them. They are formulated by strategists according to datasets captured from the integrated processes and operations exploiting intelligent systems, including cyber-physical ones (Hofmann and Rüsch 2017). Due to their large size, their assessment and process is rather complex to be performed with typical data processing methods and tools (Zhong et al. 2017). To this end, the aforementioned systems incorporate machine learning methodologies to extract manufacturing knowledge (Ahuett-Garza and Kurfess 2018).

In addition, machine learning approaches employed for scheduling operations in manufacturing systems. In the study of Tan et al. (2019) introduced a multi-agent model and a planning and scheduling algorithm, which enables manufacturing robots to work in coordination. Chen et al. (2019a, b) proposed a model that scales well for semi-supervised learning to forecast energy consumption with a few labelled data by modeling and compensating the unlabeled data. Recently, Agarwal et al. (2020) showed how deep learning can be used to find the input variables that maximize process profit or other non-linear process objectives such as quality.

Recent reviews can be found in Kang et al. (2020) that have grouped and analyzed the published research on the use of machine learning in production lines, and in the study of Dogan and Birant (2021) which provides a detailed review of literature in order to give an overview of how machine learning methods may be used to create intelligent manufacturing processes. Additionally, this study identifies many critical research issues that remain unresolved in the current literature on the same subject.

Machine learning based algorithms have been recognized as effective for knowledge mining and making predictions while treating complex production management problems and performing manufacturing systems optimization. In the recent study of Paraschos et al. (2021), a novel two-agent reinforcement learning-based approach proposed that includes parametric production and maintenance operations to increase the efficiency of the system. Also, Paraschos et al. (2022) introduced a novel framework to obtain joint policies for the authorization of production, recycling, maintenance and remanufacturing activities within deteriorating circular manufacturing systems. This framework uses a reinforcement learning algorithm and ad-hoc production control policies.

Algorithms based on machine learning, and specifically decision tree algorithms, proved to be efficient for knowledge mining and predictive modeling (Wu et al. 2008). In such kind of problems, a decision tree method uses parameters connected with the problem’s input variables to construct a model, and then, estimate the values of the dependent variable. A decision tree is created by breaking a dataset into structural components and describing the links between the features and the dependent variable in a decision tree. Decision trees are a relatively prevalent classification approach due to their potential to handle a huge range of parameters, both numerical and nominal, while generating a fairly clear result. Despite the potential merits, decision tree algorithms have been criticized as they are often found to be inefficient when providing predictions, owing to their nature.

Recently, decision trees used by academics for treating a wide range of problems in supply chains and manufacturing. More specifically, in the study of Soeprapto Putri et al. (2018) proposed and implemented a decision tree model for defect classification, which provides a logical basis for the classifications, can be used for root cause analysis of defects, and supports finding similar defects. In Antosz et al. (2020), rough sets theory and decision trees were employed to decrease supply chain service level failures during the implementation of the lean maintenance concept to increase operational efficiency. Additionally, Lyu et al. (2020) used a decision tree to model a process resulting in defective items and extract rules for recognizing defective batches and their corresponding production process features, while (Mahato and Narayan 2020) were employed a gradient-boosted decision trees to model and train supply chains to prevent service level failures. Also, Zangaro et al. (2020), used a Classification and Regression Tree (CART)-based decision tree to decide which manufacturing line to use for the delivery of components for assembly to help humans optimize their costs. In the recent study of Koulinas et al. (2021), decision trees algorithms used to illustrate efficient policies depicted from a hybrid reinforcement learning algorithm while optimizing a degrading manufacturing/remanufacturing system.

The present paper extends the work of Koulinas et al. (2020), by investigating a two-stage production system and applying tree methods for rule extraction. The study aims to improve the system's performance and validate the usefulness of the suggested algorithmic technique. Given that explainability in machine learning algorithms refers to the ability to understand and interpret the output or decision made by the algorithm (Linardatos et al. 2021), we performed additional research to assess the influence of tree algorithmic factors on their efficacy, and make the model more transparent and trustworthy. Studying production systems is vital in a world where digitization and automation are quickly altering conventional industrial processes. Researchers may get significant insights into these systems' decision-making processes and suggest areas for improvement by using tree algorithms for rule extraction. The current study stresses the importance of production optimization, specifically emphasizing a two-stage production system. The study intends to enhance the system's performance, improve its efficiency, and maximize profitability by employing tree algorithms. The research examines tree algorithms' performance and characteristics to choose the most efficient strategy. This analysis offers a more thorough knowledge of the optimization process and might offer suggestions for further study. Overall, the suggested method demonstrates the value of algorithmic innovation in the manufacturing industry and offers a potential prospect for improving the performance of production systems.

The rest of the paper is organized as follows: Sect. 2 provides a literature review of the research area addressed, Sect. 3 describes the proposed framework, consisting of the input dataset and the decision trees algorithms descriptions, Sect. 4 contains the experimental analysis, and in the Sect. 5 the conclusions are illustrated.

2 Literature review

The academic literature on manufacturing systems has explored a range of approaches, with certain systems specializing in single-stage production (Rajasekharan and Peters 2000) and others including multiple stages to complete and store goods (Kim and Sarkar 2017). Some systems are even equipped to remanufacture used or returned items (Khakbaz and Tirkolaee 2022). Periodic (Rivera-Gómez et al. 2013) or aperiodic (Chen et al. 2019a, b) inspections are often conducted to monitor degradation in manufacturing systems, which can also impact product quality. Methods such as control charts (Salmasnia et al. 2022) or sampling plans (Duffuaa et al. 2009) are used to detect any such degradation in product quality.

In the context of manufacturing/remanufacturing systems, relevant research has focused on developing control policies that enhance the profitability of systems that are experiencing degradation. Assid et al. (2021) developed optimal control policies for long-term total cost minimization with a hybrid manufacturing-remanufacturing system, considering three production decisions.

In a similar way, Koulinas et al. (2021) used decision trees to define the most efficient policies, derived by a reinforcement learning algorithm, for a complex manufacturing problem of a multi-stage manufacturing/remanufacturing system. Assid et al. (2023) provided an optimal control theory-based solution of the production planning and control problem within a hybrid manufacturing-remanufacturing system. Also, Rasay et al. (2022) defined an integrated problem of optimal maintenance planning and statistical process control of two stages of a two-stage dependent manufacturing system where, due to machinery deterioration and/or equipment failure, either stage can fall into the out of control state. Also, a genetic algorithm (GA) applied to find the optimal values of the decision variables minimizing the long-run expected average cost per unit time.

In the study of Liu and Papier (2022) proposed a framework for product substitution as part of repair and refurbishment using Bayesian estimation for optimal two-way substitution between new and remanufactured products. Additionally, Zhang et al. (2021) developed a joint production-maintenance decision model to determine the amount of system component deterioration and component maintenance to reduce component idleness and shorten the time required to complete production. In the study of Dehayem Nodem et al. (2011) presented a method to find the optimal control policy for a manufacturing system subject to random machine failure and repair.

Furthermore, various studies have attempted to integrate mechanisms and techniques for the planning and scheduling of activities like production and remanufacturing, to enhance the quality of remanufactured products. Gan et al. (2022) formulated an optimal model to solve production scheduling and maintenance planning problems of a multi-component system with economic dependence. He et al. (2020) proposed a framework for remanufacturing-ontology and knowledge management, along with a reuse methodology that leverages Case Based Reasoning to reuse the most similar past solution. Liu et al. (2021) considered a novel mixed-integer linear programming model to solve process planning problems effectively using commonly available mathematical programming solvers, such as CPLEX and Gurobi. Wang et al. (2023) formalizes the remanufacturing process as a real job shop problem and studies it in terms of scheduling with simultaneous energy consumption minimization and makespan minimization. In the recent study of Paraschos et al. (2021) tried to approve frequent minimal maintenance activities by using parametric control policies and reinforcement learning techniques, to decrease the downtime of manufacturing systems. Wang and Fei (2020) suggested a novel study to deal with integrated approach in financial decision making of remanufacturing production system from manufacturing and service firms point of view. In addition, Scheller et al. (2021) developed a comprehensive modelling framework for the optimal coordination of master production and recycling scheduling in order to achieve positive contribution margins, while (Arabsheybani and Arshadi Khasmeh 2021) proposed a multi-objective mathematical model considering simultaneous optimization of resiliency and uncertainty in multi-period and multi-product supply chain.

Existing studies on optimization of manufacturing systems tend to focus largely on the use of ad hoc policies or control charts for scheduling activities and quality inspections, with little consideration for the current state of the products and systems being manufactured. These methodologies are out of sync with the principles of smart circular manufacturing that aims to eliminate waste and enhance product quality through material reuse.

In response to this challenge, our research proposes an integrated design and operation management process that incorporates continuous quality inspections and maintenance, recycling, and remanufacturing activities. By employing reinforcement learning alongside ad hoc policies, such as Kanban, we develop joint policies for production, maintenance, recycling, and remanufacturing. The objective of this unique approach is to boost the adaptability and robustness of circular manufacturing systems while at the same time maintaining cost-effectiveness.

3 The proposed framework

This research focuses on applying and performing analysis of popular and verified efficient decision tree algorithms to construct tree structures that show the best-discovered rules using a reinforcement learning algorithm. It is worth noting that the leaves of the trees represent groupings of independent variables, while the nodes represent parameters. The suggested framework is shown in Fig. 1 below.

Fig. 1
figure 1

The flowchart of the proposed approach

Initially, the original dataset was generated by a reinforcement learning/Base Stock-based algorithm. Then, we performed four different decision tree algorithms to construct decision models, trying to generalize the findings from the dataset. The trees algorithms used were the REPTree, the HoeffdingTree, the J48 (C4.5), and the RandomTree algorithms. Next, we extracted the results from each algorithm and decision rule to define decision policies. The accuracy of each of the trees is monitored using standard output metrics. The result from every tree, the decision rule, is used to define control policies that could be customizable to numerous different production system configurations. The efficiency of each rule regarding the total system profitability is documented by applying each proposed policy and comparing the resulting average profits. In addition, we performed a sensitivity analysis of the J48 algorithm to test if the size of the tree impacts the algorithm's accuracy and if it has an impact on the total system profitability.

The J48 (C4.5) is a classic decision tree construction method introduced by Quinlan (1992). The ID3 (Iterative Dichotomizer 3) algorithm (Quinlan 1986) underlies its functionality since it constructs a decision tree from training data collection using information entropy. The attribute with the most significant information gain, the most remarkable change in information entropy, is used as the choice criterion. The above is because the attribute with the most significant information gain can distinguish the set's instances more clearly. The tree nodes represent the set's different characteristics. At each tree node, the algorithm chooses the data characteristic that most effectively divides the collection into subgroups. The tree branches represent the various attribute values, while the final nodes represent the dependent variable's categorization. The WEKA program uses the J48 classifier to build a C4.5 pruned or unpruned tree.

The REPTree is a rapid decision tree method based on C4.5 that builds several trees and then chooses the best one to be the representative. Each decision tree is constructed using gain/variance information and pruned using reduced-error pruning with back fitting. As with C4.5, missing values are handled by fragmenting the relevant instances.

The Random Tree method generates a tree using randomly selected characteristics and does not conduct pruning. Additionally, it enables the estimation of class probabilities (Frank et al. 2016).

The Hoeffding Tree algorithm implements the Hoeffding tree technique (Hulten and Spencer 2001) and is capable of learning from large amounts of data. This method uses the fact that a limited amount of data allows for the selection of highly excellent characteristics to divide. To be more exact, the Hoeffding bound quantifies the amount of information needed to calculate fitness measurements for a characteristic. Since this method makes use of the Hoeffding constraint, it has the benefit of guaranteeing high efficiency (Hulten and Spencer 2001).

3.1 The multi-stage system description

The manufacturing system involves two stages to process one type of product. Each stage contains a manufacturing facility and a storage facility. The first stage manufacturing facility manufactures work-in-progress parts, which are later stored in the first storage facility and remain there until they are transferred into the second stage of the production process. In the second stage, the manufacturing facility completes the final goods, stockpiled in the second storage facility. The maximum capacity of the first and second inventories can be defined as \(Q_{WIP}^{max}\) and \(Q_{P}^{max}\), respectively.

The behavior of the multi-stage system is formulated by a variety of random and dynamic events defined by a discrete event simulation technique (Xanthopoulos et al. 2016). In this respect, a series of degradation failures can affect the system's operability. Let n stages describe the system condition. With each failure, the two-stage system moves from stage n to stage n+1. These failures degrade the system's condition and the completed items' quality. In addition, maintenance activities are initiated to prevent significant deterioration. These activities restore the system to its previous condition before deterioration. If the system significantly deteriorates, it is considered that the system is malfunctioning and cannot continue to operate normally. Hence, the system must be repaired. The repair activities restore the system to stage 0, which is in very good condition.

Given the degradation in product quality, the system can produce high-quality, low-quality, and defective items. Regarding available sellable products, the customers can acquire only high-quality and low-quality items. On the other hand, defective items can be either remanufactured or recycled to generate revenues. The remanufacturing activities generate ready-to-be-sold products utilizing the stored faulty items. Conversely, recycling removes faulty items and creates ample space for storing high-quality and low-quality products.

Furthermore, with a probability \(P_{r}\), the dissatisfied customers can return their acquired products to the system. The returned products are stockpiled in the second storage facility and can be refurbished. The refurbishing activities turn the returned items into sellable products, which are considerably cheap and of low quality. Customers acquire these items when the inventory does not contain high-quality, low-quality, and remanufactured products.

3.2 Description of the dataset

A reinforcement learning/Base Stock-based control mechanism is implemented. It assumes a similar functionality to the reinforcement learning framework introduced in the works of Paraschos et al. (2022), Xanthopoulos et al. (2017). In this regard, it exploits an interacting agent observing the manufacturing environment and searching for an optimal operation strategy that yields improved competence, reliability, and output. Analytically, the decision-making process followed by the mechanism is described as follows. Let \(d_{e}\) denote the timepoints at which the mechanism makes decision. At \(d_{e}\), it receives information regarding the state of the studied system. The current state of the system is:

$$ S^{t} \leftarrow \left\{ {f,m_{st} ,\pi_{\alpha } ,\pi_{\epsilon } } \right\} $$
(1)

where \(f\) stands for system degradation, \(m_{st}\) represents the status of the multi-stage system, \(\pi_{\alpha }\) and \(\pi_{e}\) denote the high-quality and faulty items, respectively.

Given the observed state of system, the mechanism decides upon an action. Specifically, the mechanism can initiate a maintenance activity, refurbish returned items, recycle defective items, or manufacture good under the Base Stock policy. Formally, the permitted set of action \(\Delta\) is:

$$ \Delta \left\{ {f,m_{st} ,\pi_{\alpha } ,\pi_{\epsilon } } \right\} \leftarrow \left( {do\;nothing,Base\;Stock} \right),\;\;f \in \left[ {0,2} \right] $$
(2)
$$ \Delta \left\{ {f,m_{st} ,\pi_{\alpha } ,\pi_{\epsilon } } \right\} \leftarrow \left( {do\;nothing} \right),\;\;m_{st} = 0 $$
(3)
$$ \Delta \left\{ {f,m_{st} ,\pi_{\alpha } ,\pi_{\epsilon } } \right\} \leftarrow \left( {do\;nothing,maintain} \right),\;\;f \ge 2\;\;and\;\;\pi_{\alpha } = Q_{P}^{max} $$
(4)
$$ \Delta \left\{ {f,m_{st} ,\pi_{\alpha } ,\pi_{\epsilon } } \right\} \leftarrow \left( {do\;nothing,refurbish,recycle} \right),\;\;f \ge 2\;\;and\;\;\pi_{\epsilon } = Q_{P}^{max} $$
(5)

Following the above expressions, the mechanism behaves as follows. In case of system downtime \(\left( {m_{st} = 0} \right)\), the mechanism remains idle as the system does not operate. When the condition of the system is relatively good \((0 \le f < 2)\), the mechanism can authorize a Base Stock-based production activity or opt to remain idle. Under the Base-Stock policy, the system starts to produce new parts when it receives a new order. In this regard, the information regarding the customer demand is transmitted to the integrated manufacturing facilities to initiate the manufacturing process. This approach facilitates the production of high quality and ready-to-be-sold products. However, in the case of considerable degradation \(\left( {f \ge 2} \right)\), the mechanism can opt for system maintenance to improve the condition of the multi-stage production system and the quality of manufactured items. In addition, the system can recycle faulty products, or refurbish returned products to make ample space for the storage of sellable products, and hence fulfill pending customer orders.

After selecting an action at \(d_{e}\), a relative reward is received by the mechanism in the subsequent \(d_{e} + 1\). The goal of the mechanism is to find the optimum policy \(p^{op}\) that maximizes the attained average reward \(\overline{{A_{r} }}\). Formally, the attained average reward is:

$$ \overline{{A_{r} }} \leftarrow {\text{lim}}_{l \to \infty } \frac{1}{l}E\left( {\mathop \sum \limits_{n = 1}^{l} A_{{d_{e} + 1}} } \right) $$
(6)

where \(A = K_{tp} + K_{lp} + K_{rec} + K_{ref} + X_{r} - X_{{m_{n} }} - X_{p} - X_{stor} - X_{los} - X_{rem} - X_{ref} - F_{ret}\).

In the above expression, \(K_{tp}\), \(K_{lp}\), \(K_{rec}\), and \(K_{ref}\) refer to the high-quality, low-quality, recycled and refurbished items, \(X_{r}\) and \(X_{{m_{n} }}\) represent costs associated to repair and maintenance at \(n\), \(X_{p}\) and \(X_{stor}\) denote the production and storage costs, \(X_{los}\) and \(X_{rem}\) refer to lost sales and remanufacturing costs, respectively. Finally, \(X_{ref}\) and \(F_{ret}\) denote the refurbishing cost and returned product fee.

For the purposes of this study, a reinforcement learning algorithm, named R-smart (Gosavi 2004), is implemented to seek out optimal control policies for the subsequent decision-tree constructions. It falls into the category of the average cost reinforcement learning algorithms, which endeavor to maximize the defined average reward at every \(d_{e}\). To achieve that goal, it calculates two discrete values, that is, action value (\(q_{a} \left( {S,\delta } \right)\)) and average action reward (\(\overline{{A_{op} }}\)). The action values, also called q-values, are stored in a table, and are calculated at each \(d_{e}\) using (7). On the other hand, the average action reward is estimated with (8) at the next decision making timepoint (\(d_{e} + 1\)), when the action with the highest value is selected. Mathematically, R-smart is defined as follows:

$$ q_{a} \left( {S,\delta } \right) \leftarrow q_{a} \left( {S,\delta } \right) + \beta \left[ {A^{\prime} - \overline{{A_{r} }} + q_{a} \left( {S^{\prime},\delta^{\prime}} \right) - q_{a} \left( {S,\delta } \right)} \right] $$
(7)
$$ \overline{{A_{op} }} \leftarrow \left( {1 - \gamma } \right)\overline{A} + \beta A $$
(8)

Regarding the mathematical notation in expressions (7) and (8), \(A^{\prime}\) and \(S^{\prime}\) refer to the reward and the system state attained by the mechanism at \(d_{e} + 1\), \(\delta\) and \(\delta^{\prime}\) denote the actions selected by the mechanism at \(d_{e}\) and \(d_{e} + 1\), respectively, \(\beta\) is a real-valued contant. Finally, the action-state space is efficiently investigated using the e-greedy algorithm. The algorithm is characterized by a variable \(\pi_{\varepsilon }\), receiving values between 0 and 1. Let us consider a probability \(\pi_{r}\) for opting an action. The algorithm can select either a random action (\(\pi_{r} = \pi_{\varepsilon }\)), or the one with the highest value (\(\pi_{r} = 1 - \pi_{\varepsilon }\)).

4 Experimental analysis

4.1 Configuration

This section describes the experimental evaluation carried out for evaluating the decision tree-based decision-making framework. First, we generated the training dataset employing the reinforcement learning/Base Stock-based control mechanism presented in Sect. 2. The dataset contained optimal control policies for optimizing manufacturing, restoring and quality control activities. The generation of decision rules was performed with the utilization of decision trees trained based on the resulted dataset. For this purpose, the Waikato Environment for Information Analysis (WEKATM) software was used for classification methods application, and construct decision trees in this research. WEKATM is an open-source predictive analysis platform comprised of a collection of machine learning algorithms and data processing techniques. Data mining activities such as data pre-processing, clustering, sorting, feature collecting, regression, and visualization may be accomplished with the WEKATM tool suite (Frank et al. 2016).

The performance of the constructed decision trees was evaluated in the context of the two-stage manufacturing system, presented in Sect. 2. To this end, 25 simulation experiments were carried out. Each experiment was replicated 15 times and lasted up to the point where the examined two-stage manufacturing system completed 5.5 million items. During the experiments, the operation of the manufacturing system was simulated under real world-like fluctuating conditions pertaining to the customer demand, production, failures, repairs, maintenance, and remanufacturing.

Table 1 presents the first 10 simulation experiments. In the table, \(\lambda_{\pi }\) refers to the average rate of orders arrivals, \(\lambda_{m}\) denotes the average rate of manufacturing activities, \(\mu_{{\alpha_{n} }}\) and \(\mu_{{e_{n} }}\) represent the average rates of failure activities and system maintenance at stage \(n\), \(\mu_{\rho }\) refers to the average rate of repair activities, \(\lambda_{r}\) denotes the average rate of remanufacturing activities. It is worth noting that the mentioned average rates are exponentially distributed. Moreover, it is sensible that \(\mu_{{\alpha_{0} }} > \mu_{{\alpha_{1} }} > \cdots > \mu_{{\alpha_{n} }}\), and \(\mu_{{e_{0} }} < \mu_{{e_{1} }} < \cdots < \mu_{{e_{n} }}\).

Table 1 A summary of the experiments

4.2 Tree performance indices

Numerous output indicators may be utilized to evaluate the tree algorithms' trustworthiness. Initially, the TP (True Positive) index is calculated by counting the sum of positive instances identified correctly as positive classes. In contrast, the FP (False Positive) index is calculated by adding the number of negative classes classified incorrectly as positive. Additionally, the TPR (True Positive Rate) may be calculated by dividing the TP by the total of positive classes and the FPR (False Positive Rate) by the number of negative classes. Additionally, since the F-measure is the harmonic mean of precision and recall, it is a combined precision and accuracy indicator. Precision is calculated by dividing TP by the sum of TP and FP. In contrast, accuracy is calculated by multiplying TP by the TN (True Negative—the sum of correctly predicted negative occurrences) and dividing by the total number of cases categorized.

4.3 Results

The current study has a total of 326432 cases processed using each decision tree method. Additionally, the effect of pruning on the accuracy of the best algorithm is examined. The comparison findings for the eight instances are shown in Table 2 (4 algorithms). Notably, the C4.5 (J48) was applied with confidence factors of 0.35, 0.25, 0.2, 0.1, and 0.05, while lower values result in greater pruning. Initially, the correctly categorized cases and those erroneously classified are shown, with the J48 (0.2) being the most effective, accurately classifying 326,325 occurrences. The pruned J48 (0.25) method was more accurate than the HoeffdingTree, RERTree, and RandomTree algorithms, which wrongly categorized 216, 121, and 127 cases, respectively.

Table 2 Comparative results for the decision trees algorithms

As for the processing time needed for each algorithm to build the model, it took 5.35 s for the J48 (0.35), 3.73 s for the J48 (0.35), 3.81 s for the J48 (0.2), 4.44 s for the J48 (0.1), and 4.32 s for the J48 (0.05). As for the rest of the algorithms, the HoeffdingTree needed 4.83 s to build the model, while the REPTree consumed 3.04 s and the RandomTree about 2.67 s. As expected, a larger size of trees (namely less pruned) need more processing time, but generally, the time needed for model construction was reasonable.

Regarding the effect of pruning on the J48 algorithm's accuracy, it was noted that having more trimmed trees, i.e., trees with a smaller diameter, does not ensure that the algorithm would be more accurate. More precisely, the J48 (0.2) tree is erroneously categorized in just 107 occurrences, while the more pruned J48 (0.05) tree is incorrectly classified in 118 occasions, and the less pruned J48 (0.35) tree is incorrectly classified in 111 instances.

Additionally, Table 3 includes information on the performance of each tree method in terms of TP rate, FP rate, and F-measure. More precisely, the J48 method obtains the highest TP rate for all classes except "refurbish," for which the RandomTree approach beats the other techniques. Regarding the J48 method variations, it should be noted that less pruned trees are more likely to result in higher TP rate values. There are many equalities, presumably due to the high number of instances and the decimal precision chosen. Regarding the FP rate, the HoeffdingTree algorithm has the highest value (for the class "recycle"). In contrast, the other values are equivalent to zero due to the minimal number of incorrectly classified instances compared to the total number of examples used. Regarding the F-Measure index, the J48 algorithm outperforms all other algorithms for all classes except "refurbish," for which the REPTree algorithm outperforms all others, with the J48 coming second and the RandomTree and HoeffdingTree following.

Table 3 Results for the algorithms of the present analysis

In this research, each rule extracted by every decision tree algorithm was employed to address the optimization problem that the R-smart algorithm was initially confronted with, validating its efficacy in managing the production system and optimizing its profitability. Each rule was applied to the optimization problem, resulting in the profitability shown in Table 4. It is worth noting that the average profit values for the processed cases are shown. These results demonstrate that the J48 algorithm outperforms the other methods regarding average profit.

Table 4 Results of profitability for the trees algorithms used (103 €)—(R-smart)

In addition, regarding the J48 algorithm, it is found that less pruning assists in producing a better-performing decision rule since the J48 (0.35) rule achieved better average profitability instead of the J48 (0.05), J48 (0.1), J48 (0.2), and the J48 (0.25). Moreover, it is worth noting that the J48 (0.35) algorithm achieved the best profitability for 10 of the 25 scenarios, the J48 (0.25) for 3 of 25, the J48 (0.2) for 5 of 25, the J48 (0.1) for 6 of 25, and the J48 (0.05) for 6 of 25. As for the rest algorithms, the HoeffdingTree did not find the highest profits for any of the scenarios, the REPTree found 4 of 25, and the RandomTree found 3 of 25. Note that when applying the R-learning algorithm (without any decision tree constructed and no decision rule), the average profitability is 4.10E+08 k€. The best profitability is found for 10 of the 25 scenarios. This finding illustrates the value of the current innovative approach since the rule extracted by the data generated with R-smart achieves better results than the R-smart algorithm alone.

The rule derived using the most efficient method (in terms of profitability), the J48 (0.35), is described in Table 5. This rule states that if the system's "machine" level is equal to or less than zero, the action executed is "do nothing." Additionally, suppose the "machine" parameter is positive, and the "prod a" value is equal to or less than 99. In that case, the "def prod" value is equal to or less than 99, the "def prod" value is equal to or less than 6, the "prod a" value is equal to or less than 9, and the "machine" parameter is equal to or less than 1, the action performed is "produce". Similarly, the rule contains "instructions" for carrying out actions in each instance, which may result in a higher objective function value for the issue.

Table 5 The rule extracted by the J48 (0.35) algorithm

5 Conclusions

In this study, we used a variety of decision tree algorithms to classify instances and represent the highest performance policies generated by a reinforcement learning algorithm while optimizing a complex production, maintenance, and quality control problem within a degrading manufacturing and remanufacturing system. More specifically, we applied decision tree algorithms for the first time to mine knowledge that can optimize production, quality, and maintenance in the context of flexible manufacturing systems. The results we achieved were highly encouraging. Additionally, we conducted further research to exposit the influence of algorithmic factors on the decision tree's efficacy, ensuring that decision-makers could understand the procedures involved and trust the model utilized.

Among the performed algorithms, the C4.5 (J48) was found to be the most accurate, especially its highly pruned version (J48 (0.01)), with the Hoeffding Tree, Random Tree, and REPTree, following. Additionally, the extracted rules were employed to address the initial optimization problem that the reinforcement learning algorithm addressed to demonstrate their effectiveness at controlling the production system and maximizing its profitability. Again, the J48 algorithm proved the most efficient compared to the other three decision tree algorithms. As for the J48 algorithm, we performed a sensitivity analysis to determine if the pruning level impacts the objective function value. Regarding profitability, we discovered that utilizing the J48 algorithm to build more extensive and complex trees can generate rules that result in higher profits. However, it is worth noting that this approach may compromise accuracy. Note that this finding is only for the J48 algorithm and cannot be generalized since the more giant trees created with the RandomTree algorithm have lower profitability.

Additionally, we retrieved the decision rule from each associated tree and put it in the optimization issue to validate its effectiveness. The verification findings established that the J48 method is the most efficient. At the same time, the pruning level affects the rule's efficiency, as extremely large or small tree sizes result in poor rule performance regarding system profitability.

In terms of management implications, the comparison of findings has a significant impact on real-world industries and the ever-evolving challenges they face. By thoroughly analyzing and categorizing the control policies generated by the production mechanism, we have developed a framework that produces effective decision rules for the manufacturing and quality control of items. Owing to the universal nature of these principles, they can be implemented in new manufacturing challenges and systems without incurring significant costs. This article reveals how our framework can be applied to real-world situations, thereby enabling organizations to improve their manufacturing processes and yield better quality items.

The retrieved information has significant implications for decision-making processes in complex and dynamic real-world industrial environments. Specifically, managers can leverage the suggested technology to analyze vast quantities of data collected from production systems and identify broad patterns, requirements, and trends dictated by customers and the manufacturing sector. This information can be utilized to adapt the integrated production process and supply chains while minimizing emergent operational expenses like missed sales. The proposed method is an intelligent and robust tool for decision-making in the Industry 4.0 era, which has been transformed by technologies such as the Internet of Things (IoT). Future work could include the application of additional machine learning principles such as neural networks to compare algorithmic accuracy and system profitability. Additionally, more complex production systems could be explored.