1 Introduction

As the Industry 4.0 (I4.0) era approaches, factories come closer to advanced technologies, such as Artificial Intelligence (AI) and Industrial Internet of Things (IIoT), to significantly enhance their performance through innovative methods (Gilchrist 2016). For instance, through real-time data collection and processing, manufacturers can monitor the system condition, detect possible anomalies, and promptly inform supervisors to take action before the anomalies become severe and lead to production downtime (Wang et al. 2022). Several AI solutions, such as Support Vector Machines and Artificial Neural Networks, have been presented, demonstrating great accuracy in predicting production line malfunctions (Li et al. 2017). However, their black-box nature makes their outcome explanation challenging, which leads to supervisors’ reduced trust in the AI models and, thus, hinders their deployment in critical applications where humans make the final judgment, for example, industrial anomaly detection (Rehse et al. 2019). Furthermore, the lack of transparency and interpretability hinders the efficient identification of weaknesses in AI algorithms and their subsequent improvement (Alfeo et al. 2023). Therefore, there is a need to develop AI models with transparent and interpretable behavior that can provide explanations, enabling end-users to understand decision-making processes, explore impactful input factors, delve into model mechanics, and respond appropriately (Carletti et al. 2019).

Recently, a new research direction called eXplainable Artificial Intelligence (XAI) has emerged that deals with developing techniques, algorithms, and tools that produce human-comprehensible explanations of the decisions of AI-based systems (Adadi and Berrada 2018). These explanations can take various forms depending on the application, including IF-THEN rules that express input–output data relationships, visual highlighting, for example, the important parts of input images for model predictions, feature importance rankings, textual explanations, and counterfactuals, among others (Kök et al. 2023). The transition from AI to XAI is imperative for successfully integrating automated decision-making into production environments in which humans make supervision and final decisions.

In recent years, the research community has introduced two main categories of XAI methodologies based on their implementation. These categories are:

  • Post-hoc explanation methods: External techniques that seek to explain black-box models by approximating them either globally or locally using interpretable surrogate models.

  • Intrinsic interpretable models: Models that can explain their predictions by themselves.

However, while post-hoc methods have been valuable in interpreting complex decision processes, certain limitations have motivated a shift towards intrinsic interpretable models. These challenges arise from the fact that the surrogate model may not accurately reflect the actual behavior of the underlying black-box model, leading to misleading explanations (Rudin 2019). Furthermore, even if the surrogate model approximates well, it may rely on different features compared with the black-box model, further contributing to explanations inconsistent with the original model. Another disadvantage is that the explanations of these techniques can be easily manipulated to be acceptable through specific frameworks, even if the base model is highly biased (Slack et al. 2020). Finally, when the dataset includes interrelated features, the assumption of feature independence made by commonly used post-hoc methods such as permutation feature importance, Local Interpretable Model-agnostic Explanations (LIME) method, and SHapley Additive exPlanations (SHAP) method can result in misleading explanations (Aas et al. 2021). Thus, these disadvantages have heightened research interest in learning intrinsic interpretable models, whose decisions can be explained without additional techniques, representing assimilated knowledge in a manner consistent with human thought (Alonso et al. 2015).

In light of the challenges posed by post-hoc methods, the inherent adaptability and interpretability of fuzzy systems have emerged as promising solutions. Zadeh’s foundational work on fuzzy sets has paved the way for developing fuzzy systems that can model complex systems using a higher level of abstraction in a human-understandable form (Zadeh 1965; Chen and Niou 2011). The interpretability of these systems is not merely incidental; it is a core feature rooted in their ability to capture and represent knowledge in a way that reflects human cognition and provides a detailed understanding of complex systems and their underlying dynamics (Chen and Chen 2002). This makes fuzzy systems essential in the shift from post-hoc methods to inherently interpretable models (Chen and Jian 2017). Furthermore, fuzzy systems have proven valuable in forecasting (Pant and Kumar 2022). In particular, fuzzy forecasting techniques, which leverage fuzzy logical relationships, present a novel method for predicting the behavior of complex systems (Chen and Wang 2010; Chen et al. 2006). In related work, Petri Nets, which are viewed as a tool for fuzzy modeling, face challenges such as the state explosion problem, reminiscent of the issues with black-box models (Shen et al. 2013). This issue arises when these nets become so large that their behavior becomes challenging to monitor, leading to inefficiency. Such challenges echo the problems observed with black-box models, highlighting the pressing need for models that balance complexity with clarity (Chen and Fang 2005). As the focus shifts towards intrinsic interpretable models, the fusion of fuzzy logic and advanced techniques has become evident. Merging fuzzy logic with techniques, such as neural networks and expert systems, results in neuro-fuzzy methods that provide a robust approach to knowledge representation. This fusion effectively bridges human cognition with sophisticated computational models, ensuring clarity and computational prowess (Chen et al. 2009).

Fuzzy Cognitive Maps (FCMs), a type of recurrent neural network, are widely used intrinsic interpretable models for knowledge representation. They typically integrate fuzzy logic features during development, classifying them as a neuro-fuzzy method (Kosko 1986). Specifically, FCMs are directed graphs consisting of nodes called concepts representing the components of the modeled system or conceptual entities, which can be seen as information granules, and incorporate weighted edges that describe the causal relations between them. This characteristic places FCMs within the area of granular computing, which focuses on the conceptualization and processing of information granules (Papageorgiou and Stylios 2008). FCMs find applications in modeling complex systems, including industrial systems, and in addressing prediction problems such as time-series forecasting and classification (Wang et al. 2021; Song et al. 2011; Loia et al. 2016). FCMs offer several advantages due to their unique characteristics:

  1. 1.

    They can use experts’ assessments when the collected data are insufficient,

  2. 2.

    Their graphical structure provides an intuitive representation where concepts and weights have a well-defined meaning for the system under analysis,

  3. 3.

    They provide feature-based explanations for their predictions, being inherently interpretable,

  4. 4.

    The inference process of FCMs is visually transparent, enabling users to comprehend the decision-making process leading to predictions,

  5. 5.

    Experts can modify the weights of FCMs to encode rules that have not yet been observed in data (e.g., a new type of fault in the manufacturing system), providing a level of flexibility unattainable in other intrinsic interpretable models,

Given the aforementioned advantages, FCMs have garnered significant interest from researchers and have proven to be extremely useful across various domains (Papageorgiou and Salmeron 2013). For instance, in the industry context, Lee et al. (1997) proposed an FCM-based model for fault diagnosis in a tank-pipeline system that successfully identified various simulated faults, whereas Stylios and Groumpos (1998) presented an FCM-based supervisor of manufacturing systems for failure detection and decision analysis. Lastly, Tirovolas and Stylios (2022) proposed FCMs as a health indicator prognostic method for engines’ remaining useful life in the context of predictive maintenance. However, while the literature often highlights the interpretable nature of FCMs, this is primarily based on the clarity of their concepts and weights, rather than a demonstration of their explanatory performance. Therefore, thorough numerical simulations should be performed to determine the capabilities of FCMs to explain their decisions.

To evaluate the interpretability features of FCMs, it is crucial to understand their development processes. Currently, two fundamental methods for FCM construction are found in the literature: (a) expert-based and (b) data-driven approaches (Papageorgiou and Stylios 2008). In the expert-based method, FCM concepts and weights are determined solely based on domain experts’ knowledge, which is incorporated into the model using fuzzy logic theory (Stylios and Groumpos 2004). However, this approach relies heavily on the expertise level of individuals, potentially leading to unsatisfactory performance, as experts may overlook essential aspects of the problem and assign inappropriate weight values (Song et al. 2009). On the other hand, the data-driven approaches automatically define FCM parameters from available data using learning algorithms or calculate interconnection weights as correlation coefficients between variables (Papageorgiou 2012; Czerwinski et al. 2021; Nápoles et al. 2020a). Specifically, when learning algorithms are employed without prior expert knowledge, the presence of all weights is usually assumed, leading to an over-parameterized model, or the interconnections between concepts are arbitrarily established (Nápoles et al. 2020b).

Nevertheless, the dataset may contain spurious correlations that can be unintentionally captured from the FCM and bias the learning process. This introduces fragility to the FCM, compromising its reliability, prediction accuracy, and interpretability (Forward 2022; Wang and Culotta 2021). Indeed, the employed learning algorithms, performing as black boxes, aim to fit the FCM to the available data based on passively observed historical correlations, which can indicate a predictive relationship among variables. However, these algorithms do not consider the semantics of the analyzed system and thus fail to distinguish between causal and spurious relationships. Therefore, without establishing appropriate constraints beforehand, these algorithms are fooled by illusory patterns and assign weight values to the corresponding edges, resulting in an FCM that does not represent the authentic system interactions; instead, it learns the correlational associations between features. Consequently, when the assimilated spurious correlations break down, the model’s predictions inevitably fail, while the explanations are erroneous, as the FCM misconstrues the relationships between the problem variables. Similarly, relying on the correlation coefficient is equally unreliable because it has been shown that correlation does not necessarily imply causality (Rohrer 2018). In the domain of industrial anomaly detection, such an FCM proves ineffective, as its explanations misdirect plant supervisors to irrelevant parts of the manufacturing system, hindering the identification of the root causes of faults. Such gaps provide the motivation to develop new methods that identify authentic causal relationships between problem variables and rule out possible spurious correlations (Nápoles et al. 2020b). In this direction, Yosef et al. (2022) presented a method for removing spurious correlations by calculating the concepts’ behavioral similarity through data and applying a set of defined rules from domain experts to discern the actual causal relationships. However, through this approach, an FCM can still contain spurious correlations that experts consider acceptable, while some actual causal associations can remain undetected as they can be beyond experts’ knowledge. Finally, this expert-driven causality analysis is unfeasible for highly complex systems with many variables.

A potential solution to these limitations is to devise a method that identifies the real causal structure of an FCM from observational data, eliminating the need for domain experts. By doing so, this method can offer a significant contribution in two ways: (a) preventing the injection of spuriousness into these cognitive networks, thereby enhancing their prediction accuracy and interpretability, and (b) providing a tool that can efficiently handle large-scale problems. Notably, a distinguishing contribution of this work is that, to the best of the authors’ knowledge, no other data-driven causal discovery method has been suggested to rule out spurious correlations in FCMs, thereby elevating their performance and robustness.

This paper’s main contribution is introducing a novel approach for FCM construction, leveraging the Liang-Kleeman Information Flow (L-K IF) analysis for causal inference. In more detail, unlike the approach presented by Yosef et al. (2022), the proposed technique contributes by eliminating the necessity for expert involvement; it identifies the authentic causal relationships from the data using an automatic causal search algorithm. A pivotal part of our contribution is the imposition of the derived causal links as constraints during the FCM learning procedure. This strategic move is tailored to effectively remove spurious correlations and, in doing so, improve the FCM’s aggregate predictive and explanatory power. The capabilities of the proposed method are demonstrated in the context of developing an XAI model for anomaly detection and root cause analysis in an industrial system. Finally, a comparative analysis is conducted between the developed FCM and state-of-the-art FCM-based models in terms of their predictive and explanatory power. It is worth noting that while the presented case study focuses on anomaly detection, the proposed method can be effectively employed in other prediction problems as well. For further details and implementation, the code for this study is available in Tyrovolas et al. (2023).

The rest of the paper is organized as follows. Section 2 presents the foundations of the classic FCM formalism and L-K IF analysis. Section 3 describes in detail the proposed methodology, including the model’s development process and how to predict and interpret its results. Section 4 conducts extensive numerical simulations to compare the proposed model against state-of-the-art FCM-based models. Finally, Sect. 5 presents some concluding remarks.

2 Theoretical background

This section first presents some basic notions of FCMs regarding their structure and how they perform the simulations. Second, it describes the causal inference tool L-K IF analysis, used to determine the actual causal relationships between the analyzed system variables.

2.1 Fuzzy cognitive maps

As mentioned in Sect. 1, an FCM consists of n concepts \(C_{i} \,, i \in \{1,2,\cdots ,n\}\), and weights \(w_{ij} \in [-1,1]\) that indicate the causal influence from \(C_{i}\) to \(C_{j}\). In general, there are three kinds of causality:

  • Positive causality (\(w_{ij}>0)\): the affected variable (\(C_{j}\)) changes (increases or decreases) in the same direction as its cause variable (\(C_{i}\)) changes.

  • Negative causality (\(w_{ij}<0)\): the affected variable (\(C_{j}\)) changes in the opposite direction to its cause variable (\(C_{i}\)) change.

  • Zero causality (\(w_{ij}=0)\): there is no relation between the cause (\(C_{i}\)) and the affected (\(C_{j}\)) variable.

Each concept \(C_{i}\) has an activation value \(A_{i}\), which is determined via a reasoning rule, where the most common is

$$\begin{aligned} A_{i}^{(t+1)}=f\left(\sum _{\begin{array}{c} j =1 \\ j \ne i \end{array}}^n A_{j}^{(t)} w_{ji}\right), \end{aligned}$$
(1)

where t is the iteration step, \(A_{i}^{(t+1)}\) denotes the activation value of the i-th concept at \((t+1)\)th iteration step, \(A_{j}^{(t)}\) denotes the activation value of the j-th concept at tth iteration step, \(w_{ji}\) denotes the causal weight from jth concept to i-th concept, and \(f(\cdot )\) denotes the activation function that normalizes the concepts’ activation values within a specified interval (Kosko 1986). The most known activation functions are bivalent, trivalent, hyperbolic tangent, and sigmoid, where depending on which is selected, \(A_{i}^{(t+1)}\) receives values within the [0, 1] or \([-1,1]\) intervals (Orang et al. 2022). The activation values of all concepts in each iteration step can be expressed as a state vector \({\textbf{A}} \in {\mathbb {R}}^{n}\), while the values of the causal weights \(w_{ij}\) between each pair of concepts \(C_{i}\) and \(C_{j}\), compose a weight matrix \({\textbf{W}} \in {\mathbb {R}}^{n \times n}\), whose diagonal elements are equal to zero. Therefore, (1) can be rewritten as:

$$\begin{aligned} {\textbf{A}}^{(t)}=f({\textbf{A}}^{(t-1)} {\textbf{W}}). \end{aligned}$$
(2)

Using (2), the activation values of the concepts in each iteration step are computed. An initial state vector \({\textbf{A}}^{(0)}\), which includes input data (e.g., sensor data), triggers the FCM’s iterative reasoning process (Falcon et al. 2019). Subsequently, a new state vector yields at each iteration step until the termination condition is satisfied, which can be either the FCM’s convergence to an equilibrium point, leading to reliable results, or the completion of a maximum number of iterations, where the FCM exhibits cyclic or chaotic behavior (Kosko 1988).

2.2 Information flow

As mentioned above, accurately identifying authentic causal relationships between variables in a modeled system is crucial for developing efficient FCMs. Causality inference from data is a notoriously difficult problem that has been extensively studied for over half a decade (Egrioglu et al. 2022). Various methods (mostly statistical) have been proposed to address this challenge, but these often suffer from certain limitations (Eichler 2013; Hlavackovaschindler et al. 2007; Runge et al. 2012). Some of these approaches are qualitative, lacking the necessary quantitative information for this research’s purpose, while other methods are empirically or half-empirically formulated, which, although successful in specific contexts, lack the desired universality for developing all-purpose algorithms. Recently, it has been realized that causality is actually a real physical notion called Information Flow (IF) and can be rigorously derived from first principles (Liang 2014, 2016). Specifically, IF describes the contribution of one variable’s entropy per unit of time in increasing the marginal entropy of another variable and reflects the magnitude, kind, and direction of their cause-effect relationship. This offers a promising way to systematically formulate causality analysis in a quantitative sense based on a rigorous theoretical framework, enabling its universal applicability across different disciplines. The fundamental equations for calculating the IF between two or more system variables are as follows.

Let be a two-dimensional (2-D) dynamic system:

$$\begin{aligned} d {\varvec{x}} = {\varvec{F}}({\varvec{x}},t)dt + {\varvec{B}}( {\varvec{x}},t)d {\varvec{w}}, \end{aligned}$$
(3)

where \({\varvec{F}} = (F_{1}, F_{2})\) is the deterministic components, \(x = (x_{1}, x_{2}) \in {\mathbb {R}}^{2}\) is the state variables, \({\varvec{w}} = (w_{1}, w_{2})\) is a standard 2-D Wiener process, and \({\varvec{B}} = (b_{ij})\) is the matrix of perturbation amplitude (Liang 2008). For the aforementioned system, the IF from \(x_{2}\) to \(x_{1}\) is

$$\begin{aligned} T_{2 \rightarrow 1} = - E \left( \frac{1}{\rho _{1}} \frac{\partial F_{1} \rho _{1}}{\partial x_{1}} \right) + \frac{1}{2} E \left( \frac{1}{\rho _{1}} \frac{\partial ^{2} \rm{g}_{11} \rho _{1}}{\partial x_{1}^{2}} \right) , \end{aligned}$$
(4)

where \(\rho \left( t; x_{1}, x_{2}\right)\) is the joint probability density function, \(\rho _{1} \left( t; x_{1}\right) = \int _{{\mathbb {R}}} \rho dx_{2}\) is the marginal density of \(x_{1}\), \(\rm{g}_{11} = \displaystyle \sum \nolimits _{k=1}^2 b_{1k}^{2}\), and E is the expectation with respect to \(\rho\). An important property of (4) is the satisfaction of the nil causality principle, according to which \(x_{2}\) is not causal to \(x_{1}\) (\(T_{2 \rightarrow 1}=0\)) if the evolution of the latter is independent of the former (neither \(F_{1}\) nor \(\rm{g}_{11}\) depends on \(x_{2}\)) (Liang 2016).

As a further step, Liang (2014) established that under a linearity assumption, the IF of two system variables can be estimated from only two time series, say, \(X_{1}\) and \(X_{2}\), using the following maximum-likelihood estimator of (4):

$$\begin{aligned} T_{2 \rightarrow 1} = \frac{C_{11} C_{12} C_{2,d1} - C_{12}^{2} C_{1,d1}}{C_{11}^2 C_{22} - C_{11} C_{12}^2}, \end{aligned}$$
(5)

where \(C_{ij}\) is the sample covariance between \(X_{i}\)and \(X_{j}\), and \(C_{i,dj}=\overline{(X_{i}-\overline{X_{i}})(\dot{X_{j}}-\overline{\dot{X_{j}}})}\) is the sample covariance between \(X_{i}\) and the difference approximation of \(\frac{\text {d}X_{j}}{\text {d}t}\), which is computed using the Euler forward scheme: \({\dot{X}}_{j,n} = \left( X_{j,n+k}-X_{j,n}\right) /(k\Delta t)\), with \(k \ge 1\) some integer. The IF in the opposite direction, i.e., \(T_{1 \rightarrow 2}\), is obtained by swapping indices 1 and 2. Besides, writing (5) as a function of correlation and/or correlation-like quantities gives

$$\begin{aligned} T_{2 \rightarrow 1} = \frac{r}{1-r^2}(\acute{r}_{\small 2,d1} - r \, \acute{r}_{\small 1,d1}), \end{aligned}$$
(6)

where \(r = C_{12}/ \sqrt{C_{11}C_{22}}\) is the sample correlation coefficient between \(X_{1}\) and \(X_{2}\), and \(\acute{r}_{\small i,dj} = C_{i,dj}/ \sqrt{C_{ii}C_{jj}} \, (i,j = 1,2)\) is the "correlation" between \(X_{i}\) and \(\dot{X_{j}}\) but normalized with the variances of \(X_{i}\) and \(X_{j}\). According to (6), when two variables are causally related (\(T_{2 \rightarrow 1} \ne 0\)), they are correlated (\(r \ne 0\)). However, the opposite does not hold. This property helps distinguish authentic causal relationships from spurious correlations.

Recently, (5) was generalized, resulting in a simple formula for causality analysis among multiple variables (Liang 2021). In detail, given a dataset of d time-series variables, the IF from \(X_{2}\) to \(X_{1}\) is

$$\begin{aligned} {\hat{T}}_{2 \rightarrow 1} = \frac{1}{detC} \cdot \displaystyle \sum \limits _{j=1}^d \Delta _{2j} C_{j,d1} \cdot \frac{C_{12}}{\mathrm {C_{11}}}, \end{aligned}$$
(7)

where \(C_{j,d1}\) is the sample covariance between \(X_{j}\) and \({\dot{X}}_{1}\), and \(\Delta _{ij}\) are the cofactors of the covariance matrix C. An algorithm (Algorithm 1) for multivariate time-series causality analysis is developed based on (7). As observed from the algorithm, a statistical significance test is conducted to draw safe conclusions about the actual causal relationships for each pair of variables, estimated by \({\hat{T}}_{i \rightarrow j}\).

figure a

Nevertheless, the importance of the causal relationship must be assessed more than by inspecting the presence of causality between variables. For this purpose, the normalization of the estimated significant IF rates has been proposed with the normalizer of \({\hat{T}}_{2 \rightarrow 1}\) being

$$\begin{aligned} {\hat{Z}}= \Bigg |\widehat{\left( \frac{\text {d}H_1^*}{\text {d}t}\right) } \Bigg |+ \displaystyle \sum \limits _{j=2}^d |{\hat{T}}_{j \rightarrow 1} |+ \Bigg |\widehat{\left( \frac{\text {d}H_1^{noise}}{\text {d}t}\right) } \Bigg |, \end{aligned}$$
(8)

where

$$\begin{aligned}{} & {} \widehat{\left( \frac{\text {d}H_1^*}{\text {d}t}\right) } = \frac{1}{detC} \cdot \displaystyle \sum \limits _{j=1}^d \Delta _{1j} C_{j,d1}, \end{aligned}$$
(9)
$$\begin{aligned}{} & {} \widehat{\left( \frac{\text {d}H_1^{noise}}{\text {d}t}\right) } = \frac{1}{2} \frac{\hat{\rm{g}}_{11}}{C_{11}}, \end{aligned}$$
(10)

and \(\hat{\rm{g}}_{11} = \frac{Q_{N,1}\Delta t}{N}\). Finally, the normalized IF from \(X_{2}\) to \(X_{1}\) is:

$$\begin{aligned} \tau _{2 \rightarrow 1} = \frac{T_{2 \rightarrow 1}}{{\hat{Z}}} \end{aligned}$$
(11)

which lies on \([-1,1]\). When \(|\tau _{2 \rightarrow 1} |\) is 1, \(X_{2}\) has the greatest causal impact on \(X_{1}\). Furthermore, simply swapping the indices in the above equations yields \(\tau _{1 \rightarrow 2}\).

2.3 L-K IF analysis on binary time series

Previous studies utilizing L-K IF analysis to identify causal relations have not focused on discrete-valued signals that take a few values, such as a binary time series. However, real-world datasets, particularly in industrial settings, often comprise binary variables such as the state of a proximity sensor or button. Consequently, an experiment was conducted to verify the efficiency of the causal inference tool in effectively handling binary data.

Let be three series \(X_{1}\), \(X_{2}\), and \(X_{3}\), generated from three autoregressive processes:

$$\begin{aligned}{} & {} X_{1}(n+1) = 0.1 + 0.4 X_{1}(n) - 0.8 X_{3}(n) + e_{1}(n+1), \end{aligned}$$
(12a)
$$\begin{aligned}{} & {} X_{2}(n+1) = 0.7 + 0.7 X_{3}(n) - 0.8 X_{2}(n) + e_{2}(n+1), \end{aligned}$$
(12b)
$$\begin{aligned}{} & {} X_{3}(n+1) = 0.5 + 0.5 X_{3}(n) + e_{3}(n+1), \end{aligned}$$
(12c)

where \(X_{3}\) is the confounder of the other two (\(X_{3} \rightarrow X_{1}\) and \(X_{3} \rightarrow X_{2}\)) without any other causality, and the errors, \(e_{1} \sim N(0,1)\), \(e_{2} \sim N(0,1)\) and \(e_{3} \sim N(0,1)\) are independent. After initializing the variables with random values and generating 10,000 samples for each, L-K IF analysis was performed. Table 1a depicts the derived IF rates and their respective confidence intervals at the 99% confidence level. The results demonstrate that the only significant IF rates are \(T_{3 \rightarrow 1}\) and \(T_{3 \rightarrow 2}\) as they lie within the intervals [0.1975, 0.2091] and [0.0613, 0.0657], respectively, which is in agreement with the actual relations. The rest of Ts take both negative and positive values; thus, they cannot be distinguished from zero. It is noteworthy that creating pseudorandom values can lead to slightly different results for different series. Nevertheless, the mean is expected to converge to the same value when an ensemble of series is examined. Subsequently, the experiment was repeated using the binarized time series, that is, the series discretized into 0 or 1. After repeating the L-K IF analysis (Table 1b), it is concluded that the proposed technique reliably captures the causal relations in a qualitative sense, even if the time series have been binarized.

Table 1 IF rates for the series generated with (12) and their respective confidence intervals (99% confidence level)

3 Proposed methodology

Figure 1 illustrates the proposed methodology, outlining the major phases of constructing an FCM-based model and interpreting its predictions.

Fig. 1
figure 1

Proposed methodology scheme

3.1 Data pre-processing

Once data are collected from the target system, such as a manufacturing system, suitable data pre-processing techniques are employed. Initially, since FCM can only handle numeric data, categorical variables, including class attributes in a classification problem, need to be encoded. The numerical representative (\(a_{j} \in [0,1]\)) for each class label (\(class_{j}\)) is calculated using the following formula:

$$\begin{aligned} a_{j} = \frac{j-1}{m-1}, \end{aligned}$$
(13)

where \(j \in \{1,\dotsc ,m\}\) and \(m \ge 2\) the number of class labels.

In the context of FCMs, an essential step is the assignment of fuzzy values to concepts, known as data fuzzification. Fuzzification is practically considered a data normalization procedure that computes the concepts’ initial activation values for each data observation. Traditional normalization techniques include min-max and \(z\)-score normalization; however, they present some weak points, such as out-of-bounds error when a new value is outlying and susceptibility to outliers. Furthermore, min-max normalization yields different normalizations for different data separations, such as in cross-validation. To address these issues, the Generalized Logistic (GL) algorithm was utilized in this study for data normalization (Cao et al. 2016). This algorithm makes no assumptions about the distribution of variables but instead uses a generalized logistic function to approximate the cumulative distribution function (CDF) of each variable. The main advantage of this method is its robustness against outliers. The algorithm maps values from interval (\(-\infty\), \(\infty\)) to the interval [0, 1].

3.2 Information flow-based fuzzy cognitive map (IF-FCM)

After preparing the data, the next step is to define the FCM architecture that determines the type and number of concepts. In the context of classification, the literature presents two main FCM architectures, which differ in the number of output concepts (\(\rm{OCs}\)), but also in how they assign a class label for each data instance. The first architecture, known as the class-per-output architecture (CpO), maps each class label to a separate \(\rm{OC}\) with m total outputs. The predicted class is then indicated by the \(\rm{OC}\) with the highest activation value in the last iteration of the reasoning process. In contrast, in the second architecture, referred to as the single-output architecture (SO), the class attribute is mapped to a single \(\rm{OC}\) \(C_{n}\). The estimated activation value of \(C_{n}\) is then assigned to one of the class labels by dividing the activation interval ([0, 1] or \([-1,1]\)) into partitions, each corresponding to a class label (Papakostas et al. 2008). The classification process in an FCM-SO can be summarized as follows:

  • Step 1: Consider the kth data observation in the dataset as the initial state vector

    $$\begin{aligned} {\textbf{A}}_{k}^{(0)} = [A_{1k}^{(0)}, A_{2k}^{(0)}, \dotsc , A_{nk}^{(0)}=0], \end{aligned}$$
    (14)
  • where \(A_{ik}^{(0)} \in [0,1]\), \(i \in \{1,2,\dotsc ,n-1\}\) are the initial activation values of the input concepts, and \(A_{nk}^{(0)}\) the initial activation value of the \(\rm{OC}\)

  • Step 2: Applying the employed reasoning rule recurrently, calculate the state vector

    $$\begin{aligned} {\textbf{A}}_{k}^{(l)} = [A_{1k}^{(l)}, A_{2k}^{(l)}, \dotsc , A_{nk}^{(l)}], \end{aligned}$$
    (15)

    in the steady state l, whereas \(|A_{ik}^{(l)}-A_{ik}^{(l-1)}|<\varepsilon\), with \(\varepsilon\) being a small positive number (usually \(10^{-5}\)), and \(i ~\in ~\{1,2,\dotsc ,n\}\). The maximum number of iterations is denoted by T and defined by the user. \(A_{nk}^{(l)}\) is the activation value of the \(\rm{OC}\) in the last iteration.

  • Step 3: Once the reasoning process is complete, assign \(A_{nk}^{(l)}\) to one of the numerical representatives of the class labels. This is accomplished using \(m-1\) defined decision thresholds that divide the activation interval into m partitions. Therefore, depending on the range \(A_{nk}^{(l)}\) falls into, the FCM predicts the corresponding class label. To determine the decision thresholds, a "threshold-moving" approach is employed, which identifies the optimal value based on a predefined evaluation metric. In this paper, we locate the decision threshold by considering the maximum value of the Geometric Mean (16), which describes the balance of classification performance on both majority and minority classes, allowing us to determine the ideal position of the classification hyperplane (Kubat et al. 1997).

    $$\begin{aligned} G-mean = \sqrt{TPR*TNR} \end{aligned}$$
    (16)

In this study, the second architecture was selected because of its lower parameter count and computational requirements. Additionally, a comprehensive analysis of the architectures conducted in prior research, the study by Papakostas et al. (2012), concluded that the SO architecture outperformed the CpO architecture on seven of the eight datasets analyzed.

3.3 IF-FCM learning

After determining the architecture, a learning procedure is performed to adapt the FCM behavior based on the collected data (Fig. 1). The proposed approach is divided into two phases. In the first phase (Training phase 1), Algorithm 1 is executed to determine the causal relationships between the dataset variables. The algorithm is computationally efficient, even when the scales of the original variables differ greatly; therefore, raw encoded data are used.

In the second phase (Training phase 2), the parameters that define the FCM response are tuned, including the weights and parameters associated with the activation function and reasoning rule. Consequently, a challenging question arises regarding the choice of the appropriate reasoning rule and activation function. Previous studies have shown that using (1) in conjunction with the activation functions mentioned in Sect. 2.1 often leads the FCM to converge to the same equilibrium point regardless of the initial state vector (Boutalis et al. 2009). However, this behavior is undesirable in forecasting tasks, such as anomaly detection, as the model predicts only one class label. Furthermore, the use of bounded activation functions can lead to saturation issues, where the activation values of concepts tend to approach the lower or upper boundary of the specified interval when they receive a strong negative or positive influence, respectively (Nápoles et al. 2022a). Finally, the sigmoid function deceives the simulation results by activating unexpected concepts based on their received influence, as it returns 0.5 when its argument is zero (Mpelogianni and Groumpos 2018).

Recently, to solve the issues mentioned above, a new rule called quasi nonlinear reasoning rule was proposed, which involves a re-scaled activation function acting as a normalizer (Nápoles et al. 2022b), and is mathematically expressed as

$$\begin{aligned} A_{i}^{(t+1)}=\underbrace{ \varphi f \left( \displaystyle \sum \limits _{\begin{array}{c} j=1 \\ j\ne i \end{array}}^n A_{j}^{(t)}w_{ji}\right) }_\text {nonlinear component} + \underbrace{(1-\varphi )A_{i}^{(0)}}_\text {linear component}, \end{aligned}$$
(17)

where the parameter \(\varphi \in [0,1]\) controls the nonlinearity of the reasoning rule, and \(f(\cdot ): {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^{n}\) is the activation function defined as

$$\begin{aligned} f({\textbf{X}})= \left\{ \begin{array}{ll} \frac{{\textbf{X}}}{{\Vert {\textbf{X}} \Vert }_{2}}, &{} {\textbf{X}} \ne \overrightarrow{0} \\ 0, &{} otherwise \\ \end{array} \right. \end{aligned}$$
(18)

such that \({\Vert \cdot \Vert }_{2}\) denotes the Euclidean norm. Using a matrix-like notation, (17) is rewritten as

$$\begin{aligned} {\textbf{A}}^{(t)}=\varphi f \left( {\textbf{A}}^{(t-1)}{\textbf{W}}\right) + (1-\varphi ){\textbf{A}}^{(0)}. \end{aligned}$$
(19)

In the study conducted by Nápoles et al. (2022a), the convergence properties of the above reasoning mechanism were thoroughly examined. Through a mathematical proof by contradiction, it was concluded that an FCM employing (18) and (19) does not have a unique equilibrium point for all initial state vectors when \(\varphi \in [0, 1)\). They also explored the case of \(\varphi = 1\) based on the symmetry and diagonalizability of the derived \({\textbf{W}}\). In more detail, using the appropriate matrix properties, the authors equated the reasoning rule with the power iteration method formula and concluded that for a diagonalizable weight matrix \({\textbf{W}}\) with eigenvalues \(|\lambda _{1}|\ge |\lambda _{2}|\ge \cdots \ge |\lambda _{n}|\), if an initial stimulus \(u_{0}\) has a nonzero projection along an eigenvector associated with \(\lambda _{1}\), then \(u_{k}\) converges to such an eigenvector as \(k \rightarrow \infty\) (Mises and Pollaczek-Geiringer 1929). In particular, when \(\lambda _{1}\) is real, the method converges to a unique fixed point. Nevertheless, because asymmetry is a distinguishing characteristic of causation, the convergence of the FCM for \(\varphi = 1\) should be analyzed without relying on the diagonalizability of \({\textbf{W}}\). In the context of the power iteration method, studies have demonstrated that even if \({\textbf{W}}\) is not diagonalizable, the same outcomes are achieved, albeit with slower convergence (Leader 1991). Therefore, the case of \(\varphi = 1\) enables modelling scenarios in which the FCM converges to a unique fixed-point attractor without the need for symmetry in \({\textbf{W}}\).

In this paper, we utilize the reasoning rule presented in (19) and the activation function of (18). The learning algorithm eventually adjusts the weights of the FCM and the controllable parameter \(\varphi\). However, the difference between the proposed method and the existing methods is that the normalized significant IF rates computed in Training phase 1 are imposed as constraints in Training phase 2 to avoid capturing spurious correlations. In detail, only the weights of edges with significant estimated IF rates are tunable parameters, while the remaining weights are set to zero "a priori". This approach improves the generalizability and interpretability of the developed FCM while reducing the training time by reducing the dimensions of the optimization problem. Therefore, a candidate solution is encoded as a (\(\rm{SIFs}+1\))-dimensional vector, where \(\rm{SIFs}\) is the number of significant IF rates and parameter \(\varphi\).

$$\begin{aligned} x = [\varphi , w^{(1)}, w^{(2)}, \dotsc , w^{(\rm{SIFs})}]. \end{aligned}$$
(20)

For FCM learning, which involves determining the optimal weight values and \(\varphi\), we have chosen the Particle Swarm Optimization (PSO) metaheuristic algorithm due to its effectiveness in the literature (Papageorgiou et al. 2005; Bas et al. 2022). PSO starts with a random population of candidate solutions called particles. Through iterations, the particles are evaluated using a defined cost function and updated accordingly. The process continues until a satisfactory solution is found or a stopping criterion is met, such as the maximum number of function evaluations. The cost function used in this study is defined as follows:

$$\begin{aligned} {\mathcal {E}}(x) = \alpha _{1} G(x) + \alpha _{2} H(x), \end{aligned}$$
(21)

where x represents a candidate solution, \(0 \le G(\cdot ) \le 1\) denotes the FCM’s mean absolute prediction error (22), and \(0 \le H(\cdot ) \le 1\) denotes the accumulated dissimilarity between two consecutive FCM state vectors (23). The parameters \(\alpha _{1}, \alpha _{2} \in [0, 1]\) indicate the relevance of the FCM’s prediction accuracy versus stability, for which \(\alpha _{1} + \alpha _{2} = 1\), ensuring that the cost function is always bounded in the interval [0, 1].

$$\begin{aligned}{} & {} G(x) = \frac{1}{K} \displaystyle \sum \limits _{k=1}^K \left| Y_{k}-A_{n,k}^{(l)} \right| \end{aligned}$$
(22)
$$\begin{aligned}{} & {} H(x) = \displaystyle \sum \limits _{k=1}^K \displaystyle \sum \limits _{i=1}^n \displaystyle \sum \limits _{t=1}^l \frac{2\,\omega _{t} (A_{ik}^{(t)}-A_{ik}^{(t-1)})^{2}}{K \, n \, (T-1)} \end{aligned}$$
(23)

In (22) and (23), K represents the number of training observations, n is the number of FCM concepts, \(Y_{k}\) is the expected value of the output concept in k-th data observation, and \(\omega _{t}=\frac{t}{T}\) is the importance of the t-th iteration in the reasoning process, which increases linearly with the number of iterations. The rationale behind \(\omega _{t}\) is that the learning algorithm should focus primarily on stabilizing the last iterations, allowing greater flexibility at the beginning (Nápoles et al. 2016).

3.4 Interpretation of FCM’s predictions

The proposed FCM can explain its predictions, supporting two levels of interpretability: (a) global and (b) local. At the global level, IF-FCM provides a holistic view of the influence of each input variable in the decision-making process, whereas, at the local level, it provides numeric explanations for individual predictions by calculating the importance of each input feature to this particular decision. The ability to find the features that play a critical role in classifying a sample as an anomaly enables root cause analysis (Brito et al. 2022).

3.4.1 Global interpretability

The relevant literature demonstrates several methods for examining the overall contribution of each feature to the decision-making process of an FCM. The most widespread method is based on graph theory and states that the concept’s importance can be measured via its degree of centrality (Kosko 1986):

$$\begin{aligned} CEN(C_{i})=in(C_{i})+out(C_{i}), \end{aligned}$$
(24)

where \(in(C_{i})\) and \(out(C_{i})\) refer to the number of incoming and outcoming edges of each concept \(C_{i}\), respectively. The most significant feature of the FCM is the one whose sum of the concepts acting on it and those affected by it is the largest.

3.4.2 Local interpretability

To explain the decision for a given data instance, FCMs provide a dynamic, semi-quantitative method that analyzes the propagation of effects from one concept to another using a plot of the activation values of all concepts across iterations (Barbrook-Johnson and Penn 2022). The final activation values of the input concepts after FCM stabilization reflect their contribution to the prediction, with concepts with larger absolute values interpreted as more important or influenced/influential (Soler et al. 2012; Liu et al. 2020). Furthermore, this plot enables the investigation of how relative changes in the initial concept values impact the reasoning process, providing insights into whether changes accelerate, stabilize, or diminish over time.

4 Experimental results

In this section, we present the results of numerical simulations designed to assess the efficacy of the proposed methodology. First, we provide a detailed description of the dataset used in the simulation. Next, we outline the application of the proposed methodology to the dataset. Finally, we compared our model with state-of-the-art FCM-based models in terms of their prediction accuracy, interpretability, and aggregate power.

4.1 Dataset description

One of the standard datasets used in industrial process anomaly detection is Matzka’s PMAI4I dataset, which we adopted in our experimental evaluation (Matzka 2020). This synthetic yet realistic dataset represents industrial predictive maintenance data, and has been widely recognized and accepted as a reliable benchmark for evaluating various XAI methods (Ghasemkhani et al. 2023; Mylonas et al. 2023). It contains 10,000 samples covering a diverse range of variables to provide a holistic view of the industrial data. These variables include one categorical variable (product quality \(\in\) {"low", "medium", "high"}), five numerical variables (air temperature, process temperature, rotational speed, torque and tool wear) and a binary target variable indicating the machine failure ("0" = Healthy, "1" =Faulty. For each sample, apart from the fault, its type is known to be one of the following:

  1. 1.

    Tool wear failure (TWF): the tool fails at a random tool wear time between 200 and 240 min.

  2. 2.

    Heat dissipation failure (HDF): if the difference between the air and the process temperature is less than 8.6 K while the tool’s rotational speed is less than 1380 rpm, a failure is caused.

  3. 3.

    Power failure (PWF): if the required power (i.e., the product of torque and rotational speed in rad/s) is less than 3500 W or greater than 9000 W, the system fails.

  4. 4.

    Overstrain failure (OSF): the process fails by overstrain when the product of tool wear and torque exceeds 11.000 minNm for low quality (L) products, 12.000 for medium quality (M), and 13.000 for high quality (H), respectively.

  5. 5.

    Random failures (RNF): regardless of process parameter values, there is a 0.1% probability of failure.

If any of the mentioned failure modes is present, the process fails, and the machine failure value is set to one. However, during training, the FCM receives only the input variable values and system condition information without knowing the root cause of the fault. Hence, the FCM-based classifier aims to achieve two objectives: first, detecting the presence of anomalies in the analyzed manufacturing system, and second, identifying the most significant input variable(s) for each true positive prediction, which are likely to be responsible for the fault. This is accomplished by leveraging the inherent interpretability characteristics of the FCM.

4.2 Simulations execution

Following the methodology described in Sect. 3, the data are first pre-processed. This includes encoding categorical features, such as product quality, where each category value is assigned an integer (e.g., "low" is represented as 0, "medium" as 1, and "high" as 2). In addition, because the dataset is imbalanced, an SMOTE-based algorithm is used to address this issue, generating artificial instances of the minority class "1" (Sridhar and Sanagavarapu 2021). In particular, this study applies the hybrid algorithm SMOTE-ENN, which merges undersampling and oversampling using Edited Nearest Neighbors and SMOTE, respectively. This combination strengthens the bias towards the minority class while weakening it towards the majority class, resulting in improved overall performance compared to using these techniques individually. Finally, data fuzzification is performed to prepare the data for training and decision-making.

4.2.1 Training phase 1

By applying Algorithm 1 to the dataset and subsequently normalizing the significant IF rates, we obtain the results presented in Table 2. These findings align with the dataset description since:

  1. 1.

    The product quality influences the wear time of the tool, leading to the occurrence of TWF and OSF.

  2. 2.

    The process temperature at each time step is derived from the air temperature samples, indicating a causal relationship between these variables.

  3. 3.

    Both air temperature and process temperature contribute to the emergence of HDF in the system, establishing an information flow from these variables to the target variable.

Table 2 Significant IF Rates In The PMAI4I

Nonetheless, beyond the obvious links, the algorithm discovered that:

  1. 1.

    Rotational speed and torque do not directly affect machine failure but indirectly through tool wear.

  2. 2.

    Air temperature has a causal influence on tool wear.

  3. 3.

    There exists a feedback loop from machine failure to tool wear.

4.2.2 Training phase 2

As mentioned previously, during Training phase 2, the weights of all FCM edges with insignificant IF rates were set to zero before starting PSO execution. According to Table 2, the final number of tunable weights is nine along with the reasoning rule parameter \(\varphi\). For the PSO parameter initialization, a population size of 100 was chosen, and the cost function parameters \(\alpha _{1}\) and \(\alpha _{2}\) were set to 0.8 and 0.2, respectively. Additionally, to achieve a more accurate solution, a hybrid function was employed to continue the optimization after the termination of the original solver. The algorithm was implemented using MATLAB global optimization toolbox.

4.2.3 Experimental setup

After training the IF-FCM, we compared its predictive and explanatory power with several state-of-the-art FCM-based models. These include FCM-A (Froelich 2017), FCMBinaryClassifier (FCMB) (Szwed 2021), FCMMulticlassClassifier (FCMMC) (Szwed 2021), Long-Term Cognitive Network (LTCN) (Nápoles et al. 2022b), and a Fuzzy Cognitive Map using the "Stability based on Sigmoid Functions" method (FCM-SSF) (Nápoles et al. 2017). Furthermore, to emphasize the significance of L-K IF analysis in enhancing FCM performance, we developed two additional models that utilize (18) and (19) for decision-making while being trained through PSO. However, their Training phase 1 varies. In the first model, called correlation coefficient-based FCM (CCFCM), the weights correspond to the correlation coefficients between variables with a p value less than 0.05. In the second model (FCM-FC), all weights are included without performing initial data analysis to determine the relationships between concepts.

To avoid possible issues such as overfitting and ensure the generalizability of the models, stratified 10-fold cross-validation was used for the simulations. Simultaneously, hyper-parameter tuning was conducted to optimize the performance of each model, considering the variables displayed in Table  3. As for FCM-SSF, 91 maps were randomly generated, varying in network densities from 10% to 100%, and the model delivering the best performance was selected.

Table 3 List of Hyper-parameters for model tuning

4.2.4 IF-FCM’s predictive power

Table 4 displays the average prediction accuracy, the "Area under the ROC Curve" (AUC) score, and the Cohen’s kappa coefficient for each model across all folds. According to the results, the LTCN, FCMB, and FCMMC exhibit the highest performance among the cognitive networks, followed by FCM-FC and IF-FCM. In contrast, FCM-A demonstrates the lowest performance. The poor performance of FCM-A can be attributed to the algorithm proposed by Froelich (2017), where the loop for computing the classification error of each candidate threshold focuses only on minimizing false negatives, rather than achieving an optimal balance between false negatives and false positives. This issue also explains the discrepancy between its accuracy and AUC score. Since AUC is threshold-invariant and provides an aggregate performance measure across all possible decision thresholds, a combination of low accuracy and high AUC suggests that the selected decision threshold is not optimal.

The FCM-SSF model is based on the CpO architecture. As mentioned in Sect. 3.2, this architecture selects the \(\rm{OC}\) with the highest activation value in the final iteration to make its decision. This mechanism inherently lacks a decision threshold, which is a pivotal component in computing the AUC score. Specifically, the AUC score is a performance metric that evaluates the ability of a model to discriminate between the positive and negative classes. The ROC Curve is constructed by plotting the True Positive Rate (sensitivity) against the False Positive Rate (1-specificity) at various decision threshold levels, typically ranging from 0 to 1. Different points on the ROC Curve are obtained by varying this threshold, and the AUC score is the area under this curve. The essence of the AUC score lies in its ability to assess the model’s performance across all possible thresholds. Therefore, given that the CpO architecture of the FCM-SSF does not utilize a decision threshold, it becomes inherently incompatible with the ROC Curve. Without the ability to vary the decision threshold, generating the ROC Curve and computing the AUC score by extension is impossible. Therefore, in Table 4, we employed the symbol "N/A" (not applicable, not available) to indicate that the AUC score is not applicable or computable for the FCM-SSF model.

Table 4 Average Accuracy, AUC, and Kappa coefficient For each FCM-based model

4.2.5 IF-FCM’s explanatory power

Regarding interpretability, IF-FCM calculates the global feature importance using (24). Based on the causal structure presented in Table 2, tool wear is the most important feature with six incoming and outgoing edges, air and process temperature follow with three and two edges, respectively, while the rest have only one outgoing edge. To validate this finding, the LOFO (Leave-One-Feature-Out) method was employed, which is an XAI technique that iteratively removes each feature, retrains the model, and compares the resulting model error to a baseline model consisting of all features. This analysis assesses the mean feature importance value and standard deviation (Erdem 2023). LOFO was chosen due to its ability to handle correlated features, unlike linear models, and its robust generalization as it calculates feature importance across cross-validation splits. LOFO analysis was conducted for various black box machine learning (ML) models, such as Light Gradient-Boosting Machine (LightGBM), K-Nearest Neighbour (KNN), Decision Tree (DT), Multilayer Perceptron (MLP) classifier, Gaussian Naïve Bayes (NB), Stochastic Gradient Descent (SGD), Support Vector Machine (SVM), and eXtreme Gradient Boosting (XGB). In addition, an intrinsic interpretable Logistic Regression (LR) model was developed, where the feature coefficients provided insights into feature importance. The results presented in Fig. 2 indicate that tool wear was identified as the most impactful feature by seven out of the nine ML models, six models ranked torque as the second most influential feature, and so on. However, variations were observed among the models. For instance, XGB considered process temperature as the most significant feature and tool wear as the second, while LR highlighted torque as the most impactful, followed by air temperature and tool wear. These discrepancies underscore the potential differences in global interpretability across models, aligning with findings from previous studies (Li et al. 2022).

Regarding the examined FCM-based models, the rankings of the input features’ importance varied. In the LTCN model, the importance rankings of the input features were as follows: (1) torque, (2) tool wear, (3) rotational speed, (4) product quality, (5) process temperature, and (6) air temperature. Similarly, in FCM-SSF, product quality and tool wear were considered the most important features, each with three edges. Torque followed with two edges, while all other variables had only one edge. In contrast, the CCFCM assigned the highest importance to air temperature and rotational speed, with six edges each. Torque and process temperature had four edges each, while product quality and tool wear had only two edges. However, the FCM-A model, which is based on SO architecture, does not represent the actual causal relationships and cannot calculate the degree of centrality for each concept. In this model, input concepts are connected only to the \(\rm{OC}\) without feedback (Froelich 2017, Fig. 1). A similar issue arises in FCMB, FCMMC, and FCM-FC models, where the fully connected map structure suggests the presence of spurious correlations and does not allow for the calculation of each concept’s centrality, as all concepts have the same number of edges. Determining the global feature importance of the PMAI4I dataset has also been a concern for other researchers. In particular, in the study conducted by Sridhar and Sanagavarapu (2021), the authors reached the conclusion that tool wear had the greatest effect, followed by torque and rotational speed, thereby providing additional assurance for the accuracy of the results.

Fig. 2
figure 2

The input features ranking based on their global importance for the examined ML models

The assessment of local interpretability was conducted by computing the success rate of the local explanations provided by IF-FCM for each failure mode. This rate indicates the percentage of correctly predicted anomalous data instances specific to each failure mode, in which the model effectively highlights the appropriate input features as the most important. This success rate serves as a measure of the model’s accuracy, consistency, and coherence in attributing the correct input features to the detected fault. A higher success rate suggests an increased number of data observations, where the model precisely identifies the actual causal parameters of the failure, thus providing a measure of the correctness of the generated explanations. For instance, as shown in Fig. 3, the model should identify tool wear as the most significant input variable for a detected TWF. Conversely, in the event of a PWF, the model should underscore either the torque or the rotational speed.

Table 5 presents both the average success rate for each failure mode and the overall average success rate across all four types of faults. As can be observed, the proposed model possesses the highest degree of interpretability with a success rate of 87.49%, outperforming all other FCM-based models. FCM-SSF is the second most interpretable FCM-based model, with an 83.55% success rate, whereas FCM-FC has a success rate of 76.68%. Among the other models, LTCN, FCMMC, CCFCM, and FCMB follow, the results of which (74.27%, 58.12%, 50.49%, and 38.74%, respectively) suggest that they provide confusing explanations for the modelled system. The issues with FCMMC and FCMB are twofold: (a) the class label is extracted after a predetermined number of iterations (i.e., hyper-parameter depth) without the models being stabilized, and (b) the fully connected structure results in the unintentional absorption of spurious correlations. Moreover, the fully connected structure problem plagues FCM-FC, leading to poor interpretability. Regarding the CCFCM, the correlation coefficient is unreliable because its value is significant in the case of spurious correlations, even if the two variables are not causally related. Finally, due to the capture of spurious correlations, 1-step reasoning, the employed reasoning rule, and the sigmoid activation function, FCM-A cannot interpret individual predictions, making it an inappropriate model. Notably, the simulation results revealed that all input concepts had an activation value of 0.5 in the final iteration. This uniformity implies that the influence of individual input concepts on the model’s prediction for a specific data instance remains indeterminate. Consequently, neither the average success rate for each failure mode nor the overall success rate across all four fault types could be calculated. To represent this lack of local interpretability in Table 5, the symbol "N/A" (not applicable, not available) was used for the FCM-A metrics.

Table 5 Success rate of local explanations for each FCM-based model

Figure 4 summarizes the performance analysis results, visually representing the interpretability and accuracy of each model. The primary goal is to evaluate the trade-off between accuracy and interpretability, along with the aggregate predictive and explanatory power of the models. The trade-off is measured as the absolute difference between the average accuracy and the average success rate score, while the aggregate power is quantified as their combined sum. In the scatter plot, the diagonal line \(x=y\) represents the optimal trade-off, and the model positioned closer to the upper-right corner demonstrates the highest aggregate power. According to Fig. 4, IF-FCM has the maximum overall power (1.69723), surpassing all the other FCM-based models, whereas it has the second-best trade-off (0.05253), following FCM-SSF (0.01590). LTCN has the second-best aggregate power (1.69441); however, there is an imbalance between its prediction and interpretation scores (0.20895). Among the considered models, excluding FCM-A because of its lack of interpretability, CCFCM exhibited the poorest overall performance (1.31517).

Based on the conducted experiments and comparisons, it can be concluded that IF-FCM is a reliable predictor of machine failures. The model’s enhanced explanatory power can be attributed to its ability to capture authentic causal relationships among problem variables. The global interpretability results of IF-FCM align with those of the examined models and previous research works. However, it is important to note that different models emphasize different features. In terms of local interpretability, IF-FCM outperformed the other models, providing more coherent explanations. Overall, the method’s capability to rule out spurious correlations enhances the overall power of FCM, establishing it as a robust and interpretable model.

Fig. 3
figure 3

Important input features for each failure mode

Fig. 4
figure 4

Accuracy-interpretability trade-off and aggregate power for each FCM-based model. aProposed Model, bNápoles et al. (2022b), cNápoles et al. (2017), dSzwed (2021), eFroelich (2017)

5 Conclusions

In this paper, a novel approach is presented for constructing FCMs using Liang-Kleeman Information Flow (L-K IF) analysis, an effective tool for causal inference. The motivation for this study stems from the pressing challenge of spurious correlations present in previous expert-based and data-driven FCM construction approaches. Our primary contribution is the formulation of a strategy that effectively mitigates spurious correlations, thereby enhancing the aggregate predictive and explanatory capabilities of FCM. By integrating L-K IF analysis into FCMs, we introduced an automated causal search algorithm that reliably identifies authentic causal relationships from the data. These identified relationships subsequently served as constraints in the FCM learning process. To validate our approach, we applied it to a synthetic dataset tailored for industrial anomaly detection and root cause analysis as a proof-of-concept, resulting in improved performance of the developed FCM compared to other FCM-based models. While we acknowledge the potential value of incorporating additional datasets, our focus on a single dataset effectively highlights the unique contributions, innovations, and advantages of our method within a specific context, thus paving the way for future studies to explore its generalizability across multiple datasets. Moving forward, we plan to extend this study to real-world industrial data experiments, while investigating the challenges associated with metaheuristic learning algorithms.