1 Introduction

Machine learning (ML) models are widely used today due to their ability to handle various real problems by analyzing, discovering, and interpreting patterns and structures in large amounts of data. Machine learning has primarily driven digital transformation in recent years, allowing businesses to reduce costs, mitigate cybersecurity risks by detecting advanced, unknown cyberattacks, or provide personalized recommendations to customers. However, using artificial intelligence (AI) to make decisions that affect people’s lives raises numerous concerns about ML models’ bias, transparency, and explainability. For example, the European Commission (EC) proposed the AI Act regulation in April 2021 to regulate the use of AI and machine learning. The regulation establishes different risk levels based on the risk that AI systems may pose to individuals’ fundamental rights, prohibiting certain practices classified as unacceptable risks. It also creates a list of requirements for high-risk practices before releasing them to the market. One of such requirements for high-risk AI systems is that they must be developed with transparency in mind so that the final user can interpret the system’s output (European Commission 2021).

In recent years, there has been a growing effort to increase trust in AI systems through interpretability and explainability, i.e., the ability to justify and interpret an AI model’s decision. Post-hoc methods use a trained model and try to mimic or explain its behavior by using an external model at testing time (Došilovíc et al. 2018). Methods such as LIME (Ribeiro et al. 2016a), Anchor-LIME (Ribeiro et al. 2016b), or SHAP (Lundberg and Lee 2017) fall into this category, and are focused on developing a separate “white-box” ML model that is trained to mimic the behavior of the “black-box” model based on a Deep Neural Network (DNN). Since the white-box model is explainable, the method aims to identify the rules or input characteristics influencing the DNN (Ras et al. 2022). However, these methods typically cannot provide a global explanation of the model, or their explanations are dependent on the set of hyperparameters. Alvarez-Melis and Jaakkola (2018) also demonstrated that these methods lack robustness: similar inputs should produce similar explanations, a condition that LIME and SHAP do not satisfy.

On the other hand, ante-hoc techniques seek to make a model inherently explainable and interpretable, while still attempting to attain the highest level of accuracy by taking explainability into account from the model’s design and during training (Došilovíc et al. 2018). In line with this concept, this paper introduces a neural network topology with full interpretability and high accuracy. We build an ensemble of neural networks to learn each feature’s contribution to the response variable, resulting in a highly accurate yet fully interpretable neural network-based Generalized Additive Model (GAM) (Hastie and Tibshirani 1990). Backpropagation (Goodfellow et al. 2016) is used to estimate each of the feature networks, and the GAM model is fitted using the local scoring (Hastie and Tibshirani 1990) and backfitting (Breiman and Friedman 1985) algorithms to ensure that it converges and is additive. Because the partial effects of each learned function can be visualized independently, our proposal is a “white-box” deep learning model: the functions learned are an exact description of how a covariate affects the response variable; the partial effects of each function can show if the relationship between the response variable and each covariate is linear, monotonic or a complex function. The partial function describes, for a particular value of a feature, what is the marginal effect on the prediction of the model. This is not a typical feature of neural networks, which are usually “black-box” models that are hard to interpret. These characteristics make it an appropriate algorithm for high-risk AI applications such as medical decision-making. Moreover, by observing the learned partial effects, the model can be used to protect against bias introduced in the training dataset: for example, in binary classification problems, partial functions with a positive slope increase the probability of observing that class.

There are not many methods in the literature that use neural networks to implement a Generalized Additive Model. The topology of neural networks inspired by GAM, named Generalized Additive Neural Networks (GANNs), was introduced by Potts (1999). GANNs use a univariate Multilayer Perceptron, do not use backpropagation for network training, and require human evaluation during training to add or remove hidden units to the neural network architecture based on the study of the plotted partial effects. More recently, Brás-Geraldes et al. (2019, 2020) updated the GANN architecture to enable the use of flexible parametric link functions based on the Aranda-Ordaz transformations family, and use resilient backpropagation (RPROP) (Braun and Riedmiller 1992). Nevertheless, our proposal takes advantage of recent advances in deep learning and does not impose any restrictions on the neural network architecture. The model can learn complex functions with a large number of hidden units and more than one hidden layer, making it suitable for real-world problems.

Chang et al. (2021) introduced NODE-GAM, a GAM based on a deep learning version of tree-based GAMs and \(\text {GA}^2\text {M}\) (Lou et al. 2013). NODE-GAM is based on tree ensembles and is able to outperform the results of \(\text {GA}^2\text {M}\), but at the expense of being computationally expensive since tree ensembles require calculating millions of trees to accurately learn the partial effects of each feature. Since NODE-GAM is based on tree ensembles, it differs conceptually from the proposed algorithm.

At last, Neural Additive Models (NAMs) were recently introduced by Agarwal et al. (2021): NAMs are a single neural network made up of a linear combination of sub-networks, each of which is dedicated to learning the contribution of a single input feature. Backpropagation (Linnainmaa 1970) is used to train all of the sub-networks jointly, and this is one of the differences with our procedure, which trains a different network for each feature, allowing us to train the networks independently (and in parallel if required for high-dimensional datasets). Additionally, this new GANN implementation can have different network architectures for each input feature, allowing for greater flexibility (using shallow networks to model simple or linear features and more complex DNN to fit complex input features).

To the best of our knowledge, we are the first to propose a GAM implementation that uses Deep Neural Networks to estimate the smooth functions independently. Although previous works use backpropagation and neural networks to estimate the partial functions, we propose to estimate each partial function independently leveraging the local scoring and backfitting algorithms to ensure that the GAM model converges and is additive, providing a flexible framework for training GANNs which does not impose any restriction on the neural network architecture.

The paper is organized as follows. Section 2 formally introduces Generalized Additive Models, and how the model is fitted to provide an interpretable Deep Neural Network. Section 3 describes the simulation studies conducted to evaluate the performance of the proposed algorithm, comparing our results with Neural Additive Models. Section 4 showcases its performance on a real dataset of Denial of Service attacks. At last, Sect. 5 summarizes the main research conclusions and outcomes, establishing the prospects for future research.

2 Methodology

Regression models are one of the most powerful tools in statistics and data analysis. Their use allows us to understand the dependence between a response variable Y and a set of covariates \({\textbf {X}} = X_1,\ldots , X_p\). The general multivariate model can be defined as

$$\begin{aligned} Y = m({\textbf {X}}) + \varepsilon , \end{aligned}$$

where \(m(\cdot )\) is the mean function and \(\varepsilon \) is the regression error.

The linear regression model is the simplest form of this model, where the mean regression function can be expressed as \(E[Y\mid {\textbf {X}}] = m({\textbf {X}}) = \alpha _0 + \sum _{j=1}^p \alpha _j X_j\). However, in this type of model, the effect of the \(X_1,\ldots , X_p\) covariates on the response is assumed to be linear, and the response variable is assumed to follow a Normal distribution. Generalized Linear Models (GLM) (Nelder and Wedderburn 1972; McCullagh and Nelder 2019) extend the linear regression model by allowing the use of other distribution families as the response variable. In this case, the relationship between the mean value of the assumed distribution and the covariates is modelled by \(E[Y\mid {\textbf {X}}] = m({\textbf {X}}) = h(\alpha _0 + \sum _{j=1}^p \alpha _j X_j)\), where \(h(\cdot )\) is a monotonic know function (the inverse of the link function).

The recent trend has been to abandon parametric and linear functions in favor of modeling the relationship between the response variable and the covariates in a nonparametric manner. Generalized Additive Models (Hastie and Tibshirani 1990) extend the GLM by replacing the linear predictor \(\eta = \alpha +\sum _{j=1}^p\alpha _jX_j\) with an additive predictor of unknown and smooth functions \(\eta = \alpha +\sum _{j=1}^p f_j(X_j)\), allowing to model the dependence between the response and the covariates without specifying in advance the function that links them, yielding the model defined by:

$$\begin{aligned} E[Y\mid {\textbf {X}}] = m({\textbf {X}}) = h(\alpha + \sum _{j=1}^p f_j(X_j)), \end{aligned}$$
(1)

where \(h(\cdot )\) is a monotonic known function (the inverse of the link function) and \(f_1,\ldots , f_p\) are smooth and unknown functions. To guarantee the identification of the model, a constant \(\alpha \) is introduced, and it is required that the partial functions satisfy \(E[f_j(X_j)] = 0, j = 1, \ldots ,p\) which implies that \(E[Y] = \alpha \) (Hastie and Tibshirani 1990).

Table 1 Adjusted dependent variable and weights at the local scoring algorithm for common models (Hastie and Tibshirani 1990)

GAMs have the advantage of being completely nonparametric, allowing for a smooth fit for all covariates, while also avoiding the curse of dimensionality (Bellman 1961). Additionally, they allow us to force a linear fit for some of them (allowing for the fitting of categorical covariates), leading to a semiparametric model. Furthermore, their ability to model complex nonlinear relationships between covariates and the response variable qualifies them for real-world problems. However, the main disadvantage of classical implementations of Generalized Additive Models is their computational complexity (as with other nonparametric methods) and their proclivity to overfit the training data. Nevertheless, these issues can be overcome in neural network-based GAM implementations by using techniques such as regularization, early stopping, or dropout (Prechelt 1998).

2.1 Fitting GANNs with independent neural network training

Given an independent random sample \(\{{\textbf {X}}_i, Y_i\}_{i=1}^{n}\) with \({\textbf {X}}= X_1, \ldots , X_p\), the model in (1) is fitted using the local scoring algorithm from Hastie and Tibshirani (1990), which we replicate in Algorithm 1. Backfitting is used to fit a weighted additive model, where each \(f_{j}\) \((j = 1, \ldots , p)\) is estimated by regressing the adjusted dependent variable \(Z_i\) on each \(X_j\) with weights \(W_i\). The adjustment of the dependent variable and the weights is determined by the distribution of \(Y_i\) (see Table 1). The backfitting algorithm (described in Algorithm 2) iteratively estimates the contribution of each covariate \(X_1,\ldots , X_p\) to the adjusted dependent variable \(Z_i\) updated at each step of the local scoring algorithm. At each iteration of the backfitting algorithm, we obtain a set of estimated smooth functions \(\hat{f}_j(\cdot )\) that explain the dependent variable \(Z_i\).

There are several methods for estimating \(f_j\), including methods based on regression splines (De Boor and De Boor 1978), Bayesian approaches (Lang and Brezger 2004) or local polynomial kernel smoothers (Wand and Jones 1994; Fan and Gijbels 2018). In this work, we propose to use independent neural networks (which are universal function estimators (Hornik et al. 1989)) to learn the contribution of each covariate to the dependent variable: at each iteration of the backfitting algorithm, we train a neural network for each covariate \(X_j\) with the entire set of training data for one epoch. The dependent variable \(Z_i\) and the weights \(W_i\) are updated at each iteration of the local scoring algorithm, and the networks are trained for one more epoch until the backfitting algorithm converges. The networks are improved at each epoch as the learned adjusted dependent variable \(Z_i\) approaches \(Y_i\) at each step of the local scoring algorithm.

Table 2 Deviance for common response variables
Algorithm 1
figure a

Local scoring algorithm

Algorithm 2
figure b

Backfitting algorithm with neural networks

This process is repeated until the model converges. If the link function is the identity, there is only a loop over the additive components. This loop stops when, for iteration m, the updated functions \(\hat{f}_j^m (\cdot )\) comply with the criterion in step 7 of Algorithm 2. At this point, the sum of the learned functions by each neural network is approximately the original response variable \(Y_i\). If the link is not the identity, then there is also an outer loop carrying out the local scoring iteration. This outer loop stops when, for iteration l, the convergence criterion shown in step 6 of Algorithm 1 is met. Note that we use deviance because it is an appropriate measure of discrepancy between observed and fitted values. Particularly, given the fitted mean response \(\hat{\mu }_i = \hat{E}[Y_i \mid {\textbf {X}}_i]\), the deviance is defined as \(DEV = \frac{\sum _{i=1}^n DEV_i(Y_i, \hat{\mu }_i^{l-1}) - DEV_i(Y_i, \hat{\mu }_i^{l})}{\sum _{j=1}^n DEV_i(Y_i, \hat{\mu }_i^{l-1})}\), with \(\text {DEV}_i\) depending on the link (see Table 2).

3 Simulation studies

This section reports the results of four simulation studies conducted to evaluate the practical performance of the proposed method. Firstly, we show the results of our own scenario, which considers two different response distributions and two different variance functions, and secondly, we replicate the scenario described in section A.8.1 of Agarwal et al. (2021) where Neural Additive Models (NAMs) are proposed, comparing both methods.

The hyperparameters for NAMs are the recommended values in section A.6 of Agarwal et al. (2021), while our GANN uses a basic hyperparameter setting, with no regularization. Our method is implemented in Python using Keras (Chollet et al. 2015), while NAM is implemented using Tensorflow (Abadi et al. 2015). All the experiments were executed on a device with 16 cores and 16GB of RAM.

We used a sample size of \(n = 30625\) for both studies, which was split into 80% for training the model and 20% for testing. Mean Squared Error (MSE), bias and variance from the test set were calculated based on 1000 trials.

3.1 Scenario I

For the first scenario, we considered the following predictor

$$\begin{aligned} \eta = \alpha + \sum _{j=1}^3 f_j(X_j), \end{aligned}$$

with

$$\begin{aligned} f_j(X_j) = {\left\{ \begin{array}{ll} X_j^2 \hspace{1.4cm} \text { if } j=1 \\ 2X_j \hspace{1.3cm} \text { if } j=2 \\ \sin {X_j} \hspace{1cm} \text { if } j=3, \end{array}\right. } \end{aligned}$$

\(\alpha = 2\), and covariates \(X_j\) drawn from an uniform distribution \(U\left[ -2.5, 2.5\right] \).

Table 3 Hyperparameter setting on Scenario I. Hidden units shows the number of neurons on the hidden layer

Based on this scenario, three different simulations were carried out considering two different response distributions

  • \(Y = \eta + \varepsilon \), where \(\varepsilon \) is the error distributed in accordance to a \(N(0,\sigma (x)\)). In this case, we consider an homoscedastic situation with \(\sigma (x) = 0.5\) (R1) and a heteroscedastic one with \(\sigma (x) = 0.5 + \mid 0.25 \times \eta \mid \) (R2).

  • \(Y \sim \text {Bernoulli}(p)\), with \(p = \exp (\eta ) / \exp (1 + \eta )\) (R3).

We used shallow neural networks (with a single hidden layer) with 1024 hidden units. The complete set of hyperparameters for this scenario (I) can be found in Table 3. The achieved MSE, bias and variance on the test set are presented in Table 4, while Fig. 1 shows the learned partial effect plots from the training set.

Table 4 Simulation results achieved on the test set averaged across 1000 replications of Scenario I. Results for R3 are based on differences in the predictor \(\eta \)

We can observe how our model is able to obtain low MSE and bias values. As expected, the MSE increases in the heteroscedastic situation (R2). The binomial scenario has a higher variance, but the achieved MSE is similar to the homoscedastic scenario (R1), demonstrating that our proposal has a good performance, and it is able to understand the training data and generalize in the test set.

Figure 1 shows the obtained partial effects from the training set. We can observe how it is able to estimate the partial effect of each function accurately in all the studied scenarios. The estimation of the \(f_j\) functions in the binomial simulation (R3) seems to be slightly different from the true function. Nevertheless, for all j, the interval obtained using the simulation quantiles recovers the true function along the whole support.

Fig. 1
figure 1

Theoretical functions, mean estimation on the training set across 1000 iterations, and simulation quantiles achieved on Scenario I. Each row corresponds to a simulation scenario (R1, R2, R3)

3.2 Scenario II

Agarwal et al. (2021) proposed the following simulation study to validate their algorithm. We focused on \({Task}_0\), where the response Y is given by

$$\begin{aligned} Y = \sum _{j=1}^3 f_j(X_j) + \varepsilon , \end{aligned}$$

with

$$\begin{aligned} f_j(X_j) = {\left\{ \begin{array}{ll} \frac{1}{3}\log 100X_j+ 101 \hspace{0.5cm}\text { if } j=1 \\ \frac{-4}{3}e^{-4\mid X_j \mid } \hspace{1.7cm} \text { if } j=2 \\ \sin {10X_j} \hspace{1.9cm} \text { if } j=3, \end{array}\right. } \end{aligned}$$

covariates \(X_j\) drawn from an uniform distribution \(U\left[ -1, 1\right] \) and \(\varepsilon \) sampled from a Normal distribution \(N(0,\frac{5}{6})\). We will use this scenario to compare our proposal to NAM. The hyperparameter setting for both algorithms are summarized in Table 5. For this scenario, both algorithms were configured with a DNN with three hidden layers with 1024, 512 and 256 hidden units, respectively, and relu activation function. Note that NAM was configured to train for 10 epochs with Early Stopping, while the proposed model trains for a maximum of 10 iterations until the backfitting algorithm converges.

Table 6 shows how both algorithms achieve low MSE, bias and variance values, and demonstrate a good capacity for understanding the training data and generalizing to unseen values from the test set. We can observe that NAM achieves slightly lower MSE values but has a higher bias than our proposal. Moreover, focusing on Fig. 2, we can observe how NAM is not as accurate in recovering the partial effects of each covariate. The partial effects learned by NAM are less accurate than those obtained by the proposed GANN, showing a jagged shape in \(\hat{f}_3\) which does not correspond to the true function shape. Moreover, the simulation quantiles for \(\hat{f}_1\), \(\hat{f}_2\) and \(\hat{f}_3\) do not recover the true function shape along the whole support.

This is not the case with our GANN implementation: it obtains a higher variance, but a similar MSE and lower bias while preserving the true function’s shape, providing greater interpretability.

This different behavior could be based on NAM being composed of a single neural network that learns the contribution of all the covariates to the response variable jointly. The neural network objective is focused on recovering accurately the predicted response \(\hat{Y}\) but at the expense of estimating the partial effect of each individual covariate less accurately. This is not the case with the proposed model, which still obtains low MSE while estimating the partial effects accurately because it is composed of independent neural networks, each devoted to learning the contribution of a single feature to the response variable.

Furthermore, in order to be plotted, the feature function vectors generated by the NAM algorithm must be post-processed (such as reverting the applied feature scaling and mean-center of the estimated functions). By contrast, our model returns the learned functions without any additional post-processing. In addition, there is no need to standardize or normalize the features of the dataset before using them in this GANN: since each network is committed to learning only one feature, feature scaling is not required, making it a simpler approach, which does not require feature engineering.

Table 5 Hyperparameter setting for the proposed method and NAM on Scenario II. Hidden units shows the number of neurons on each hidden layer
Table 6 Simulation results achieved on the test set by our method and NAM across 1000 replications of Scenario II

4 Application to real data

This section presents the results obtained when applying our algorithm to real data, where we aim to detect cyberattacks in an Industrial Control System (ICS). To do so, we used the Cybersecurity dataset from the University of Coimbra (Frazão et al. 2019). This dataset was created on a small-scale process automation scenario that uses Modbus/TCP to simulate a real Industrial Control System. It includes normal operation data as well as a collection of Distributed Denial of Service (DDoS) attacks. In a DDoS attack, multiple distributed nodes attack a single victim concurrently, trying to overwhelm its resources.

We analyzed the model’s ability to detect and explain three different DDoS attacks of 1 min of duration in traffic captures of 30 min: Modbus Query Flooding, TCP SYN Flooding and Ping Flooding. The characteristics of the cited attacks are the following:

  • Modbus Query Flooding: the attacker tries to overwhelm the network by sending fake Modbus packets to the server which controls the ICS. The attack is easy to implement since Modbus protocol does not implement any authentication mechanism (Bhatia et al. 2014).

  • TCP SYN Flooding: exploits the TCP three-way handshake; the victim receives a TCP SYN message to start a new TCP connection, and replies with an SYN-ACK packet, keeping an open port to finalize the connection. The attacker never replies with the ACK packet to confirm the connection, forcing the victim to keep the port open until the connection expires. If the attacker sends millions of SYN requests, it will exhaust all available ports, preventing legitimate devices from connecting to the victim (Horak et al. 2021).

  • Ping Flooding: the attacker tries to interrupt legitimate device communication by sending bogus Internet Control Message Protocol (ICMP) messages to the victim, usually using spoofed IP addresses. If the amount of sent ICMP communications is sufficiently high, the victim will not be able to handle and reply to the number of received requests, causing traffic overflow and unavailability (Horak et al. 2021).

Fig. 2
figure 2

Theoretical functions, mean estimation on the train set across 1000 iterations, and simulation quantiles achieved by NAM (first row) and the proposed method (second row) on Scenario II

In order to model the probability of a cyberattack, we used the proposed GANN with a binomial response and five covariates. The covariates used were the duration of the communication between two devices (dur), the number of packets (pkts) and bytes (bytes) exchanged, and the source and destination inter-packet arrival times (sintpkt and dintpkt, respectively), that measure the mean arrival times between consecutive packets in a communication. We used a GANN model with a single hidden layer and 1024 hidden units per feature. Table 7 shows the model hyperparameters and sample size for each DDoS attack. The dataset was split into 80% for training the model and 20% for testing.

Table 8 summarizes the numerical results obtained on the detection of three types of cyberattacks. The decision threshold was calculated using the Youden Index (Youden 1950). In all cyberattack situations, we can see that the model has a high performance, with AUC-ROC values above 0.89. Precision and recall are also considerably high, showing how it can have a very good performance with real data.

Figure 3 shows the partial effect plots of each feature obtained in each cyberattack situation. We will use the plots to interpret the results of the model and demonstrate its capability to provide an explanation of how each feature influences the probability of cyberattack.

Firstly, we can observe how the proposed DL model is able to reveal both linear and non-linear relationships between the covariates and the response variable, providing high flexibility.

Secondly, we can focus on obtaining an explanation of the attack type by analyzing the effects of each covariate on the response. When the duration of the communications (dur) and the number of exchanged packets (pkts) and bytes (bytes) increase, the probability of all three DDoS attacks decreases. This is a typical characteristic of flooding attacks, where an attacker sends a limited number of packets (and bytes) in multiple short communications. Therefore, having long communications with a high number of packets and data increases the probability of normal communication, while short conversations with few packets reveal the presence of all DDoS attacks. In the case of the number of bytes, we can observe how first the probability of a TCP SYN Flood attack increases with the number of bytes exchanged, with the maximum probability with \( bytes = 200\), which indicates the number of bytes used in the attack. Then, the probability of cyberattack decreases, showing how packets with \(bytes > 200\) have a lower probability of TCP SYN Flood attack and indicating a normal communication.

Focusing on the inter-packet arrival times, we can see how an increase in the mean arrival times in packets coming from the victim of the attack (dintpkt) increases the probability of all DDoS attacks. For Ping Flooding and Modbus Query Flooding attacks, since the network is flooded with fake requests, the communication between legitimate devices is affected by the network congestion caused by the attack, increasing the packet arrival times between consecutive packets. For TCP SYN Flood attacks this behavior can be explained by the nature of the TCP Handshake: since the victim is forced to keep an open port waiting for an ACK packet which never arrives (until the connection expires and the victim sends a FIN packet to end the handshake), the mean arrival times between packets sent from the victim increase.

In a similar way, an increase in the mean arrival times in packets coming from the attacker (sintpkt) decreases the probability of cyberattacks: DDoS attacks send multiple packets in short intervals, and therefore smaller mean packet arrival times in packets coming from the attacker is an indicator of DDoS attacks, while higher values in this metric increase the probability of legitimate communication. We can see on the partial effect plot of sintpkt for TCP SYN is monotonically decreasing, revealing a short packet delivery rate used by the attacker. In the case of Modbus Query Flooding and Ping Flooding, we can see how first the probability of cyberattack increases (since in this case, the attacker sent packets with variable delivery rates), and then decreases in the case of normal communications.

Table 7 Hyperparameter setting and sample size for each attack scenario
Table 8 Experimental results achieved in the test set on Cyber-security ICS dataset

5 Conclusions and future work

This study presents a new method for fitting Generalized Additive Models using neural networks, which provides an interpretable yet highly accurate deep learning method, by training a linear combination of neural networks to learn the contribution of each covariate to the response variable. Unlike other GAMs based on neural networks, we train each network independently while leveraging the local scoring and backfitting algorithms to ensure that the resultant GAM converges and is additive. Our proposal provides high flexibility, being able to estimate both linear and non-linear functions. Unlike methods based on tree-ensembles, which usually require millions of trees to accurately learn the partial effects of each feature function, this new GANN implementation only requires one neural network per feature with a limited number of hidden units and layers.

Our experiments on both synthetic and real datasets demonstrate that our proposal can achieve high accuracy and interpretability using both deep and shallow neural networks, allowing for flexible settings. It performs similarly in terms of accuracy to other GAM implementations using neural networks, but it provides greater interpretability. Moreover, since it is composed of independent neural networks, the training can be performed in parallel and executed on specialized hardware, allowing greater scalability to larger datasets. Moreover, its implementation is not as complex as other neural-network-based GAMs like NAM, since it does not require feature scaling, data normalization or data post-processing to obtain the learned partial effects or fit the GAM model.

Fig. 3
figure 3

Learned feature functions on Cyber-security ICS dataset. Each column represents an attack: from left to right, Modbus Query Flooding, TCP SYN Flooding and Ping Flooding

The results achieved on a real dataset of DDoS cyberattacks in an Industrial Control System demonstrate that the proposed method can be used not only to detect cyberattacks with high accuracy but also to provide a robust interpretation of the reasons that lead to cyberattack detection. The explainable DL model was able to reveal both linear and non-linear relationships between the covariates and the response. The partial effect plots of each covariate were analyzed to see how each feature influences the response variable. The functions learned by the DL model are consistent with expert knowledge from the cybersecurity field and can be used to explain the reasoning behind the model prediction.

Revealing non-linear patterns in the partial effect plots is one of the advantages of GAM-based models. Our method meets this property and is able to learn both monotonic and non-monotonic functions, demonstrating that the algorithm is not biased to learn monotonic functions only, and can learn arbitrarily shaped functions. Therefore, it is an “white-box” deep learning method which can be used to provide an exact interpretation of the decision of the model.

Future work includes the exploration of the use of the proposed algorithm for other tasks, such as multinomial logistic regression or feature interactions. We also plan to implement this methodology both in an R package and a Python library.