Adversarial attacks on graph-level embedding methods: a case study

As the number of graph-level embedding techniques increases at an unprecedented speed, questions arise about their behavior and performance when training data undergo perturbations. This is the case when an external entity maliciously alters training data to invalidate the embedding. This paper explores the effects of such attacks on some graph datasets by applying different graph-level embedding techniques. The main attack strategy involves manipulating training data to produce an altered model. In this context, our goal is to go in-depth about methods, resources, experimental settings, and performance results to observe and study all the aspects that derive from the attack stage.


Introduction
Graphs are playing an important role in many real-world applications, and related data analysis techniques are showing their feasibility also on large-scale problems, including drug screening [1], cancer metabolism [2], protein analysis [3], resource deployment [4], cooperative control strategies [5], and knowledge graph completion [6].
Graph embedding is an emergent research area in machine learning (ML) that includes techniques and methods to find "latent vector representations" of graphs to capture their topology and preserve relevant network properties [7][8][9]. The resulting graph representations can be made rich by considering several information sources, such as vertex-vertex relationships and vertex/edge attributes. Graph embedding techniques attract significant interest in the ML community since vector spaces are more amenable to data science than graphs. Indeed, while network relationships can only be processed by specific techniques from mathematics, statistics, and ML, vector spaces have a richer toolset from those disciplines. In addition, vector operations are often simpler and faster than the equivalent graph operations.
In everyday real-life applications, due to the increasing pervasiveness of ML, deep learning, and AI algorithms, the robustness and vulnerability of these algorithms have now become crucial and very important aspects of research. In the specific context of AI applied to networked data processing, application domains include cybersecurity, online financial trading, social media, big-data analytics, and bioinformatics.
Here, the question arises on how to handle situations in which input data is altered, due to acquisition noise or intentional modifications (the so-called adversarial attacks), leading to misleading conclusions or reduced algorithms performance. In these contexts, the real goal of an attack is to cause (intentionally or not) the malfunctioning and/or fraudulent behavior of algorithms operating on data structured as graphs. This occurs as a result of perturbations of the graphs performed by the attacker, where the changes can be more or less significant and targeted based on knowledge of the intrinsic functioning of these algorithms (parameters, implementation logic, etc.). This is the assumption of Adversarial Machine Learning (AML) [10]. This new research area studies and proposes solutions to tackle the vulnerability of ML models when their performance in different tasks is compromised through adversarial perturbations on their input. Indeed, neural networks and many other ML techniques suffer this problem when input modifications occur during training, testing, or deploying phases. Notable application domains of AML are Computer Vision [11,12], Natural Language Processing (NLP) [13], and Cybersecurity [14].
Among the reviews on graph adversarial attacks, [15,16] mainly focus on GNN-based methods, while [17] also covers attack and defense models for non-GNN methods. The Graph Adversarial Learning Literature repository, produced by Sun et al. [17], has a curated selection of more than 110 adversarial attack and defense studies on graph-structured data, as well as links to downloadable programs. The Awesome Graph Adversarial Learning repository, built and maintained by Chen et al. [18], includes links to 271 relevant publications published in the last five years. Literature on adversarial attacks is mainly focused on poisoning strategies, such as backdoor attacks [19], where the training stage of the target model is perturbed and its behavior is normal unless a specific trigger is present in the test samples, training to mislead graph prediction [20]. The general aim is to affect the performance of graph-level tasks, and none of these works shares our goal, which is to compare the robustness to adversarial attacks of different graph-level embedding methods.
In this work, we address a specific application domain for AML: the study of the vulnerability of ML models applied to the classification/prediction of biological networks. In our assumption, an "adversarial attack" on a biological network concerns any type of perturbation to the structure of the graph due both to the noise introduced by the experimental environment from which the biological data are extracted and to the lack of information due to corrupted sources, or incomplete pre-processing of raw data. Therefore, in our mind, we consider less likely, but still possible, a scenario in which a "real" attacker can intentionally and fraudulently modify the biological networks processed by the ML models.
In this context, we aim to study and measure the robustness of some graph embedding methods, two based on neural networks and another based on statistics, when we alter the training stage of these models through adversarial perturbations to the input data. The altered models of graph-to-vector transformers (embedders) are evaluated with respect to their robustness by measuring classification performance on unperturbed test graphs. In this work, we keep on developing the research activities carried out in [21,22], which, as far as we know, are the only literature contributions to the robustness analysis of graph embedding techniques.
The paper is structured as follows. Section 2 introduces definitions, types, and properties of graph embedding and adversarial attack taxonomies. Section 3 discusses our approach to adversarial attacks on graph-embedding methods and the related experimental study by delving into methods, resources, experimental settings, and performance results. Section 4 reports concluding considerations and future directions for the work.

Graph embedding definitions
The general term of graph embedding methods denotes a plethora of techniques and methodologies to translate large and complex graphs into a reduced vector space, which is often called latent space. In other words, any procedure that constructs a vector representation of a graph in order to simplify and/or make a certain machine learning task more efficient is called graph embedding.

Definition 1 (Graph Embedding)
A graph embedding is a mapping from a collection of graph substructures (most commonly either all nodes, or all edges, or certain subgraphs, or even the whole graph) to ℝ d .
Graph embedding techniques differ in which aspects of the graph we try to represent: • Node-level embeddings describe the connectivity of the graph. Each node in the graph is associated with a vector representation. Node-level embeddings target node prediction, reconstruction, and graph clustering. • Edge-level embeddings describe traversals across the graph. Each edge in the graph is associated with a vector representation. Edge-level embeddings target edge prediction, reconstruction, and graph clustering. • Graph-level embeddings encode the entire graph into a single vector. Each element in a set of graphs is associated with a vector representation. Graph-level embeddings target graph classification and graph matching.
In this work, we focus on graph-level embedding (see Fig. 1), more precisely in the realm of graph classification tasks in the biological networks domain. Graph-level embedding, also known as whole-graph embedding, can be formally defined as: Definition 2 (Graph-level Embedding) Given a set of graphs G = {G 1 , … , G m }, a graphlevel embedding is a mapping function ∶ G → ℝ d where d ∈ ℕ, such that preserves some proximity measure defined on G.
Choosing an appropriate embedding dimension d is challenging but necessary, and above all crucial, to generate embeddings applicable to a multitude of tasks. A general rule of thumb is "small enough to be efficient and large enough to be effective". The criticality concerning the final latent space dimension is that it should express all valuable information needed to accomplish the machine learning task on graphs.
When talking about graph embedding techniques, it is important to be aware of another distinction: • In transductive embedding, the vector representation for a new graph is produced by the embedding function (or model), requiring to process the new graph jointly with previous graphs. At each new graph embedding, the vectors for previous graphs change. • In inductive embedding, the vector representation (embedding) for a new graph is produced by the embedding function (or model), requiring only the processing of the new graph. The embedding for older graphs does not change when we perform the embedding of a new graph.
The embedding process can be unsupervised or supervised. Thus, transductive supervised embedding methods cannot be used as models for prediction and classification tasks on unknown graphs.

Adversarial attack taxonomies
Several surveys in the literature [15][16][17] propose different taxonomies for graph/network attacks based on the goals, knowledge, and resources of the attackers. Adversarial samples of graph data can be produced either through node-level perturbations or by edge-level perturbations. Node-level attacks may consist in adding/removing nodes and/or modifying target node features. Edge-level attacks may consist in adding/removing edges between nodes and/or modifying target edge features. In both cases, the number of modified nodes/edges is often referred to as the perturbation budget, which is used to evaluate the magnitude of the perturbation.
Evasion attacks refer to modifications to only the testing data on which a model is applied to accomplish the requested task (e.g., classification, regression, clustering, matching, etc.). In evasion attacks, there is no need to know the model insights (architecture, parameters, and so on). Poisoning attacks aim to affect the model's performance by adding adversarial samples into the training dataset. The majority of adversarial attacks in graphbased machine learning is of this type. In addition, in the case the task of the trained model is performed in the transductive learning setting, any evasion attack results in a poisoned attack since the model is re-trained after testing.
Concerning the amount of knowledge the attacker has about the target models, the types of attacks are classified as: 1) white-box attack, in which the attacker can retrieve all the useful information about the target system to successfully complete the attack, such as the underlying model, its architecture and parameters, etc.; 2) gray-box attack, in which the attacker can only obtain limited information about the target system to perform the attack. This type of attack is more dangerous to the system than white-box attacks, as it only needs partial information to work; 3) black-box attack, in which the attacker has no information about the system. Generally, it is only allowed to do black-box queries on limited samples at most, and thus it cannot make poisoning attacks on the trained model. However, if a black-box attack works, it is more dangerous than the other two since the attacker succeeds with no information at all.
Regarding the goal, the attack can be targeted if the attacker pursues a specific goal, such as the prediction of wrong labels in a graph classification task accomplished by a trained model. On the other hand, the attack is untargeted when the attacker aims at a general malfunctioning or degradation in performance of the model under attack.

Graph embedding adversarial attack
The present experimental analysis compares the behavior of three graph-level embedding methods under attack conditions for the graph classification task. This section is organized as follows. In Section 3.1, we describe the datasets considered in the experiments for the robustness evaluation of embedding models. In Section 3.2, we introduce the adopted attack strategies and their rationale. Then, in Section 3.3, we briefly describe the embedding methods considered in the experiments. In Section 3.4, we present the proposed experimental pipeline that trains the embedding models on attacked graphs and evaluates their performance on test graphs. In the last Section 3.5, we discuss the experimental results.

Datasets
We consider three graph datasets having varying properties, as detailed in Table 1.
MUTAG is a popular benchmark dataset composed of networks of 188 mutagenic aromatic and heteroaromatic nitro compounds [23]. The two classes indicate whether or not the compound has mutagenic effects on a bacterium. The nodes represent the atoms of the compound, while the edges represent the chemical bonds between them. The graphs contain both vertex and edge labels.
The PROTEINS dataset consists of 1113 graphs corresponding to protein molecules, subdivided into two classes according to whether or not they are enzymes [24]. The nodes represent Secondary Structure Elements of three different types (helix, sheet, or turn). Edges connect two nodes if they are neighbors in the amino-acid sequence or in the 3D space.
The Kidney dataset includes tissue-specific metabolic networks created for validating related research [25][26][27]. It contains networks representing 299 patients divided into three disease classes: 159 clear cell Renal Cell Carcinoma (KIRC), 90 Papillary Renal Cell Carcinoma (KIRP), and 50 Solid Tissue Normal samples.We obtained the networks by mapping gene expression data coming from the Genomic Data Commons (GDC, https:// portal. gdc. cancer. gov) portal (Projects TCGA-KIRC and TCGA-KIRP) on the biochemical reactions extracted from the kidney tissue metabolic model [28] (https:// metab olica tlas. org). Specifically, given the stoichiometric matrix of the metabolic model, the graph nodes represent the metabolites, and the edges connect reagent and product metabolites in the same reaction, weighted by the average of the expression values of the genes/enzymes catalyzing that reaction [25]. This results in graphs having the same topology (dictated by the metabolic model) but different edge weights (dictated by the gene expression values for each patient). The simplification procedure described in [26] is applied to reduce the complexity of the network, leading to reduce the number of nodes from 4022 to 1034. For our experimental study, we augmented the Kidney graphs with additional numerical labels storing the weighted degree of each node. Indeed, Kidney graphs have all the same structure (same set of vertices and edges), while they differ in the weights associated with edges among different graph samples. Since two of the embedding methods under study cannot process edge weights, we decided to provide this information as node labels. With this choice, we could apply these embedding methods efficiently in this particular graph domain.
While MUTAG is a small dataset of small graphs (few nodes and edges each), PRO-TEINS is a much larger set of small graphs, and Kidney is a medium-sized dataset of large graphs. The first two datasets can be considered historical benchmarks since they have been widely adopted as benchmarks in several works on graph classification in the last decade. In contrast, the larger-scale Kidney dataset is here adopted as a more challenging benchmark.

Attack strategies
The attack strategies we consider are edge-level perturbations of the input graphs consisting in the removal of a chosen budget of their edges. Which edges are to be removed is decided according to four different criteria: Random -It is the baseline removal strategy, where edges to be removed are randomly chosen.
Betweenness Centrality -It measures the centrality of an edge e defined [29] as the sum of the fraction of all-pairs shortest paths that pass through e where V is the set of nodes, (i, j) is the number of shortest-paths connecting vertices i and j, and (i, j | e) is the number of those paths passing through the edge e.
Eigenvector Centrality -Known also as eigen-centrality [30], it describes the importance of a node in a graph based on that of its adjacent nodes. Let A = (a i,j ) be the adjacency matrix of a graph G. We can compute a weight x i for node i in terms of the weights x j for the other nodes j as where is a constant. Equation 2 can be rewritten as Ax = x so that x is an eigenvector of the adjacency matrix A. The components of the eigenvector associated with the maximum eigenvalue measure the relative importance among the nodes, providing their ranking. To obtain the centrality of edges, rather than nodes, we use the eigenvector centrality computed on the line graph of G (i.e., the graph where nodes represent the edges of G and two vertices are adjacent if their corresponding edges in G are incident). PageRank -The PageRank algorithm is a variant of the Eigenvector Centrality originally designed for ranking web content, using hyperlinks between pages as a measure of importance [31]. Let A = (a i,j ) be the adjacency matrix of a directed graph. The PageRank centrality x i of node i is given by: where and i are user-defined constants and d k = max(o k , 1), with o k denoting the outdegree of node k. Thus, PageRank is determined by an endogenous component, namely the so-called damping factor , that considers the network topology, and an exogenous component = ( i ) , the so-called personalization vector, that is independent of the network structure. As for the eigenvector centrality, we compute the PageRank centrality of edges using the line graphs of the original graphs. In the experiments, we used = 0.85 and a null vector.
The motivation for removing edges based on their centrality measure is based on the assumption that the most significant information in biological networks relies on some relevant links, where the relevance is measured in terms of the link centrality measure. For example, in the Kidney dataset (see Section 3.1), metabolites are linked if they appear in the same metabolic reaction. Therefore, high centrality links identify metabolites that are involved in many metabolic reactions and thus represent significant information for the networks. Similarly, in the MUTAG and PROTEINS datasets, edges with high centrality identify significant backbone structures of chemical compounds and proteins, respectively (Fig. 2). Our focus is to study the effects of targeted attacks on the graph embedding process. Indeed, the choice of an edge removal criterion based on edges relevance shows that we envision a data degradation or an attack causing a significant information loss in biological networks. Figures 3 and 4 show pictures of a Kidney graph before and after edge removal, in particular, with the removal of edges with highest betweenness centrality score (see Fig. 3), or randomly chosen (see Fig. 4). It can be noted how the former edge selection strategy, when compared to the baseline edge removal, focuses more on the deletion of entire clusters of nodes characterized by a great number of high centrality connections (see the rightmost node cluster of Fig. 3a, which is disrupted in isolated nodes after edge removal in Fig. 3b).
The described adversarial attacks to the embedding models under study are poisoning attacks. Indeed, we limited graph perturbations only to the sets of graphs adopted to build the embedding models, which are then used to infer embeddings of testing graphs in an inductive learning environment, as detailed in Section 3.4.

Embedding methods
Two embedding methods considered in the experiments rely on neural networks models: Netpro2vec [32] and Graph2Vec [33]. These models learn a function that maps graphs into a numerical lower-dimensional space. This mapping is optimized in a learning process that uses one by one a set of training graph samples. The third method, FEATHER [34], is a probabilistic embedding model. Probabilistic models exploit the extraction of random walks in the graph to learn its global structure together with the local neighborhood connectivity. FEATHER behaves as an embedding function performing graph-level embedding on each graph separately. Details are described in the following subsections.

Inductive Netpro2vec
Netpro2vec [32] is an unsupervised graph-level embedding method that exploits node proximity information (under different metrics) to transform graphs into textual documents while preserving their significant structural properties. Netpro2vec relies on an NLP learning model, called SkipGram [35], to extract, from each document-based graph, Fig. 3 A graph from the Kidney dataset before (a) and after (b) a 30% budget of edge removal based on betweenness centrality (red edges in the left picture indicate those removed in the right picture) Fig. 4 A graph from the Kidney dataset (same as Fig. 3) before (a) and after (b) a 30% budget of random edge removal (red edges in the left picture indicate those removed in the right picture) the meaningful features in terms of vectors, i.e., the embeddings. Such a new graph representation can be used for several machine learning tasks, such as unsupervised clustering and supervised classification of graphs. The main advantage of Netpro2vec is that it provides efficient embeddings completely independent of the task and nature of the data.
The current Netpro2vec implementation cannot be directly used in our experimental study since it provides a programming interface with the only support of transductive embedding. Nonetheless, by exploiting the Doc2Vec [36] facility to infer vector representation of new documents (in our case, graphs) based on a pre-trained embedding model, we developed a new API for the method, that we call iNetpro2vec, to support also inductive embedding.

Graph2Vec
Graph2Vec [33] is a neural method for learning graph-level embeddings in an unsupervised manner. First, the method relabels nodes through a recursive node relabeling algorithm assigning to each node a label uniquely representing the node's rooted subgraph (neighborhood). After recursion, the final node labels form a vocabulary of words, and graphs are represented as a set of words (a document) in this vocabulary. Like Netpro2vec, Graph2Vec relies on the Doc2Vec learning model to learn the graph embeddings. The initial labels of nodes are, by default, the node degrees, although the user can specify them as an additional input. Graph2Vec is a popular method among graph-level embedding techniques, and it has proved to have good performance throughout many graph domains.
Graph2Vec graph-level embeddings are learned in a transductive manner. Since this method shares with Netpro2vec the same NLP processing technique, also in this case, it is possible to use the Doc2Vec facility to infer embeddings of new samples based on a pre-trained neural model. Thus, in the current work, we developed a new API that we call iGraph2Vec, to enable the method to operate in inductive learning mode.

FEATHER
FEATHER [34] is a method that uses an r-scale random walk weighted characteristic function to describe the distribution of graph node features at multiple scales. Assuming the neighborhood of a node u at scale r consists of nodes that can be reached by a random walk in r steps from source node u, this characteristic function has probability weights defined by the transition probabilities of random walks in r steps from source node u. FEATHER is a probabilistic embedding method: by exploiting random walks, it learns multi-scale node features of the graph that are aggregated by mean pooling to obtain a numerical vector (embedding) representing the entire graph structure and its local neighborhood connectivity. Therefore, the resulting embedder can be applied to each graph sample separately.

Experimental pipeline
The experimental pipeline is summarized in the pseudo-code of Algorithm 1. The graph dataset is loaded (line 1) and split ten times into a training set with ninety percent of the samples and a test set used for evaluation (line 2). The dataset partitions are non-overlapping, thus ensuring that all graphs in the dataset are used for testing exactly once. After dataset splitting, the training samples are attacked according to the chosen adversarial attack strategy and parameter (budget of the attack) (line 3), while the test samples are unaltered. The embedding method is initialized, and its parameters are set (line 4). The embedding model is built (trained) on the altered samples (line 5). The so-trained model is applied to produce embedding on both training (line 6) and testing (line 7) samples. Once the embedding vectors are obtained, an SVM classifier with a linear kernel is applied to fit the training vectors and predict the test vectors (line 8). Scores for all cross-validation folds are collected, and performances are computed.

Performance evaluation
The experimental results are all reported in Tables 3, 4, 5, 6, 7, 8, 9, 10 and 11 in the Appendix in terms of accuracy, precision, F-measure, recall, and Matthews Correlation Coefficient (MCC) [37]. In Figs. 5, 6 and 7, we plotted the MCC scores obtained by the stratified 10-fold cross-validation on the embeddings produced by all methods when applied to each dataset and by varying the type of attack (random, betweenness, eigenvector, and PageRank centrality-based attack) and the budget of poisoning (percentage of edge removal).
In the MUTAG benchmark, all the methods show a performance degradation towards the null classification for attack budgets higher than 20%. This is due to the small size of the original graphs and the consequent scarcity of graph information survived to the edge removal attacks. The worst behavior is observed for FEATHER, while the best for iNetpro2vec, which partially succeeds in extracting distinctive information from the built vocabulary under moderate attacks.
In the case of PROTEINS, the performance of all the methods is low and similar, even with no attacks. iNetpro2vec is the only method showing less degradation of MCC scores when increasing the poisoning budget to the maximum. In our interpretation, this effect is partially due to the inherent robustness of the method. Indeed, if we look at the structure of a graph from the PROTEINS dataset (e.g., Fig. 2), we observe that edge removal attacks lead to the division of the protein graph into disjoint groups of atoms, leaving intra-group connectivity unaltered. In our understanding, iNetpro2vec relies more on intra-group than Algorithm 1 The experimental pipeline inter-group connections to characterize the graph embedding. This is a plausible explanation for the almost flat trend of MCCs in the plots.
In the case of the more challenging Kidney dataset, iNetpro2vec always performs better than FEATHER. This method clearly suffers in robustness under increasing edge removal attacks, and its performance under all the attack strategies soon degrades towards the null classification. iNetpro2vec also outperforms iGraph2Vec in the unattacked case and with attack budgets less than %20. In the other poisoning percentages, iGraph2Vec and iNetpro2vec show similar robustness when the attack increases and across the different strategies. In particular, the two methods show good robustness within a range of 20% for random and betweenness strategies and within a larger range of budgets in the case of eigenvector and PageRank centrality-based edge removal.
Overall, iNetpro2vec appears more robust to edge removal attacks than the other methods. This is particularly evident in the PROTEIN benchmark, where the gap is larger as the attack budget increases. The same holds in the Kidney benchmark, although, for this domain of high-scale and weighted graphs, iGraph2Vec performs lightly worse with lowbudget attacks but similarly with larger budgets.
It should be observed that some of the compared methods seem to improve, rather than decrease, their performance under small budgets (generally 5%) of edge removal attacks. This is the case of FEATHER on the MUTAG and PROTEINS datasets and of iNetpro2vec and Graph2Vec on the Kidney dataset. However, by applying the two sample T-test to quantify the difference between the population of MCC means in the case of unattacked graphs and of 5% budget of poisoning, and by examining the relative p-value, it comes out that the reported unexpected increase is not statistically significant and therefore cannot be considered a real improvement in performance.
As a general comment on the attack strategies, the eigenvector and PageRank centralitybased edge removal strategies have similar effects on the methods' performance. This was expected since, as already discussed in Section 3, the PageRank centrality is a variant of the eigenvector one.
To conclude the performance evaluation of the compared methods, for all the embedding methods we report the parameter settings in Table 12 and the execution times recorded during the experiments in Table 2. We measured the average execution times of the experimental pipeline when applied to each pair dataset/method on an iMac Retina 5K with a 4GHz Intel Core i7 quad-core and 32GB of RAM 1600 MHz DDR3. We observe that the FEATHER algorithm is much faster than the other two methods in the case of small graphs (MUTAG and PROTEINS datasets). However, it is the slowest when dealing with the much larger Kidney graphs, for which the two neural network-based methods require approximately 2/3 of the time.

Conclusions and future work
As a general conclusion, from our experimental study, iNetpro2vec shows a very good robustness of its embedding models across all the considered benchmarks and when the model training set is poisoned even with targeted attacks involving more central connections of nodes. It behaves similarly to iGraph2Vec in the Kidney benchmark, consisting of large-scale weighted and highly connected graphs. In this domain, the FEATHER method has no success. This is further proof that iNetpro2vec provides efficient embeddings independently from the nature of data and for different tasks (graph classification, graph similarity matching, and so on).  Future works are in the following directions: definitely, look at datasets with different characteristics in terms of density, structure, and position of nodes and edges as the attack strategies act above all on these aspects; furthermore, another important issue concerns the application of additional attack strategies in order to evaluate the behavior of graph embedding methods.

Appendix A: Performance measures of graph-embedding methods
In this appendix, we include tables reporting measures of 10-fold classification accuracy (acc), precision (prec), F-measure (f1), recall, and Matthews Correlation Coefficients (MCC) obtained in all the experiments. One table is reported for each experiment bunch, referring to the classification performance of one graph-embedding method (iNetpro2vec, iGraph2Vec, or FEATHER) when applied to one dataset (MUTAG, PROTEINS, or Kidney). In each table, we report the performance results when the dataset is unattacked (first row) and in the case of different percentages of edge removal (budget). The rows are grouped according to the criterion adopted for edge removal (random, betweenness, eigenvector, or pagerank) ( Tables 3, 4 , 5, 6, 7, 8, 9, 10 and 11).   Appedix B: Parameter settings of graph-embedding methods Table 12 reports the parameter settings for the software implementations of Netpro2vec, 1 Graph2Vec, 2 and FEATHER 2 adopted in the experiments. These parameters have been experimentally chosen to optimize MCC performance. Funding This work has been partially funded by the BiBiNet project (H35F21000430002) within POR-Lazio FESR 2014-2020. It was carried out also within the activities of the authors as members of the ICAR-CNR INdAM Research Unit and partially supported by the INdAM research project "Computational Intelligence methods for Digital Health". The work of Mario R. Guarracino was conducted within the framework of the Basic Research Program at the National Research University Higher School of Economics (HSE).
Data availability Data and algorithms used in the current work are all available as open source.
Code availability The software used in the current experimental study is publicly available for reproducibility of results.

Declarations
Ethics approval and consent to participate Datasets used in the current work are all from secondary sources, where primary ethics approval had been obtained for data acquisition.

Conflicts of interest
The authors declare that they have no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.