Improving Network Delay Predictions Using GNNs

Autonomous network management is crucial for Fifth Generation (5G) and Beyond 5G (B5G) networks, where a constantly changing environment is expected and network configuration must adapt accordingly. Modeling tools are required to predict the impact on performance (packet and delay loss) when new traffic demands arrives and when changes in routing paths are applied in the network. Mathematical analysis and network simulators are techniques for modeling networks but both have limitations, as the former provides low accuracy and the latter requires high execution times. To overcome these limitations, machine learning (ML) algorithms, and more specifically, graph neural networks (GNNs), are proposed for network modeling due to their ability to capture complex relationships from graph-like data while predicting network properties with high accuracy and low computational requirements. However, one of the main issues when using GNNs is their lack of generalization capability to larger networks, i.e., when trained in small networks (in number of nodes, paths length, links capacity), the accuracy of predictions on larger networks is poor. This paper addresses the GNN generalization problem by the use of fundamental networking concepts. Our solution is built from a baseline GNN model called RouteNet (developed by Barcelona Neural Networking Center-Universitat Politècnica de Catalunya (BNN-UPC)) that predicts the average delay in network paths, and through a number of simple additions significantly improves the prediction accuracy in larger networks. The improvement ratio compared to the baseline model is 101, from a 187.28% to a 1.828%, measured by the Mean Average Percentage Error (MAPE). In addition, we propose a closed-loop control context where the resulting GNN model could be potentially used in different use cases.


Introduction
Fifth Generation (5G) and Beyond 5G (B5G) networks require comprehensive performance analysis and monitoring to meet the strict demands of Quality of Service (QoS) and Quality of Experience (QoE) for new services, such as Ultra-Reliable Low-Latency Communication (URLLC), Massive Machine Type Communication (mMTC) and Enhanced Mobile Broadband (eMBB) [1].To realize such services, network operators create and deploy network slices.Network slices are multiple logical virtual networks on top of a shared network domain, using a shared physical network and shared computing resources.However, one of the biggest challenges for network automation is the dynamic resource allocation of network slices [2].
Such slices must be proactively and dynamically created, deployed, and optimized in (near)-real-time [3] to guarantee the expected QoS/QoE (e.g., latency below a maximum, bandwidth above a minimum, and a minimum of connected devices) for demanding applications such as virtual reality, smart stadiums, etc.According to [4], forthcoming infrastructures are expected to incorporate technologies such as Software Defined Networking (SDN), Network Function Virtualization (NFV), Service Function Chaining (SFC), and Multi-access Edge Computing (MEC).The integration of these advanced features requires a quick analysis of the Key Performance Indicators (KPIs).For example, new usages like Internet of Vehicles (IoV) are demanding a high level of adaptation to the networks [5].Therefore, a fast and highly automated management system is required to trade between demand, benefits, and capacity for existing and newer slices.
Several tools can aid the management system in achieving automation.Traditional network modeling is based on mathematical analysis of the network behavior, but they rely on simplifications that often do not reflect the complexity and dynamism of current network models [6].In contrast, network simulators offer highly accurate results at the packet level.However, their processing time and usage of computer resources grow exponentially with the network size and parameter configuration (e.g., constant change of network conditions).Consequently, existing networks are challenging to analyze, monitor, and manage in almost real time.
Recently, Machine Learning (ML)-based predicting models are trained with data to improve network decision-making.These models play a fundamental role in network automation since they can swiftly respond to new data or values with minimal human intervention.This way, data-driven approaches can help improve prediction times and accuracy.There are currently proposals for Artificial Intelligence (AI)/ML algorithms to produce the following generation of network models that learn from data and abstract the underlying network complexities.These models can quickly predict the network KPIs and can be adjusted by retraining them with newly available data [7].Furthermore, the usage of AI is becoming a key enabler in the current 5G and B5G network slicing landscape [8].The Neural Networks (NNs) to approach the implementation of Digital Twin Networks (DTNs) are starting to be widely used in the research and development of this type of networks, as presented in [9][10][11].
It has been shown that NNs are excellent function approximators and can be used to represent complex relationships from data [12].However, current NN designs are not made to learn from data with a graph structure.To overcome this problem, GNNs are proposed and recently employed to precisely estimate performance in networking domains, respectively [13].GNNs learn from graph features and structures, creating more accurate data-based models.As the complexity of the GNNs increases with network size, training with larger networks takes longer and uses more resources [14].Therefore, it is convenient to obtain a GNN-based model capable of generalizing.This model is trained using data from networks with a small number of nodes, providing fast training.At the same time, this model should produce high-accuracy results when predicting network KPIs in larger unseen networks, exhibiting good scalability properties.
This paper describes our step-by-step approach to outperform RouteNet [15], a well-known and state-of-the-art GNN model designed to predict network performance metrics.This baseline was proposed during the 2021 Graph Neural Networking Challenge: Creating a Scalable Network Digital Twin, which was part of the ITU AI/ML in 5G Challenge. 1 Our contributions can be summarized as follows: 1 https:// aifor good.itu.int/ about-ai-for-good/ aiml-in-5g-chall enge/ • We propose a GNN model that predicts the network delay with high accuracy and is scalable compared to the baseline model RouteNet by exploiting feature engineering techniques and fundamental networking concepts.• We provide a detailed description of our approach toward the model design and an exhaustive analysis of the network properties related to the dataset used to train the model.This aspect is fundamental to support our decisions on the different models we design and can be used as a methodology to create GNN to solve related problems.• We perform an extensive set of experimental evaluations where we compare our approach with recent state-of-the-art GNN approaches and analytical models, and discuss how different changes to the baseline model affect the model's performance.Real and synthetic datasets are used for a comprehensive prediction accuracy evaluation.
The rest of the paper is structured as follows.In Sect.2, we describe the use cases where a GNN used for network KPIs predictions can be helpful.In Sect.3, we review the main methods for predicting network delay.We introduce the RouteNet baseline in Sect. 4. After that, we present Girona Antwerp Intelligence for Networks (GAIN), an enhanced RouteNet-based architecture in Sect.6, which can accurately estimate the average per-path delay in large networks while being trained in small networks.We evaluate GAIN and show the obtained results in Sect.7. Finally, we conclude the paper in Sect.8.

Problem Context
Predictive models are an adequate tool to study how various network configurations and decisions might affect 5G and B5G networks performance before deployment.
For instance, a network management system can measure the impact of accepting new traffic on the network throughput or delay.If the existing users are not affected, the request is accepted.Otherwise, the request is rejected.To do this automatically, prediction models should be fast (e.g., in milliseconds) and accurate enough so the decision is timely and adequate.
As discussed in Sect. 1, AI/ML, particularly GNNs, are being proposed as the next generation of predictive models.GNNs have gained significant attention in the networking community due to their ability to learn complex relationships among nodes and links [16].This makes them more suitable for this domain compared to classical ML techniques or Deep Learning (DL) [7].
Recently, GNNs have also been suggested as DTNs [17].A DTN represents a virtual real-time depiction of a physical network in the digital world [18].Digital twins have sparked a revolution in various industries, providing an enhanced view of physical entities to support diagnosis and what-if analyses.The requirements for near-real-time network analysis are closely linked to the concept of DTNs, which is still in the early stages of adoption for 5G and B5G networks [19,20].
In general, ML is becoming a crucial enabler for developing new DTNs for these networks, as the demands of new services grow increasingly stringent.For instance, in smart cities, DTNs are utilized to design networks that can ensure QoS without wasting energy, as discussed in [21].Moreover, the emerging IoV services also rely on IoV for network planning, which can incorporate NNs and even incorporate techniques like Federated Learning [22].
Another interesting use case in 5G and B5G networks is a dynamic, real-time, routing optimization tool that accomplishes the QoS requirements.These networks continuously evolve and require an adaptive behavior, where routing can be decisive in achieving the Service Level Agreement (SLA) requirements.In this case, different routing configurations can be tested, selecting the best result according to the target KPIs.This tool should have the following functionalities: • Monitor the current resource allocations and performance over the entire network.• Detect or be notified by an external actor, of new network events that might affect the KPIs.• Test multiple routing configurations and choose the optimal one for the real network.• Minimize the used resources while satisfying the SLAs.
• Apply new network routing configurations.
The applicability of DTNs for routing is demonstrated in the state-of-the-art literature, where the usage of AI facilitates the implementation of these solutions.For example, in [23], AI is employed to implement an energy-efficient routing protocol in Wireless Sensor Networks (WSNs).In [24], a cognitive approach for routing is utilized in the context of reliable Internet of Things (IoT) networks, leveraging the capabilities of DTNs.Additionally, in the context of IoV and B5G networks, a message routing scheme is developed in [25], utilizing AI techniques.
The solutions to the use cases described above can be designed following a closed-loop control and leveraging the prediction model.As depicted in Fig. 1, network policies (i.e., SLAs) define the system's constraints or optimal operation points.All the decisions coming from the decision-making algorithm should ensure such optimal operation.A monitor block is introduced, which detects changes in the real network and formats the data to train the prediction model.Then, if a change is detected, the decision-making algorithm is triggered.The prediction model evaluates the algorithm's potential decisions and returns its impact on network policies.
Thus, the decision-making algorithm can select the decision that keeps the network operating closer to the agreed operation point.The changes, if any, are deployed in the network via an executor.The main blocks of the solution in Fig. 1 can be easily mapped to the Monitor-Analyze-Plan-Execute over a shared Knowledge (MAPE-K) framework, one of the most influential reference control models for autonomous and self-adaptive systems [26].MAPE-K defines five main blocks: the monitor, the analysis, the plan, the executor, and the knowledge block.The monitor and the executor are clearly represented in the figure, while the decision-making algorithm and the prediction model represent the plan and analysis block, respectively.The knowledge represents the data shared by all the other four blocks.More concretely, in the case of network slicing, once a new slice request arrives at the network, the decision-making algorithm should evaluate whether the slice request should be accepted.Then, the decision to accept or reject the request is simulated in the prediction model.According to the evaluations, the decision that allows fulfilling the network policy is then taken.In the case of routing optimization, the same procedure is followed, but instead of adding/rejecting slices, a routing possibility is assessed.Then, the best decision is selected and applied to the network.

Related Work
A predictive model and a decision-making algorithm are the main blocks needed to achieve the automated network management tasks described in Sect. 2. In this work, we focus on improving a prediction model called RouteNet, which predicts the network performance according to a given topology, routing, and offered traffic.However, other techniques can and have been used in predicting network performance.In the literature, network models are used for this objective and are typically created using analytical modeling, which includes queuing theory [27], Markov chains [28], and other similar techniques.Such models can produce fast but poor predictions as a result of relying on unrealistic and static assumptions about real-world networks.Packet-level network simulators are often utilized to model network behavior under specific cases and get more realistic results.Nevertheless, their longer computations and execution times when dealing with larger networks [29] make them unsuitable for the required fast predictions.Despite this, network simulators produce highly accurate data which can be employed to train different ML models.AI/ML solutions such as shallow Feed Forward Neural Networks, Support Vector Machine (SVM), Random Forest (RF), Reinforcement Learning (RL), Multi-Layer Perceptrons (MLP), or Convolutional Neural Networks (CNN) have been used to resolve similar 5G challenges.Some state-of-the-art solutions apply these methods for predicting network delay, throughput, or other KPIs.In [30], 3D-CNNs are applied to the problem of provisioning resources for network slices in real-time, predicting network capacity requirements depending on the time of the week.
In [31], a zero-touch control for network slicing is proposed, predicting each slice's network capacity needs.The authors in [32] implement a Logistic Regression (LR), an SVM, and a Decision Tree (DT) to predict the delays experienced by the users.Following a DL approach, [33] predicts the traffic that a network will support at a specific time, similarly to [30,31], but using a Recurrent Neural Network (RNN).[34] also uses RNNs to predict delays in 5G networks for IoT and Tactile Internet.All these ML solutions work well with static scenarios but are less adapted to create graph models in comparison to GNNs.The main advantage of GNNs over other ML methods is their expressive power [35], allowing to model different graphs (e.g., different topology, number of nodes) with higher accuracy.Some additional motivations to apply GNNs to the network performance prediction are: • GNNs have been successfully applied to combinatorial optimization problems [36], and can achieve relational reasoning [37].• A graph representation can better capture the relationship of the nodes in a network, i.e., there is a one-to-one mapping between the network topology and the graph representation.
In that sense, RouteNet [15] is a GNN model that can directly learn from graphlike data, including network topologies, routing configurations, and offered traffic, to predict network KPIs.In recent research, other GNN solutions have been applied to solve autonomous network management [38], network slicing monitoring [39], network slicing control [40], or SDN end-to-end delay prediction [41].However, GNNs, as ML algorithms, are sensitive to the mismatch between training data and testing data, i.e., training and testing data do not come from the same distribution.Therefore, one of the main challenges in using GNNs for predicting network performance is to improve the out-of-distribution generalization, understood as the difference in accuracy between training and testing data [42].
Several works already attempt to improve the generalization capabilities of GNNs.The authors of [43,44] presented PARANA, a GNN model that extracts path and link-level features related to Queuing Theory (QT).The model trains two message-passing procedures.The first procedure focused on learning features from larger networks, while the second focused on learning features from smaller networks.The output is the average of both procedures.SOFGNN team [45] averaged two models and used data augmentation and feature engineering to solve the out-ofdistribution problem.To obtain a higher amount of samples for the training dataset, data augmentation is applied, creating new samples similar to the ones found in the training sets.Regarding feature engineering, they created a new feature called link load (%), defined as the sum of the traffic of all flows traversing the link and divided by link capacity.The average result of two trained models was used, one using as features the square value of link load, and the other model including both the square and cube values of link load.Finally, the hyperparameters were fine-tuned.
65 Page 8 of 36 BNN-UPC also proposed a solution to improve RouteNet's out-of-distribution generalization [46].The proposed model implements an extra message-passing procedure and introduced a scaling factor combined with the data augmentation to estimate the delay values of higher-capacity links.Unfortunately, this model required a long time to train, in part due to applying only feature selection and not applying feature engineering to improve the model generalization.
Table 1 provides a summary of the mentioned recent methods for using ML, DL, and GNNs for network modeling.Many works have tried to address the network performance prediction problem.Among the most used solutions, GNNs are the ML models that are best suited to the graph representation seen in networks.Nevertheless, GNNs suffer an out-of-distribution generalization problem and some works have tried to resolve it.As shown in the next sections, the main differences in our model are fast training, high accuracy, the usage of a simple and single model, and the public availability of the code and the datasets.

Background
Text, social networks, photographs, molecules, and computer networks are just a few examples of daily life whose interactions can be represented as graphs.Recently, GNNs [13], a scope of deep learning, have been used to solve various issues with graphs as an input [48,49].In a GNN, a graph is defined as G = (V, E) , with V representing the set of nodes and E representing the set of edges.The edges represent the interactions between nodes.Each node has an associated set of features represented by a one-dimensional vector.The vector's length is the number of input features of the node.Edge features are also represented in a one-dimensional vector whose length is the number of input features of the edge.
Like other neural networks, a GNN is a layered architecture whose one principle is the message-passing.The idea is that neighboring nodes affect the behavior of a concurrent node.Then, a message-passing method is applied to predict the target values based on neighbor nodes.Firstly, the information of the neighbor nodes is collected, then combined, and finally, the target node information is updated.These three steps are message transformation, aggregation, and update.Message transformation functions define how each node, as well as its neighbors and edges, produce messages.The aggregation function determines how messages are grouped using the vertices' maximum, average, or element-wise sum.Simpler aggregations are possible inside a GNN, but more sophisticated aggregations, like Long Short-Term Memory (LSTM) [50], are also possible.The updating function describes how messages are updated in the target node.Figure 2 represents a basic GNN, where the colored nodes include their features in the vectors.The target A node values are predicted in this case.The transformation phase to each message is applied first, then the aggregation function, and finally, the values update for the target node.When one of the attributes of a specific node is missing, the pooling method is used to obtain information from it [51].
A sequential neural network is typically used in the readout phase to compute the output for the predictions.Depending on the prediction needs, the readout functions  RouteNet was used as a baseline model for this work.RouteNet is a Recurrent Neural Network (RNN) model whose principal function is to predict per-sourcedestination network KPIs given a specific network configuration.An RNN GNN exchanges information of the graph and saves the states to the memory cell of each recurrently until convergence.In particular, as shown in Fig. 3, RouteNet learns the network model from the data collected over different network topologies, such as their traffic and routing configurations, which can later be used to predict their performance.The resulting model captures the complex relationships between the properties of links and the source-destination paths in topologies.
The representation of the graph that will feed the GNN is the first issue to be addressed in its design.Edge prediction is a limitation of current state-of-the-art GNNs, despite GNNs being able naturally to support them.Then, the input graph has to be adapted to read and predict link features.The input graph of RouteNet converts links into nodes, creating a hyper-graph for each sample, where two node types exist: path and link.The main working principle of RouteNet is the differentiation of the data between the path level (e.g., end-to-end delay, end-to-end packet loss) and the link level (e.g., link delay, link utilization).This information is encoded in learnable vectors.Based on this assumption, the message-passing procedure used by RouteNet [15] follows the principles: 1.The state of a path depends on the state of all the links on the path.Fig. 2 GNN basic structure with message transformation, aggregation and update [52] Fig. 3 RouteNet model inputs and output [15] 2. The state of a link depends on the state of all the paths that traverse the link.
The learnable vectors are used to execute a message-passing procedure (with a determined number of rounds) to collect messages from all the links and paths.This message-passing procedure, combined with the usage of RNNs, makes the GNN architecture capable of inferring path or link-level metrics.In RouteNet, each RNN's hidden state represents a function with the information of the path or link.These RNNs have modifiable hyperparameters to adapt to the specific use case.
In this paper, we use Mean Average Percentage Error (MAPE) as the prediction error.The MAPE is the average value of the relative errors, in percentage, as shown in Eq. ( 1).The relative error is calculated by the difference between the predicted value ŷl and the real value y i , divided by y i and obtaining the result in absolute value.The lower the MAPE value, the better the predictions.
Despite the efforts to enhance RouteNet, the mismatch in accuracy between the training and testing data is still significant, which indicates room for improvement.For instance, when trained with small networks and then used to predict results for larger networks, RouteNet shows a very high error level in the per-path average delay, in terms of MAPE (cf.Sect 7).Therefore, the BNN-UPC proposed a problem statement in the context of the ITU AI/ML for 5G Challenge 2021 [53].The challenge's goal was to create a scalable network model by tackling a fundamental limitation of existing GNNs: the lack of generalization capabilities to larger network topologies.Participants in the challenge needed to modify the RouteNet baseline or create a new GNN-based model.Such a model should be trained with a dataset containing small network topologies and later validated with larger ones, maintaining a low prediction error.This paper shows the step-by-step approach to outperform RouteNet by leveraging networking concepts and analysis.

Dataset
The dataset for training and testing the GNN model was provided by BNN-UPC [54], and was generated with OMNeT++, a packet-level network simulator.The model receives the following inputs per each sample in the dataset: 1. Topology: graph including node and link-level properties (e.g., nodes, links, queue sizes, link capacity).2. Flow performance: flow level and aggregate source-destination measurements (e.g., dropped packets, average delay).3. Flow traffic: flow level and aggregate source-destination time and size distributions used to generate traffic (e.g., average bandwidth, packets generated, average packet size).4. Routing: paths connecting source-destination pairs. (1) 65 Page 12 of 36 5. Link performance: metrics of output ports (e.g., utilization and losses, average port occupancy).
The dataset also includes a preset split into training, validation, and testing as follows: • Training: small networks between 25 and 50 nodes.
• Validation: larger networks from 50 to 300 nodes to analyze the ability of the models to generalize.• Test: equivalent characteristics to the validation dataset.
Moreover, the testing dataset is subdivided into three subsets called settings, having different features each: 1. Setting 1 ( S 1 ).Longer paths.Focused on the artificially generated longer paths regarding the training dataset.However, the link capacities have the same value ranges as in the training dataset.Some source-destination pairs do not transmit traffic.2. Setting 2 ( S 2 ).Increased link capacity.Focus on the larger link capacity with respect to the training dataset.All the source-destination pairs transmit traffic using shortest-path routing.Moreover, there is higher aggregated traffic and higher capacity links than in the training dataset.3. Setting 3 ( S 3 ).Both S 1 and S 2 properties are mixed.All source-destination pairs transmit traffic using longer paths with higher capacity links than the training dataset.
Dataset class balancing is relevant to train, validate and test the model with all the cases that it might have to predict [55].In addition, the number of samples predicted later has also to be balanced, to check the accuracy stability in different model use cases.In the following subsections, the training, validation, and test datasets are analyzed, in order to check the feature distributions and the dataset balancing.

Training Dataset Analysis
The training dataset consists of approximately 173 million path samples.The graph size represents the number of nodes for each network.As evidenced in Fig. 4, each graph size is directly related to the number of paths (i.e., the number of samples to train the model).The paths on bigger graphs are proportionally longer.The paths are mainly of lengths 3 and 4, as illustrated in Fig. 5.The traffic values in bits per second are balanced through the entire dataset for each graph size and path length, as displayed in Fig. 6 and Fig. 7. Packets per second are correlated with traffic in bits per second, as they depend directly on constant packet size.Other path features values, including packets dropped and the predictor variable, delay, are more dispersed in terms of value distributions.The selected link features of utilization, maximum queue occupancy, and offered traffic intensity ( O t ), a variable created in Sect.6.4, are also correctly balanced in the dataset, as shown in Fig. 8.

Validation Dataset Analysis
The validation dataset contains approximately 52 million path samples and follows the structure of the training dataset.The main difference is the increased number of graph nodes, ranging from 50 to 300 nodes, to test the generalization of the trained  9, the paths on bigger graphs are proportionally longer.The paths mainly have a length between 3 and 10, as shown in Fig. 10.
Also similarly to the training dataset, the values of traffic in bits per second are also balanced through all the dataset for each graph size and path length as detailed in Fig. 11 and Fig. 12.In the case of longer paths, a larger deviation in the values can be observed.In terms of the analyzed link features, utilization, maximum queue occupancy, and O t , the validation dataset is also correctly balanced, as shown in Fig. 13.

Testing Dataset Analysis
The testing dataset consists of approximately 26 million path samples and follows the structure of the train and validation datasets.As in validation, it includes the increased number of graph nodes, from 50 to 300 nodes.The main difference, in this case, is the partition in the three different subsets, S 1 , S 2 , and S 3 .For S 2 and S 3 , the paths on the bigger graphs are proportionally longer, as shown in Fig. 16 and  The paths mainly have a length between 9 and 11 in S 1 , between 3 and 6 in S 2 and between 6 and 10 in S 3 , as shown in Fig. 14.However, S 1 has significantly fewer path samples than the other settings.None of the datasets were modified, as the testing and validation datasets were already balanced.The subsets S 2 and S 3 of the testing dataset were also found to be balanced.On the other hand, the S 1 test dataset had very few samples and with higher deviation.These features were related to the link properties, including link utilization, maximum queue occupancy, and O t , as shown in Fig. 18.These fea- tures were out of the training and validation distributions, mainly in the graph samples with a higher number of nodes.The cause of this deviation was that not all the source-destination paths transmitted traffic, only the particularly long ones.These characteristics lead to slightly higher errors in the S 1 predictions, as described in Sect.7, compared to S 2 and S 3 .However, these errors were specific to this small subset designed for particular testing purposes (i.e., longer path testing with lower traffic).

Improving RouteNet for Generalization
In this section, the improvements to the RouteNet baseline implemented in Ten-sorFlow [47] are described step by step, including the improvement achieved.Our work follows step-by-step testing of individual modifications and measuring the enhancement of each applied change, one after the other.

Inferring Per-Path Delay from Predicted Queue Occupancy (GAIN-1 Solution)
Since the validation and test datasets contain topologies with higher link capacities, the delay values in larger topologies are lower than in smaller topologies.This implies that the model is trained with data following a particular data distribution but has to predict values that follow another distribution, which is problematic for NNs [56].
The initial idea to solve this problem consisted in finding an indirect metric that keeps a similar distribution among all the datasets and can be used, followed by a post-processing step, to predict the metric of path delay with low error.Based on this idea, our first approach was to use the occupancy of a link ( Q o ) as an indirect metric, representing the average utilization of a queue, as it encapsulates local relationships between offered traffic, queue size, and link capacity.The first step, titled GAIN-1, was to test the improvement of the original RouteNet implementation, replacing the initial prediction of delay with the prediction of the indirect metric Q o and inferring the path delay from that prediction.
After predicting Q o , we estimate each flow's delay by adding the queue delays belonging to a path in a post-processing step.Let's assume a path with three nodes, as depicted in Fig. 19.When the packets are sent from the src node, they are queued in Q 1 accumulating a delay.Depending on the queuing policy (assumed to be the same in each queue), the queue occupancy, and the link capacity, the packets will suffer more or less delay.If a link has more capacity, the waiting time of a packet in the queue will be lower.Once the packets are sent to the second node using Link 1 , they are queued in Q 2 , adding another delay.Finally, the packets are sent through Link 2 to the dst node.Therefore, the delay of a source- destination flow can be obtained as the sum of the delays in every flow link, as shown in Eq. (2).
Each queue delay of this flow was calculated using Eq. ( 3), where Q o is the occupancy of the queue, Q s is the queue size in number of packets, A p is the aver- age packet size in number of bits, and C o is the capacity of the outgoing link of the queue.The link delay and flow delay formulations are presented below.The post-processing code was optimized using the appropriate TensorFlow data structures to increase the processing speed and reduce code complexity.The resulting MAPE of the GAIN-1 solution was 44.73 %.

Normalization of Predictor Features (GAIN-2 Solution)
After analyzing the training and validation datasets, it was observed that the data distributions of the scalar features used in the model (detailed in Table 3) were different for training, compared to the validation and test subsets, as shown in Table 2. Specifically, the traffic of a flow ( T f ) and the capacity of a link ( C l ) had different value ranges for each dataset and between the training and validation/test datasets, respectively.
Therefore, in GAIN-2, a min-max normalization was applied as a pre-processing in the transformation function, using the training dataset min-max values.The maxmin normalization was calculated using the formula in Eq. ( 4).Normalized value N v (2)  The result of the GAIN-2 solution was a 28.739 % MAPE.

Feature Selection (GAIN-3 Solution)
The features included in the baseline were T f and C l , as described in Table 3.After a correlation test between the features available and the expected delay results in the datasets, packets ( P f ) revealed a high correlation to path delay.Furthermore, this feature had value ranges very similar in the three datasets, enhancing generalization.Consequently, it was added to the GAIN-3 path state, also including the min-max normalization preprocessing.The MAPE of the GAIN-3 solution was 18.471 %.

Offered Traffic Intensity (GAIN-4 Solution)
When approaching the total link capacity, increasing the total T f in a link directly impacts the queue delay by increasing it too.Figure 20 shows an example where a source src sends traffic to dst.If the sum of T f of each flow in the link is close to or exceeds C l , the Q 1 delay will increase.
(4) The dependence between T f demand in a link and the queuing delay [57], leads to the creation of a new feature named offered traffic intensity ( O t ), as shown in Eq. ( 5).It was defined as the sum of T f of all flows f passing through the net- work link N l divided by the bandwidth C l of that link, resulting in a scalar feature assigned to the link state.
Finally, only the feature O t was used in GAIN-4.For that reason, C l did not require normalization, as it was removed from the model, and the original values were used to calculate O t .The resulting MAPE of the GAIN-4 solution was 2.612 %.

Hyperparameters Optimization (GAIN-5 Solution)
The hyperparameters, i.e., number of neurons for the path-state and link-state, readout units, message-passing iterations, and epochs, were optimized for our model, testing different combined values using a grid search, defining and combining values for each parameter, as detailed in Table 4.For each combination of hyperparameters, a new training was performed followed by a check of the MAPE result.
One of the effects of optimizing the hyperparameters is the reduction of the training epochs, as the model stabilizes the validation results much earlier than the original 100 epochs.For reference, the best model achieved the best result after seven epochs.An early stopping epoch limiter is a technique that stops the training after a number of epochs where the model does not improve its results.The number of epochs monitored to do the early stopping is called patience.It was used in the GAIN-4 and GAIN-5 solutions, with a patience of 5 epochs, to speed up the hyperparameter tests.
After hyperparameter tuning, the GAIN-5 model was trained during seven epochs.We found that reducing the dimensions of the link state and the path state and increasing the readout units along the iterations of the message passing procedure significantly improved the results from 2.612 % to 1.838 %, the best result of all the GAIN solutions, also improving for the different subsets.
Another interesting effect of keeping hyperparameters and epochs at a low value is the reduced Graphics Processing Unit (GPU) power consumption, temperature, compute, and memory utilization as exposed in Sect.7.

Results
This section reports the results obtained following the steps described in the previous section.The GAIN-2 solution was the one presented to the challenge, achieving a 28.739% MAPE.GAIN-3, 4, and 5 are the results of additional improvements after the competition.In addition, a dataset with real traces of traffic is included for further evaluation.

Additional Models for Evaluation
For further analysis and comparison, two models are added to the predictions phase.Apart from the RouteNet baseline and the GAIN-5, two other delay prediction models are used: RouteNet-Erlang [58] and Queuinx [59].
RouteNet-Erlang also uses RouteNet as a baseline, applying improvements to increase the prediction accuracy in larger networks ( ≈10x), longer paths, and larger link capacities, using queue status to improve the prediction accuracy.The model is trained with the same dataset used with the baseline and GAIN-5.
Queuinx is a model that uses queuing theory, being focused on finite queues, using a fixed point algorithm.This model is used to validate the RouteNet [15] results in comparison to queuing theory computations.Both models are used to predict the test dataset and are included in the analysis of the results.

Models Evaluation with Test Dataset
The gradual improvement in each step is shown in Table 5, where the RouteNet baseline is shown to have a bad generalization for larger graphs (as an example, S 1 , S 2 , S 3 with 300 nodes) resulting in a MAPE of 187.28% for the full testing data- set.This is mainly caused by the out-of-distribution values when predicting in the validation and testing dataset, especially in S 2 and S 3 .The MAPE was calculated separately for each column and row in Table 5.The GAIN solutions gradually reduced MAPE, applying improvements step by step as described in Sect.6, using the three settings with the graph sizes 50 and 300 as a reference.From the RouteNet Baseline and GAIN-1 solution, which was the first change over the baseline, until the GAIN-5 solution, we achieved an improvement by a factor of 24 (from 44.73 % to 1.838%) and by a factor of 101 (from 187.28% to 1.828%) lower MAPE in all settings, respectively.The improvement between the baseline and GAIN-5 can be compared in Fig. 21, where the subsets S 1 , S 2 and S 3 are represented.The absolute error is represented in a logarithmic scale for comparison reasons.For GAIN-5, the error is 32x smaller than in the baseline.GAIN-5 is compared to the baseline in terms of graph size and path length, two features of a graph that scale up when the sample is bigger, as shown in Sect. 5.It can be observed that in setting S 2 and S 3 , the tendency of the prediction error is to become lower when the graph increases the number of nodes, improving the generalization compared to the baseline.
As introduced in the test analysis in Sect.5.3, the S 1 subset had a lower num- ber of samples including features with high deviation compared to S 2 and S 3 .These characteristics lead to increased errors in the S 1 predictions, both in baseline and GAIN-5 implementations, but still much lower in GAIN-5.The small datasets make it hard to rigorously evaluate GNNs, as they are data-hungry models [60], requiring significant amounts of data to check the correctness of the prediction accuracy.This is the main problem in analyzing the results of S 1 , where the high deviation paired with a low amount of samples resulted in more variability in the accuracy of the predictions.On the other hand, the predictions on S 2 with higher capacity links and on S 3 , with long paths mixed with higher capacity links, resulted in a low overall error for all the graph sizes, becoming lower as the graph was bigger.As shown on the error scales, the absolute error variability range on the GAIN-5 solution on all settings is much smaller than in the baseline results.
Also in Fig. 21 the prediction results for the Queuinx and RouteNet-Erlang models are shown.Queuinx reveals similar results compared to the baseline.It can be concluded that both models exhibit a higher error than GAIN-5, with RouteNet-Erlang being better than the challenge baseline in every setting.Comparing the errors linked to the path lengths in Fig. 22, it can be observed that with longer paths, the error deviation is larger.This is due to the low amount of samples and their particularity in the S 1 of the dataset.In Fig. 23 this problem is shown to have a small impact on the solution precision.On the left, all settings are mixed to obtain the average absolute error per each graph size.GAIN-5 is shown to have a lower error on all settings mixed in comparison to the other tested models, lowering the error as Fig. 23 Baseline, Queuinx, RouteNet-Erlang and GAIN-5 all settings, logarithmic absolute errors compared to path length Fig. 24 GAIN-5 absolute error on S 1 compared to path length the graph size increased.On the right, all settings compared to the path length are checked, where a higher error with the out-of-distribution samples is observed, as a small number of samples were above 15 hops.
In a further error analysis to confirm the generalization of the model, heat maps for the three settings were generated.In Fig. 24, S 1 was found to scale slightly worse than the other settings as detected in the previous analysis, especially with the graphs of 200 and 220 nodes.Inversely, Fig. 25 and Fig. 26 show the generalization of the model, keeping the errors low and even lowering the absolute errors as the networks become bigger.

Models Evaluation with Real Traffic Traces Dataset
To validate the accuracy of the tested models, a real traffic dataset is also used.This real traces dataset [61] is generated using real-world traffic matrices and realistic packet inter-arrival times.In Fig. 27 the MAPE results for each graph and model are shown.For ABILENE and GEANT networks, the results are similar, with RouteNet-Erlang achieving the best results, followed closely by GAIN-5.Meanwhile, testing the models with the GBN network, which contains smaller links and lower traffic, the results are clearly better using GAIN-5, which demonstrates a better generalization maintaining a lower overall error on the three scenarios.

Resource Utilization
Concerning the GPU usage, it can be observed that in the GAIN-4 solution is ∼ 4% while in the baseline it is ∼19%, with the added benefit of a much faster training.One of the causes of this reduction is the better convergence of the model, as illustrated in Fig. 28.In GAIN-5 the GPU consumption is higher than in GAIN-4, because of the higher hyperparameters, but GPU consumption is still kept under 15%.The hardware of the testbed used for the training was composed of two GPUs Nvidia GeForce GTX 1070 with 8 GB GDDR5 memory, paired with an AMD Ryzen 5 5600X CPU, 32 GB of DDR4 RAM, and a 500 GB M.2 SSD.The time to train and test the GAIN-5 model was about 3 h 30 min, 2 h 20 min for training, which is a significant reduction compared to the original 12 h training of the baseline.
Finally, to compare GAIN-5 with other solutions, Table 6 provides a summary of the best different approaches and MAPE results.The uniqueness of our solution (GAIN-5) [63] is the preservation of the architecture and data of the original RouteNet model, as the message passing procedure was not modified and data augmentation was not used, resulting in an implementation with lower overhead than in the other solutions.Additionally, our approach did not use the average of multiple models to maintain the simplicity of the baseline.The use of the new feature offered traffic intensity, designed from the knowledge of traditional queuing theory, significantly improved the accuracy of the predictions.GAIN-5 demonstrates that it is possible to obtain a low MAPE utilizing a simpler approach, just improving the original RouteNet model, and avoiding the usage of multiple tuned models.This simplicity allows fast training and easier management of the model, enhancing its usage in new environments by just retraining and deploying it.No other solution in the BNN-UPC challenge has fast training and good precision without using multiple models.The fine-tuning of GNN hyperparameters set smaller in our solution, allowed lower training and prediction times, and less memory and power usage, thus making our solution more suitable in case of frequent retraining requirements.

Conclusions and Future Work
The network models based on GNNs are being suggested as a good approach for building an autonomous 5G and B5G networks, as they can both properly forecast network KPIs and easily be modified and updated to account for the current networks rising complexity and dynamism.Out-of-the-box GNNs are unable to generalize and scale to larger topologies, which causes their accuracy to decline.In this paper, we introduced GAIN, a GNN-based model that accurately forecasts the average per-path network delay.The good generalization of the model, tested with heterogeneous datasets, enables it for predicting per-path delay in many scenarios, thereby proving it useful for managing 5G and B5G services.
In order to achieve good generalization, GAIN builds on RouteNet's GNN and makes a number of additions.These additions include the inference of per-path delay from the predicted link queue occupancy, feature normalization, feature selection, feature engineering (offered traffic intensity), and hyperparameter optimization.Additionally, compared to other solutions currently available, GAIN has less implementation complexity, lower resource requirements, and faster training.
A closer analysis, filtering, and preparation of the datasets is key for achieving improved results in a ML approach.In this case, the training and validation datasets were well generated for the training task.The testing dataset had particularities discovered during the test phase, where a closer analysis helped to know the cause of the slightly higher errors in a specific case.
As part of our ongoing research, we intend to examine additional GNN architectures, frameworks, and performance improvements.We also intend to use these developments in the suggested or another closed-loop control context, where GNN predictions will be used by decision-making algorithms (such as traffic admission control) deployed in 5G and B5G networks.
65 Page 32 of 36 material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http:// creat iveco mmons.org/ licen ses/ by/4.0/.

Fig. 1
Fig. 1 Closed-loop control context for network optimization, represented by the bold arrows

65
Page 10 of 36 can be modified as classifiers or regression functions.The output of the readout function converts the values into the desired prediction for each node.

Fig. 4 Fig. 5 Fig. 6
Fig. 4 Number of samples per graph size, and proportion of path lengths per graph size in training dataset

Fig. 11 Fig. 12 Fig. 13
Fig. 11 Average path bandwidth distribution per graph size in validation dataset

Fig. 16
Fig. 16 Number of samples per graph size, and proportion of path lengths per graph size in test dataset, S 2

Fig. 17 3 65
Fig. 17 Number of samples per graph size, and proportion of path lengths per graph size in test dataset, S 3

Fig. 18
Fig. 18 Maximum queue occupancy distribution per each graph size in S 1

GAIN 4, 5 65
Page 22 of 36 equals the original value V subtracting the minimum value of the feature for all the training dataset V min , and dividing the result by the difference between the maximum V max and V min of the training dataset.The TensorFlow functions were used to find the maximum and minimum values on the training dataset's tensor, which efficiently reduce the tensor to one dimension to return the desired value.

Fig. 19
Fig. 19 Queue occupancy example to infer delay

Fig. 25
Fig. 25 GAIN-5 absolute error on S 2 compared to path length

Table 1
Comparison of our work and other contributions focusing on GNN

Table 3
Features used in the GAIN solutions

Table 4
Values tested for optimization of the hyper-parameters

Table 5
Test dataset MAPE (%) results and performance for baseline and each GAIN solution

Table 6
Comparison of the best solutions for the BNN-UPC challenge