Learning from Network Device Statistics

Stadler, Rolf; Pasquini, Rafael; Fodor, Viktoria

doi:10.1007/s10922-017-9426-z

Learning from Network Device Statistics

Open access
Published: 26 September 2017

Volume 25, pages 672–698, (2017)
Cite this article

Download PDF

You have full access to this open access article

Journal of Network and Systems Management Aims and scope Submit manuscript

Learning from Network Device Statistics

Download PDF

Rolf Stadler^1,2,
Rafael Pasquini³ &
Viktoria Fodor¹

3112 Accesses
19 Citations
Explore all metrics

Abstract

We estimate end-to-end service metrics from network device statistics. Our approach is based upon statistical, supervised learning, whereby the mapping from device-level to service-level metrics is learned from observations, i.e., through monitoring the system. The approach enables end-to-end performance prediction without requiring an explicit model of the system, which is different from traditional engineering techniques that use stochastic modeling and simulation. The fact that end-to-end service metrics can be estimated from local network statistics with good accuracy in the scenarios we consider suggests that service-level properties are “encoded” in network-level statistics. We show that the set of network statistics needed for estimation can be reduced to a set of measurements along the network path between client and service backend, with little loss in estimation accuracy. The reported work is largely experimental and its results have been obtained through testbed measurements from a video streaming service and a KV store over an OpenFlow network .

What SNMP Data Can Tell Us about Edge-to-Edge Network Performance

Software Defined Networking Concept (SDN) Based on Artificial Intelligence: Taxonomy of Methods and Future Directions

ANETtE—Automated Network Evaluation and Test Environment

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Network statistics and measurements have been at the core of network management since its inception. How to model and retrieve this data from network devices in an open way has been a driving force behind the evolution of network management standards.

Network statistics have traditionally been used for managing the network layer and have driven tasks like network provisioning, routing, and fault detection. In this work, we use network-level, local statistics to estimate service-level, end-to-end metrics. We give examples and show scenarios where this can be achieved with an accuracy that matches methods which rely on both network and backend statistics.

We believe that our results are significant, for two reasons. First, they include mappings from low-level device statistics, such as packet rates, onto higher-level application statistics like frame rates or response times, which is hard to do with traditional network engineering techniques. Second, while application statistics highly depend on the configuration and the state of the backend system that runs the service [1, 2], we compute mappings without considering backend statistics. We thus conclude that end-to-end application statistics must be “encoded” in network device statistics in the scenarios we investigated.

Our approach to mapping network device statics to service level-metrics is based upon statistical, supervised learning, whereby the mapping function is learned from observations, i.e., through monitoring the system.

Our approach enables end-to-end performance prediction without requiring an explicit model of the system. The method is different from the traditional engineering approach to performance prediction, which is based on stochastic modeling and simulation. Traditional engineering requires detailed knowledge of the architecture of the system and of its components, as well as their functionality and interactions. The reason we advocate statistical learning for end-to-end performance estimation is that a stochastic model would become too complex for many application scenarios and thus would be infeasible to develop and apply in practice.

The work reported in this paper is largely experimental, and its findings have been obtained from testbed measurements. We have deployed two services namely—an HTTP Video-on-Demand (VoD) service with VLC [3] clients, and Voldemort [4], a Key-Value store (KV)–on a cluster and measured service-level QoS metrics on clients which access these services over an OpenFlow network. We have built generators that load the testbed with service requests and cross traffic, and we have devised a framework for collecting device statistics from servers and switches during test runs. The collected traces are then used as input to train models that estimate service-level QoS metrics from infrastructure statistics.

The main contributions of this paper are:

We demonstrate through experimentation that service-level metrics can be learned from network device statistics using standard machine-learning methods. In our case, the metrics are KPIs from two services, streaming video and KV store. The device statistics are low-level, course-grain metrics from OpenFlow switches of the network that enables communication between a server cluster and a client;
We show that the set of network statistics needed for estimation can be reduced using feature reduction techniques, which decreases the overhead both for data collection and model computation. It turns out that features (i.e., metrics) which score high with univariate feature reduction tend to lie on the network path between the server cluster that runs the service and the client for which the prediction is made.

Preliminary results of this work appeared in [5]. This paper includes a revised presentation of the problem and firmer results. It includes additional experiments involving cross-traffic. All model computation and evaluations reported in this paper have been performed with Python libraries instead of R, which was used in [5]. Further, we include a naïve learning method as a baseline to assess the effectiveness of the approach. Also, we apply an additional feature reduction method, which is computationally efficient and allows us to rank features.

A word on terms we will be using throughout the paper. We apply terminology from network management and machine learning. The expression “we estimate service metrics from device measurements” translates into a phrase like “we predict target variables from features” in machine learning. Therefore, with “we estimate a metric” we mean the same as “we predict a metric” in the machine-learning sense. (Note also that, in machine learning terminology, “prediction” normally does not refer to a future time.) Furthermore, we use the terms “application” and “service” in the same sense.

The remainder of this paper is organized as follows. Section 2 formulates the learning problem and discusses the machine learning methods used in this work. Section 3 describes the infrastructure statistics and the service-level metrics collected for QoS estimation, and it explains how traces are generated. Section 4 details the testbed and load generation. Section 5 describes the experiments, the model computation process, and the evaluation results. Section 6 provides an assessment of the experimental results. Section 7 surveys related work. Finally, Sect. 8 presents conclusions and future work.

2 End-to-End Estimation of Service Metrics as a Learning Problem

Figure 1 outlines the system under investigation. It is composed of a backend in form of a server cluster, a network, and a client population. Clients access the services running on the cluster through the network. The services we are considering in this work are streaming video and KV store. Statistics collected from the infrastructure are used to train models for end-to-end metrics estimation.

We are interested in how the infrastructure statistics X relate to the service-level metrics Y on the client side. The infrastructure statistics X include measurements from the network and from the server cluster. The performance indicators Y on the client side refer to service-level metrics, for example, frame rate and response time. Details regarding the composition of X and Y are given in Sect. 3.

The metrics X and Y evolve over time, influenced, e.g., by the load on the servers, operating system dynamics, network traffic, number of active clients. Assuming a global clock that can be read on the machines in the server cluster, in the OpenFlow network controller, and in the clients, we model the evolution of the metrics X and Y as time series \(\{X_t\}\), \(\{Y_t\}\), and \(\{(X_t, Y_t)\}\).

Our objective is to estimate the service-level metric \(Y_{t}\) at time t on a client, based on knowing the infrastructure statistics \(X_{t}\). Using the framework of statistical learning, the problem is finding a model \(M: X_t \rightarrow \hat{Y}_t\), such that \(\hat{Y}_t\) closely approximates \(Y_t\) for a given \(X_t\). This is a regression problem, which we solve through supervised learning [6].

We apply two machine learning methods in this work–regression tree and random forest. The regression tree method computes region boundaries with the objective of minimizing the residual sum of squares (RSS). It recursively partitions the space of the input statistics (the feature space) into regions \(R_1, R_2,\ldots , R_M\). For a given X, the metric Y is estimated as \(\hat{Y} = \sum _{i \in R_k} \frac{Y_i}{|R_k|}\), where \(R_k\) is the region that X falls into, \(|R_k|\) is the number of training samples in \(R_k\), and i is the index of the samples in \(R_k\). The regions are constructed using a greedy algorithm, whereby during each construction step of a selected region, a feature and a threshold are identified that fulfill the optimization criterion [7]. This method has a computational complexity of \(O(m^2n)\), whereby m is the number of samples and n is the number of features.

Random forest is an ensemble method. Each estimated value of Y is an average of estimations from several regression trees [8]. Each of these trees is constructed using a fraction of the input statistics, and each construction step uses a randomized reduced feature set [7]. This method has a computational complexity of \(O(Tm^2n)\), whereby T is the number of trees in the adopted forest.

As a baseline to the machine learning methods, we use a naïve method which relies on Y values only. For each \(x \in X\) it predicts a constant value \(\bar{y}\) which is simply the average of the samples \(y_t\).

To investigate to which extent a reduced set of input statistics, automatically selected, can achieve accurate estimations, we apply feature selection techniques from machine learning. Computing a subset of features that minimizes the estimation error requires the evaluation of \(2^n\) subsets for n features and is thus infeasible for large n. For this reason, heuristic selection methods have been developed. One such method we use in this work is forward-stepwise-selection [7]. Starting from an empty feature set, the method incrementally grows the feature set by including, in each iteration, a new feature that minimizes the estimation error. The process stops whenever including an additional feature does not further decrease the estimation error. The method requires the evaluation of \(O(n^2)\) subsets of the full feature set. The second selection method we use is univariate-feature-selection, which relies on computing the cross correlation between regressor and target for each feature, sorting the features according to the evaluation values, and selecting the top k features to obtain the reduced feature set. This method requires the evaluation of n subsets, each containing a single feature. It has lower computational complexity than forward-stepwise-selection. In this work, we refer to a resulting feature set from forward-stepwise-selection as the ‘minimal’ feature set.

3 Infrastructure Statistics and Service-Level Metrics

This section describes the statistics of the input feature set \(X = X_{cluster} \cup X_{port}\) and its subsets. We refer to X also as the full feature set. Further, the section explains the specific service-level metrics \(Y_{VoD}\) and \(Y_{KV}\) and how traces for model computation are generated.

The \(X_{cluster}\) feature set is extracted from the kernel of the Linux operating system that runs on the servers executing the applications. The Linux kernel is the core of the Linux operating system. It gives applications access to resources, such as CPU, memory, and network, and it schedules requests to those resources. To access the kernel data structures, we use System Activity Report (SAR), a popular open source Linux library [9]. Accessing kernel data through procfs [10], SAR computes various system statistics over a configurable interval. Examples of such statistics are CPU core utilization, memory utilization, and disk I/O. \(X_{cluster}\) includes only numeric features from SAR, about 1700 statistics per server.

The \(X_{port}\) feature set is extracted from the OpenFlow switches at per port granularity level. It includes statistics from all switches in the network. We implemented a monitoring module in an OpenFlow controller, using standard OpenFlow statistic request and statistic reply messages [11] to periodically collect statistics regarding: (1) Total number of Bytes Transmitted per port; (2) Total number of Bytes Received per port; (3) Total number of Packets Transmitted per port; and (4) Total number of Packets Received per port.

The \(X_{path}\) feature set is a subset of \(X_{port}\) containing only statistics from ports on the path between the server cluster and the client. During the experiments, the path is composed of 12 ports, which results in a feature set of 48 statistics.

The \(Y_{VoD}\) service-level metrics: for the VoD application, we chose the VLC media player software [3], which provides single-representation streaming with varying frame rate. The service-level metrics we are considering are measured on the client device. During an experiment, we capture the following metrics: (1) Display Frame Rate (frames/sec), i.e., the number of displayed video frames per second; (2) Audio Buffer Rate (buffers/sec), i.e., the number of played audio buffers per second. These metrics are not directly measured, but computed from VLC events like the display of a video frame at the client’s display unit. We have instrumented the VLC software to capture these events and log the metrics every second.

The \(Y_{KV}\) service-level metrics: for the KV store, we chose the Voldemort software [4]. The service-level metrics we are considering are measured on the client device. During an experiment, we capture the following metrics: (1) Read Response Time as the average read latency for obtaining responses over a set of operations performed per second; (2) Write Response Time as the average write latency for obtaining responses over a set of operations performed per second. These metrics are computed using a benchmark tool of Voldemort. The read and write operations follow the request-and-reply paradigm, which allows for tracking the latency of individual operations. We have instrumented the benchmark tool to log the metrics every second.

Generating the traces: the collected statistics for \(X_{cluster}\) and \(X_{port}\) are stored in csv files. A csv file contains m rows of n features. Each row represents one observation and has a timestamp t indicating when the statistics were measured. Collected service-level metrics for \(Y_{VoD}\) and \(Y_{KV}\) are also stored in csv files together with the observation time t. During experiments, X and Y statistics are collected every second on the testbed. For each application running on the testbed, the data collection framework produces a time series \(\{(X_t, Y_t)\}\). We interpret this time series as a set of samples \(\{(X_{1}, Y_{1}),\ldots , (X_{m}, Y_{m})\}\). Assuming that each sample \((X_{t}, Y_{t})\) in the set is drawn uniformly at random from a joint distribution (X, Y), we obtain models using methods from statistical learning.

4 The Testbed

The testbed is deployed on a server rack in our laboratory at KTH. It includes ten high-performance machines interconnected by Gigabit Ethernet. Nine of them are Dell PowerEdge R715 2U servers, each with 64 GB RAM, two 12-core AMD Opteron processors, a 500 GB hard disk, and four 1 GB network interfaces. The tenth machine is a Dell PowerEdge R630 2U machine with 256 GB RAM, two 12-core Intel Xeon E5-2680 processors, two 1.2 TB hard disks, and twelve 1 GB network interfaces. All machines run Ubuntu Server 14.04 64 bits, and their clocks are synchronized through NTP [12].

The VoD application is deployed on six PowerEdge R715 machines: one HTTP load balancer, three web server and transcoding machines, and two network file storage machines. The load balancer runs HAProxy version 1.4.24 [13]. Each web server and transcoding machine runs Apache version 2.4.7 [14] and ffmpeg version 0.8.16 [15]. The network file storage machines run GlusterFS version 3.5.2 [16] and are populated with the ten most-viewed YouTube videos in 2013, which have a length of between 33 s and 5 min. The VoD client is deployed on another PowerEdge R715 machine and runs VLC [3] version 2.1.6 over HTTP.

The Voldemort KV store runs on the same machines as the VoD application. Six of them act as KV store nodes in a peer-to-peer fashion, running Voldemort version 1.10.22 [4]. The store is first populated with 10 million unique keys, selected uniformly at random from a 32-bit key-space. The size of the stored values is 1024 bytes, the default for Voldemort. Each key-value pair is stored on three machines in the cluster. Consistent hashing is used to identify these machines. The KV client runs the Voldemort benchmark tool and uses the same machine as the VoD client.

By deploying VoD and KV store on the same machines, the testbed allows for experiments with with either a single application running or with both applications running concurrently. Additional details about the VoD and KV application setup and configuration can be found in [2, 17, 18].

4.1 Emulated OpenFlow Network

The OpenFlow network, including switches and controller, is virtualized on the PowerEdge R630 machine described above. We use Virtual Box as hypervisor. Each OpenFlow switch and the OpenFlow controller runs in a virtual machine (VM). VMs run Ubuntu 14.04 and have 10 GB disk space. The switch VMs have 1 core and 4 GB RAM; the controller VM has 4 cores and 8 GB RAM. We use 18 cores to run the VMs, out of 24 physical cores available on this machine. We monitor the cpu steal time in all the virtual machines to monitor the competition among the virtual machines for access to physical cores. In all experiments we conducted, the observed cpu steal time was zero, which means that each VM had access to an entire physical core during an experiment.

Figure 2 shows the configuration of the OpenFlow network deployed on the PowerEdge R630 machine. It represents a three-tier network with border switches (SWB1...SWB4), aggregation switches (SWA1...SWA6), and core switches (SWC1...SWC4).

An OpenFlow switch is emulated using Open vSwitch 2.3.2 (OVS), an open-source software switch for virtualized server environments [19]. Such a switch can forward traffic between different VMs on the same physical machine over a physical network. Open vSwitch supports standard management interfaces (e.g. sFlow, NetFlow, CLI) and is open to programmatic extension and control through the OpenFlow protocol.

The links between OpenFlow switches in Fig. 2 are layer-2 ethernet links, emulated through the Virtual Box hypervisor and configured for 1 GB per second. We use the netem package to control the communication delay between the switches [20]. The links between the physical machine emulating the OpenFlow network on the one side and the server cluster, client machine, load generator machines, and cross traffic machines on the other side are physical.

The OpenFlow controller is implemented using the Floodlight 1.0 package [21], extended with the monitoring module we developed for collecting the \(X_{port}\) feature set. The network topology has 14 switches with a total of 44 ports, which are periodically polled. The OpenFlow controller maintains a connection to each OpenFlow switch using the layer-2 ethernet links of Virtual Box, some of which appear in Fig. 2 as dashed lines.

Beside the OpenFlow network, Fig. 2 shows four other components of our testbed. First, the server cluster, which runs the VoD and KV applications. Second, the client component, which issues service requests, and for which the service-level metrics are estimated. Third, three load generators, which emulate client communities and generate requests from different network locations. Load generator 1 runs on one PowerEdge R715 machine, load generators 2 and 3 run in VMs and share another PowerEdge R715. These two VMs are each configured with 4 cores, 8 GB RAM, and 10 GB disk space. Both applications have their own client and the load generator as described in Sect. 4.2. Fourth, two VMs are dedicated to cross traffic generation. Both run on the same machine as load generators 2 and 3 and share their configuration.

The routing topology is set up in such a way that different levels of traffic aggregation occur during experiments. All routes are bidirectional. The traffic between client and load generator 1 on the one side and the server cluster on the other side follows the path SWB3, SWA5, SWC4, SWC1, SWA1, and SWB1; the traffic between load generator 2 and the server cluster follows the path SWB2, SWA3, SWC2, SWA2, and SWB1; the traffic between the load generator 3 and the server cluster follows the path SWB4, SWA6, SWC1, SWA1, and SWB1; and the traffic between the cross traffic load generator and the cross traffic KV server (see below) follows the path SWB4, SWA6, SWC1, SWA1, SWB1, SWA2, SWC2, SWA3, and SWB2.

4.2 Generating Load Patterns

We built two types of load generators, one for the VoD application and another for the KV store. The VoD load generator dynamically controls the number of active VoD sessions, spawning and terminating VLC clients. The KV load generator controls the rate of KV operations issued per second. Both generators produce load according to two distinct load patterns.

The Periodic-load pattern: The load generator produces requests following a Poisson process whose arrival rate is modulated by a sinusoidal function, with start value \(P_{S}\), amplitude \(P_{A}\), and a period of 60 min.

Flashcrowd-load pattern: The load generator produces requests following a Poisson process whose arrival rate is controlled by the flashcrowd model described in [22]. The arrival rate starts at value \(F_{S}\) and peaks at flash events. \(F_{E}\) such events occur within an hour, distributed uniformly at random over this time period. At each flash event, the arrival rate increases within a minute to a peak value of \(F_{R}\), stays at this level for one minute, and then decreases to the initial rate within four minutes.

Table 1 gives the configurations of the load generators for the experiments reported in Sect. 5.

Table 1 Configuration parameters of VoD and KV load generators

Learning from Network Device Statistics

Abstract

Similar content being viewed by others

What SNMP Data Can Tell Us about Edge-to-Edge Network Performance

Software Defined Networking Concept (SDN) Based on Artificial Intelligence: Taxonomy of Methods and Future Directions

ANETtE—Automated Network Evaluation and Test Environment

Explore related subjects

1 Introduction

2 End-to-End Estimation of Service Metrics as a Learning Problem

3 Infrastructure Statistics and Service-Level Metrics

4 The Testbed

4.1 Emulated OpenFlow Network

4.2 Generating Load Patterns

5 Experiments, Model Computation and Evaluation Results

5.1 Estimating Service-Level Metrics Using the Full Feature Set X

5.2 Estimating Service-Level Metrics Using the Network Feature Set \(X_{port}\)

5.3 Comparing Feature Reduction Techniques to Reduce the Network Feature Set \(X_{port}\)

5.4 Reducing the Network Feature Set \(X_{port}\) Using Univariate-Feature-Selection

5.5 Estimating Service-Level Metrics Using the \(X_{path}\) Feature Set

6 Assessment of Evaluation Results

7 Related Work

8 Conclusions and Future Work

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation