AutoSAC: automatic scaling and admission control of forwarding graphs

Millnert, Victor; Bini, Enrico; Eker, Johan

doi:10.1007/s12243-017-0597-0

AutoSAC: automatic scaling and admission control of forwarding graphs

Open access
Published: 03 August 2017

Volume 73, pages 193–204, (2018)
Cite this article

Download PDF

You have full access to this open access article

Annals of Telecommunications Aims and scope Submit manuscript

AutoSAC: automatic scaling and admission control of forwarding graphs

Download PDF

1718 Accesses
2 Citations
Explore all metrics

Abstract

There is a strong industrial drive to use cloud computing technologies and concepts for providing timing sensitive services in the networking domain since it would provide the means to share the physical resources among multiple users and thus increase the elasticity and reduce the costs. In this work, we develop a mathematical model for user-stateless virtual network functions forming a forwarding graph. The model captures uncertainties of the performance of these virtual resources as well as the time-overhead needed to instantiate them. The model is used to derive a service controller for horizontal scaling of the virtual resources as well as an admission controller that guarantees that packets exiting the forwarding graph meet their end-to-end deadline. The Automatic Service and Admission Controller (AutoSAC) developed in this work uses feedback and feedforward making it robust against uncertainties of the underlying infrastructure. Also, it has a fast reaction time to changes in the input.

CECT: computationally efficient congestion-avoidance and traffic engineering in software-defined cloud data centers

Article 25 June 2018

Dynamic Management of Forwarding Rules in a T-SDN Architecture with Energy and Bandwidth Constraints

VNF-DOC: A Dynamic Overload Controller for Virtualized Network Functions in Cloud

1 Introduction

Over the last years, cloud computing has swiftly transformed the IT infrastructure landscape, leading to large cost-savings for deployment of a wide range of IT applications. Physical resources such as compute nodes, storage nodes, and network fabrics are shared among tenants through the use of virtual resources. This makes it possible to dynamically change the amount of resources allocated to a tenant, for example as a function of workload or cost. Initially, the cloud technology was mostly used for IT applications, e.g., web servers, databases, etc., but has now found its way into new domains. One of these domains is packets processed by a chain of network functions.

In this work, we are considering a chain of network functions through which packets are flowing. Every packet must be processed by each function in the chain within some specific end-to-end deadline. The goal is to ensure that as many packets as possible meet their deadline, while at the same time using as few resources as possible.

The goal is thus to derive a method for controlling the amount of resources allocated to each network function in the chain. Previously, this was usually done by statically allocating some amount of resources to each network function. Since the input is time-varying (see Fig. 1 for a trace of traffic flowing through a switch in the Swedish university network, SUNET), such a strategy usually lead to over-allocation of resources for long periods of time (yielding high costs and environmental footprint) as well as overload for shorter periods, when the input is large. To ensure that at least some packets meet their deadlines when the network function is overloaded, one has to use admission control, i.e., reject some packets.

Recently, a new option became available through the advances of virtualization technology for networking services. The standardization body ETSI (European Telecommunications Standards Institute) addresses the standardization of these virtual network services under the name Network Functions Virtualization (NFV) [1]. These Virtual Network Functions (VNFs) consist of virtual resources, such as virtual machines (VMs), containers, or even processes running in the OS. Using such VNFs, it is possible to change the resources allocated to a network function by either vertical scaling (i.e., changing the capacity of the allocated VMs) or horizontal scaling (i.e., changing the number of allocated VMs). Horizontal scaling is considered in this work. These VNFs are connected in a graph topology (commonly called a Forwarding Graph), as illustrated in Fig. 2. In this figure, there are two forwarding graphs (corresponding to the blue and red arrows). The blue forwarding graph consists of VNF₁, VNF₂, VNF₃, and VNF₅ and the red forwarding graph consists of VNF₁, VNF₂, VNF₄, and VNF₅. Each of the VNF is given a number $m_{i} \in \mathbb {Z}^{+}$ of VMs, which are mapped onto the network function virtual infrastructure.

While the benefit of using NFV technologies is scalability and resource sharing, there are two drawbacks as follows:

a)
Starting a new virtual resource takes time, since it has to be deployed to a physical server and it requires the execution of several initialization scripts and push/pulls before it is ready to serve packets,
b)
The true performance of the virtual resource differs from the expected performance, since one does not know what else is running on the physical machines [2].

In this work, we

develop a model of a service-chain of network functions and use it to derive a service-controller and admission-controller for the network functions,
derive a service-controller controlling the number of virtual resources (e.g., VMs or containers) allocated to each network function by using feedback from the true performance of the instances as well as feedforward between the network functions,
derive an admission-controller that is aware of the actions of the service-controller which it uses in order to reject as few packets as possible,
evaluate the service and admission controller using a real-world traffic trace from the Swedish University Network (SUNET).

1.1 Related works

There are a number of works considering the problem of controlling virtual resources within data centers, and specifically for virtual network functions. However, many of them focus on orchestration, i.e., how the virtual resources should be mapped onto the physical hardware. Shen et al. [3] develop a management framework, vConductor, for realizing end-to-end virtual network services. In [4], Moens and De Turk develop a formal model for resource allocation of virtual network functions. A slightly different approach is taken by Mehraghdam et al. [5] where they define a model for formalizing the chaining of forwarding graphs using a context-free language. They solve the mapping of the forwarding graphs onto the hardware by posing it as a MIQCP.

Scaling of virtual network functions is however studied by Mao et al. [6] where they develop a mechanism for auto-scaling resources in order to meet some user specified performance goal. Recently, Wang et al. [7] developed a fast online algorithm for scaling and provisioning VNFs in a data center. However, they are not considering timing-sensitive applications with deadlines for the packets moving through the chain, which is done by Li et al. [8] where they present a design and implementation of NFV-RT that aims at controlling NFVs with soft real-time guarantees, allowing packets to have deadlines.

The enforcement of an end-to-end deadline for a sequence of jobs is however addressed by several works, possibly under different terminologies. Di Natale and Stankovic [9] propose to split the E2E deadline proportionally to the local computation time or to divide equally the slack time. Later, Jiang [10] used time slices to decouple the schedulability analysis of each node, reducing the complexity of the analysis. Such an approach improves the robustness of the schedule, and allows to analyze each pipeline in isolation. Serreli et al. [11, 12] proposed to assign local deadlines to minimize a linear upper bound of the resulting local demand bound functions. More recently, Hong et al. [13] formulated the local deadline assignment problem as a MILP with the goal of maximizing the slack time.

An alternate analysis was proposed by Jayachandran and Abdelzaher [14], who developed several transformations to reduce the analysis of a distributed system to the single processor case. Or in [15] where Henriksson et al. proposed a feedforward/feedback controller to adjust the processing speed to match a given delay target.

2 Modeling the service-chain

In this section, we present a general model of the forwarding graph and virtual network functions presented in Section 1. We consider a service-chain consisting of n functions F ₁,…,F _n, as illustrated in Fig. 3. Packets are flowing through the service-chain and they must be processed by each function in the chain within some end-to-end deadline. A fluid model is used to approximate the packet flow and at time t there are $r_{i}(t)\in \mathbb {R}^{+}$ packets per second (pps) entering the i th function. In a recent benchmarking study, it was shown that a typical virtual machine can process around 0.1–2.8 million packets per second, [16]. Hence, in this work, the number of packets flowing through the functions is assumed to be in the order of millions of packets per second, supporting the use of a fluid model.

A function consists of several parts, as illustrated in Fig. 4: an admission controller, a service controller, m _i(t)instances, a buffer, and a load balancer. It is assumed that all the parts of a function are located at the same location, e.g., the same data center or rack. In [17], Google showed that less than 1 μs of the latency in a data center was due to the propagation in the network fabric. Hence, communication delay within a function is neglected.

2.1 Admission controller

Every packet that enters the service-chain must be processed by all of the functions in the chain within a certain end-to-end (E2E) deadline, denoted D ^max. This deadline can be split into local deadlines D _i(t), one for each function in the chain, such that the packet should not spend more than D _i(t) time-units in the i th function. Should a packet miss its E2E deadline, it is considered useless. It is thus favorable to use admission control to drop packets that have a high probability of missing their deadline in order to make room for following packets. The goal of the admission controller is to guarantee that the packets that make it through the service-chain do meet their E2E deadline. It is assumed to be possible to do admission control at the entry of every function in the chain.

Packets are admitted into the buffer of function F _i based on the admittance flag α _i(t) ∈{0,1}. If α _i(t) = 1 incoming packets are admitted into the buffer, and if α _i(t) = 0 they are rejected. We define the residual rate ρ _i(t) to be the rate by which packets are admitted into the buffer:

$$ \rho_{i}(t) = r_{i}(t)\times \alpha_{i}(t). $$

(1)

2.2 Service controller

At any time instance, function F _i has $m_{i}(t) \in \mathbb {Z}^{+}$ instances up and running. Each instance is capable of processing packets and corresponds to a virtual machine, a container, or a process running in the OS. It is possible to control the number of running instances by sending a reference signal $m_{i}^{\text {ref}}(t) \in \mathbb {Z}^{+}$ to the service controller. However, as explained in Section 1, it takes some time to start/stop instances since an instantiation of the service is always needed. We denote this as the time overhead Δ_i. Hence, the number of instances running in the i’th function at time t is

$$ m_{i}(t) = m_{i}^{\text{ref}}(t-{\Delta}_{i}). $$

(2)

The time-overhead is assumed to be symmetric here, but in the real-world it is usually faster to start an instance than it is to stop one. However, for increased readability they are considered equal in this work. It should be noted that it is straight forward to extend the theory to account for an asymmetric time-overhead.

An instance is expected to be able to process packets at an expected service rate of ${\bar {s}_{i}}$ pps. However, as described in Section 1, the true capacity of the instance will differ from the expected one since there might be other loads running on the infrastructure (i.e., the physical machine). Hence, the true capacity of the j th instance in the i th function is given by

$$s^{\text{cap}}_{i,j}(t) = {\bar{s}_{i}} + \xi_{i,j}(t), $$

where ξ _i,j(t) is the machine uncertainty for the j th instance in the i th function. It is given by

$$\xi_{i,j}(t) \in [{{\xi}_{i}^{\text{lb}}},\, {{\xi}_{i}^{\text{ub}}} ] \text{ pps},\quad -{\bar{s}_{i}} < {{\xi}_{i}^{\text{lb}}} \leq {{\xi}_{i}^{\text{ub}}} < \infty, $$

where ${{\xi }_{i}^{\text {lb}}}$ and ${{\xi }_{i}^{\text {ub}}}$ are lower and upper bounds of this machine uncertainty, assumed to be known. The machine uncertainty is also assumed to be fairly constant during the lifetime of the instance. Using this, one can express the true capacity of the i th function in the service-chain as

$$ {s^{\text{cap}}_{i}}(t) = \sum\limits_{j=1}^{m_{i}(t)} {\bar{s}_{i}} + \xi_{i,j}(t), $$

(3)

which together with the average machine uncertainty

$$ {\hat{\xi}_{i}}(t) = \frac{1}{m_{i}(t)}\sum\limits_{j=1}^{m_{i}(t)}\xi_{i,j}(t), $$

(4)

can be written as ${s^{\text {cap}}_{i}}(t) = m_{i}(t)\times ({\bar {s}_{i}} + {\hat {\xi }_{i}}(t))$. Note that it would be natural to allow the time-overhead Δ_i to also have some uncertainty. However, such uncertainty can be translated into a machine uncertainty.

2.3 Processing of packets

The packets in the buffer are stored and processed in a FIFO manner. Once a packet reaches the head of the queue the load balancer will distribute it to one of the instances in the function. Note that this is done continuously due to the fluid approximation. The rate by which the load balancer is distributing packets, and thus by which the function is processing packets, is defined as the service rate

$$ s_{i}(t) = \left\{\begin{array}{lll} \rho_{i}(t) \quad & \quad \text{if } q_{i}(t) = 0 \text{ and } \rho_{i}(t) \leq {s^{\text{cap}}_{i}}(t)\\ {s^{\text{cap}}_{i}}(t) \quad & \quad \text{else} \end{array}\right. $$

(5)

where ρ _i(t) is residual rate given by Eq. 1 and q _i(t) is the number of packets in the buffer:

$$ q_{i}(t) = P_{i}(t) - S_{i}(t), \quad q_{i}(t) \in \mathbb{R}^{+}, $$

(6)

where $P_{i}(t) = {{\int }_{0}^{t}} \rho _{i}(x) {\mathrm {d}x}$ is the total amount of packets that has been admitted into function F _i, and $S_{i}(t) = {{\int }_{0}^{t}}s_{i}(x) {\mathrm {d}x}$ is the total amount of packets that has been served by function F _i. Furthermore, the total amount of packets that has reached the i th function is given by $R_{i}(t) = {{\int }_{0}^{t}}r_{i}(x) {\mathrm {d}x}$.

2.4 Function delay

The time that a packet that exits function F _i at time t has spent inside that function is denoted the function delay d _i(t):

$$ d_{i}(t) = \inf \{\tau \geq 0 :\, P_{i}(t-\tau) \leq S_{i}(t)\}. $$

(7)

The expected time that a packet entering the i th function at time t will spend in the function before exiting is defined as the expected function-delay ${\bar {d}_{i}}(t)$

$$\begin{array}{@{}rcl@{}} {\bar{d}_{i}}(t) &=& \inf \left\{\phantom{\bar{s}_{i}}\!\!\!\!\!+{\hat{\xi}_{i}}(x)) {\mathrm{d}x}\tau \geq 0 :\, P_{i}(t) \leq S_{i}(t)\right.\\ &&+ \left. {\int}_{t}^{t+\tau} \!\!\!\!m_{i}(x)\times({\bar{s}_{i}}+{\hat{\xi}_{i}}(x)) {\mathrm{d}x} \, \right\}. \end{array} $$

(8)

Equation 8 can be interpreted as finding the minimum time τ ≥ 0 such that S _i(t + τ) = P _i(t), or in other words such that at time t + τ the function will have processed all the packets that have entered the function at time t.

Computing the expected function-delay ${\bar {d}_{i}}(t)$ requires information about m _i(t) and ${\hat {\xi }_{i}}(t)$ for the future, whereas computing the expected function delay ${{d}_{i}^{\text {ub}}}(t)$ requires information about m _i(t) for the future. Information about m _i(t) up until time t +Δ_i is always known since $m_{i}(t+{\Delta }_{i})=m_{i}^{ref}(t)$ and $m_{i}^{ref}(x)$ is known for x ∈ [0, t]. It is therefore possible to compute the expected function delay ${\bar {d}_{i}}(t)$ whenever it is shorter than the time-overhead Δ_i (which will be used later in Section 3 when deriving the admission controller and the service controller).

Note that the (expected) function delay does not distinguish between queueing delay and processing delay. In [17], Google profiled where the latency in a data center occurred and showed that 99% of the latency (≈85 μs) occurred somewhere in the kernel, the switches, the memory, or the application. It is very difficult to say exactly which of this 99% is due to processing or queueing, hence they are considered together as the function delay.

2.5 Concatenation of functions

The n functions in the service-chain are concatenated with the assumption of no loss of packets in the communication channel between them. Therefore, the input of function F _i is exactly the output of function F _i−1:

$$r_{i}(t) = s_{i-1}(t), \qquad \forall i=2,3,\ldots,n. $$

Finally, no communication latency between the functions is assumed. However, it is possible to account for it, and would be necessary should the different functions reside in different locations, i.e. different data centers. However, adding a communication latency is straightforward, and if such a communication latency (say C) were to be constant between the functions one could easily account for it by properly decrementing the end-to-end deadline: $\tilde {D}^{\max }= {D^{\max }} - C$, and then use the framework developed in this paper.

2.6 Problem formulation

The goal of this paper is to derive a service-controller and an admission-controller that guarantees that packets that pass through the service-chain meet their E2E deadline. This should be done using as few resources as possible while still achieving as high throughput as possible. This is captured in a simple, yet intuitive utility function u _i(t). Later in Section 3, the utility function is used to derive an automatic service- and admission controller, denoted AutoSAC.

Utility function

The utility function measures the availability a _i(t) and the efficiency e _i(t) of each function in the service chain. The availability is defined as the ratio between the service-rate and the input-rate of the function, and the efficiency is defined as the ratio between service-rate and the capacity of the function:

$$\begin{array}{@{}rcl@{}} a_{i}(t) &=& \frac{\text{service}}{\text{demand}} = \frac{s_{i}(t)}{r_{i}(t-d_{i}(t))} \in [0,\,1+\epsilon], \end{array} $$

(9)

$$\begin{array}{@{}rcl@{}} e_{i}(t) &=& \frac{\text{service}}{\text{capacity}} = \frac{s_{i}(t)}{{s^{\text{cap}}_{i}}(t)} \in [0,\,1]. \end{array} $$

(10)

The reason why a _i(t) can grow greater than 1 is due to the buffer—it is possible to store packets for a short interval and then process them at a rate greater than what they arrived with. However, it is not possible to have a _i(t) > 1 for an infinite amount of time. In practice, 𝜖 is very small, and it is not possible to achieve a a _i(t) > 1 for any significant period of time.

A low availability corresponds to a large percentage of the incoming load being rejected by the admission controller, since there is not enough capacity to serve them. A low efficiency, on the other hand, corresponds to over-provisioning of resources. It is therefore difficult to achieve both high availability and high efficiency. The availability and efficiency is combined into a utility function u _i(t):

$$ u_{i}(t) = a_{i}(t)\times e_{i}(t) = \frac{{s_{i}^{2}}(t)}{{s^{\text{cap}}_{i}}(t)\times r_{i}(t-d_{i}(t))}. $$

(11)

Note that the utility function as well as the availability and efficiency function have the good property of being normalized making it easy to compare the performance of service-chains having different input load. To evaluate the performance between service-chains of different lengths and over different time-horizons the average utility U(t) is defined:

$$ u(t) = \frac{1}{n} \sum\limits_{i=1}^{n} u_{i}(t),\quad U(t) = \frac{1}{t}{{\int}_{0}^{t}}u(x){\mathrm{d}x}. $$

(12)

While the utility function (11) uses the product of the availability and efficiency one might argue that they should not have equal weight when computing the utility. A natural choice to achieve that would be to have a convex combination of them:

$$ \tilde{u}_{i}(t) = \lambda_{i} a_{i}(t) + (1-\lambda_{i}) e_{i}(t),\quad \lambda_{i} \in [0,1], $$

(13)

where λ _i corresponding to the relative importance of achieving a high availability or a high efficiency. The method used in Section 3 to derive a control-strategy using utility function (11) will also apply to the alternative utility function (13).

3 Controller design

In this section, an automatic service- and admission-controller (AutoSAC) is derived. Figure 5 illustrates an overview of the different parts of AutoSAC and the information flow it uses. It measures the incoming load, current queue size, and the true performance in order to estimate how much service rate it will need as well as to estimate how long it will take an incoming packet to pass through the function. It also uses feedforward to functions down the chain in order to make them react faster to changes in the input load. For instance, when the i th function increases its service rate, it sends a signal to the i + 1th function letting it know that in Δ_i time-units, it will get an increase in incoming traffic rate. Finally, due to the time overhead needed to start new instances there will be a need to do admission control, however, in order to not discard unnecessarily many packets it uses feedback from the queue size and the true performance of the functions to estimate how much time it will take a new packet to pass through the function, then it does the admission control based on this estimate.

The difficulty when deriving AutoSAC lies in the different time-scales for starting/stopping instances, the E2E deadlines, and the rate-of-change of the input. They are all assumed to be of different orders of magnitudes, given by Table 1. However, these timing assumptions will be exploited when deriving AutoSAC later.

Table 1 Timing assumptions for the end-to-end deadline, the change-of-rate of the input, and the overhead for changing the service-rate. These timing assumptions are used when deriving the automatic service- and admission-controller

Full size table

The admission controller is derived in Section 3.1 and the service controller in Section 3.3. In Section 3.4, a short discussion of the properties of AutoSAC is presented.

3.1 Admission controller

Every request that enters the service chain has an end-to-end deadline ${D^{\max }}$. It has to pass through every function in the chain within this time. Furthermore, each function can impose a local deadline D _i(t) for the packet entering the i th function at time t. One can therefore use either the local deadline to do a decentralized admission control at the entry of each of the functions in the chain, or the global deadline for a centralized admission control. In this work, we will use a decentralized approach, shown below, but will also derive a policy for a centralized admission control in Section 3.2.1; however, only the decentralized policy will be evaluated in Section 4.

3.2 Decentralized admission control

For the decentralized admission control, each function can compare the local deadline with the upper bound of the expected delay ${{d}_{i}^{\text {ub}}}(t)$ it will take a new packet to pass through the function. If the worst-case expected delay is larger than the local deadline the admission controller should drop the packet. This results in the following policy for the admittance-flag α _i(t):

$$ \alpha_{i}(t) = \left\{\begin{array}{ll} 1\qquad \text{if } D_{i}(t) \geq {{d}_{i}^{\text{ub}}}(t) \\ 0\qquad \text{if } D_{i}(t) < {{d}_{i}^{\text{ub}}}(t) \\ \end{array}\right. $$

(14)

where the upper bound on the expected function delay ${{d}_{i}^{\text {ub}}}(t)$ is given by

$$\begin{array}{@{}rcl@{}} {{d}_{i}^{\text{ub}}}(t) &=& \inf \left\{\phantom{\bar{s}_{i}}\!\!\!\!\! + {{\xi}_{i}^{\text{lb}}}) {\mathrm{d}x}\tau \geq 0 :\, P_{i}(t) \leq S_{i}(t)\right.\\ &&+ \left. {\int}_{t}^{t+\tau} \!\!\!\!m_{i}(x)\times({\bar{s}_{i}} + {{\xi}_{i}^{\text{lb}}}) {\mathrm{d}x} \right\}. \end{array} $$

(15)

This is the worst case of the expected delay given by Eq. 8, i.e., when every instance is processing packets at the lower bound of its possible service-rate, hence leading to the upper bound on the expected delay. One should note here that in order to compute the upper bound (15) one need information about m _i(x) for x ∈ [t, t + τ]. However, as mentioned earlier, as long as τ ≤Δ_i this information is available since the number of instances running at time t is decided by the control signal computed Δ_i time units ago, i.e. $m_{i}(t) = m_{i}^{ref}(t-{\Delta }_{i})$, implying that m _i(x) is known for x ∈ [0,t +Δ_i]. This is illustrated in Fig. 6 where P _i(t) shows the cumulative amount of packets that has been let into the function, and S _i(t) the cumulative amount of served packets up until time t. From time t until t +Δ_i, it shows a shaded blue region, highlighting that the exact service is uncertain in this area. However, it is possible to compute an upper bound $S_{i}^{\mathsf {ub}}(t) = S_{i}(t) + {\int }_{t}^{t+\tau } m_{i}(x)\times ({\bar {s}_{i}} + {{\xi }_{i}^{\text {ub}}}) dx$ and a lower bound $S_{i}^{\mathsf {lb}}(t) = S_{i}(t) + {\int }_{t}^{t+\tau } m_{i}(x)\times ({\bar {s}_{i}} + {{\xi }_{i}^{\text {lb}}}) dx$ of this uncertainty region and hence a lower bound and an upper bound on the expected delay.

3.2.1 Centralized admission control

In contrast to the decentralized admission control, it might be advantageous to drop packets as soon as possible (in order to not waste any resources on packets that are dropped later) in the service chain if there is a possibility that they will miss their global deadline. To do so, one has to compare the expected worst-case end-to-end delay D ^ub(t) for a packet entering the chain at time t with the global deadline ${D^{\max }}{}(t)$, leading to the following policy:

$$ \alpha_{1}(t) = \left\{\begin{array}{ll} 1 \qquad \text{if } {D^{\max}} \geq D^{\text{ub}}(t) \\ 0 \qquad \text{if } {D^{\max}} < D^{\text{ub}}(t) \end{array}\right.. $$

(16)

Computing D ^ub(t) in Eq. 16 is straightforward, but before doing so, one has to compute the worst-case service rates for all the functions down the chain. At any time x ≥ t (with t being the current time) the worst-case predicted service-rate for functions i = 1,2,…,n is:

$$ {s^{\text{lb}}_{i}}(x) = \left\{\begin{array}{ll} {s^{\text{lb}}_{i-1}}(x) \quad &\text{if } q_{i}(x)=0 \text{ and } {s^{\text{lb}}_{i-1}}(x) \leq m_{i}(x)\times({\bar{s}_{i}}+{{\xi}_{i}^{\text{lb}}}),\\ m_{i}(x)\times({\bar{s}_{i}}+{{\xi}_{i}^{\text{lb}}}) \quad &\text{else}. \end{array}\right. $$

(17)

where ${s^{\text {lb}}_{0}}(x) = 0$, since we cannot predict the future input-rate of the first function. With t being the current time, the worst-case predicted cumulative-service of function i at time x ≥ t is then given by:

$$ S_{i}^{\text{lb}}(t,x) = S_{i}(t) + {\int}_{t}^{x+t} {s^{\text{lb}}_{i}}(z)\mathrm{d}z,\qquad i=1,2,\ldots,n $$

(18)

Using this, the expected worst-case end-to-end delay D ^ub(t) is given by

$$ D^{\text{ub}}(t) = \inf\{\tau\geq0 :\, P_{1}(t) \leq S_{n}^{\text{lb}}(t+\tau)\}, $$

(19)

where $P_{1}(t) = {{\int }_{0}^{t}} \rho _{1}(x)dx$ is the cumulative amount of requests that has been admitted into the first function. One should note that $S_{n}^{\mathsf {lb}}(x)$ in Eq. 19 could be expressed in a very neat way using Network Calculus [18, 19], but due to lack of space we decided to not introduce the theory of Network Calculus in this paper.

3.3 Service controller

The goal for the service-controller is to find $m_{i}^{ref}(t)$ such that the utility function is maximized once the reference signal is realized in Δ_i time-units, i.e., such that u _i(t +Δ_i) is maximized. In this section, it will be assumed that the utility function used is the one defined in Eq. 11; later in Section 3.3.1, it will be derived for the alternative utility function (13). Recall that the utility function (11) is given by

$$u_{i}(t) = a_{i}(t)\times e_{i}(t) = \frac{{s_{i}^{2}}(t)}{{s^{\text{cap}}_{i}}(t)\times r_{i}(t-d_{i}(t))}. $$

As explained in the introduction of this section, the input load is assumed to change relatively slowly over a time interval of a few milliseconds. Hence, one can approximate

$$ r_{i}(t-d_{i}(t)) \approx r_{i}(t), $$

(20)

since the goal of both the admission controller and the service controller is to keep d _i(t) in the order of milliseconds or less. Therefore, it is possible to approximate the utility function with

$$u_{i}(t) \approx \frac{{s_{i}^{2}}(t)}{{s^{\text{cap}}_{i}}(t)\times r_{i}(t)}. $$

Furthermore, the service rate s _i(t) can be approximated to be either at the capacity of the function, ${s^{\text {cap}}_{i}}(t)$, or at the input rate r _i(t)

$$ s_{i}(t) \approx \min\{{s^{\text{cap}}_{i}}(t),\,r_{i}(t)\}. $$

(21)

where the $\min $ is used since the function cannot process packets at a faster rate than what they are entering the function for a prolonged period of time. Likewise, it cannot process packets at a rate higher than the capacity of the function when the input were to be higher than this. This leads to the utility function being approximated as

$$u_{i}(t) \approx \left\{\begin{array}{lll} \frac{({s^{\text{cap}}_{i}}(t))^{2}}{{s^{\text{cap}}_{i}}(t)\times r_{i}(t)} = \frac{{s^{\text{cap}}_{i}}(t)}{r_{i}(t)},\, &\text{ if } {s^{\text{cap}}_{i}}(t) \leq r_{i}(t) \\ \frac{{r_{i}^{2}}(t)}{{s^{\text{cap}}_{i}}(t)\times r_{i}(t)} = \frac{r_{i}(t)}{{s^{\text{cap}}_{i}}(t)}, \, &\text{ else} \end{array}\right. $$

With ${s^{\text {cap}}_{i}}(t)$ given by Eq. 3 and the average machine uncertainty ${\hat {\xi }_{i}}(t)$ given by Eq. 4 the utility function can finally be approximated as

$$ u_{i}(t) \approx \left\{\begin{array}{lll} \frac{m_{i}(t)({\bar{s}_{i}} + {\hat{\xi}_{i}}(t))}{r_{i}(t)},\, &\text{ if } m_{i}(t)({\bar{s}_{i}} + {\hat{\xi}_{i}}(t)) \leq r_{i}(t) \\ \frac{r_{i}(t)}{m_{i}(t)({\bar{s}_{i}} + {\hat{\xi}_{i}}(t))}, \, &\text{ else} \end{array}\right. $$

(22)

Since the goal is to find $m_{i}^{ref}(t)$ in order to maximize u _i(t +Δ_i), one needs knowledge of ${\hat {\xi }_{i}}(t+{\Delta }_{i})$ and r _i(t +Δ_i) which is not available. However, one can assume that the machine uncertainty will be fairly constant during Δ_i time-units such that ${\hat {\xi }_{i}}(t+{\Delta }_{i}) \approx {\hat {\xi }_{i}}(t)$. Furthermore, one has to estimate the future input-rate to the function. For the first function, F ₁, this can be done by using the derivative of the (preferably low-pass filtered) input-rate:

$${\hat{r}_{1}}(t) = r_{1}(t) + {\Delta}_{1}\frac{\mathrm{d}r_{1}(t)}{\mathrm{d}t}. $$

For the succeeding functions, i = 2,…,n, the input-rate will change in a step-wise fashion and can therefore not approximate it with the expression above. However, since r _i(t) = s _i−1(t) and m _i−1(x) is known for x ∈ [0, t +Δ_i−1] (with t being the current time), one could estimate the future input-rate ${\hat {r}_{i}}(t)$ with

$${\hat{r}_{i}}(t) \approx \min \left( {s^{\text{cap}}_{i-1}}(t+{\Delta}_{i-1}),\, {\hat{r}_{i-1}}(t) \right),\quad i=2,\ldots,n. $$

Note that ${s^{\text {cap}}_{i-1}}(t+{\Delta }_{i-1})$ is used here, instead of ${s^{\text {cap}}_{i-1}}(t+{\Delta }_{i})$. The reason is that if Δ_i >Δ_i−1 one does not have enough information to compute ${s^{\text {cap}}_{i-1}}(t+{\Delta }_{i-1})$. However, one can use the assumption that Δ_i ≈Δ_i−1. Furthermore, since

$${s^{\text{cap}}_{i-1}}(t+{\Delta}_{i-1}) \approx {m^{\text{ref}}_{i-1}}(t)\times ({\bar{s}_{i-1}}+{\hat{\xi}_{i-1}}(t)) $$

one can summarize the predicted input ${\hat {r}_{i}}(t)$ as

$$ {\hat{r}_{i}}(t)\! =\! \left\{\begin{array}{lll} r_{i}(t) + {\Delta}_{i}\frac{\mathrm{d}r_{i}(t)}{\mathrm{d}t}, & i=1 \\ \min\! \left\{ {m^{\text{ref}}_{i-1}}(t)\! \times \! ({\bar{s}_{i-1}}+{\hat{\xi}_{i-1}}(t)),\,\, {\hat{r}_{i-1}}(t) \right\}, &\text{ else} \end{array}\right. $$

(23)

With this, one can define $\kappa _{i}(t) \in \mathbb {R}^{+}$ to be the real number of instances needed to exactly match the predicted incoming rate:

$$ \kappa_{i}(t) = \frac{{\hat{r}_{i}}(t)}{{\bar{s}_{i}} + {\hat{\xi}_{i}}(t)}. $$

(24)

The control signal, i.e., the number of instances that should be started, $m_{i}^{ref}(t)$ can then be found by solving

$$m_{i}^{ref}(t) = \left\{\begin{array}{ll} \arg\max\limits_{x\in \mathbb{Z}^{+}} \{x / \kappa_{i}(t) \},\, &\text{if } x \leq \kappa_{i}(t) \\ \arg\max\limits_{x\in \mathbb{Z}^{+}} \{ \kappa_{i}(t) / x \},\, &\text{else} \end{array}\right. $$

where $x\in \mathbb {Z}^{+}$ is the number of instances and κ _i(t) given by Eq. 24. Here, one can see that the first case of the above equation is maximized when x is as large as possible, but since this case is only valid when x ≤ κ _i(t) it leads to x = ⌊κ _i(t)⌋. Similarly, the second case is maximized when x is as small as possible, and since this case is valid for x ≥ κ _i(t) it leads to x = ⌈κ _i(t)⌉, leading to the final control-law:

$$ m_{i}^{ref}(t) = \left\{\begin{array}{lll} \lfloor\kappa_{i}(t)\rfloor,\quad &\text{ if } \lfloor\kappa_{i}(t)\rfloor \lceil\kappa_{i}(t)\rceil \geq {\kappa_{i}^{2}}(t) \\ \lceil \kappa_{i}(t)\rceil,\, &\text{ else} \end{array}\right. $$

(25)

where again $\kappa _{i}(t) = \frac {{\hat {r}_{i}}(t)}{{\bar {s}_{i}}+{\hat {\xi }_{i}}(t)}$ is the real number of machines that is necessary to exactly match the predicted incoming traffic.

3.3.1 Alternative utility function

Using the same method described in Section 3.3, one can derive a control-law for the alternative utility function (13):

$$\begin{array}{@{}rcl@{}} \tilde{u}_{i}(t)& =& \lambda_{i}a_{i}(t) + (1-\lambda_{i})e_{i}(t)\\ &=& \lambda_{i} \frac{s_{i}(t)}{r_{i}(t-d_{i}(t))} + (1-\lambda_{i})\frac{s_{i}(t)}{{s^{\text{cap}}_{i}}(t)}. \end{array} $$

By using the approximation (20) for the input rate, (21) for the service rate, (23) for predicting the input rate, and finally (3) for estimating the maximum capacity along with Eq. 4 for the machine uncertainty, one arrives at the following control-law:

$$m_{i}^{ref}(t) = \left\{\begin{array}{lll} \arg\max \limits_{x\in \mathbb{Z^{+}}} \left\{\lambda_{i} \frac{x}{\kappa_{i}(t)} + (1-\lambda_{i})\right\}, \quad &\text{ if } x\leq \kappa_{i}(t)\\ \arg\max\limits_{x\in\mathbb{Z^{+}}} \left\{\lambda_{i} + (1-\lambda_{i})\times \frac{\kappa_{i}(t)}{x} \right\}, &\text{ else} \end{array}\right. $$

where $\kappa _{i}(t) = \frac {{\hat {r}_{i}}(t)}{{\bar {s}_{i}}+{\hat {\xi }_{i}}(t)}$.

One can see that the upper case is maximized when x is as large as possible within that case, i.e., with x = ⌊κ _i(t)⌋, while the lower case is maximized when x is as small as possible, i.e., with x = ⌈κ _i(t)⌉. The remaining question is then which of the two cases that yield the largest utility. Fortunately, this it is easy to evaluate, resulting in the final control-law for the alternative utility function:

$$ m_{i}^{ref}(t) = \left\{\begin{array}{lll} \lfloor \kappa_{i}(t) \rfloor,\quad & \text{if } \lambda_{i}\lfloor \kappa_{i}(t) \rfloor \lceil \kappa_{i}(t) \rceil + (1-2\lambda_{i}) \kappa_{i}(t) \lceil \kappa_{i}(t) \rceil \geq (1-\lambda_{i}) {\kappa_{i}^{2}}(t) \\ \lceil \kappa_{i}(t) \rceil, &\text{else} \end{array}\right. $$

(26)

where again $\kappa _{i}(t) = \frac {{\hat {r}_{i}}(t)}{{\bar {s}_{i}}+{\hat {\xi }_{i}}(t)}$. Comparing the two control-laws (25) and Eq. 26, one can see that the alternative control-law (26) is equivalent to the Eq. 25 when the efficiency and availability are considered equally important, i.e., when λ _i = 1/2.

3.4 Properties of AutoSAC

There are several interesting properties captured by the admission controller and service controller presented in this section. First of all, the admission controller (14) ensures, by design, that every packet that is admitted into a function, and thus exits the function, meets its deadline. Therefore, no packets that exit the service-chain will miss their end-to-end deadline.

The service-controller given by Eq. 25 captures both the feedback used from the true performance of the instances (when computing ${\hat {\xi }_{i}}(t)$) as well as feedforward information about future input coming from functions earlier in the service-chain (when computing ${\hat {r}_{i}}(t)$). This makes it robust against machine uncertainties but also ensures that it reacts fast to sudden changes in the input. For instance, given a service-chain of six functions, function F ₅ will know that in Δ₄ time-units, F ₄ will have ${m^{\text {ref}}_{4}}(t)$ instances running and can thus start as many instances as needed to process this new load.

4 Evaluation

In this section, the automatic service- and admission-controller (AutoSAC) developed in Section 3 is evaluated. First, in Section 4.1, by illustrating how a randomly generated service chain of three functions performs when it is given a 5-h traffic trace. Later, in Section 4.2, AutoSAC is compared with two other “state-of-the-art” methods for scaling cloud services. The comparison is done using a Monte Carlo simulation where the parameters of a five function service chain are randomly generated and then simulated, again using a real traffic trace as input.

The real-world trace of traffic data used as input was gathered over 120 hours from a port in the Swedish University NETwork (SUNET) and then normalized to have a peak of 10,000,000 packets per second as shown in Fig. 1. The simulation was written in the open-source language Julia [20]. The code and traffic trace used for this simulation is provided on GitHub.^{Footnote 1}

4.1 Example chain

For this example, a service chain with three functions where the E2E deadline was set to 30 ms, which in turn was split into local deadlines of 10 ms for each function. The other parameters (i.e., ${\bar {s}_{i}}$, Δ_i, ${{\xi }_{i}^{\text {lb}}}$, and ${{\xi }_{i}^{\text {ub}}}$) for every function in the service-chain are generated randomly. The expected service-rate ${\bar {s}_{i}}$ was chosen uniformly at random from the interval [100,000, 200,000] pps. The time-overhead Δ_i was drawn uniformly at random from the interval [30, 120] seconds. The machine uncertainty was chosen to be in the range of ± 30% of the expected service-rate $\bar {s}_{i}$. The lower bound of the machine uncertainty was drawn from the interval $[-0.3{\bar {s}_{i}},\,0]$ pps and likewise, the upper bound was drawn from $[0,\,0.3{\bar {s}_{i}}]$ pps.

In Fig. 7, one can see how the service chain scales the number of instances up/down in order to react to the input load. In Fig. 8, one can see how the average utility changes over the course of the simulation. One thing to notice is that the average utility over the service chain remains stable above 0.95 despite large variations in the input.

4.2 Comparing AutoSAC with state-of-the-art

In this section, we will evaluate AutoSAC through a Monte Carlo simulation with 15 ⋅ 10⁴ runs where it is compared against two state-of-the-art methods for auto-scaling VMs in industry; dynamic auto-scaling (DAS) and dynamic over-provisioning (DOP). However, since these two methods do not use any admission control, they are also augmented with the admission controller presented in Section 3.1. The two augmented methods are denoted by “DAS with AC” and “DOP with AC.” Hence, in total, the method presented in Section 3 is compared with four other methods.

Dynamic auto-scaling (DAS)

This method is currently being offered to customers using Amazon Web Services [21]. It allows the user to monitor different metrics (e.g., CPU utilization) of their VMs using CloudWatch. One can then use it together with their auto-scaling solution to achieve dynamic auto-scaling. This allows the user to scale the number of VMs as a function of these metrics. One should note that the CPU utilization can be considered the same as the efficiency metric e _i(t) defined in Eq. 10. For the Monte Carlo simulation, the following rules were used:

add a VM if the efficiency is above 99%,
remove a VM if the efficiency is below 95%,

which might seem as a high and tight interval, but it is necessary in order to achieve a high utility.

Dynamic over-provisioning (DOP)

A downside with DAS is that it reacts slowly to sudden changes in the input. A natural alternative would therefore be to instead do dynamic over-provisioning, where one measures the input to each function and allocate virtual resources such that there is an expected over-provision by 10%.

Monte Carlo simulation

The five methods are compared using a Monte Carlo simulation with 15 ⋅ 10⁴ runs. For every run, 1 h of input data was randomly selected from the total of 120 h shown in Fig. 1. Furthermore, in every run, a new service-chain with five functions was generated using the method described in Section 4.1. The end-to-end deadline was chosen to 50 ms, which in turn was split into local deadlines of 10 ms for each function.

The evaluation of the Monte Carlo simulation is based on the average utility $U(t)=\frac {1}{t}{{\int }_{0}^{t}} {\sum }_{i=1}^{n} u_{i}(x) {\mathrm {d}x}$. Since a packet that misses its deadline (which is possible when using DAS or DOP) is considered useless, it is evaluated as a dropped packet when exiting the function. It therefore impacts the availability metric and in turn the utility. Should all packets miss their deadlines in function F _i for a time interval τ, then a _i(t) = 0 ∀t ∈ τ, i.e., the availability would be evaluated as 0 during this time-interval since the output of the function is considered useless.

Results

The mean of the average utility U(t) for all the simulation runs is presented in Fig. 9 for each of the five methods. One can see that AutoSAC achieves a utility that is 30–40% better than that of DAS and DOP. The main reason for this is that they are lacking admission control leading to packets missing their deadlines, which eventually results in a low utility.

When augmenting DAS and DOP with the admission controller derived in Section 3.1, the performance is increased by 20–40%, purely as a result of not having these sudden drops in performance. However, AutoSAC still performs 5–10% better, due to the feedforward property of AutoSAC which gives it a faster reaction time to changes in the input as well as the feedback property leading to better prediction and robustness against the machine uncertainties.

5 Summary

In this work, we have developed a mathematical model for a NFV Forwarding Graphs residing in a Cloud environment. The model captures, among other things, the time needed to start/stop virtual resources (e.g., virtual machines or containers), and the uncertainty of the performance of the virtual resources which can deviate from the expected performance due to other tenants running loads on the physical infrastructure. The packets that flow through the forwarding graph must be processed by each of the virtual network functions (VNFs) within some end-to-end deadline.

A utility function is defined to evaluate performance between different methods for controlling NFV Forwarding Graphs. The utility function is also used to derive an automatic service- and admission-controller (AutoSAC) in Section 3. It ensures that packets that exit the forwarding graph meet their end-to-end deadline. The service-controller uses feedback from the actual performance of the virtual resources making it robust against uncertainties and deviations from the expected performance. Furthermore, it uses feedforward between the VNFs making it fast to react to changes in the input load.

In Section 4, AutoSAC is evaluated and compared against four other methods in a Monte Carlo simulation with 15 ⋅ 10⁴ runs. The input load for the simulation is a real-world trace of traffic data gathered over 120 h. The traffic is normalized to have a peak of 10,000,000 packets per second. AutoSAC is shown to have better performance than what is offered in the cloud industry today. We also show that when augmenting the industry-methods with the admission controller derived in Section 3, they have a significant increase in performance.

It would be interesting to extend this work by investigating how to derive a controller when the true performance is unknown or when the time-overhead needed to start virtual resources is unknown. Moreover, it would be interesting to investigate how to control a forwarding graph that has forks and joins, i.e., a graph structure rather than just a chain.

Notes

https://github.com/vmillnert/ICC17simulation

References

ETSI (2012) Network Functions Virtualization (NFV), https://portal.etsi.org/nfv/nfv_white_paper.pdf
Leitner P, Cito J (2016) Patterns in the chaos—a study of performance variation and predictability in public iaas clouds. ACM Trans Internet Technol 16(3):15
Article Google Scholar
Shen W, Yoshida M, Kawabata T, Minato K, Imajuku W (2014) vconductor: An nfv management solution for realizing end-to-end virtual network services. In: Network Operations and Management Symposium (APNOMS), 2014 16th Asia-Pacific. IEEE, pp 1–6
Google Scholar
Moens H, De Turck F (2014) Vnf-p: A model for efficient placement of virtualized network functions. In: 10th International Conference on Network and Service Management (CNSM) and Workshop. IEEE, pp 418–423
Chapter Google Scholar
Mehraghdam S, Keller M, Karl H (2014) Specifying and placing chains of virtual network functions. In: 2014 IEEE 3rd International Conference on Cloud Networking (CloudNet). IEEE, pp 7–13
Google Scholar
Mao M, Li J, Humphrey M (2010) Cloud auto-scaling with deadline and budget constraints. In: 2010 11th IEEE/ACM International Conference on Grid Computing. IEEE, pp 41–48
Chapter Google Scholar
Wang X, Wu C, Le F, Liu A, Li Z, Lau F (2016) Online vnf scaling in datacenters, arXiv preprint arXiv:1604.01136
Li Y, Phan L, Loo BT (2016) Network functions virtualization with soft real-time guarantees. In: IEEE International Conference on Computer Communications (INFOCOM)
Google Scholar
Di Natale M, Stankovic JA (1994) Dynamic end-to-end guarantees in distributed real time systems. In: Proceedings of the 15-th IEEE Real-Time Systems Symposium, pp 215–227
Google Scholar
Jiang S (2006) A decoupled scheduling approach for distributed real-time embedded automotive systems. In: Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium, pp 191–198
Google Scholar
Serreli N, Lipari G, Bini E (2009) Deadline assignment for component-based analysis of real-time transactions. In: 2nd Workshop on Compositional Real-Time Systems, Washington, DC, USA
Google Scholar
Serreli N, Lipari G, Bini E (2010) The demand bound function interface of distributed sporadic pipelines of tasks scheduled by EDF. In: Proceedings of the 22-nd Euromicro Conference on Real-Time Systems, Bruxelles, Belgium
Google Scholar
Hong S, Chantem T, Hu XS (2015) Local-deadline assignment for distributed real-time systems. IEEE Trans Comput 64(7):1983–1997
Article MathSciNet MATH Google Scholar
Jayachandran P, Abdelzaher T (2008) Delay composition algebra: A reduction-based schedulability algebra for distributed real-time systems. In: Proceedings of the 29-th IEEE Real-Time Systems Symposium, Barcelona, Spain, pp 259–269
Google Scholar
Henriksson D, Lu Y, Abdelzaher T (2004). In: Proceedings of the 16th Euromicro Conference on Real-Time Systems , pp 61–68
Bonafiglia R, Cerrato I, Ciaccia F, Nemirovsky M, Risso F (2015) Assessing the performance of virtualization technologies for nfv: a preliminary benchmarking. In: 2015 Fourth European Workshop on Software Defined Networks. IEEE, pp 67–72
Google Scholar
Kapoor R, Porter G, Tewari M, Voelker GM, Vahdat A (2012) Chronos: Predictable low latency for data center applications. In: Proceedings of the Third ACM Symposium on Cloud Computing, ser. SoCC ’12. New York, NY, USA: ACM. [Online]. Available: http://doi.acm.org/10.1145/2391229.2391238, pp 9:1–9:14
Google Scholar
Cruz RL (1991) A calculus for network delay, part I: Network elements in isolation. IEEE Trans Inf Theory 37(1):114–131
Article MATH Google Scholar
Le Boudec J-Y, Thiran P Network Calculus: a theory of deterministic queuing systems for the internet, ser. Lecture Notes in Computer Science. Springer, 2001, vol. 2050
Bezanson J, Edelman A, Karpinski S, Shah VB (2014) Julia: A fresh approach to numerical computing, arXiv preprint arXiv:1411.1607
2016, 10. [Online]. Available: https://aws.amazon.com/

Download references

Acknowledgments

The authors would like to thank Karl-Erik Årzén and Joao Monteiro Soares for the useful comments on early versions of this paper.

Author information

Authors and Affiliations

Lund University, Ole Römers väg 1, SE 223 63, Lund, Sweden
Victor Millnert & Johan Eker
Università degli Studi di Torino, Corso Svizzera 185, 10149, Torino, Italy
Enrico Bini
Ericsson Research, Lund, Sweden
Johan Eker

Authors

Victor Millnert
View author publications
You can also search for this author in PubMed Google Scholar
Enrico Bini
View author publications
You can also search for this author in PubMed Google Scholar
Johan Eker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Victor Millnert.

Additional information

Source code

The source code for the simulation in Section 4 can be found on Github at https://github.com/vmillnert/ICC17simulation.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Millnert, V., Bini, E. & Eker, J. AutoSAC: automatic scaling and admission control of forwarding graphs. Ann. Telecommun. 73, 193–204 (2018). https://doi.org/10.1007/s12243-017-0597-0

Download citation

Received: 06 April 2017
Accepted: 18 July 2017
Published: 03 August 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s12243-017-0597-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

AutoSAC: automatic scaling and admission control of forwarding graphs

Abstract

Similar content being viewed by others

CECT: computationally efficient congestion-avoidance and traffic engineering in software-defined cloud data centers

Dynamic Management of Forwarding Rules in a T-SDN Architecture with Energy and Bandwidth Constraints

VNF-DOC: A Dynamic Overload Controller for Virtualized Network Functions in Cloud

1 Introduction

In this work, we

1.1 Related works

2 Modeling the service-chain

2.1 Admission controller

2.2 Service controller

2.3 Processing of packets

2.4 Function delay

2.5 Concatenation of functions

2.6 Problem formulation

Utility function

3 Controller design

3.1 Admission controller

3.2 Decentralized admission control

3.2.1 Centralized admission control

3.3 Service controller

3.3.1 Alternative utility function

3.4 Properties of AutoSAC

4 Evaluation

4.1 Example chain

4.2 Comparing AutoSAC with state-of-the-art

Dynamic auto-scaling (DAS)

Dynamic over-provisioning (DOP)

Monte Carlo simulation

Results

5 Summary

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Source code

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation