Modelling large timescale and small timescale service variability

The performance of service units may depend on various randomly changing environmental effects. It is quite often the case that these effects vary on different timescales. In this paper, we consider small and large scale (short and long term) service variability, where the short term variability affects the instantaneous service speed of the service unit and a modulating background Markov chain characterizes the long term effect. The main modelling challenge in this work is that the considered small and long term variation results in randomness along different axes: short term variability along the time axis and long term variability along the work axis. We present a simulation approach and an explicit analytic formula for the service time distribution in the double transform domain that allows for the efficient computation of service time moments. Finally, we compare the simulation results with analytic ones.


Introduction
Service speed variability is a problem that has been observed in many practical application scenarios. For example, in Kimber and Daly (1986), it has been observed for vehicular traffic. More recently this problem has been recognized in data centers (Guo et al. 2014). The effect of variability was also studied in Anjum and Perros (2015) with application to video-streaming. Most of the previous literature, however, focused only on large-timescale variability, where Markov-modulating models represent the random fluctuations of the environment. These set of models are commonly referred to as reward models and have been studied for a long time (Howard 1971).
The variation in the service speed can be modelled by dividing the jobs into "infinitesimal quantities of work to be done" and considering the "speed at which this infinitesimal work is performed", i.e., the random amount of time needed to execute the infinitesimal amount of work. Then, once a model defines how speed changes over time, the complete system can be modelled in a straight-forward way where the amount of work increases gradually along the analysis and the time required to execute the given amount of work is a random process.
If the service process depends on a time-dependent random process, e.g., on a modulating background continuous time Markov chain (CTMC) representing the environmental state, whose "clock" evolves according to the time, then the natural performance analysis is based on the gradually increasing time and the randomly varying time dependent environment state.
However, in many real applications, variability is not easily predictable and works at different timescales. Modulating CTMCs (whose "clock" evolves according to the time) works very well to model variability where the parameters of the job execution remain constant for a longer random period of time, and there are few jumps during the execution of one job. Apart from this large scale variability, in this work, we also focus on variability that occurs at much smaller timescales, where the execution speed changes thousands, if not millions, of times during the execution of the main job, and combine it with the more classical modulation that works on a larger timescale.
The remainder of this paper is structured as follows. In Sect. 2 we start by considering only the small timescale variability. In Sect. 3 we additionally introduce also the large timescale variability. Section 4 is devoted to the mathematical analysis of the obtained small and large timescale system. The effects of the considered variability is studied in Sect. 5 through numerical examples, and Sect. 6 concludes the paper.

Small timescale variability
In this section, we omit the large timescale variability and instead focus only on small timescale variability. So we assume that the environmental state is unchanged for now.
We introduce a second order fluid model for the short timescale variability: assuming that a job is composed of quantums of size x, each such quantum is served in a random amount of time with distribution N (μ x, σ 2 x) (with μ > 0). Assuming that the service times of the different quantums are independent, the progress of service is modeled by a Brownian motion X (w) with parameters μ and σ 2 . We emphasize that in this model, the Brownian motion corresponds to the time required to service a job as a function of the size of the job (see Fig. 1). A job of size x thus requires a random time T with distribution N (μx, σ 2 x), whose probability density function is The assumption that T may take negative values does not make sense physically. However, due to μ > 0, for macroscopic job sizes, the probability of T < 0 is negligible, so the proposed mathematical model is a close approximation of the physical system. In the mathematical analysis, T < 0 does not cause any issues, and the performance measures of interest can be calculated accurately. The can be expressed as a polynomial in xμ and xσ 2 where the coefficients u k, j are such that u k, j = 0 if k + j is odd. Note that a Brownian motion may take negative values as well, which does not make sense physically, but, since μ > 0, for macroscopic values of w, the probability that T is negative is negligible.
We focus on the service of a job in a queue whose work requirement, W , is generally distributed according to probability density function f W (x).
Using the second order fluid model assumption, the probability density function of the service time of a job, denoted by f T (t), can be computed as: (2)

Moments of the scaled distribution
There are also some interesting relations between the moments of W , the moments of T and the parameters μ and σ 2 . In particular, the k-th moment of T can be expressed as: Since E[N (xμ, xσ 2 ) k ] can be expressed as a polynomial in xμ and xσ 2 with coefficients u k, j according to (1), we can compute the moments of T as: Since u k, j = 0 only when k + j is even, E[W k+ j 2 ] is always an integer moment of W . For example, for the first and second moment of T we have:

Combining large and small timescale variability
Large scale variability can be considered using a discrete state continuous time Markov modulating process (MMP), denoted by M(t). We assume the MMP is a continuous time Markov chain (CTMC) on a finite state space with infinitesimal generator matrix Q. In state i, the service is characterised by rate μ i and variance σ 2 i . Only considering large scale variability (that is, assuming σ i ≡ 0, ∀i) would lead to a standard Markov reward model. However, including small-scale variability makes for an interesting and complex model.
Assume that a job of size W = u starts service at time t = 0, with the MMP M(t) in state i. Then the evolution of the service time X (w), 0 ≤ w ≤ u as a function of the job size is the following: -Let a 1 denote the time of the first transition of M(t). As long as X (w) is smaller than a 1 , X (w) evolves according to a Brownian motion with parameters μ i and σ 2 i [denoted by BM(μ i , σ 2 i )]. -At time a 1 , M(t) changes to some state j. Accordingly, assuming that the first passage of X (w) to a 1 occurs at work amount w 1 , for w ≥ w 1 , X (w) evolves according to a BM(μ j , σ j ) (starting from the point w 1 and from level a 1 ). -This is repeated for further possible transitions of M(t) at times a 2 , a 3 , . . . , up to the point u.
Note that in visualization, the horizontal axis denotes the job size, and the vertical axis denotes time, see Fig. 2. Thus for X (w), the behaviour can be described as a type of level- This model is essentially different from second order Markov-modulated fluid models (also referred to as Markov modulated Brownian motion) (Breuer 2012;Karandikar and Kulkarni 1995). The main difference between the two approaches is that in second order fluid models, it is the amount of work performed per unit of time that is assumed to have normal distribution; in the present paper, it is the amount of time required to perform a unit of work that is assumed to have normal distribution.
Keeping in mind that M(t) is a CTMC, the entire distribution of X (w) is determined by the initial points t = 0 and w = 0 and the initial state of the modulating process M(0) = i. The process X (w) can be simulated as follows: -If W is random, generate the value of W , denoted by u.
-X (w) runs as a BM(μ i , σ 2 i ) until either the value of X (w) reaches a 1 or w reaches u, whichever occurs first.
-If u occurred first, then the simulation is finished.
-If X (w 1 ) = a 1 for some w 1 < W , then we generate the next state j and also the next transition time a 2 according to the MMP M(t), then continue X (w) as a Brownian motion with parameters (μ j , σ 2 j ) starting from the point (w 1 , a 1 ) until either the value of X (w) reaches a 2 or reaches u, whichever occurs first.
-We keep generating new transitions and new Brownian motion sections until we reach u. The service time of the job is T = X (u) = X (W ).
The main question, similar to Sect. 2, is the distribution of T and performance measures derived from T . The main contribution of this paper is the analytical evaluation of the distribution of T in the double transform domain. Several related performance measures can be obtained based on this transform domain description numerically.
The analytical problem can be formulated as the cumulative distribution type functions of the service time (for fixed w) which include information about the initial and final background state of M(t) along with the distribution of the service time. In accordance with the mathematical model, G i j (x, w) is defined for both positive and negative values of x, but G i j (0, w) is typically negligible.
Based on G i j (x, w), the corresponding cumulative distribution function in case of a random W with probability density function f W (w) is The next section provides the mathematical analysis of G i j (x, w).

Job completion in small and large timescale variable environment
Let X (w) denote the time needed to service a job of fixed size w. We aim to analyse the entire process {X (w), w ≥ 0}, and, based on that, derive performance measures for X (w) (for a fixed job size w), and also for X (W ), where W is possibly random.
The system operates in a random environment characterized by the MMP M(t), t ≥ 0, which is a Markov chain with generator Q (and the variable t denotes the time of the MMP). The process X (w) starts from 0 at w = 0. When the MMP is in state i, the main process, X (w), is a Brownian motion with parameters μ i > 0 and σ i > 0 (given for each state i). Whenever the MMP makes a transition at time t = a, i.e., when X (w) reaches level a, the MMP switches to a new state k and the main process continues as a Brownian motion with parameters μ k > 0 and σ k starting at level a. Then the same procedure continues until the job of size u gets completed.
The main process X (w) starts from level 0 at w = 0, i.e., X (0) = 0. We are interested in the distribution of X (u), where u is the size of the workload, and introduce the notation We aim to compute where refers to the double sided Laplace transform and * refers to single sided Laplace transform. (7) is convergent when Re(s) > 0 and |v| is small enough (depending on Re(s), that is, |v| < ε(Re(s)) for some positive function ε(.). Convergence of the inner integral for Re(s) > 0 follows directly from the fact that g i j (x, w) is a probability density function. Convergence in v will be addressed during the proof of Theorem 1, and the function ε(.) is made explicit in (16). We remark that calculating g * i j (v, s) in a region where Re(s) > 0 and |v| is small enough is sufficient for the further calculation of performance measures of interest.
where Q is the generator of the MMP, I is the identity matrix and where furthermore (Re(s) > 0 and |v| is sufficiently small in all formulas.) Proof Let W a be the first passage point along the horizontal axis, where the BM(μ i , σ i ) starting from level 0 reaches level a (a > 0). The CDF and PDF of W a are denoted by f i (a, w) is given explicitly [using Girsanov's theorem and mirror principle, see e.g. Theorem 6.9 in Schilling and Partzsch (2012)] as When the process starts in state i, two things may happen (c.f. Fig. 3): the main process will either reach level a (along the vertical axis) before w (on the horizontal axis), i.e. W a < w, or not. If the main process reaches level a before w, then the MMP switches from state i to another state k at W a , and the main process continues similarly with parameters (μ k , σ k ), albeit starting from level a.
If the main process does not reach level a before completing w amount of work, i.e., W a > w, then we need the conditional distribution of the level at w assuming that X (w) < a for ∀u < w. To obtain it, we introduce the notation is a CDF type function, and it describes an incomplete distribution concentrated on (−∞, a), and it satisfies where the first term corresponds to the probability that the BM(μ i , σ i ) hits level a before w (W a < w), while the second term corresponds to the probability that the BM(μ i , σ i ) hits the vertical line at w without reaching level a (W a > w). To calculate the density b i (x, a, w), we first note that the position of a BM(μ i , σ i ) at point w has normal distribution with parameters (μ i w, σ i √ w), so its probability density function denotes the PDF of normal distribution with parameters μ and σ 2 , i.e., To compute b i (x, a, w), we need to subtract the density that the BM hits level a first. We calculate it using total probability according to the first passage time at level a, W a . Altogether, b i (x, a, w) can be calculated as The level at which the MMP changes its state, a, is exponentially distributed with parameter −q ii . Using that and the probability of moving from state i to state k at a state transition of the MMP, −q ik /q ii , we have where x + = max(0, x) and δ i j denotes the Kronecker delta. Remarks: -The first term in (14) is the probability that the main process reaches u = w before hitting level a averaged out according to the distribution of a. -In the second term, the MMP switches to state k at u < w, with the main process at level X (w) = a. -Even though the general idea is that the main process is increasing, using a second order approach means that in the short term, the main process may decrease as well. Hence we should care about negative values of x if possible. The formula (14) is consistent with the possibility that the process X (w) may decrease and the formula is valid for negative values of x as well.
The second integral in (14) is essentially convolution in both variables x and w; to simplify it, we will take Laplace-transform in the variable w, and double-sided Laplace transform in the variable x.
We look to take the Laplace-transform of the variable w in all of the important functions in (14). Denoting the transform variable by s, the Laplace-transform of f i (a, w) in (11) is explicit: where z i (s) is given in (9). Similarly, we have an explicit formula for the Laplace-transform of φ Then from (13) and from (14), we have To transform the level variable x as well using two-sided Laplace transform, we will use the functions φ − * (v, s, μ, σ ) and φ + * (v, s, μ, σ ) as defined in (10): where convergence in either integral holds when the real part of the associated denominators are positive. For Re(s) > 0, we have μ < Re μ 2 + 2sσ 2 , from which the denominators are positive when from this, we have that (7) and (8) To compute g * i j (v, s) = ∞ x=−∞ e −vx g * i j (x, s)dx, we start by investigating the transform of first term a * i (x, s) on the right hand side in (15): The first term on the right hand side of (17) is The second term on the right hand side of (17) is Altogether, expressing (17) as the total of (18) and (19), we get Now we focus on the second term on the right hand side of (15).
from which we get (22) can be written as whose solution is (8).
Theorem 1 provides an explicit expression (involving a matrix inversion) for the double transform domain description of the service time distribution. For the s → w one-sided inverse Laplace transformation, we applied different approaches depending on the distribution of the work requirement W . To simplify calculations, we decided to avoid doing a v → x two-sided inverse Laplace transformation, instead calculating the moments of X (w) explicitly, which can be obtained from where L −1 s→w denotes inverse Laplace transformation from parameter s to parameter w. To compute these moments it is important to note that the eigenvalues of Z(s) − Q are all positive, which provides a symbolic derivative according to v also for the matrix inverse, e.g., to compute the mean of X (w) we have The explicit formulas for ∂ k ∂v k A * (v, s) v=0 with higher k values are omitted here, but symbolic mathematical packages can compute them easily. For deterministic work requirement (W = w) we applied the numerical inverse Laplace transformation method from Horváth et al. (2018) with order 24. This numerical inverse Laplace transform procedure evaluates the Laplace transform function only in points with a positive real part.
For exponentially distributed work requirement (W is exponentially distributed with rate ϑ) we use an explicit inversion formula based on Consequently, the kth moment of the service time for an exponentially distributed work requirement (W ) with rate ϑ can be computed explicitly as 5 Numerical examples

Simulation results
To study the effects of variability, we have applied the procedure outlined in Sect. 3 to simulate the behaviour of the queue with short and long scale variability. In particular, to find the intersection between the Brownian motion and the level determined by the time at which the modulating process changes state, we have discretised the work with a quantum x, and during the period when the MMP stays in state i, for each quantum we have set the evolution of the time according to a normal distribution N (μ i x, σ 2 i x) (following the procedure outlined at the beginning of Sect. 2). The MMP leaves state i at the first time instant in which the discretised BM crosses the level T n , where T n is the time of the nth state transition of the MMP. When the nth state transition occurs in state i, then T n = T n−1 + τ i , where T n−1 is the time of the previous state transition and τ i is exponentially distributed with parameter −q ii (the ith diagonal element of the generator matrix Q of the MMP). This simulation approach is indeed an approximation, but it can be made arbitrarily precise by choosing appropriately small values of x (at the cost of simulation time). Simulations were run for several choices of x to examine the error of this approximation.
In our numerical experiment, we have considered a two-state modulating process with jump rates q 12 and q 21 , and studied the effects of different service speed and variability parameters μ i and σ i (i = 1, 2). Apart from computing performance measures related to the service time distribution, we also included simulation results for the response time in an M/G/1 queue, where jobs arrive according to a Poisson process of rate λ and are served by a single server subject to short and long term variability according to a first-come-first-served discipline. Job sizes may be either deterministic or random. λ is set so that the queue is stable.
For the first batch of simulations, we examine the effect of short and long term variability of the server by changing μ and σ while leaving the other parameters fixed: We compare the following cases: -Base no variability, μ 1 = μ 2 = 2.4848 and σ 1 = σ 2 = 0.
(For this set of simulation results, the discretization interval x is set to 0.05 ms so that on average, the BM for each job requires 2000 samples. Each simulation considers the execution of N = 10,000 jobs.) Figure 4a shows the service time distribution for the different server variability configurations. For the Base case, as it is expected, all the probability mass is centered along  Fig. 4a at 200 ms and 400 ms are associated with the cases when the MMP stays in state 1 (2, respectively) for the whole period of the service. The cases when the MMP experiences state transition during the service are represented by the continuously increasing part of the Large curve. The case that combines both small and large scale variability (Small + Large) further smooths the curves, and the effect is more evident near the two probability masses at 200 ms, and 400 ms. Figure 4b shows the response time distribution of the corresponding queuing models. It is interesting to see that in the cases where small variability is considered there are no jumps due to its perturbation effect.
The second batch of simulations focuses on the MMP by changing the speed of the background MMP. We set  Figure 5 shows simulation results for the second batch of simulations. When the sojourn times are very large, the service time distribution tends to concentrate the probability mass near the times required in either state (200 ms and 400 ms). On the other hand, when the MMP changes rapidly, the distribution tends to concentrate on the average case, producing results very similar to the one seen in Fig. 4 for the cases with small variability only: in this case, there is almost no difference between large scale and small scale variability, because the quick alternation of the modulating process eliminates the large scale effect. As a final remark, to consider the case with sojourn times 1.25 ms and 0.8 ms, the sampling time was reduced to x = 0.01 to allow a sufficient number of samples during the sojourn in a modulating state. As for the response time (Fig. 5c) and we examine the following job size distributions for W : -Deterministic W = 100 ms, -Exponential with mean 100 ms, -Erlang with 4 stages with mean 100 ms, -Hyper-exponential with probability density function with parameters λ 1 = 1/(100(1 + √ 0.6)), λ 2 = 1/100((1 − √ 0.6)) -Pareto with probability density function x 9 4 x > 20, 0 x < 20 (To make the results easily comparable, E[W ] = 100 ms is identical in each case.) In particular, Fig. 6a shows the service time distribution for each job size distribution. The effect of service variability is more evident on job length distributions with a lower coefficient of variation. Figure 6b shows the effect on response time: indeed, combining the effect of service variability with a heavy tailed distribution, as for the Pareto case, can create very long queues which can lead to extremely long response times.

Comparison of analytical and simulation results
For the last batch of simulations, we compare empirical moments from the simulation to moments calculated using the double transform method of Sect. 4.
The system parameters are the same as in (25). Two different job size distributions are examined: deterministic and exponential. To test the inaccuracy of the simulation with finite discretization steps, we run each simulation with two different choices of x: x = 0.05 ms and x = 0.005 ms. Table 1 presents the moments of the service time distribution obtained from the simulator and the transform domain description. x = 0.05 ms corresponds to sim. 1 and x = 0.005 ms corresponds to sim. 2.
From Table 1, we observe increasing relative error for higher moments. For the mean, the relative error is around or smaller than 2%. The relative error of the mean decreases as x is refined from x = 0.05 ms (sim. 1) to x = 0.005 ms (sim. 2). We note that for the exponential case, the service time moments were calculated using an analytic formula, while for deterministic job size, some inaccuracy might also come from the numerical inverse Laplace transformation method.

Conclusions
In this work, we have introduced a queue with a service model where a modulating background Markov process models the large timescale variability, and a second-order fluid process models the service capacity on small timescale. The resulting service model can be interpreted as a certain type of level-dependent Brownian motion.