Priority queueing system with many types of requests and restricted processor sharing

A priority queueing model with many types of requests and restricted processor sharing is considered. A novel discipline of requests admission and service is proposed. This discipline assumes restriction of the bandwidth (capacity) of the server and the number of requests that can receive service in the system at the same time. This discipline is some kind of realistic hybrid of the traditional discipline of service in a multi-server system and the discipline of the limited processor sharing. The requests of the highest priority can push out from the service the low priority requests. Therefore, the important problem is fitting of the number of requests that can receive service at the same time to the bandwidth of the server. This problem is solved via construction and analysis of a multi-dimensional Markov chain describing operation of the system under any fixed set of the system parameters.


Introduction
Queueing theory is a widely acknowledged mathematical tool for optimal solution of the task of a restricted resource distribution among the competing requests of users. The simplest models assume that service to customers is provided in a certain order sequentially, one-by-one. More general models suggest a possibility of some kind of resource sharing and simultaneous service of several requests at the same time. The two most popular disciplines for managing the simultaneous service of several requests are as follows: A-Resource is divided into several parts (called as servers) and each request receiving service uses the assigned to him/her server. The service times of requests are independent. We call the system with such a discipline as a multi-server system; B-Resource is jointly used by all requests and the service rate is in inverse ratio to the number of requests receiving service. This discipline is called as processor sharing (PS). For surveys of the research related with this discipline and some its generalizations, see (Yashkov 1987;Yashkov and Yashkova 2007;Altman et al. 2006).
The overwhelming majority of the existing research, starting from the pioneering works by A.K. Erlang, assumes discipline A. Queueing models of GI∕PH∕N∕K, BMAP∕PH∕N∕K types (in D.G. Kendall's notation) with infinite or finite buffers, losses, retrials and their partial cases are investigated in enough full extend, especially in the case when service time has an exponential distribution. The models with an arbitrary (G) distribution of the service time are investigated only approximately or 1 3 asymptotically. In particular, certain bounds are obtained for some performance characteristics.
An advantage of the discipline A is relative easiness of its practical realization. E.g., in a call center, several operators can provide service to users using the separate workstations and communication channels. In information transmission systems, the physical resource, e.g., bandwidth of the channels, can be divided (using various technical schemes like frequency division, time division, code division multiplexing, etc.) into logical channels (servers) each of which is assigned to service of a separate request. The evident disadvantage of discipline A is possible under-utilization of the resource. Situations occur when only a few logical channels are busy, while the rest are staying idle.
The PS discipline is free from this disadvantage. The resource (let us call it further as the bandwidth) is always fully used if there are requests for service. However, the essential two disadvantages of PS discipline, besides more difficult technical implementation related to the necessity of dynamic redistribution of the bandwidth, in many concrete applications are as follows: (i) situations are possible when currently presenting in the system requests do not need in total the whole bandwidth. E.g., if five presenting users need transmission of their information at rate 10 megabits per second (Mbps), they do not need to fully share the available bandwidth of 100 Mbps channel. They will use in total only 50 Mbps; (ii) there may exist some minimal requirement of the user to the bandwidth assigned to him/her. E.g., the users may require the bandwidth for video ondemand transmission of HD content an MPEG2 transport stream as 12 Mbps and do not agree to use the smaller bandwidth due to poor quality of service. Therefore, the number of simultaneously serviced users in 100 Mbps channel must be less than 9. Thus, the pure PS discipline that assumes that all arriving requests are accepted for service is not applicable for modeling the considered transmission process.
As a tool to overcome the disadvantage (ii), the discipline of limited PS (LPS) was offered in the literature. This discipline suggests that the number of users, which simultaneously use the bandwidth, is limited by some finite number, say N. This number is called sometimes as a multiprogramming level, see, e.g., (Nair et al. 2010), or concurrency limit, see, e.g., (Gupta and Zhang 2022). For relevant references, see also, e.g., (Alencar et al. 2021;Telek and Van Houdt 2018;Samouylov et al. 2016;Dudin et al. 2017;Masuyama and Takine 2003;Dudin et al. 2021;Ghosh and Banik 2017;Bocharov et al. 2007;Brugno et al. 2017Brugno et al. , 2018D'Arienzo et al. 2020;Kim et al. 2019).
Contributions of our paper consist of the following.
• We propose a new, hybrid, discipline called as a restricted processor sharing. This discipline combines the positive features of disciplines A and B. As in both, A and LPS, disciplines, we suppose that the maximum number of requests that can receive the service at the same time is an integer number N, N < ∞. In A discipline, N corresponds to the number of servers. In LPS discipline, N corresponds to the concurrency limit. If an arriving request meets N requests in service, in this paper we assume that it is lost. The variants when such a request is queued into the infinite or finite buffer or will make the retrials are left for the future research based on the results of presented below analysis of the system with loss of requests. The derived expressions of the blocks of the generator of the multidimensional Markov chain describing behavior of the system can be used as the bricks for derivation of the form of the generator of the Markov chain that describes the dynamics of the system with buffers and retrials. The hybrid discipline assumes that a required amount of work (amount of information) and a required (nominal) rate of service are associated with each request. The total rate of service of all requests staying in the system is restricted by the parameter B called the bandwidth of the server. If the sum of nominal service rates of all requests staying in the system does not exceed the bandwidth, all requests receive service independently of each other with the nominal rate, as in discipline A. If the sum of nominal service rates of requests staying in the system exceeds the bandwidth, all requests receive service at the proportionally reduced rate, as in discipline LPS. • We suggest that the requests are heterogeneous in respect to their importance and the required bandwidth and the nominal service. There is a finite number M types of requests. Different types of requests have different priorities. One of the types of requests has a preemptive priority over requests of other types. Arrival of such a request when the number of requests obtaining service is equal to N implies the loss of one of the requests having the lowest priority among presenting in the system, if any. To reduce probability of interruption of service of low priority requests, they are not accepted to the system when the number of requests receiving service is less than N but exceeds a certain threshold value. The considered model can have a wide field of applications. The particular case when there are only two types of requests well fits for modelling the system of cognitive radio. Type-1 requests are sent by the primary (licensed) users. Type-2 requests are sent by the cognitive (secondary) users. It is worth to note that the existing in the literature models, see, e.g., (El-Toukhy and Arslan 2019; Goel and Kulshrestha 2022;Sun et al. 2014b, a;Lee et al. 2022), of cognitive radio systems are the special cases of our model with M = 2 and absence of possibility of service of requests with the reduced rate. • While the overwhelming majority of more or less relevant papers assume that flows of the requests are defined as the stationary Poisson arrival process, here we assume that the arriving heterogeneous flow is described by the Marked Markovian Arrival Process (MMAP) (see, e.g. (He 1996)). This allows to adequately account bursty nature (high variability and dependence of consecutive inter-arrival times) which is the inherent feature of information flows in various modern telecommunication network, contact centers, etc, see, e.g., (Chen et al. 2022) where the information about the real flows traces is presented. It is worth noting that the use of stationary Poisson arrival process as a model of real-life process usually implies too optimistic estimates of the system performance indicators.
The reminder of the paper is as follows. In Sect. 2, the considered mathematical model is described. The Markovian process describing behavior of the model under study is defined and analysed in Sect. 3. Expressions for computation the basic performance indicators of the system are given in Sect. 4. In Sect. 5, an numerical example is presented. Section 6 contains brief conclusion of the paper.

Mathematical model
We consider a queuing system with a restricted processor sharing discipline. The scheme of the system is shown in Fig. 1. Incoming to the system requests are divided into M types. The arrival of requests is described by the MMAP, see (He 1996). Arrival can occur only at the epochs of transitions of underlying Markov process denoted by v t , t ≥ 0. The mean intensity m , m = 1, M, of arrival of requests of type m is given by m = D m , where is a row vector of invariant probabilities of the process v t . This vector is defined as the unique solution of the system of linear algebraic equations D = with the normalization condition = 1. Here is a column vector of a proper size, consisting of 1s, and is a row vector consisting of 0s. The notation m = 1, M means that the parameter m admits values 1, … , M.

The total intensity of requests is defined as
We interpret the service of requests as the transfer of a certain amount of information. The bandwidth of the server defined as the maximum number of megabits that can be transmitted per unit of time is denoted as B. We assume that the bandwidth of the server is used by all requests. The maximum possible number of simultaneously served requests is limited by the parameter N. It is assumed that the amount of information to be transmitted to serve a single request of type m has an exponential distribution with rate m , m = 1, M. The value of −1 m represents the average data volume of a request of type m, m = 1, M. We assume that requests of different types require the different service intensity. Denote by ̂m the bitrate desired for requests of type m (nominal bitrate). Therefore, the nominal service time of a request of type m is (̂m m ) −1 . Accordingly, the nominal service intensity m of a request of type m is calculated as m =̂m m , m = 1, M. The desired bitrate (nominal service intensity) is provided to any request when there is no shortage of bandwidth of the server, i.e. the sum of the desired bitrates of all requests, which receive service, does not exceed the bandwidth of the server. Otherwise, the bitrates provided to all requests are correspondingly reduced.
We assume that requests have different priorities. Requests of the first type have the highest priority, ..., requests of the type M have the lowest priority. This means the following. First of all, we will assume that requests of type m, m = 2, M, are not accepted into the system if the number of already serviced requests is equal to or exceeds the parameter N 1 . This means that N − N 1 places are reserved specifically for servicing requests of the first type. If a type 1 request arrives when the number of requests receiving service is equal to N or the request of type m, m = 2, M, arrives when the number of requests receiving service is not less than N 1 and there are requests with a lower priority on the service, then the arriving request displaces one of the serviced requests with the lowest priority and starts the service. The displaced request is lost. Let n t , n t = 0, N, be the number of requests on service, and s (m) t be the number of requests of type m receiving service at moment t such as s (m) Because the bandwidth sharing discipline is applied, the actual service intensity of the request is equal to its nominal service rate only if the used at time t bandwidth, which is defined as t̂k , is less than the bandwidth B of the server. Otherwise, the service rate of type m request is cut and equals to It is obvious that the (M + 2)-dimensional random process where completely describes the behavior of the considered queuing system and is a regular continuous-time Markov chain.
Since this Markov chain is irreducible and has a finite state space, it is known that the limits exist for any values of the system parameters. They are called as the stationary probabilities of the system states or steady-state probabilities.
To simplify analysis of the multi-dimensional Markov chain, it is useful to combine the set of states of the process t having the value n of the component n t , into so called level n, n = 0, N. For certainty, we number the states, which belong to the level n, in the lexicographic order of the component v t and the reverse lexicographic order of the com- In accordance with this enumeration, we combine the stationary probabilities of the states that belong to the level n into the row vectors n , n = 0, N. These vectors satisfy the system of linear algebraic equations (balance equations) where A is the infinitesimal generator of the Markov chain t and the normalization condition: For solving this system, it is necessary to obtain the generator A. On this way, the most difficult particular problem is to describe the transition intensities of the components of the M-dimensional process t which determines the current number of each type requests in the system. To compute these intensities, first we need to formally define the process of a request service when the system is not overcrowded and the request permanently receives the nominal required service rate. Analyzing various scenarios, one can make sure that service time of such a request has so-called the generalized phase-type (GPH) distribution, see (Dudin et al. 2016). Such a distribution is the generalization of the well-known in the literature phase-type distribution (see (Neuts 1981)) to the case of service of heterogeneous requests. The basic idea of the GPH distribution is to avoid the monitoring of the type of each request during its service. It is achieved via the use of different probability vectors for installing the initial state of the underlying process of service of requests of different types and the common sub-generator for description of transitions of the underlying process of service within its state space. For more details about the GPH distribution and examples of its applications, see (Dudin et al. 2016).
As an underlying process s t , t ≥ 0, of service of an arbitrary request we call the continuous-time Markov chain defined as follows. The state space of this chain is the set of integers {1, … , M}. The initial state of the chain s t at the epoch of a request service beginning is randomly chosen with the probabilities defined as the components of the probability vector m given by if this the request is of type m. The rates of transition of the Markov chain s t to the absorbing state are determined by the column vector − where the sub-generator is defined by formula = −diag{ m , m = 1, M}, where diag{… } means a diagonal matrix having the diagonal elements specified in parenthesis.
Having defined the service time distribution of a single request, we can describe the intensity of transitions of the multidimensional process t . For this purpose, we extend the approach going back to the paper (Ramaswami and Lucantoni 1985). We use the following notation. Conditional that all n requests staying in the system receive service at a nominal (not reduced) rate, let m , m = 2, M, if n = N 1 , N, and m = 1 if n = N, be the square matrices of size T n whose elements determine the transition probabilities of the process t at epochs of type m request arrival, m = 2, M, when n, n = N 1 , N, requests receive service, or a request of the type 1 arrives when N requests are in service, and the arriving request tries to displace from the service a request with a lower priority. Only one element in each row of the matrix E (n) m is different from zero and equals to 1. To define which entry is equal to 1, we note that each row and column of the matrix E (n) m correspond to the certain state {s 1 , s 2 , … , s M } of the process t , t ≥ 0.
Recall that all states of the process t , t ≥ 0, are numbered in the reverse lexicographic order of the entries In this case, a type m * request has the lowest priority, and an arriving type m request displaces any type m * request, which leaves the system (is lost). A more detailed description of these matrices and the algorithms elaborated to calculate them are presented, for example, in (Kim et al. 2013) and (Kim et al. 2021).
To take into consideration the receiving of reduced service rate when the sum of the required by all requests presenting in the system bandwidth is greater than the bandwidth of the server B, we need more notation, namely: • n n = 1, N, are the column vectors of dimension T n , whose elements ( n ) i are defined as • diag{ n } is a diagonal matrix with the diagonal elements given by the entries of the vector n .
Now we are prepared to present the generator A. Since requests enter the system and depart only one at a time, it is clear that the matrix A has the block-tridiagonal structure: The diagonal elements of the diagonal blocks A n,n , n = 0, N, are negative and their modules determine the intensity of the Markov chain t departure from the corresponding states. The non-diagonal elements of these blocks are non-negative and determine the transition intensities of the Markov chain inside the level n. The elements of the matrices A n,n−1 , n = 1, N, and A n,n+1 , n = 0, N − 1, are non-negative and determine the transition rates of t from level n to the levels n − 1 and n + 1 , respectively.

Theorem 1
The explicit form of the blocks A n,n � , n, n � = 0, N, max{n − 1, 0} ≤ n � ≤ n + 1, is as follows: where I W is an identity matrix of size W, ⊗ and ⊕ denote symbols of Kronecker product and sum of matrices, see, for example, (Graham 2018).

3
The proof of Theorem 1 is carried out by means of analysis of a Markov chain transitions during an infinitesimal interval and is omitted here. Note that the use of the vector n allows to take into account a decrease of the service rate in case of a shortage of bandwidth.
Chains with a block-tridiagonal structure of the generator are called in the literature as the Level Dependent Quasi-Birthand-Death processes. The size of system (1) can be large. For solution of such systems, it is recommended to exploit the sparse structure of the generator A. E.g., the algorithm from (Baumann and Sandmann 2010) can be used.

Performance characteristics
Once the vectors n , n = 0, N, are calculated, they can be used for computing the values of versatile performance indicators of the analyzed queuing system. Formulas for computation of some performance indicators are presented below.
The mean number of requests in the system is The rate of the output flow of requests that successfully received service is equal to The proof of this formula evidently follows from the formula of total probability and equivalent form of formula (1) Row vectors n , n = 0, N, define stationary probabilities of the states of the Markov chain t such as the number n t of the requests in the system is equal to n and the components of the column vectors A n,n−1 define the rates of successful service completions during the stay of the Markov chain t in these states. The rate of the output flow of type-m requests that received service is equal to The mean number of type-m requests in the system is ( The proof of this formula is similar to the proof of formula (1). It evidently follows from the formula of total probability with account of the fact that the multiplier (I W ⊗ L n ( m )) selects only the components of the vector n , which account the number of requests of type m, and these requests' departure rate is equal to 1. As the result, the sum in the right hand side of (2) defines the mean number of type-m requests in the system.
The probability of an arbitrary request loss at its arrival moment is where Ẽ (n) m is the diagonal matrix having the same diagonal elements as the matrix E (n) m . The probability of an arbitrary type 1 request loss is The probability of an arbitrary type m request loss upon arrival is The probability that at an arbitrary moment there will be a shortage of a bandwidth is equal to The probability that all requests at an arbitrary moment receive the required service rate is equal to Let the square matrix E (n) m,l where l = 2, M, n = N , if m = 1 and l = m + 1, M, n = N 1 , N if m = 2, M − 1 of size T n define the transition probabilities of the process t , t ≥ 0, during the moment at which type m request arrives to the system and displaces a type l request when the number of requests receiving service is n. Definition of this matrix is similar to definition of the matrix E (n) m given above. In each row of this matrix only one element can be equal not to zero but to 1. We use the mentioned in definition of the matrix E (n) m fact that each row and column of the matrix is E (n) m,l correspond to a certain state {s 1 , s 2 , … , s M } of the process t . In the row of the matrix E (n) m,l that corresponds to the state {s 1 , s 2 , … , s M } , element 1 is placed in the column that corresponds to the state {s 1 , … , s m−1 , s m + 1, s m+1 , … , s l−1 , s l − 1, 0, … , 0} only The loss probability P loss of an arbitrary request is

Numerical example
Let us assume that there are three types of requests ( M = 3 ). A size of a request is measured in Megabits (Mb). The size of a type m request has the exponential distribution with the rate m , m = 1, 3. We set 1 = 0.025. Thus, the average size of a type 1 customer is 40 Mb. The nominal bitrate ̂1 of type 1 request is 20 Mb per second. Correspondingly, the service rate of type 1 customer in the case of absence of the deficit of bandwidth is 1 = 0.5. For requests of type 2 and type 3, 2 = 1 75 , ̂2 = 15 Mbps, 2 = 0.2, and 3 = 1 100 , ̂3 = 10 Mbps, 3 = 0.1.
We assume that the arrival flow of requests is the MMAP defined by matrices The average total arrival intensity of customers is = 3.90335, the average arrival intensities of type m requests are 1 = 1.12506 , 2 = 1.95346 , 3 = 0.824833 . The coefficient of variation of inter-arrival times is 1.52387, the coefficients of variation of type m requests inter-arrival times are 2.32208, 1.13619, and 1.50726 correspondingly. The coefficient of correlation of two consecutive interarrival times is 0.159857, the coefficients of correlation of two consecutive inter-arrival times of type m requests are 0.236901, 0.0466208, and 0.128633, correspondingly.
We fix that the maximum number of requests that can obtain service at the same time as N = 50.
In this numerical example, we intend to investigate the impact of the bandwidth of server B and the parameter N 1 , which defines the acceptance of lower priority requests, on the main performance measures of the system. For this purpose, we vary the bandwidth B in the range [50,300] with the step 50, and the parameter N 1 over interval [1,50] with step 1. The computations were implemented on PC with Intel Core i7-8700 CPU and 16 GB RAM, Wolfram Mathematica 12.1. The run time is about 80 minutes for 300 different pairs (B, N 1 ) or 16 seconds per one pair on average. Figure 2 shows the dependence of the average total number N requests of requests and the mean number N requests m , m = 1, 3, of type m requests in the system on the parameters N 1 and B.
As it is seen from Fig. 2, in the considered case the average total number N requests of requests decreases with the increase of the bandwidth of the server B and increases with the increase of the parameter N 1 . Under the fixed N 1 the decrease of N requests with the increase in bandwidth B stems from the fact that with growth of B the service rates increase and, therefore, the requests faster depart from the system. Under the fixed B, the increase of N requests with the increase in N 1 occurs due to the fact that increasing of N 1 leads to more tolerant acceptance policy. More requests are admitted to the system what potentially can lead to the decrease of the service rates due to the lack of server's bandwidth. The number N requests 1 of type 1 requests in the system behaves the same way as the total number N requests of requests in the system. The mean numbers N requests 2 and N requests 3 of type 2 and type 3 requests also increase with the increase in N 1 , but behave not monotonically with the growth in bandwidth B. . However, with the further growth in B the server becomes less overcrowded what obviously leads to the decrease in the mean number of type 2 and type 3 requests in the system. Figure 3 illustrates the influence of the parameters N 1 and B on the loss probabilities P arrival−loss and P arrival−loss m , m = 1, 3. As it is seen from Fig. 3, with the growth of the bandwidth B the loss probability of any type request decreases because the larger bandwidth implies the bigger average service rates and requests faster leave the server freeing up place for arriving requests. The increase in N 1 implies the decrease in the loss probability of P arrival−loss , P arrival−loss 2 , and P arrival−loss 3 , and decrease in the loss probability of P arrival−loss 1 .
The decrease in the loss probability of P arrival−loss 1 despite the preemptive priority over type 2 and type 3 requests can be explained as follows. When N 1 increases, evidently more such requests are accepted to the system. The server becomes more loaded and due to sharing the speed of service of type 1 requests decreases, and the situation when an arriving type 1 request meets N type 1 requests obtaining service occurs more often.
The dependence of the probability P no−sharing that all requests at an arbitrary moment obtain required service rate on the parameters N 1 and B is presented in Fig. 4. This figure confirms that the probability P no−sharing is large when B is large and N 1 is small. Correspondingly, this probability is small when B is small and N 1 is large.
These observations, as well as some of dependencies given by Figs. 5, 6, 7 are obvious. However, the behavior of some curves, e.g., figures for P Here A m is the profit earned by service of one type m request; B m is the charge for loss upon arrival of one type m request; C m is the charge for loss of one type m request due to pushing out; Let the introduced costs be defined by A 1 = 10, A 2 = 5, A 3 = 3, B 1 = 4, B 2 = 2, B 3 = 1, C 2 = 20, C 3 = 5, D = 0.05.
The shape of the function E(B, N 1 ) is presented on Fig. 8. The optimal value of the cost criterion E(B, N 1 ) is equal to 7.82733, the optimal values of the bandwidth B and the threshold N 1 are equal to 200 and 27, correspondingly.

Conclusion
In this paper, we introduced and analyzed a novel discipline of simultaneous service of multiple requests. This discipline looks to be realistic for application in real world systems. It assumes restriction on the bandwidth of the server and the number of requests that can receive service at the same time. When the number of requests presenting in the system is relatively small, each of them receives a permanent share of the bandwidth and their service processes are mutually independent, like service in the standard multi-server queueing system. However, when the sum of the bandwidths of the requests admitted to the system exceeds the bandwidth of the server, service to requests is provided at the proportionally reduced rates. Requests are heterogeneous with respect to requirements to the service rates and have different priorities. One of the types of requests has a preemptive priority over the requests of all other types and no restriction in admission until the number of requests presenting in the system reaches the maximum admissible value. The rest of types of requests have more strict restriction in admission and preemptive priorities over each other.
Analysis of the model is performed under realistic suggestion about correlation and possible high variability of inter-arrival times. This is achieved via the assumption that the arrivals occur according to the MMAP process which is essentially more general arrival process than the superposition of the stationary Poisson processes. Feasibility of the proposed method of analysis is illustrated by the numerical example. In particular, the results of solution of the problem of computation of the optimal values of the bandwidth of the server and the number of requests that can receive service simultaneously are presented. Due to application of the technique going back to works by D. Lucantoni and W. Ramaswami, it is possible to implement computations not only for relatively small number of requests receiving service at the same time.
The considered model suggests loss of requests arriving when the number of requests under service has the maximum value. The presented analysis is planned to be extended to the scenarios when storing of such requests in an infinite or finite buffer or repeated attempts to enter the service are possible. In these scenarios, operation of the system can be described by the Markov chain ̃t of the form ̃t = {i t , t } where i t is the number of requests in the buffer of orbit and t is the Markov chain analysed in this paper. If the states of the chain ̃t will be enumerated in the direct lexicographic order and the levels of the chain will be defined by the fixed values of the component i t , then the blocks A n,n ′ of the generator of the Markov chain t analysed in this paper will be properly used as the sub-blocks of the blocks of the generator of the Markov chain ̃t The case of assigning not equal shares to competing flows of requests, see, e.g., (Chen et al. 2022), can be considered as well. The problem of application of the obtained