Load-balancing for multi-skilled servers with Bernoulli routing

We study the optimal Bernoulli routing in a multiclass queueing system with a dedicated server for each class as well as a common (or multi-skilled) server that can serve jobs of all classes. Jobs of each class arrive according to a Poisson process. Each server has a holding cost per customer and use the processor sharing discipline for service. The objective is to minimize the weighted mean holding cost. First, we provide conditions under which classes send their traffic only to their dedicated server, only to the common server, or to both. A fixed point algorithm is given for the computation of the optimal solution. We then specialize to two classes and give explicit expressions for the optimal loads. Finally, we compare the cost of multi-skilled server with that of only dedicated or all common servers. The theoretical results are complemented by numerical examples that illustrate the various structural results as well as the convergence of the fixed point algorithm.


Motivation
We investigate the performance of a multi-skilled queueing system formed by parallel servers with Processor Sharing (PS) queues. Jobs of different classes of customers arrive to the system following a Poisson process. There is one dedicated server for each class of customer and one multi-skilled server that can execute jobs of all classes. Furthermore, we assume that jobs are assigned to the servers according to the Bernoulli policy. Our goal is to find the optimal load balancing so as to minimize the weighted mean number of jobs in the system.
The main application of our model comes from wireless networks. Consider a region divided in different subregions. Each dedicated server models an antenna that provides ser-vice to a unique subregion and the multi-skilled server models a central antenna that provides service to all the subregions. Using the results of this article, one can determine how the traffic of each subregion must be shared between the antenna of that region and the central one in order to minimize the performance of the system. This architecture has been previously considered by Taboada et al. (2017) in a different context where dedicated servers (or microcells in their model) can be switched on and off so as to minimize the weighted sum of the mean delay and the mean power consumption in the system.

Related work
Load balancing has been widely investigated in different contexts. In data centers, for example, various policies depending upon the information available to the dispatcher have been proposed. In general, optimal policies for the typical performance measures such as mean processing times are not easy to determine albeit in some specific cases. For example, when no information on the state of the servers is available, the optimal Bernoulli routing policy was determined in Altman et al. (2011) for mono-skilled servers only. For FCFS servers, a policy based on Sturm sequences (Gaujal et al. 2006) are known to be optimal. With more information on the server state, a number of heuristics such as Join the Shorter of d queues (Mitzenmacher 2001;Vvedenskaya et al. 1996) and Join the Shortest Queue (Graham 2000) have been analyzed in the large server asymptotic case. In addition, there are various pullbased policies such as Join the Idle Queue (Lu et al. 2011) that are known to work well in practice. Another important routing policy is the Size Interval Task Assignment (Harchol-Balter et al. 1999) where jobs of different sizes are executed in different servers and, therefore, the service requirement of incoming tasks need to be known. This policy has been further studied in Feng et al. (2005) and the author in Harchol-Balter (2000) presented a variation of this policy in which the size of jobs does not need to be known.
Load balancing has also been investigated for balancing energy costs in data centers using Energy Packet Networks model (Fourneau 2020), whereas in Liu et al. (2015) it is considered that data centers are located in different geographical zones. The above works are mostly concerned with mono-skilled or homogeneous servers.
In networks with multi-skilled agents or servers, skill-based routing policies have been proposed and investigated (Koole et al. 2003;Wallace and Whitt 2005). These works are mainly oriented towards call-center architectures with Erlang-B or Erlang-C type of queues. An illustrative example is an overflow-type policy, where each incoming call has a list of agents ordered by priority, with highest priority given to mono-skilled ones, and is routed to the first available agent of this list. If no agent is available, the call can be queued or blocked depending on the architecture. These routing policies are usually difficult to analyze and the cited works are interested in approximations for the various performance measures for a given policy. In these models, obtaining the optimal policy analytically is not easy. We refer to Chen et al. (2020) for a recent survey on multi-skilled systems.
Multi-skilled queues appear also in the analysis of redundancy systems (Gardner et al. 2015;Bonald et al. 2017) in which incoming requests can be sent simultaneously to a subset of queues. We do not investigate the redundancy aspect.
The network topology we consider makes our model different from Altman et al. (2011) in which all servers can execute all type of tasks.

Contributions
The main contributions of the article are summarized as follows: • We provide a necessary and sufficient condition for the stability of the system. • We show that the optimization problem in terms of probabilities can be reformulated in terms of the loads of the servers, which is a convex problem with linear constraints. We also show that if there are routing probabilities that satisfy the original problem (that we will call (PROB-OPT)), then it is possible to find loads that satisfy the reformulated problem (that we will call (LOAD-OPT)). • We fully characterize the optimal loads on the servers for two classes of customers. For more than two customers, we provide in Proposition 2 conditions under which each class of traffic satisfies one of the following: (i) it sends all its traffic to its dedicated server, (ii) it sends all its traffic to the multi-skilled server and (iii) it shares its traffic among the multi-skilled and its dedicated server. • Using the result of Proposition 2, we present a fixed-point algorithm whose convergence ensures that the optimal loads on the servers are achieved. This algorithm starts with an initial condition of the set of servers (according to one of the three possible traffic sharing policies of Proposition 2) and its fixed point is given by the partition of the set of servers.
Providing an analytical proof of this convergence on the partition of the set of servers seems to be an extremely difficult task. However, we illustrate the convergence of this algorithm using numerical experiments. • We compare the performance of our model with the performance of two models. The first model consists of a system where all the servers are multi-skilled and we show the existence of a switching curve, i.e., when the arrival rate of one of the traffic increases, the model whose performance is better changes. The second model consists of a system with no sharing, that is, all the servers are dedicated or mono-skilled, and we provide conditions on the arrival rates such that the performance of the no-sharing model is larger than the performance of our model. • We delve into the comparison of the aforementioned models using numerical experiments.
First, we show the uniqueness of the switching curve when we compare our model with a system where all the servers are multi-skilled. We also observe that, in a system formed by servers with equal capacity and different (but not extremely large) holding costs, the region where our model outperforms the all-sharing system is very large.

Organization
In the next section, we describe the network model and define the optimization problem. Section 3 gives the stability condition and presents an equivalent problem in terms of loads on the servers. In Sect. 4, the main results on the structure of the optimal policy are provided for the model under consideration. We compare in Sect. 5 the performance of our model with the performance of models with other network topologies. We present our numerical experiments in Sect. 6. Finally, we discuss our main conclusions in Sect. 7.

Notation
We consider a server farm with Processor Sharing (PS) queues and an input traffic of different classes. Let K = {1, 2, . . . , C} be the set of classes. We assume that jobs of class i ∈ K arrive to the system according to a Poisson process and have generally distributed service times 1 . Let η i be the traffic intensity of jobs of class i. The class of a job defines the set of servers that can be assigned to this job ( Fig. 1). We consider a system with C + 1 servers. Let S = {0, 1, . . . , C} be the set of servers. For a server j ∈ S, we denote by r j the capacity or speed of Server j (i.e., the amount of traffic that can be served per unit of time) and by c j its holding cost. We denote by S i the set of servers that can execute jobs of class i and, for A ⊂ K, S A = ∪ i∈A S i . For j = 1, . . . , C, Server j executes jobs of class j, i.e., they are dedicated servers. On the other hand, Server 0 executes jobs of all the classes, i.e., it is a multi-skilled server.
For i = 1, . . . , C, we denote by p i the probability that a class i job is executed in its dedicated server, i.e., Server i. For j = 1, . . . , C, the load of Server j is defined as follows whereas for Server 0 as

Problem formulation
For a given routing strategy p = ( p i ), the mean number of jobs of server j is denoted by E[N j (p)]. In this article, we aim to find the routing matrix that minimizes the total cost of the system. More specifically, we analyze the following optimization problem: The first constraint ensures that p i 's are probabilities. The second and third constraints ensure that all the servers are stable, that is, that the total incoming traffic into a server is smaller than its service capacity.

Preliminary results
We first study the existence of a feasible solution of (PROB-OPT). This is the same as characterizing the conditions under which the system can be stabilized. In the following proposition, we provide this result.

Proposition 1 (Stability) The system under consideration can be stabilized if and only if
Proof See "Appendix A".
Since servers are M/G/1-PS queues, we know from Thm 3.8 and Thm 3.9 of Kelly (1979) that the probability of being n jobs in Server i is ρ n i (1 − ρ i ). Therefore, it follows directly that E[N j (p)] = ρ j 1−ρ j and, as a consequence, we can reformulate (PROB-OPT) in terms of the loads on the servers as follows: We now show that the optimization problems we have considered so far are related. More precisely, we show that, if there are routing probabilities that satisfy (PROB-OPT), then it is possible to find loads that satisfy (LOAD-OPT).

Lemma 1 Let p be a routing strategy that satisfies (3)-(5).
Then, for all j ∈ S, ρ j (p) also satisfies the constraints of (LOAD-OPT).
We now show that i∈K η i = r 0 ρ 0 + j∈K r j ρ j in the following way: where the first equality is given using (1) and (2). Finally, we focus on the constraint i∈A η i ≤ r 0 ρ 0 + j∈A r j ρ j , ∀A ⊂ K. Using again (1) and (2), we have for all A ⊂ K that And the desired result follows.
Note that (LOAD-OPT) is a convex problem with linear constraints and has an unique solution as long as the stability condition in Proposition 1 is verified. Moreover, from the above lemma, the solution of (PROB-OPT) can be obtained by optimizing directly over the loads. Then, the optimal routing probabilities can be determined later from (1), once the optimal load on each server is determined.
Remark 1 Let us remark that we assume that servers are M/G/1-PS queues. However, the results of this article are also valid if we assume that servers are M/M/1 queues with any work-conserving queueing discipline (note that the mean number of customers of a server j in both cases is ρ j /(1 − ρ j )).

Analysis of the solution of (LOAD-OPT)
Let δ j = c j /r j c 0 /r 0 for all j ∈ K. We denote by C b the set of classes that route traffic to two servers, by C 0 the set of classes that routes all the traffic to Server 0 and by C d the set of classes that send all the traffic to its dedicated server.
In the following proposition, we present the first result of this section. It gives the conditions under which a class of traffic belongs to C b , C 0 or C d .

and all the traffic of class i is routed to Server 0 if and only if
where ρ * 0 is the optimal load at Server 0 and is given by Besides, if j ∈ C d the optimal load of Server j is η j r j , if j ∈ C b the optimal load of Server j is and if j ∈ C 0 the optimal load of Server j is zero.
The above result leads to this corollary which gives a simple sufficient condition to determine when a given class will not send all its traffic to the multi-skilled server.
is always satisfied and this implies that j / ∈ C 0 according to Proposition 2.
From the above corollary, it follows another interesting property that says that, if δ j < 1 for all j ∈ K, then C 0 = ∅.
The next result gives an ordering which can help identify classes that use both the dedicated and the multi-skilled server. This can be seen as a way to determine, for a given set of input parameters (arrival rate, server speeds, holding costs, etc.), the skills for which we need to train the multi-skilled servers in order for the system to be optimal.
We note that (b) of the above result can be stated as follows: if class j routes all the traffic to Server j, class i routes all its traffic to Server i when η i r i ≤ η j r j and δ i ≤ δ j . In the following result, we show that, under similar conditions, the set of classes that send traffic to two servers can never be {i, j}.
Proof We assume that C b = {i, j}. For this case, it follows from (10) that where the above inequality is an equality if C 0 = ∅.
Let T i denote the sojourn time of jobs of class i. We now provide an interesting result related to the sojourn time of jobs.
Proof We know that the sojourn time of jobs of class i and of class j follow an exponential distribution with rate And the desired result follows.

Charaterization of the solution of (LOAD-OPT) with C = 2
We now focus on the case C = 2. Throughout this article, we refer to this case as the M model. Without loss of generality, we assume that δ 1 ≤ δ 2 . The goal of this section is to fully characterize the solution of (LOAD-OPT) with C = 2.
We first note that, from Proposition 3, it can never be given the following cases: (i) For the remaining cases, we have the following options: In this case, each class sends all its traffic to the dedicated server. Therefore, ρ * i = η i r i for i = 1, 2 and ρ * 0 = 0. According to Proposition 2 this occurs when In this case, all the traffic of class 1 is sent to Server 1 and the traffic of class 2 is sent to Server 0 and Server 2. As a result, ρ * 1 = η 1 r 1 and, from (10) and (11) we obtain that ρ * 2 = 1 − δ 2 r 0 +r 2 −η 2 r 0 +δ 2 r 2 and ρ * 0 = 1 − r 0 +r 2 −η 2 r 0 +δ 2 r 2 . According to Proposition 2, this case occurs when δ 1 ≤ which simplifying gives In this case, all the traffic of class 1 is sent to Server 1 and the traffic of class 2 is sent to Server 0. As a result, ρ * 1 = η 1 r 1 , ρ * 0 = η 2 r 0 and ρ * 2 = 0. According to Proposition 2, this case occurs when In this case, the traffic of both classes is sent to Server 0. Hence, ρ * i = 0 for i = 1, 2 and from (10) According to Proposition 2 and using that δ 1 ≤ δ 2 , this case occurs when δ 1 ≥ 1 In this case, the traffic of class i is sent to Server 0 and Server i, for i = 1, 2. From (10) and (11), it results that ρ * 0 = 1 − r 0 +r 1 +r 2 −η 1 −η 2 r 0 +δ 1 r 1 +δ 2 r 2 and, for i = 1, 2, ρ * i = 1 − δ i r 0 +r 1 +r 2 −η 1 −η 2 r 0 +δ 1 r 1 +δ 2 r 2 . Moreover, we conclude from Proposition 2 that this occurs when, for i = 1, 2, 1 , which using that δ 2 ≥ δ 1 gives We simplify the above expressions and we obtain We observe that this case is symmetric to the case 2 (where C b = {2} and C d = {1}) and using the same arguments, we get the following conditions In this case, the traffic of class 1 is sent to Server 0 and Server 1, whereas all the traffic of class 2 to Server 0. As a result, we have that ρ * 2 = 0 and from (10) and (11), we obtain that ρ * 0 = 1 − r 0 +r 1 −η 1 −η 2 r 0 +δ 1 r 1 and ρ * 1 = 1 − δ 1 r 0 +r 1 −η 1 −η 2 r 0 +δ 1 r 1 .
According to Proposition 2, we conclude that this case occurs when 1 which after some simplification results in The conditions described in items 1-7 split the feasible half-plane 0 ≤ δ 1 ≤ δ 2 into, at most, 7 disjoint regions. A detailed example of such partition is shown in Fig. 2.

Computation of the solution of (LOAD-OPT) with C > 2
As we saw before, the characterization of the solution of (LOAD-OPT) with C = 2 requires to distinguish seven different cases. This suggest that the characterization of the solution of (LOAD-OPT) with an arbitrary number of classes is to be out of reach. However, we provide a fixed-point algorithm using the result of Proposition 2. The pseudocode of this algorithm is shown in Algorithm 1. The main idea of this algorithm is that it starts from an initial partition C 0 , C b and C d that is used to compute ρ 0 (see Lines 2 and 19). This value of ρ 0 Algorithm 1 Fixed-point Algorithm to compute the solution of (LOAD-OPT).
1: INITIALIZE traffic intensities: η 1 , . . . , η C ; capacity of the servers: r 0 , r 1 , . . . , r C ; holding costs of the servers: c 0 , c 1 , . . . , c C ; partition of K: C 0 , C d and C b 2: COMPUTE ρ 0 using C 0 , C d and C b . UPDATE ρ 0 using C 0 , C d and C b . 20: end while 21: COMPUTE ρ 1 , . . . , ρ C using C 0 , C d and C b . 22: return ρ 0 , ρ 1 , . . . , ρ C . is then used to determine the set of classes that belong respectively to C 0 (see Line 9-10), to C b (see Line 12-13) and to C d (see Line 14-15). The algorithm stops in the first iteration where C 0 , C b and C d do not change. When this occurs, according to Proposition 2, the optimal loads are obtained using (10) and (11) with the resulting partition of the algorithm. Unfortunately, we did not succeed is showing the convergence of this algorithm. However, as we will see in the numerical section, we study the converge of this algorithm and, in all the experiments we have carried out, the convergence is given in a very small number of steps.
We remark that this algorithm can be also used to analyze the economies when including a multi-skilled server into a system with C dedicated servers. For this purpose, we need to initiate the algorithm with an initial partition such that C d = K and C 0 = C b = ∅ and with some values of η 1 , . . . , η C and r 1 , . . . , r C such that the system with only dedicated servers is stable. In that case, the output of the algorithm will be one of the following possibilities: (i) the algorithm stops after the first iteration and (ii) the algorithm does not stop after the first iteration. In the former case, we can conclude that it is not beneficial to add a multi-skilled server, whereas in the latter one we can compare the cost at the initial state and the cost when the algorithm stops to compare the performance of both systems. servers to that with also a multi-skilled one, as well as the case in which all the servers are multi-skilled and can serve all the jobs.

Comparison with all full-skilled servers system
First, we compare the cost of the model with C dedicated servers and a single full-skilled server with the cost a system formed by C + 1 servers with the same values of the holding costs and capacities as Server 0, but all the servers can serve jobs of all the classes. We call the latter model ASSAC (All Servers Serve All Classes).
Lemma 2 Consider that η j → 0 for all j > 1 and δ 1 1− η 1 r 1 < 1. Then, the cost of the system with dedicated servers is δ 2 1 times smaller than the cost of the ASSAC model when η 1 is small enough. Proof In the system with dedicated servers, when δ 1 1− η 1 r 1 < 1 and η j → 0 for all j > 1, all the jobs of class 1 are executed in Server 1 and the load of the rest of the servers is zero. Hence, the cost of this system is In the ASSAC model, the traffic is uniformly shared among all the servers and, therefore, the cost of this system when δ 1 When η 1 is small enough, (12) and (13) are approximately c 1 η 1 r 1 and c 0 η 1 r 0 , respectively. And the desired result thus follows since ratio of the former and the latter is δ 2 1 . From the above lemma, we have that the optimal cost of the system with dedicated servers is smaller than that of ASSAC in the considered regime.
Proposition 6 Consider that η j → 0 for all j > 1 and δ 1 1− η 1 r 1 < 1. Then, the optimal cost of (LOAD-OPT) is smaller than the optimal cost of ASSAC when η 1 is small enough.
We now show that the optimal cost of (LOAD-OPT) can be larger than that of ASSAC. The intuition behind this result is that the stability region of the ASSAC model is wider than the stability region for the model with one dedicated server. By taking the load close to the boundary of the stability region of the model with one dedicated server, the cost can be made to go infinity. For the ASSAC model, however, the availability of spare capacity means that the cost remains finite.
Proposition 7 Consider that η j → 0 for all j = 2, . . . , C and η 1 → r 0 + r 1 . Then, the optimal cost of ASSAC is smaller than the optimal cost of (LOAD-OPT).
Proof We first observe that the cost of ASSAC when η j → 0 for all j > 1 and and η 1 → r 0 + r 1 is given by which is clearly finite. However, for the model with dedicated servers, class-1 jobs are served by Server 0 and Server 1, whose load tends to one when η 1 → r 0 + r 1 . Therefore, its cost tends to infinity.
From the above propositions, it follows the existence of a switching curve when δ 1 < 1. In Sect. 6, we study numerically this curve.

Comparison with no sharing system
We consider a system formed by C + 1 dedicated servers, but Server 0 can only serve jobs of class 1, whereas for i ≥ 2 Server i can serve only jobs of class i. We call this model as system without sharing since jobs of different classes are not served in the same server. We compare the optimal cost of this system with the optimal cost of (LOAD-OPT).
In Proposition 7, we have shown that the optimal cost of (LOAD-OPT) tends to infinity when η 1 → r 0 + r 1 and η j → 0 for j = 2, . . . , C, whereas the optimal cost of ASSAC is finite. In the following result, we show that there is a regime where the optimal cost of the system without sharing is infinity, where the optimal cost of (LOAD-OPT) is finite. The intuition is similar here when we note that the stability region of the no-sharing model is included in that of the model with one shared server.
Proposition 8 Consider that η j → 0 for all j = 1, . . . , C − 1 and η C → r C . Then, the optimal cost of the system without sharing is larger than the optimal cost of (LOAD-OPT).

Numerical experiments
In this section, we present the numerical experiments that complement the main theoretical findings of this article.

Comparison with ASSAC
We first focus on the performance comparison of Sect. 5.1, where we showed the existence of a switching curve when δ 1 < 1. For C = 2, we analyze the value of the objetive function of the models under comparison in Sect. 5.1, which are the M model (i.e., the model we consider in Sect. 4.1) and the ASSAC model with three servers. In Fig. 3, we fix the values of the capacities and holding costs and we consider η 1 and η 2 such that both models are stable, that is, when η i < r 0 + r i , for i = 1, 2 and η 1 + η 2 < r 1 + r 2 + r 0 . We set r 1 = r 2 = r 0 = 1 and c 0 = 20, c 1 = 1 and c 2 = 2. We represent with 'x' where the cost of the ASSAC model is smaller and with a filled 'o' the region where the value of the objective function of the M model is smaller. As it can be observed in Fig. 3, the switching curve is unique, that is, when we increase η 1 (or η 2 ) there is a single value where the model that outperforms changes. Another interesting conclusion of this experiment is that the region where the M model outperforms is very large.

Comparison with no-sharing system
In the next set of experiments, we concentrate on the performance comparison of Sect. 5.2. We compare the value of the objective function of both models for the values of η 1 and η 2 Fig. 3 Comparison of optimal costs of Sect. 5.1 when δ i < 1 for i = 1, 2 such that both systems are stable, i.e., η 1 < r 0 + r 1 and η 2 < r 2 considering the same values of the parameters as in Fig. 3. As we said in Sect. 5.2, the stability region of the system without sharing is smaller than that of the M model. This implies that M model outperforms the system without sharing when η 2 → 1. This phenomenon can be clearly observed in Fig. 4. We are also interested in comparing these models out of the boundary. For this purpose, we present in Fig. 5 a zoomed version of Fig. 4. From this illustration, we conclude that the performance of both models is very similar when η 1 ∈ (0, 2) and η 2 ∈ (0, 0.8).

The solution of (LOAD-OPT) for C > 2
We now study the Algorithm 1 since, as we said before, its convergence ensures that the solution of (LOAD-OPT) is obtained. We first consider a system with C = 5 classes of traffic and the values of the parameters presented in Table 1. We have chosen these parameters since the solution of (LOAD-OPT) for these values satisfies that C d = {3}, C 0 = {4} and C b = {1, 2, 5}, i.e., all the sets of the partition are non-empty.
We consider three different initial conditions: first, all the classes belong to C d (see solid line in Fig. 6); second, classes 1, 3 and 4 belong to C d , whereas classes 2 and 5 to C b (see dashed line in Fig. 6); and finally, classes 1, 2, 3 and 4 belong to C d and class 5 to C b (see dotted line in Fig. 6).
We illustrate in Fig. 6 the evolution of the loads of each server over the iterations of the algorithm. In the upper line of Fig. 6 we show the loads of Server 0, Server 1 and Server 2, Fig. 4 Ratio of the optimal costs under comparison in Sect. 5.2 (η 1 ∈ (0, 2) and η 2 ∈ (0, 1))

Fig. 5
Ratio of the optimal costs under comparison in Sect. 5.2 (η 1 ∈ (0, 2) and η 2 ∈ (0, 0.8))  . 6 Convergence of the fixed-point algorithm presented in Sect. 4.2. The x-axis represents the iterations of the algorithm and the y-axis the load of each server whereas in the bottom line the loads of Server 3, Server 4 and Server 5. We observe that, when the algorithm converges, the load of Server 4 is zero, which means that, for this case, ρ * 4 = 0. We also see that, for Server 3, the initial load in the scenario that is represented by the solid line (that is, the scenario where all the classes send all the traffic to its dedicated server) equals to the load when the algorithm converges. This means that, for class 3, we have that ρ * 3 = η 3 r 3 . It is important to remark that, as we can also observe in Fig. 6, the algorithm converges to the same values for the three different initial partitions under consideration. We have also started the system with other initial partitions and the obtained results confirmed that the Fig. 7 Minimum and maximum number of iterations until convergence for systems of different size algorithm always converges. Another interesting property of this algorithm is that the number of iterations required to reach the convergence is very small. Indeed, when the initial partition of K is such that all the classes belong to C d , the algorithm converges after 12 iterations. Moreover, for the rest of the cases, the algorithm converges for a less number of iterations. When the initial partition of K is such that classes 1, 3 and 4 belong to C d and classes 2 and 5 to C b , it converges after 2 iterations and when the initial partition of K is such that classes 1, 2, 3 and 4 belong to C d and class 5 to C b , it converges after 3 iterations.
We now present further numerical work we have performed to analyze the convergence of Algorithm 1 for larger systems. For this set of experiments, we consider that the number of dedicated servers, C, varies from 10 to 200 with step 10. For each case we run our algorithm 10 times where, in each run, the parameters of the system are randomly chosen (but satisfying the stability condition); the results are depicted in Fig. 7, where the blue bars represent the minimum number of iterations required for convergence and the yellow bars the difference up to the maximum. The main conclusions of these experiments are twofold: first, we observe that the algorithm converges in all the cases; and second, that the number of iterations required to converge varies between 2 and 6 in all the cases. This means that the convergence of this algorithm is very fast even for large systems with 200 dedicated servers.

Conclusions
We study the optimal Bernoulli routing in a system with C dedicated servers and a single multi-skilled server. We first provide a necessary and sufficient condition for the stability of the system. We then reformulate this problem as a optimization problem in terms of the loads of the system and we show the equivalence of both problems. We provide structural properties on the solution of the derived problem, which allows us to fully characterize the optimal loads of the system when C = 2 and also to present a fixed point algorithm whose convergence ensure that the optimal loads are obtained. We compare the performance of this system with optimal loads with a system where all the servers are multi-skilled and also with a system where all the servers are dedicated. Finally, we explore numerically the convergence of the fixed point algorithm and show that, in all the considered cases, the algorithm converges in a very few number of steps.
For future work, we are interested in generalizing the results of this article to systems with a more complex topology. Besides, we think that an interesting extension of the performance analysis of this work would be to consider other popular load balancing policies such as Power of Two and Join the Shortest Queue.
Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

A Proof of Proposition 1
We first show that if there exists a subset A ⊂ K such that i∈A η i > r 0 + i∈A r i , then the system is not stable.
We know from (1) and (2) that Therefore, we have obtained that r 0 ρ 0 + i∈A r i ρ i > r 0 + i∈A r i , which requires that, at least, the load of one server is larger than one, i.e., that the system is not stable.
Let > 0 small. We now show that, if (6) holds, then the system is stable. For this purpose, we define the following routing strategy: for all i ∈ K such that η i < r i , p i = 1 − and for all i ∈ K such that η i ≥ r i , p i = r i η i (1 − ). For this choice, it is clear that ρ j < 1 for all j ∈ K. We now focus on Server 0 and we aim to show that i∈K η i (1 − p i ) < r 0 .
We denote by K * the set of classes such that η i ≥ r i . Hence, the above expression is satisfied if and only if We know from (6) that i∈K * (η i −r i ) < r 0 and, therefore, the above inequality is satisfied if and only if ⎛ In other words, the desired result follows if we choose > 0 such that

B Proof of Proposition 2
In the following result, we provide a property that will be useful to show the result of Proposition 2.
Proof To simplify the notation, we write D = C b ∪C 0 . If D = ∅, then ρ 0 = 0 and ρ j = η j /r j for j = 1, 2, . . . , C, which implies clearly that i∈A η i = j∈S A ρ j r j for all A ⊂ K. We now focus on the case D = ∅. We know that We now observe that K = D (K \ D) and therefore from (7) i∈D Therefore, for anyC ⊆ K such that D ⊆C the constraint (9) is satisfied as an equality. Besides, we now show that for any subset that does not contain D, the constraint (9) is satisfied as an inequality. For all B ⊂ D, (15) can be written as follows: which, by ρ j < η j /r j ∀ j ∈ D, gives that And the desired result follows.
We now prove the result of Proposition 2.
Using the last expression and that, for j ∈ C b , 0 < ρ * j < η j /r j , the desired result follows, i.e., To finish, we compute the loads of all the servers. First, for j ∈ C d , we have clearly that ρ * j = η j r j . Besides, we use that for all j ∈ C b , ρ * j = 1 − δ j (1 − ρ * 0 ), and from the expression (7), it follows that And rearranging both sides of the above expression, we obtain that