Pigouvian Tolls and Welfare Optimality with Parallel Servers and Heterogeneous Customers

Congestion externalities are a well-known phenomenon in transportation and communication networks, healthcare etc. Optimization by self-interested agents in such settings typically results in equilibria which are sub-optimal for social welfare. Pigouvian taxes or tolls, which impose a user charge equal to the negative externality caused by the marginal user to other users, are a mechanism for combating this problem. In this paper, we study a non-atomic congestion game in which heterogeneous agents choose amongst a finite set of heterogeneous servers. The delay at a server is an increasing function of its load. Agents differ in their sensitivity to delay. We show that, while selfish optimisation by agents is sub-optimal for social welfare, imposing admission charges at the servers equal to the Pigouvian tax causes the user equilibrium to maximize social welfare. In addition, we characterize the structure of welfare optimal and of equilibrium allocations.

I. INTRODUCTION We study service systems in which customers or agents can be served by any one of several heterogeneous servers. Customers arrive into the system according to a random process, reside in the system while being served, and then depart. Customers differ in their aversion to some congestionbased metric such as their sojourn-time in the system or the number of other customers with whom they share the server. We seek to determine how customers may be assigned to servers in such a way as to optimize some social welfare function, and also how pricing may be used to incentivize selfish customers to achieve the same social optimum.
Examples of such systems include web server farms, cloud and grid computing clusters, communication networks and cognitive radio systems. In these examples, customers may differ in the quality of service they require, and in their willingness to pay for it. The quality of service of a customer may depend on the share of bandwidth or other resources it receives, or the service latency or the sojourn time in the system. Another example arises in transportation, where users may have a choice of tolled and toll-free routes, or between multiple modes of transport. Further examples include healthcare, where patients may be choosing between different service providers. Our modeling framework is quite general in this regard and encompasses all the above examples.
A common feature of the above examples is that the more customers choose a particular server, the worse their individual experience. For example, if more drivers choose a certain road, the slower the flow of traffic on it (above a certain utilization) and hence the longer the journey time. Similarly, if more patients choose a certain hospital, then they may have to wait longer for treatment, at least in the short run, when service capacities cannot be changed. This is known as a congestion externality.
Customer preferences are captured by a cost function that could depend on the system occupancy or sojourn time in a fairly arbitrary way. For example, in a transportation network, the cost function could be the expectation of a given function of the travel time, e.g., the probability that the travel time exceeds a certain threshold value. In a communication network, it could be a function of the bandwidth received, or the latency, or a combination of the two. We allow for customer heterogeneity by applying a suitable multiplier to the congestion cost. We call this multiplier its delay-sensitivity (but emphasise that congestion costs can take account of factors other than delay).
We do not constrain service policies except to insist that they be non-discriminatory and agnostic of customer characteristics. Thus, for instance, one server may adopt a first-come first-served (FCFS) policy while another splits its capacity equally amongst all its customers (processor-sharing or PS). Servers may charge a fixed admission price to each customer choosing that server; these can be different between servers but must be the same for each customer. In particular, servers cannot charge for priority or preferential treatment.
Customers choose a server so as to optimize their individual expected utility, i.e., to minimize the sum of the admission price and the expected congestion cost (weighted by their own delay-sensitivity). As the congestion cost depends on the choices of other customers, the interaction between them constitutes a game. The payoff structure makes this a congestion game [21], [27]. We assume in addition that customers are infinitesimal, i.e., that the impact of a marginal customer on the congestion cost at any server is negligible. This assumption renders the congestion game non-atomic. Nash equilibria in non-atomic congestion games are also known as Wardrop equilibria, from their origins in transportation networks [32]; see [23,Chapter 18] for an overview of congestion games.
The goal of this paper is to study the social cost, i.e., the sum of congestion costs incurred by different customers weighted by their sensitivity to congestion, of a Wardrop equilibrium. In particular, we want to know if admission prices can be set in such a way as to ensure that the social cost at equilibrium is the minimum achievable by a central planner who could assign customers to servers. We answer this question in the affirmative. One set of such prices admit an interpretation as Pigouvian taxes associated with congestion externalities at the servers. While the welfare optimality of Pigouvian taxes is known in general, our contribution in this paper is to show that these depend only on the server, and not on the customer type. In other words, all customers using the same server are charged the same levy (which may depend on the mix of customer types choosing that particular server).
A second contribution of the paper is a characterisation of the structure of socially optimal allocations and of Wardrop equilibria. Specifically, we show that in an optimal allocation, the server with the smallest congestion cost serves the most delay-sensitive customers, the one with the next smallest congestion cost serves the next most sensitive set of customers, and so on. We show that, for arbitrary admission prices at the servers, Wardrop equilibria have the same structure. Furthermore, the higher the admission price at a server, the lower its congestion cost (among servers that are utilized by some customer).
We survey some related work in the remainder of this section, before presenting a formal statement of the model and problem in the next, and stating our main results. Proofs are presented in the following section, and we conclude with a discussion of limitations of the current work and some open problems.

A. Related Work
The notion of a congestion externality was first formalized by Pigou [25], who proposed the use of a charge or levy to internalize the congestion externality in transportation networks, thereby guiding the system to a social optimum. Such charges are known as Pigouvian taxes and have since been studied in a wide variety of contexts including queueing systems [8], [18], transportation networks [30], [33], matching markets [15] and climate change [20]. While much of the work on Pigouvian taxes focuses on achieving socially optimal levels of consumption of a good associated with externalities, the work in this paper is most relevant when demand is inelastic (i.e., the quantity of demand does not depend on the price), but there is a choice between substitutes which generate different externalities. This is the case in many queueing and transportation applications. Secondly, our work considers heterogeneous agents, with different delay-sensitivites. In the following, we refer to them as multiclass customers, with "class" being used as a synonym for "delay-sensitivity".
There is a substantial literature on the allocation of multiclass customers to parallel queues in both centralized and decentralized settings, including a variety of pricing schemes and game-theoretic formulations. Much of this work looks at specific cost functions arising from those models, whereas we consider a more general and abstract formulation. Below, we describe some of the work more closely related to the approach taken in this paper and delineate these from the results we present. We use Kendall's notation for queueing models, which we now briefly describe. A queue is described by a triple X/Y /n, where X describes the arrival process, Y the job size distribution, and n the number of servers. Common choices for X are M , denoting Markovian and referring to a Poisson arrival process, and G, denoting "general" and referring to an arrival process in which the inter-arrival times are independent and identically distributed (i.i.d.), but with a general distribution. (Some authors prefer GI to emphasise the assumption of independence.) Common choices for Y are M , denoting Markovian and referring to job sizes with an exponential distribution, G, denoting i.i.d. job sizes with a general distribution, and D, denoting fixed, deterministic job sizes. If the service discipline is not the default F CF S discipline, it is added to the notation. Thus, for example, an M/G/1 − LCF S queue has Poisson arrivals, i.i.d. job sizes with a general distribution, and a single server which adopts a last-come-first-served policy.
There are several works that study the use admission prices to reduce congestion. Naor [22], Edelson and Hilderbrand [10] and Littlechild [18] studied M/M/1 queues with identical customers who must choose between paying an admission price to enter the queue, incurring a random delay and receiving a fixed reward for service, or balking (i.e., leaving without being served). Admission prices are set by an operator who seeks to maximize revenue. If customers can observe the queue length on arrival and base their balking decision on it, then the revenue-maximizing admission price exceeds the one that maximizes social welfare [22]. However, if customers cannot observe the queue but must base their decision on only the known arrival and service rates, then these two admission prices coincide [10], [18]. In the latter setting, Littlechild [18] obtained the admission fee as a Pigouvian tax and showed that this will induce a socially optimal arrival rate. Bradford [8] extended the results to multiclass customers, each with their own delay cost function and reward for service, and obtains the Pigouvian admission charge for each class that achieves the socially optimal allocation. The admission charge is independent of the queue from which the customer receives service but depends on its class, which means that the system needs to elicit information of the customer class. In contrast, admission charges in our model are calculated for each queue but are agnostic of the customer class.
The equilibrium allocation of customers in multiqueue systems was studied by Bell and Stidham [5], and Haviv and Roughgarden [14]. Both works focused on homogeneous customers, i.e., a single customer class. Bell and Stidham [5] studied a set of parallel M/G/1 queues which differ in their holding cost per unit time and in their mean service time. They established structural properties of a socially optimal allocation as well as of Wardrop equilibria. Restricting their attention to parallel M/M/1 queues, Haviv and Roughgarden [14] obtained an upper bound on the price of anarchy (PoA), defined as the ratio of the total cost at the Wardrop equilibrium to that at the social optimum. In comparison, we consider multiclass customer populations and general cost functions.
Borst [7] studied the probabilistic allocation of multiclass traffic to parallel M/G/1 queues so as to minimize a specific social cost function, namely the total mean waiting cost per unit of time. He established a structural property of the optimal allocation. The structure we obtain for the optimal allocation is essentially the same, but our results apply to a very general class of queueing models and cost functions; we also do not restrict to finitely many customer classes. In addition, we consider a game-theoretic setting of selfish optimization and determine a pricing mechanism that will achieve social optimality with selfish optimization.
Sethuraman and Squillante [29] considered a variant of this problem where, in addition to optimal routing, servers decide the order in which customers in a queue are served, depending on their class, so as to optimise social welfare. An alternative approach is to allow customers to purchase priorities [2], [3], [17], [19], [26]; a comprehensive survey of these and other similar models is presented by Hassin and Haviv [13]. Our work differs in that we do not allow servers to discriminate between customers, as a consequence of which they do not need to elicit information about customer class. This may be more realistic in certain applications.
A number of works have studied specific applications in which pricing is used to achieve service differentiation by incentivising end users to segregate themselves on the basis of their willingness to pay for higher quality or lower delay. In particular, there is a substantial body of work proposing charging for differentiated services (Diffserv) in the Internet, and studying the resulting user strategies and equilibria; see [6], [9], [16], [24], for example. Additional examples include queues [31] and transport networks [34]. There has also been work on models in which prices are dynamically adapted in response to observed demands [12]; it is shown that if prices adapt sufficiently slowly, then the system converges to a Nash equilibrium. Finally, while the work presented in this paper focuses on parallel queues, there has been considerable work on general networks; see Roughgarden [28] for a detailed discussion of selfish routing and the PoA, and Fleischer et al. [11] for the analysis of equilibria in a very general network model.

II. MODEL AND RESULTS
Consider a system with N parallel channels for service, which we refer to as servers or queues. Customers arrive into the system according to a marked Poisson process with intensity η × F ; here, η denotes the arrival rate, and F the distribution of the arriving customer's class or delay-sensitivity. The only assumption we make about the distribution F is that its support is bounded away from zero and infinity, i.e., that there are constants β min > 0 and β max < ∞ such that F (x) = 0 for all x < β min , and F (β max ) = 1. Arriving customers must either select or be allocated to one of the queues upon arrival. We assume that the allocation has to be made with no knowledge of current or past queue occupancies, or past arrival times or routing decisions. Such an assumption may be less realistic for centralized allocation than when customers make individual decisions. Nevertheless, imposing this assumption uniformly permits clearer comparison of the two settings. The structure of Wardrop equilibria can be very different if queue occupancies are known, and requires a separate analysis, which is a topic for future research. In general, providing additional information can make the Wardrop equilibrium worse for all agents [1]! Under the assumption that queue occupancies are unknown, it is natural to restrict attention to Markovian policies, which route customers to queues according to some fixed probability vector that may depend on the customer's class, but not on history. (If queue occupancies are known, policies are Markovian with respect to a larger state space which includes that information.) We assume that customers of all classes have the same job size distributions, and that, once they join a queue, they are treated identically within it. Consequently, we assume that the congestion cost associated with a queue depends only on the aggregate arrival rate into that queue (and its service capacity and policies), but not on the composition of those arrivals. We make this precise below.
Let η denote the Borel measure on [0, In other words, the measure of an interval (a, b] is defined as the total arrival rate of customers whose class lies in this interval. As usual, the measures of all Borel sets are determined by those of intervals. All measures in this paper are non-negative, finite Borel measures. Now, Markovian routing corresponds to a decomposition of the measure η as where λ j is a measure on [β min , β max ] for each j = 1, . . . , N ; arrivals into the j th queue of customers with classes in (a, b] constitute a Poisson process of rate λ j ((a, b]). We denote the total arrival rate into the j th queue, and the mean delaysenstivity of arrivals into this queue, by respectively. Next, we associate with each queue j a cost function D j (·) which specifies the congestion cost generated by a given aggregate arrival rate; thus, D j (λ) is the congestion cost incurred by each customer when the arrival rate into queue j is λ. The cost could be the mean sojourn time, or some higher moment of it, or the probability of the sojourn time exceeding a specified threshold. Our only assumption is that each function D j be monotone increasing, continuous, and continuously differentiable in the interior of its domain (the set of arrival rates for which D j is finite), with strictly positive derivative. In particular, we assume that the domain of each D j is either R + or an interval of the form [0, a), and that in the latter case, lim x↑a D j (x) = +∞.
The assumptions above are rather mild. We do not restrict the number of servers at a queue or the service discipline. Indeed, different queues may have different numbers of servers and employ different service disciplines. They can also be associated with different cost functions, for example the mean sojourn time at one queue and the second moment at another. The only requirement is that each queue treat all customers alike, irrespective of their class. In addition to traditional queueing models, our set-up also encompasses transportation models, where the mean journey time on a road may be some increasing function of the traffic intensity on it. The main motivation for the assumption of Poisson arrivals is that it makes each D j a function of a single real variable. It is not obvious how the monotonicity and differentiability assumptions would generalize if D j were to be a function of the law of a stochastic process.
We are now ready to state the social welfare maximization problem. The objective is Thus, the social cost is defined as the sum of the expected costs incurred by customers of different classes at different queues, weighted by the corresponding flow rates.
Our first result states that, if the social cost minimization problem is feasible, then it has a solution, i.e., the minimum is attained.
Lemma 1: Let η be a finite measure with bounded support. Suppose that the cost functions D j , j = 1, . . . , N , satisfy the assumptions stated above. If the optimization problem in (4) is feasible, i.e., there is some decomposition (λ 1 , . . . , λ N ) of η such that D j (λ j ) is finite for all j = 1, . . . , N , then (4) has a solution (λ * 1 , . . . , λ * N ). Next, we consider the formulation of a game between customers. Here, we allow the queues to charge admission prices, denoted by c j at queue j. The goal of a class β customer entering the system is to choose a queue j so as to minimize c j + βD j (λ j ) where λ j is determined through the strategies of all customers. We assume that the arrival intensity measure η and the cost functions D j (·), j = 1, . . . , N are common knowledge. As we assumed that customers do not have access to current or past queue occupancies, or the history of arrival times or routing choices, they are necessarily restricted to choosing a server according to a fixed probability distribution, albeit one that may depend on their class. Thus, once again, the joint strategies may be represented by a decomposition of the measure η into measures λ 1 , . . . , λ N . We want to know when such a decomposition corresponds to a Wardrop equilibrium of the game.
The condition for a decomposition (λ 1 , . . . , λ N ) of η to be a Wardrop equilibrium is that where supp(η) denotes the support of the measure η, namely the smallest closed set F such that η(F c ) = 0. Here, F c denotes the complement of F . The condition in (5) roughly says that, if a positive mass of customers of class β, or in an arbitrarily small neighbourhood of it, use queue j, then the expected cost of a class β customer in that queue must be no higher than its expected cost in any other queue.
The existence of a Wardrop equilibrium can be shown by looking at an auxiliary optimization problem, following Beckmann et al. [4] in the single-class setting, and Yang and Huang [34] in the multiclass setting with a finite number of classes. Consider the optimization problem The existence of a solution follows by Lemma 1. It can easily be shown that any solution satisfies (5), which are essentially first-order conditions for optimality in the auxiliary problem. We include a formal statement and proof for completeness. Lemma 2: The infimum in the optimization problem (6) is attained. Moreover, any minimizer (λ W 1 , . . . , λ W N ) is a Wardrop equilibrium, i.e., it satisfies the condition in (5).
A natural mechanism design 1 question is whether we can set admission prices in such a way that selfish users reacting to these prices would assign themselves to queues in the proportions required for optimizing social welfare. Our main result affirms that this is indeed the case if admission prices are set equal to Pigouvian taxes corresponding to a welfareoptimal allocation.
Theorem 1: Let (λ * 1 , . . . , λ * N ) be a solution of the social cost minimization problem, (4). Set the admission price c j at queue j to be where D ′ j denotes the derivative of D j . Then, (λ * 1 , . . . , λ * N ) is a Wardrop equilibrium, i.e., it satisfies (5) with these admission prices. Notice that c j given in (7) is precisely the total negative externality imposed on existing customers at this queue by the admission of a marginal customer, and is hence the Pigouvian toll for this queue.
We now turn to the question of computing the optimal decomposition of a given measure η. If we can compute the optimal allocation, then we can also compute the corresponding Pigouvian taxes. Note that we start by assuming that the measure η is given. In practice, one of the major challenges of implementing Pigouvian taxes is eliciting utility functions; in our context, that corresponds to eliciting the true delay sensitivities β of different agents. Getting agents to truthfully reveal their preferences is a major challenge in mechanism design, and one which we do not address in this paper. Instead, we restrict ourselves to computing the optimal allocation given the true distribution of delay sensitivities.
The constraint on (λ * 1 , . . . , λ * N ) in the optimization problem (4) is linear, and so the set of measures satisfying the constraint is convex. If the cost function N j=1 λ j D j (λ j ) were a convex function of (λ * 1 , . . . , λ * N ), then the optimization problem would be convex, and could be solved using gradient descent methods. Unfortunately, this is not necessarily the case, as illustrated by the following counterexample.
Consider a system with two classes of customers and two M/M/1 queues. Class i customers arrive according to a Poisson process of rate η i and have delay sensitivity β i . Thus, the arrival intensity measure is η = η 1 δ β1 + η 2 δ β2 , where δ x denotes the Dirac delta which puts unit mass at x. The job sizes for both classes are assumed to be i.i.d. exponential random variables with unit mean. Both servers have a unit service rate. We assume that η 1 +η 2 < 1, so that all allocations are feasible.
Recall that the mean delay in an M/M/1 queue with arrival rate λ and service rate 1 is 1/ (1 − λ). Hence, the (classweighted) congestion cost corresponding to a decomposition (λ 1 , λ 2 ) of η is given by The constraint that λ 1 and λ 2 are non-negative and decompose η is equivalent to the constraints that λ 1 + λ 2 = η 1 + η 2 , λ 1 + λ 2 = β 1 η 1 + β 2 η 2 , and that they are all non-negative. Thus, the welfare optimization problem (4) can be rewritten as We now have the following negative result. Lemma 3: The optimization problem in (8) is not convex. In view of the above lemma, it is not obvious how to numerically compute socially optimal allocations in general. Nevertheless, we show below that both socially optimal allocations and Wardrop equilibria possess nice structural properties. These might suggest efficient algorithms for finding optima and equilibria in the model studied here.
Theorem 2: Let (λ * 1 , . . . , λ * N ) achieve the minimum in (4). Suppose i and j are distinct queues, β 2 > β 1 ≥ 0, and . This inequality also holds if λ * i = 0 and λ * j > 0. The theorem says that if some of the customers served at queue j have higher delay sensitivity than some of the customers served at queue i (where "some" is to be interpreted as "a set of positive measure"), then the congestion cost at queue j must be smaller. Moreover, any queue which serves no customers (or a set of measure zero) must have larger congestion cost than any queue which serves some customers. The theorem implies that the queues segregate traffic by class as follows: Corollary 1: Suppose (λ * 1 , . . . , λ * N ) solves the optimization problem (4). Re-order the queues (permute their labels) such that D 1 (λ * 1 ) ≥ D 2 (λ * 2 ) ≥ . . . ≥ D N (λ * N ). Then, there exist 0 = β 0 ≤ β 1 ≤ . . . ≤ β N = β max such that supp(λ * j ) ⊆ [β j−1 , β j ] for all j = 1, . . . , N . The corollary says that customers are almost segregated by class, i.e., that each queue serves a set of customer classes that is nearly disjoint from those served in other queues. By nearly disjoint, we mean that the customer classes served at distinct queues constitute intervals (closed, open or neither), which may only intersect at their boundaries. If the measure η has atoms (e.g., if there are only finitely many classes), then it is possible that customers belonging to some of these atoms are split across two or more queues. In routing terms, this would imply probabilistic routing to the corresponding queues. Secondly, the congestion costs at the queues are ordered such that more delay-sensitive customers incur smaller delays. Note that we are not claiming that queues with smaller delays have faster servers. Indeed, all servers may be identical, or the servers in less congested queues may even be slower! The differentiation in congestion costs is an emergent property of the optimal solution rather than a consequence of intrinsic differences between servers.
Next, we consider the same model, augmented with admission prices. Without loss of generality, we take c 1 < c 2 < . . . < c N ; if c i = c j , then we can collapse these two queues into a single queue whose delay function is the inf-convolution of the delay functions of its constituent queues, i.e., Each customer seeks to join a queue that minimizes the sum of the admission price, which is common to all classes, and the expected congestion cost, which is weighted by its own delay-sensitivity. We wrote down conditions in (5) for a decomposition of the arrival intensity measure η to be a Wardrop equilibrium. We now show that any Wardrop equilibrium has the same structure that we demonstrated above for a social optimum. The theorem says that if some of the customers served at queue j have higher delay sensitivity than some of the customers served at queue i, then the admission price at queue j must be larger. Whereas the social optimum does not use queues whose congestion cost at zero load is too high, a queue could remain unused in a Wardrop equilibrium either because its congestion cost at zero load is too high, or because its admission price is too high, or a combination of the two. The theorem implies that the queues segregate traffic by class as follows: . . , λ W N ) satisfy the conditions in (5), with admission prices c 1 < c 2 < . . . < c N . Then, for all j = 1, . . . , N . An important difference with the social optimum is that the ordering of queues by congestion cost at the social optimum is not obvious a priori. Hence, we do not know which queue will serve more delay-sensitive customers and which will serve less delay sensitive ones. On the other hand, at a Wardrop equilibrium, queues which charge a higher admission price (and are not idle) will serve more delay-sensitive customes than ones which charge a lower admission price.

III. PROOFS
We now present proofs of the various results stated in the previous section.
Let β max = sup{supp(η)}. Then β max is finite by assumption. Hence, the support of λ j is also restricted to [0, β max ] for all j, and the maps λ j → λ j are continuous in the weak topology; so, too, are the maps λ j → λ j . even without requiring bounded support. Finally, since the optimization problem (4) is feasible, we can restrict the minimization to a set of (λ 1 , . . . , λ N ) on which U is bounded; in particular, each λ j is in the domain of D j (·). On this set, U is continuous in the product topology. Thus, (4) involves the minimization of a continuous function over a compact set. Therefore, the minimum is attained.
Proof: of Lemma 2. The constrained optimization problem (6) seeks the minimum of a continuous function over a compact set; this follows along the same lines as the proof of Lemma 1. Hence, a minimizer exists.
Let λ W = (λ W 1 , . . . , λ W N ) be one such minimizer. Suppose by way of contradiction that it is not a Wardrop equilibirum, i.e., that it does not satisfy (5). Then, there exist queues j and k such that By definition of the support, for any δ > 0, there is an ǫ > 0 such that λ W j ((β − δ, β + δ) = ǫ. We now define a new decomposition of η which corresponds to shifting the mass in (β − δ, β + δ) from queue j to queue k. More formally, denote the restriction of a measure µ to a set A by µ| A . Define µ = λ W j | (β−δ,β+δ) . For ǫ ∈ (0, 1), define . . , N are non-negative measures and decompose η, for any ǫ ∈ (0, 1). We see from (6) that By (9), the quantity in the last line above is negative, for small enough δ and ǫ. This contradicts the optimality of λ W . The lemma is proved by contradiction.
Proof: of Theorem 1. The proof is by contradiction. Suppose λ = (λ * 1 , . . . , λ * N ) solves the welfare optimization problem, (4), and that the admission prices c j are set equal to the corresponding Pigouvian taxes, defined in (7). Suppose that (λ * 1 , . . . , λ * N ) do not satisfy (5), i.e., are not a Wardrop equilibrium for these prices. Then, there exist queues j and k such that By definition of the support, for any δ > 0, there is an ǫ > 0 such that λ * j ((β − δ, β + δ) = ǫ. We now define a new decomposition of η which corresponds to shifting the mass in (β−δ, β+δ) from queue j to queue k. Denoting the restriction of a measure µ to a set A by µ| A , we define Clearly, λ β,δ i , i = 1, . . . , N are non-negative measures, and decompose η. We see from (4) that Substituting the expression for the Pigouvian taxes c j and c k from (7) in the above, we get If we let δ decrease to zero, then so does ǫ, and the last two terms in the expression above are negligible compared to the first. Hence, it follows from the above and (10) that U(λ β,δ 1 , . . . , λ β,δ N ) − U(λ * 1 , . . . , λ * N ) < 0 for δ sufficiently small. This contradicts the assumed optimality of (λ * 1 , . . . , λ * N ). We have thus shown by contradiction that the conditions, (5), for a Wardrop equilibrium must be satisfied at a socially optimal allocation when the admission prices are given by Pigouvian taxes.
Denoting the Hessian by [D 2 U ], we consider the quadratic form where we have used the fact that x 1 = −x 3 and x 2 = −x 4 on the subspace of interest to obtain the second equality. Now, it is is clear that the expression above can be made negative by choosing x 1 and x 2 non-zero and of opposite signs, and x 1 sufficiently small in absolute value.
In other words, the quadratic form is not always nonnegative, i.e., the Hessian is not positive semi-definite on the subspace of interest. Therefore, the objective function U is not convex on the feasible set.
Suppose first that λ * i > 0 and that D i (λ * i ) < D j (λ * j ). We shall show that shifting a small mass of customer from queue j to queue i and an equal mass from i to j reduces the social cost, contradicting the optimality of λ * . Let µ i and µ j be measures such that It is clear from the assumptions that such measures exist. Since β j > β i , we also have µ j > µ i .
Consider the measuresλ defined as follows: Then,λ k = λ * k for all k, since equal masses are swapped between queues i and j while flows into all other queues are unchanged. Hence, the congestion costs D k at all queues remain unchanged. Thus, we get by assumption. But this contradicts the optimality of λ * . Thus, we cannot have D i (λ * i ) < D j (λ * j ) and λ * i > 0. Suppose next that λ * i > 0 and D i (λ * i ) = D j (λ * j ). Letλ be as above, and define , which implies that (λ α 1 , . . . , λ α N ) solve the welfare optimization problem, (4), for every α ∈ [0, 1]. Now, for α ∈ (0, 1), and small enough |ǫ|, define the measures ν α,ǫ k , k = 1, . . . , N , by If |ǫ| is sufficiently small, depending on α, then these are nonnegative measures. We now have For U(λ α ) to be a global minimum, the coefficient of ǫ in the above expression must be zero. Thus, But λ α k = λ * k for all α ∈ [0, 1] and k = 1, . . . , N . Combining this with the fact that D i (λ * i ) = D j (λ * j ) by assumption, we can rewrite the last equation as Now, λ α i is strictly increasing in α and λ α j is strictly decreasing, as λ α is obtained by swapping a volume of more delaysensitive traffic in queue j for an equal volume of less delaysensitive traffic in queue i, and these volumes are increasing in α. Moreover, D ′ i (λ * i ) and D ′ j (λ * j ) are strictly positive, and hence non-zero, by assumption. It follows that (11) cannot hold for all α ∈ (0, 1), or even for two distinct values of α.
Thus, we have shown by contradiction that we cannot have λ * i > 0 and D i (λ * i ) = D j (λ * j ). It only remains to consider the possibility that λ * i = 0. Let µ j be as above. Fix ǫ > 0 sufficiently small, and define the measures ν ǫ as follows: j (λ * j ) > 0, the above quantity is negative, contradicting the optimality of λ * , unless D i (0) > D j (λ * j ). This completes the proof of the theorem.
Proof: of Theorem 3. Suppose λ W = (λ W 1 , . . . , λ W N ) satisfies the conditions in (5). Suppose i and j are distinct queues and β 2 > β 1 ≥ 0 are such that Pick β ≤ β 1 ∈ supp(λ W i ) and γ ≥ β 2 ∈ supp(λ W j ). We have by (5) that It follows from these inequalities that . Substituting this in (12), we obtain that c i ≤ c j . As it was assumed that admission prices are all distinct, we have c i > c j , as claimed.

IV. SUMMARY AND DISCUSSION
We considered a very general model of multiple parallel queues serving a heterogeneous customer population, and studied the problem of routing customers to queues so as to maximize social welfare. We characterized certain structural properties of the welfare-optimizing allocation. We also considered selfish routing decisions made by individual customers when the queues charge admission prices, and characterized the structure of Wardrop equilibria. Finally, we showed that, if the admission prices at the queues are set equal to the congestion externalities at a socially optimal allocation, then the social optimum coincides with a Wardrop equilibrium.
The setting we studied was very general, and encompassed a variety of applications with congestion externalities. Nevertheless, some of the assumptions are restrictive. We model customer heterogeneity by applying different multipliers to a common measure of congestion cost at each queue. But it might be the case that some customers care about mean delay, while others care about the probability of exceeding a certain threshold. In that case, no multiplier on the congestion cost would be appropriate for capturing this diversity. Another restrictive assumption is that customers may differ in delay sensitivity, but not in the distribution of the workload they bring into the system. Indeed, this is why Pigouvian tolls depend on the queue, but not on the customer class. If this assumption were relaxed, the externality imposed by a customer would depend on its workload, and hence on its class; this would need to be taken into account in setting Pigouvian tolls.
We briefly discussed the difficulty of determining the optimal allocation. We showed that the optimization problem is non-convex, but did not prove that it is hard. The structural properties of the optimal allocation that we established do not resolve this question, as the optimal ordering of the queues is unknown. Even if the optimal ordering were given, it is not entirely obvious that the thresholds can be computed efficiently. Likewise, the computational complexity of determining the Wardrop equilibria is also unknown. Note that the ordering of queues in this case is determined by the given prices. Thus, one open problem for future research is developing efficient algorithms for these problems, or proving that they are hard.
A second question concerns the informational constraints on the model. We have assumed that the arrival intensity measure is known, and available as input to determining a socially optimal allocation or setting admission prices. In practice, this information is unlikely to be available, but needs to be inferred from observation. If a customer's delay sensitivity is revealed upon arrival, then the arrival distribution can easily be measured. But eliciting delay sensitivities truthfully can be a challenge in practice. It is an open question whether it is still possible to set admission prices in such a way as to ensure that the Wardrop equilibrium either coincides with the welfare optimizing allocation,or approximates it to within some factor.
Finally, we have assumed that a benevolent mechanism designer sets admission prices to maximize social welfare; it is interesting to ask what happens if the admission prices are set by a revenue maximizing service provider. Further, in such a revenue maximizing scenario it would be interesting to see if competing service providers can sustain differentiated services.