Keywords

1 Introduction

Priority queues are prevalent in service operations. For example, theme parks and ski resorts often allow customers to purchase premium tickets to join express lines. Everything Everywhere (EE), a leading telecommunications company in the UK, once offered “Priority Answer” that enabled customers to pay £0.50 to jump the queue for a service call. E-commerce platforms and shipping companies charge higher prices for faster deliveries. The US Citizenship and Immigration Services (USCIS) expedites case processing for an extra fee. The celebrated \(c\mu \) rule establishes that giving priorities to customers with higher waiting costs and shorter processing times minimizes the total holding costs. Therefore, when done right, priority scheduling can increase revenue and social welfare. However, customers are delay- and price-sensitive. Offering priorities will impact customers’ self-interested joining behavior, and when customers self-select into priorities, service providers must ensure that customers choose the “right” priority by charging them the “right” priority prices or, more generally, by designing the “right” priority mechanisms. This chapter reviews the theoretical literature on priority queues of self-interested customers, with a focus on pay-for-priority schemes.

Earlier research in this space typically assumes that priorities can be assigned to different classes of customers based on some publicly known attributes, but customers in each class can freely decide whether to join the queue based on the price and expected wait time of the class. Hence, the focus is on the pricing of a multi-class priority queue and control of arrival rates (see, e.g., Chapter 4 of Stidham 2009). The assumption of assigning priorities according to known customer attributes fits various practical applications even to this date. For example, in COVID-19 testing, priorities can be based on symptoms (Yang et al. 2022); in government services, priorities can be given to those who travel from afar (such distance-based priorities are studied in Wang et al. 2022a and will be reviewed in Chap. 6).

In many other applications, however, such customer attributes are unavailable, and priorities must be self-selected. This calls for a pay-for-priority scheme in which customers who pay a higher price receive a higher priority. We review the literature on pay-for-priority schemes in this chapter. In particular, Sect. 2 reviews papers that assume an unobservable queue. The vast majority of the literature adopts this assumption, partly because of its tractability. It is also assumed that customers differ in their delay sensitivity, on the basis of which, they choose their priority level. This literature can be further divided into two streams in terms of implementation: one on priority pricing and, the other, priority auctions. The commonly investigated performance measures are revenue and social welfare. Section 3 reviews papers that assume an observable queue. Unlike its unobservable-queue counterpart, this observable-queue literature typically assumes that customers have the same delay sensitivity (for tractability) and decide whether to purchase priority based on the ever-evolving real-time queue length. Section 4 concludes the chapter by highlighting new research trends in this area of rational priority queueing.

2 Unobservable Queues

2.1 Priority Pricing

This literature studies the following mechanism: the service provider posts a menu of prices and expected wait times; a higher price is associated with a shortened expected wait time (and therefore a higher priority class). Each arriving customer decides whether to join the queue and, if so, which price to pay (and therefore which priority class to join).

The seminal paper of Mendelson and Whang (1990) studies the incentive-compatible socially optimal priority pricing scheme. Their paper employs a model of a discrete set of customer classes. Within each class, customers have the same delay sensitivity and the same expected service requirement, but still differ in service valuation (drawn from a continuous distribution). Mendelson and Whang (1990) show that if all customers have the same expected service requirement, the socially optimal price charged to each class should be equal to the externalities a customer imposes on others, and it is also incentive-compatible in the sense that customers prefer the price and wait time of their own class to those of other classes. The socially optimal pricing scheme implements the \(c \mu \) rule, with customers with higher delay sensitivity being charged a higher price and receiving a higher priority.

In a similar model, Afèche (2013) studies the priority pricing problem from a revenue-maximizing service provider’s perspective. Instead of prespecifying a particular scheduling policy such as strict priority, Afèche (2013) applies the achievable region method due to Coffman and Mitrani (1980) and casts the revenue-maximization problem into a mechanism design framework. The paper shows that the service provider should sometimes artificially inflate the waiting time of low-priority customers in the optimal mechanism to stimulate demand for high priority. This strategy is referred to as “strategic delay.”

Further, Afèche and Pavlin (2016) study a model in which customer-type rankings are lead-time dependent and find that the optimal mechanism not only may involve strategic delay but also pricing out the middle and pooling some customers into a single FIFO (first-in-first-out) class despite their differences in delay sensitivity.

Gavirneni and Kulkarni (2016) study how a continuum of customers self-select into priority when two priority classes are available. Nazerzadeh and Randhawa (2018) further show that in this setting, a coarse service grade of granting only two priority classes is asymptotically optimal for a revenue-maximizing service provider. Their model assumes that customers’ unit waiting cost depends on their service valuation. Gurvich et al. (2019) compare how revenue maximization differs from social-welfare maximization. They find that in the asymptotic regime, the two objective functions will not lead to any significant difference in coverage (i.e., the total throughput is similar across the two) or coarseness (i.e., in both cases, having two priority classes suffices), but classification (the proportion of customers who purchase priority) is markedly different. They also find that selling priority can reduce consumer surplus and make all customers worse off despite its ability to improve social welfare (as the service provider appropriates the welfare gain).

Wang and Fang (2022) consider the effect of customer awareness on priority queues, and they find that social welfare and customer surplus are both non-monotone in the level of customer awareness. In particular, full or no customer awareness can be suboptimal from a social-welfare standpoint. Wang and Wang (2021) study the priority-purchasing behavior in retrial queues, and they show that the service provider’s revenue is bimodal in the priority price. Wang et al. (2022b) propose a pay-to-activate-service (PTAS) scheme in a vacation queue where customers can pay to instantaneously end the server’s vacation. By comparing it with the pay-for-priority scheme, they show that selling priority generates more revenue than PTAS when the system workload is high. Afeche et al. (2019) consider a pricing-and-prioritization problem when customers have heterogeneous demand rates. They show that prioritizing customers with higher demand rates may be revenue-maximizing even when customers are homogeneous in delay sensitivity.

While all the papers above study the case of a single service provider, a few papers study competition among service providers who practice priority pricing (Lederer and Li 1997; Allon and Federgruen 2009; Sainathan 2020). For a monopoly service provider, selling priority always improves revenue, and therefore, in principle, priority pricing will always be implemented (barring practical constraints). However, Sainathan (2020) shows that in a duopoly setting, priority pricing may not always arise in equilibrium even though it is a more desirable outcome for both service providers.

2.2 Priority Auctions

This literature studies the following mechanism: the service provider runs an auction in which each arriving customer submits a bid she would like to pay to join the queue, and a customer with a higher bid is served ahead of all customers with a lower bid. The early literature sometimes refers to priority auctions as queue bribery as they enable customers to bribe the service provider in exchange for an expedited service. While priority auctions sound rather different from priority pricing, they are equivalent from a mechanism design perspective as both can be translated into a direct revelation mechanism that induces customers to truthfully report types. However, models of priority auctions often assume a continuum of customer types and therefore a continuum of priorities yet those of priority pricing usually (and justifiably) consider a finite number of priority classes (even when customer types are continuous).

Kleinrock (1967) lays the foundation for the literature on priority auctions by deriving the expected waiting time expressions under continuous priorities. However, in his work, payment functions are exogenously given. Lui (1985) and Glazer and Hassin (1986) extend Kleinrock’s model by deriving customers’ endogenous bid functions in equilibrium. They show that the \(c\mu \) rule can be achieved in this priority auction. In particular, Lui (1985) shows that allowing bribery (i.e., priority auctions) can induce faster service as it motivates the service provider to increase capacity. Hassin (1995) shows that the bidding mechanism in the priority auction is self-regulating in that it achieves both the socially optimal service order and arrival rate. This is because customers’ bids factor in externalities, similar to the socially optimal priority prices chosen by Mendelson and Whang (1990). Hassin (1995) also shows that while running a priority auction can induce faster service, a revenue-maximizing service provider does not invest as much capacity as a social planner would. Afèche and Mendelson (2004) study priority auctions under a generalized delay cost structure that augments the standard additive model with a multiplicative component. They find that priority auctions perform better under multiplicative compared to additive delay costs. In the auction of Kittsteiner and Moldovanu (2005), customers possess private information on job processing time. They show how the convexity/concavity of the function expressing the costs of delay determines the queue discipline (i.e., shortest-processing-time-first (SPT), longest-processing-time-first (LPT)) arising in a bidding equilibrium.

Unlike the literature above that studies a single service provider, Gao et al. (2019) study two competing firms, one running a priority auction and the other charging a fixed price (this setup contrasts that in Sainathan 2020, who allows each firm to choose which scheme to adopt). Gao et al. (2019) show that in equilibrium, customers with either high or low waiting costs seek service from the priority-auction firm, whereas those with intermediate waiting costs choose the fixed-price firm. They also find that the priority-auction firm can be inherently favored in such a competition.

3 Observable Queues

This literature typically assumes that customers have the same waiting cost. Upon arrival, customers observe the queue length and decide whether to purchase priority (and more generally, which priority class to purchase); they may also decide whether to join the queue or balk.

Adiri and Yechiali (1974) study such a model and identify a threshold equilibrium whereby customers will purchase priority if and only if the total number of customers in the system is above a certain threshold. Hassin and Haviv (1997) extend Adiri and Yechiali (1974) to a model that incorporates mixed strategies. The authors show that multiple equilibria may exist. This is driven by the “follow the crowd” (FTC) behavior, i.e., a customer is more likely to purchase priority if more other customers purchase priority. Purchasing priority has dual purposes: it not only helps a buyer overtake non-buyers but also prevents a customer from being overtaken by future buyers. It is the latter incentive that induces the FTC behavior.

Alperstein (1988) identifies the optimal number of priority classes and the set of prices for a revenue-maximizing service provider. It shows that under full optimality, an arriving customer chooses the lowest priority price that is higher than any prices chosen by existing customers, and balks if the queue length reaches a threshold. As a result, while the model starts out with the FIFO discipline, full optimality results in a pure LIFO (last-in-first-out) discipline, which achieves the maximum social welfare (as shown by Hassin 1985). This result echoes with Hassin (1995) and demonstrates that pay-for-priority can regulate the queue by letting customers internalize their externalities.

Wang et al. (2019) compare the revenue of a pay-for-priority queue with balking, between the observable and unobservable settings. The authors find that the service provider is better off in the observable setting when the system load is either low or high, but benefits from withholding queue information when the system load is intermediate. Note that when the service provider in an unobservable queue can also charge an admission fee in addition to the priority price, then with homogeneous customers, priority pricing yields the same maximum revenue as FIFO pricing and achieves social optimality. In this case, the service provider entirely captures social welfare, leaving customers with zero surplus. On the other hand, in an observable queue, priority pricing attains social optimality when the number of priority classes is optimized, in which case, the service provider again extracts all surplus (Alperstein 1988). Since the maximum social welfare is higher when the queue is observable (as shown by Hassin 1986), it implies that the globally optimal strategy for the service provider is to disclose the queue length and practice optimal priority pricing.

The literature has studied alternative mechanisms. Erlichman and Hassin (2015) analyze a strategic-overtaking scheme in which customers observe the queue length and have the option of overtaking some of the customers already present in the queue by paying a fixed amount per overtaken customer. They show that implementing strategic overtaking can be more profitable for the service provider than selling priority. While all the papers above require customers to make a priority choice upon arrival, Wang et al. (2021) let customers pay and upgrade to priority at any time during their stay in the queue, even if they choose not to do so initially. We will review (Wang et al. 2021) in more details in Chap. 7.

4 Emerging Research Directions

We see two emerging trends in the research on rational priority queueing. The first trend is an increasing integration of queueing models with (sophisticated) economic tools, such as mechanism design/auction theory (Afèche 2013; Afèche and Pavlin 2016; Yang et al. 2017), cheap talk/information design (Yu et al. 2018), and dynamic games (Wang et al. 2021). These methodologies lend themselves to the setting of priority queues, which are often characterized by customers having private information about their priority preferences (i.e., waiting cost) and the service provider having private information about the dynamically evolving queue length.

Second, new applications, often driven by technological advances and societal concerns, also open up abundant new opportunities in this area of research. For example, Baron et al. (2022) show that prioritizing offline orders in omnichannel services eliminates channel interference and improves social welfare. Hu et al. (2022) explore privacy regulation when the service provider can use customers’ disclosed information to offer different priorities. The phenomena of decentralized priority provisioning, whereby priority is not sold by the service provider, but emerges as a result of peer-to-peer or third-party transactions, are seen in trading auctions (Yang et al. 2017), line sitting (Cui et al. 2020), and queue scalping (Yang et al. 2021). These papers will be reviewed in more details in Chaps. 24 of this book. Finally, referral priority programs, where priority is obtained not from purchase, but by exerting referral effort, are increasingly used by technology startups (Yang and Debo 2019; Yang 2021) and will be reviewed in Chap. 5.