1 Introduction

During the past decade, bi-level programming problems have received remarkable considerations in the decentralized planning problems concerning the decision progress with a hierarchical structure, which differ from the classical optimization problems as required to solve two levels of optimization tasks, e.g. the upper level (the leader problem) and the lower level (the follower problem). Two roots have dominated the study of bi-level programming. The first root dates back to the Stackelberg game [1], which is a famous problem in game theory. The second one stems from mathematical programming problems, in which the optimization problem becomes a constraint in another optimization problem [2]. Besides the paper above, a variety of literature has contributed to this topic from theoretical aspects to the computational point of view; see [36] and the references therein.

A lot of practical applications can be reformulated as bi-level programming problems, for instance [79]. Despite the extensive use in real life, bi-level programming problems are still difficult to solve in the optimization field because they require more computation time even for a small problems [5, 10]. If the lower-level problem is convex, people always substitute it by the corresponding Karush-Kuhn-Tucker (KKT) optimality conditions, so the whole model becomes a mathematical program with complementarity constraints (MPCC). Due to the hardness of the structure, such a problem does not satisfy the Mangasarian Fromovitz constraints qualification (MFCQ) and leads to the non-existence of the Lagrange multipliers in the KKT optimality conditions [11]. As a result, many state-of-the-art algorithms for solving nonlinear programming problems cannot be applied to this MPCC problem directly. Instead, various methods have been proposed in different branches of numerical optimization to solve bi-level programming problems, such as penalty type approaches [12], the branch-bound approach [13], the Taylor approach [14], and the neural network approach [15]. A comprehensive review of using standard nonlinear programming (NLP) methods to solve MPCCs can be found in Fletcher and Leyffer [16].

In this paper, we consider the combined road toll pricing and capacity expansion problem, which is reformulated as a bi-level programming problem [17]. The upper level regarded as a leader aims to minimize the total system travel time/cost, and the lower level considered as a follower determines the individual user travel time subject to user equilibrium conditions. Because our lower-level problem is convex, the whole model can be transformed into an MPCC problem and further converted to a nonlinear program (NLP) problem. When the idea of smoothing methods is combined with the technique of variational analysis, a perturbation-based approach is proposed to relax the difficulty of solving the MPCC problem. Also, the objective function and constraints in the corresponding perturbed NLP problem are differentiable; a linear independent constraint qualification holds at every feasible point. Consequently, a sequential quadratic programming (SQP) solver in MATLAB is adopted to solve the smoothing subproblems. The numerical results show that the smoothing approach is efficient to solve the combined road toll pricing and capacity expansion problems compared with other solvers.

This paper is organized as follows. Section 2 establishes a bi-level optimization model for the capacity expansion problem with road toll pricing strategy under the user equilibrium conditions. Section 3 uses the proposed method to solve the combined pricing and capacity expansion problem. In Section 4, we report numerical results by the proposed method for different scale capacity expansion problems. Some concluding remarks are presented in the last section.

2 Bi-level programming problems

A bi-level programming involves two competing decision-making parties acting at different levels: one is the upper-level decision makers (leader); the other is a lower-level decision maker (follower). Although the two levels interact with each other, yet each set has their decision variables and objectives and attempts to optimize their goals in sequence. The leader can adjust the performance of the overall system by setting some parameters to influence the decisions of the road users.

A general bi-level programming problem can be formulated as follows:

$$\begin{aligned} (\mathrm{UP})&\quad \begin{aligned} &\min _{x\in{X}} F(x,y) \\ &\quad \mbox{s.t. } G(x,y)\leq0, \end{aligned} \end{aligned}$$
(1)
$$\begin{aligned} (\mathrm{LP})&\quad \begin{aligned} &\min _{y\in{Y}} f(x,y) \\ &\quad \mbox{s.t. } g(x,y)\leq0. \end{aligned} \end{aligned}$$
(2)

Recently, many problems found in the transportation literature have been reformulated as a bi-level programming, particularly in discrete network design problems [17, 18]. Gao et al. [19] introduced a traditional bi-level programming model for the discrete network design problem and new solution algorithm for analyzing the existing relationship between the improved flows and the new addition links in the existing urban network. Numerical results for proposed algorithm produced a better solution and performed efficiently in practice. Marcotte [20] conducted an extensive study of a continuous and nonlinear design problem where the problem was reformulated as a bi-level programming problem. The findings showed that heuristics can produce near optimal solutions. Suh and Kim [21] presented specific issues associated with solving a bi-level transportation planning model in which there is a public-private interaction. Also the study discussed issues on solving a large bi-level programming problem, which contribute to building a normative theory necessary for resources allocation in a mixed economy system.

In addition, the application of the bi-level programming to the network design problem reformulated as a nonlinear problem was studied by Friesz et al. [22], where the lower-level problem substituted with equivalent variational inequality problem. LeBlanc and Boyce [23] investigated a nonlinear bi-level network design problem while utilizing the user equilibrium route choice problem as the lower-level problem. Apart from the mentioned references, many researchers have reformulated second best toll pricing as a bi-level programming problem or mathematical program with equilibrium constraints (MPEC) [9, 24]. In these references, the upper-level models are the leaders/managers responsible for planning where to add a new link and timing signals, and how much to charge road users. The lower level minimizes the individual route choice under user equilibrium conditions [25, 26] corresponding to these controls. Although a bi-level model provides a flexible platform for both the upper-level and the lower-level problems and achieves the optimal solution simultaneously, these problems are difficult to solve because most of these problems are nonlinear and entail a non-convex programming problem.

One advantage of dealing with the convex bi-level programming problem is that under mild constraint qualification, the lower-level problem can be replaced by its Karush-Kuhn-Tucker (KKT) optimality conditions to obtain an equivalence single level mathematical programming problem. Although bi-level programming has been used in various applications, one of the essential conditions for applying bi-level programming to solve designed problems is the availability of the efficient algorithms. In transportation road networks various algorithms have been proposed for solving the bi-level programming problems, such as the simulated annealing [27], the genetic algorithm [28], the ant colony algorithm [29]. These algorithms have also succeeded to solve other branches of network design problems [3032] and [33]. However, in traffic assignment problems even when the upper-level and the lower-level problems are convex, the resulting bi-level program itself may be non-convex [34]. For that reason, up to now most of the proposed approaches are inapplicable when the size of the problem becomes big. Therefore, it is important to find advanced theoretical and methodological methods for handling such as bi-level problems efficiently. In this study, we adopt the smoothing based on the F-B function to solve a combined road toll pricing and capacity expansion problem.

2.1 Mathematical formulation

Consider a road network \(G= (N, A) \) connected by sets of links and nodes denoted by A and N, respectively. Let r and s be the origin and destination on a given network, respectively. The set of origin and destination denoted by r and s is represented by w. Each origin-destination (O-D) pair w is connected by a set of paths (routes) represented by \(K_{w}\). Let \(q_{w}\) and \(u_{w}\) be the demand and the minimum travel time/cost between an O-D pair w, respectively. The flow and travel time/cost on link a are given by \(x_{a}\) and \(t_{a}\), respectively. While \(f^{w}_{k}\) and \(c^{w}_{k}\) are the flow and travel time/cost experienced by travelers along the path \(k\in K_{w}\); \(\delta^{w}_{a, k} = 1\) if link a is part of path k connecting O-D pair w and 0 otherwise.

We formulate the combined road toll pricing and capacity expansion problem as a bi-level programming problem under budget constraints.

Here is a list of additional notations used in this paper:

  • \(q_{w}\) is the travel demand between O-D pair w,

  • \(D_{w}(u_{w})\) is the demand function between O-D pair w,

  • \(D^{-1}_{w}(q_{w})\) is the inverse of the demand function, where \(D^{-1}_{w}(q_{w})=h_{w}(q_{w})\), \(\forall{w\in{W}}\),

  • \(t_{a}(x_{a},y_{a})\) is the unit cost of travel on link a, where the t denote the vector of \(t_{a}(x_{a},y_{a})\), \(\forall{a\in{A}}\),

  • \(c_{a}\) is the capacity of each link, \(\forall{a\in{A}}\),

  • \({\bar{y}_{a}}\) is the upper bound for the link capacity expansion, \(\forall{a\in{A}}\),

  • \(g_{a}(y_{a})\) is the cost of improving link a,

  • \(g_{a}\) is a twice continuously differentiable and nondecreasing function,

  • F is the upper-level objective function,

  • f is the lower-level objective function,

  • \(y_{a}\) is the link capacity, \(\forall{a\in{A}}\),

  • \(\tau_{a}\) is the link parameter for road pricing, \(\forall{a\in{A}}\),

  • θ is the conversion coefficient converting investment cost to travel cost,

  • \(\delta_{\overline{K}}(\overline{E}(z, u,v))\) is the indicator function,

  • \(\mathbf{B}_{\delta}(\bar{z}, \bar{u},\bar{v})\) is the open ball with center \((\bar{z},\bar{u},\bar{v})\) and radius δ.

The bi-level model consists of two problems: the leader problem and the follower problem (lower-level problem), which can be written as follows.

The leader problem

$$\begin{aligned}& \max_{\tau, y}F(x,y)= \sum_{w\in{W}}\int^{q_{w}(\tau)}_{0}D^{-1}_{w}(s) \, ds - \sum_{a\in{A}}t_{a} \bigl(x_{a}(\tau)\bigr)x_{a}(\tau) -\theta \sum _{a\in{A}}{g_{a}(y_{a})} \\& \quad \mbox{s.t. } \tau_{a}\geq0, \quad \forall a\in{A}, \\& \hphantom{\quad \mbox{s.t. }}x=x(y), \end{aligned}$$
(3)

where the link flow pattern \(x=x(y)\) is determined by solving the following network equilibrium problem.

The follower problem

$$\begin{aligned}& \min_{x,q} f= \sum _{a\in{A}}\int^{x_{a}}_{0}t_{a}(s, \tau_{a})\, ds-\sum_{w\in{W}}\int ^{q_{w}}_{0}D^{-1}_{w}(s)\, ds \\& \quad \mbox{s.t. } \sum_{k\in{K_{w}}}{f^{w}_{k}}= q_{w},\quad \forall w\in{W}, \\& \hphantom{\quad \mbox{s.t. }}f^{w}_{k}\geq0,\quad \forall w\in{W}, \forall k\in{K_{w}}, \\& \hphantom{\quad \mbox{s.t. }}q_{w}\geq0, \quad \forall w\in{W}, \\& \hphantom{\quad \mbox{s.t. }}x_{a}= \sum_{w\in{W}}\sum _{k\in{K_{w}}}f^{w}_{k}\delta ^{w}_{a,k}\leq{c_{a}+y_{a}}, \quad \forall a\in{A}, \end{aligned}$$
(4)

where \(t_{a}(s,\tau_{a})=t_{a}(s)+\tau_{a}\), \(a\in{A}\).

The lower level of the above bi-level programming problem is convex optimization problem, which is equivalently transformed to the Karush-Kuhn-Tucker (KKT) optimality conditions as follows:

$$\begin{aligned}& f^{w}_{k}\biggl(c^{w}_{k}+ \sum_{a\in{A}}({\lambda_{a}+\tau _{a})\delta^{w}_{a,k}}-u_{w}\biggr)=0, \quad \forall w\in{W}, \forall k\in {K_{w}}, \\& \biggl(c^{w}_{k}+ \sum_{a\in{A}}({ \lambda_{a}+\tau_{a})\delta ^{w}_{a,k}}-u_{w} \biggr)\geq0, \quad \forall w\in{W}, \forall k\in {K_{w}}, \\& q_{w}\bigl(u_{w}-D^{-1}_{w}(q_{w}) \bigr)=0, \quad \forall w\in{W}, \forall k\in{K_{w}}, \\& u_{w}-D^{-1}_{w}(q_{w})\geq0,\quad \forall w\in{W}, \\& \lambda_{a} \biggl(c_{a}+y_{a}- \sum _{w\in{W}} \sum_{k\in{K_{w}}}{f^{w}_{k} \delta^{w}_{a,k}} \biggr)=0, \quad \forall a\in{A}, \\& c_{a}+y_{a}- \sum_{w\in{W}} \sum _{k\in {K_{w}}}{f^{w}_{k} \delta^{w}_{a,k}}\geq0, \quad \forall a\in{A}, \\& q_{w}- \sum_{k\in{K_{w}}}f^{w}_{k}=0, \quad \forall w\in{W}, \\& q_{w}\geq0,\quad \forall w\in{W}, \\& f^{w}_{k}\geq0,\quad \forall w\in{W}, \forall k \in{K_{w}}, \\& \lambda_{a}\geq0, \quad \forall a\in{A}, \\& u_{w}\geq0,\quad \forall a\in{A}, \end{aligned}$$
(5)

which can be written in compact form as follows:

$$ \begin{aligned} &0\leq f^{w}_{k}\perp \biggl(c^{w}_{k}+ {\sum_{a\in{A}(\lambda _{a}+\tau_{a})\delta^{w}_{a,k}}}-{u_{w}} \biggr)\geq0, \quad \forall w\in {W}, \forall k\in{K_{w}}, \\ &0\leq{q_{w}}\perp{\bigl(u_{w}-D^{-1}_{w}(q_{w}) \bigr)}\geq0, \quad \forall w\in {W}, \forall k\in{K_{w}}, \\ &0\leq{\lambda_{a}}\perp{ \biggl(c_{a}+y_{a}- \sum_{w\in {W}} \sum_{k\in{K_{w}}}{f^{w}_{k} \delta^{w}_{a,k}} \biggr)}\geq0,\quad \forall a\in{A}, \\ &q_{w}- \sum_{k\in{K_{w}}}f^{w}_{k}=0, \quad \forall w\in{W}, \\ &q_{w}\geq0, \quad \forall w\in{W}, \\ &f^{w}_{k}\geq0, \quad \forall w\in{W}, \forall k \in{K_{w}}. \end{aligned} $$
(6)

Combining the upper-level problem (3) with the KKT conditions (6) and the problem becomes a mathematical program with complementarity constraints (MPCC):

$$\begin{aligned}& \min \sum_{a\in{A}}t_{a} \bigl(x_{a}(\tau)\bigr)x_{a}(\tau) +\theta \sum _{a\in{A}}{g_{a}(y_{a})}- \sum _{w\in {W}}\int^{q_{w}}_{0}D^{-1}_{w}(s) \, ds \\& \quad \mbox{s.t. } {\tau_{a}}\geq0, \quad \forall{a\in{A}}, \\& \hphantom{\quad \mbox{s.t. }}0\leq f^{w}_{k}\perp\biggl(c^{w}_{k}+ \sum_{a\in{A}}(\lambda _{a}+ \tau_{a})\delta^{w}_{a,k}-u_{w}\biggr) \geq0,\quad \forall w\in {W}, \forall k\in{K_{w}}, \\& \hphantom{\quad \mbox{s.t. }} 0\leq{q_{w}}\perp{\bigl(u_{w}-D^{-1}_{w}(q_{w}) \bigr)}\geq0, \quad \forall w\in {W}, \\& \hphantom{\quad \mbox{s.t. }} 0\leq{\lambda_{a}}\perp{ \biggl(c_{a}+y_{a}- \sum_{w\in {W}} \sum_{k\in{K_{w}}}{f^{w}_{k} \delta^{w}_{a,k}} \biggr)}\geq 0,\quad \forall a\in{A}, \\& \hphantom{\quad \mbox{s.t. }} q_{w}- \sum_{k\in{K_{w}}}f^{w}_{k}=0, \quad \forall w\in{W}, \\& \hphantom{\quad \mbox{s.t. }}x_{a}- \sum_{w\in{W}} \sum _{k\in {K_{w}}}{f^{w}_{k} \delta^{w}_{a,k}}=0, \quad \forall a\in{A}, \\& \hphantom{\quad \mbox{s.t. }}x_{a} \geq0, \quad a\in{A}, \\& \hphantom{\quad \mbox{s.t. }}u_{w} \geq0, \quad \forall w\in{W}, \\& \hphantom{\quad \mbox{s.t. }} q_{w}\geq0, \quad \forall w\in{W}, \\& \hphantom{\quad \mbox{s.t. }} f^{w}_{k}\geq0, \quad \forall w\in{W}, \forall k\in{K_{w}}. \end{aligned}$$
(7)

Let \(l(\upsilon)\) denote the number of components of a vector υ. Define

$$\begin{aligned}& z=(y; q; f; x; \lambda; u; \tau), \\& f_{0}(z)= \sum_{a\in{A}} \biggl[t_{a}\bigl(x_{a}(\tau)\bigr)x_{a}(\tau) + \theta \sum_{a\in{A}}{g_{a}(y_{a})} \biggr]- \sum_{w\in{W}}\int^{q_{w}}_{0}D^{-1}_{w}(s) \, ds, \\& K= \{0_{l(q)+l(x)+l(\lambda)}\}\times[0,\bar{y}_{a}]\times\Re ^{l(f)+l(u)+l(\tau)}_{+}, \\& E(z)= \left[ \begin{aligned} &q_{w}- \sum _{k\in{K_{w}}}f^{w}_{k}, \quad \forall w\in{W}, \\ &x_{a}- \sum_{w\in{W}} \sum _{k\in {K_{w}}}{f^{w}_{k}\delta^{w}_{a,k}}, \quad \forall a\in{A} \end{aligned} \right], \\& G(z)= \left[ \begin{aligned} &f^{w}_{k},\quad \forall w\in{W}, \forall{k\in{K_{w}}}, \\ &q_{w}, \quad \forall w\in{W}, \\ &x_{a},u_{w},\tau_{a},\quad \forall a\in{A}, \forall w\in{W} \end{aligned} \right], \\& H(z)= \left[ \begin{aligned} &\biggl(c^{w}_{k}-u_{w}+ \sum_{a\in{A}}(\lambda_{a}+ \tau_{a})\delta ^{w}_{a,k}\biggr),\quad \forall w\in{W}, \forall{k\in{K_{w}}}, \\ &\bigl(u_{w}-D^{-1}_{w}(q_{w})\bigr), \quad \forall w\in{W}, \\ &\biggl(c_{a}+y_{a}- \sum_{w\in{W}} \sum_{k\in {K_{w}}}f^{w}_{k} \delta^{w}_{a,k}\biggr), \quad \forall a\in{A} \end{aligned} \right]. \end{aligned}$$

where \(l(q)\), \(l(x)\), \(l(\lambda)\), \(l(f)\), \(l(u)\), and \(l(\tau)\) denote the lengths of q, x, λ, f, u, and τ, respectively.

Then problem (7) can be put in the general framework of mathematical programs with complementarity constraints (MPCC) in the following standard form:

$$ (\mathrm{MPCC})\quad \begin{aligned} &\min f_{0}(z) \\ &\quad \mbox{s.t. } 0\leq G(z)\perp H(z)\geq0, \\ & \hphantom{\quad \mbox{s.t. }}E(z) \in K, \end{aligned} $$
(8)

where \(f_{0}:\mathbb{R}^{n}\rightarrow\mathbb{R}\), \(G, H:\mathbb {R}^{n}\rightarrow\mathbb{R}^{m}\), \(E: \mathbb{R}^{n}\rightarrow\mathbb{R}^{p}\) are smooth functions, and \(K \subset\Re^{p}\) is a closed convex set.

3 A perturbation approach for MPCC

To solve the MPCC problem (8), we proposed a smoothing-based function in this section. First, let us rewrite problem (8) in the following form:

$$\begin{aligned}& \min_{x,u,v} f_{0}(z) \\& \quad \mbox{s.t. } 0\leq u\perp v\geq0, \\& \hphantom{\quad \mbox{s.t. }} G(z)-u=0, \\& \hphantom{\quad \mbox{s.t. }} H(z)-v=0, \\& \hphantom{\quad \mbox{s.t. }} E(z) \in K. \end{aligned}$$
(9)

Define

$$\overline{K}=\{0_{m}\}\times\{0_{m}\}\times K, \qquad \overline{E}(z, u,v)=\bigl[G(z)-u;H(z)-v;E(z)\bigr]. $$

One can observe that problem (9) is equivalent to

$$ (\mathrm{P})\quad \begin{aligned} &\min _{z,u,v,\lambda} \bar{f}(z,u,v) \\ &\quad \mbox{s.t. } 0\leq u\perp v\geq0, \end{aligned} $$
(10)

where

$$\bar{f}(z,u,v):=f(x)+\delta_{\overline{K}}\bigl(\overline{E}(z, u,v)\bigr) $$

and

$$\delta_{\overline{K}}\bigl(\overline{E}(z, u,v)\bigr)= \textstyle\begin{cases} 0, &(z,u,v)\in{\overline{K}}, \\ \infty, &(z,u,v)\notin{\overline{K}}. \end{cases} $$

Then z minimizes f over \(\overline{K} \Longleftrightarrow z\) minimizes \(f(x)+\delta_{\overline{K}}(\overline{E}(z, u,v))\) over \(\mathbb{R}^{n}\).

Now, let us focus on solving problem (10), which is still an MPCC problem. For such a problem, even if \(\bar{f}\) is smooth, it is not suitable to treat it as a traditional nonlinear programming problems, for the reason explained in Examples 3.1.1 and 3.1.2 in Luo et al. [35] that even the basic constraint qualification (namely the tangent cone equivalent to the linearized cone at an optimal solution) does not hold. In these examples, the Mangasarian-Fromovitz constraint qualification may fail to be met near the optimal solution and the boundedness of the set of Lagrange multipliers is not guaranteed. To overcome this difficulty, various relaxation approaches have been proposed to deal with the complementarity constraints. Facchinei et al. [36] and Fukushima and Pang [37] used \(\phi_{\mu} (a,b)=0\) to approximate the complementarity relation \(0 \leq a\), \(0 \leq b\), \(ab=0\), where \(\phi_{\mu}(a,b)\) is the smoothed Fischer-Burmeister function. We have

$$ \phi_{\mu}(a,b)=a+b-\sqrt{a^{2}+b^{2}+2 \mu^{2}}. $$
(11)

Scholtes [38] used

$$a \geq0,\qquad b \geq0,\qquad ab \leq\mu, $$

and recently Lin and Fukushima [39] proposed the following:

$$(a+\mu) (b+\mu) \geq\mu^{2}\quad \mbox{and}\quad ab\leq \mu^{2} $$

to relax the complementarity relationship between a and b.

In this research, we adopt the smoothed Fischer-Burmeister function to deal with the complementarity constraints, so the perturbed problem (8) is defined as follows:

$$ (\mathrm{P}_{\mu})\quad \begin{aligned} & \min _{z,u,v,\lambda} \bar{f}(z,u,v) \\ &\quad \mbox{s.t. } \Psi_{\mu}(u,v)=0, \end{aligned} $$
(12)

where

$$\Psi_{\mu}(u,v)=\left [ \textstyle\begin{array}{@{}c@{}} \psi_{\mu}(u_{1},v_{1}) \\ \vdots \\ \psi_{\mu}(u_{m},v_{m}) \end{array}\displaystyle \right ], $$

and \(\psi_{\mu}\) is defined by (11). The difference between our methodology from that of Facchinei et al. [36] and Fukushima and Pang [37] is that we use the variational analysis technique in Rockafellar and Wets [40] to establish the convergence property of the solution set \(\operatorname {SOL}(P_{\mu})\) to \(\operatorname{SOL}(P)\). Let us write the feasible region problem (\(\mathrm{P}_{\mu}\)) as

$$\Omega(\mu):= \bigl\{ (z,u,v) \in\mathbb{R}^{n}\times \mathbb{R}^{m}\times \mathbb{R}^{m}: \Psi_{\mu}(u,v)=0 \bigr\} . $$

Obviously, \(\psi_{0} (a, b)=0\) if and only if \(0 \leq a\), \(0 \leq b\), \(ab=0\). Therefore \(\Omega(0)\) is the feasible set of MPCC problem. Since z is not constrained in the problem (\(\mathrm{P}_{\mu}\)), for simplicity we can rewrite \(\Omega(\mu)\) in the following form:

$$ \Omega(\mu):= \bigl\{ (u,v) \in\mathbb{R}^{m}\times \mathbb{R}^{m}: \Psi_{\mu}(u,v)=0 \bigr\} . $$
(13)

We first analyze the convergence of the smoothing perturbation-based approach by demonstrating the convergence of \(\Omega(\mu)\) to \(\Omega (0)\) as \(\mu\searrow0\).

Lemma 3.1

For \(\Omega(\mu)\) defined by (13), we have

$$\lim_{\mu\searrow0} \Omega(\mu)=\Omega(0). $$

Proof

For any \((u,v) \in\limsup_{\mu\searrow 0} \Omega(\mu)\), there exist \(\mu_{k} \searrow0\) and \((u^{k},v^{k}) \in \Omega(\mu_{k})\) such that \((u^{k},v^{k}) \rightarrow(u,v)\). The inclusion \((u^{k},v^{k}) \in\Omega(\mu_{k})\) implies

$$u^{k}+v^{k}-\sqrt{\bigl(u^{k} \bigr)^{2}+\bigl(v^{k}\bigr)^{2}+2 \mu_{k}^{2}}=0. $$

Then, letting \(k \rightarrow\infty\), we have

$$u+v-\sqrt{u^{2}+v^{2}}=0, $$

namely \(\Psi_{0}(u, v)=0\) and \((u, v) \in\Omega(0)\). Therefore we have

$$\limsup_{\mu\searrow0} \Omega(\mu) \subset\Omega (0). $$

For any \((u, v) \in\Omega(0)\), let

$$I_{+}=\{i: u_{i} >0\},\qquad J_{+}=\{i: v_{i} >0\}, \qquad I_{0}=\{1,\ldots,m\}\setminus (I_{+}\cup J_{+} ). $$

For any \(\mu>0\) defined \((u(\mu),v(\mu))\) by

$$\bigl(u_{i}(\mu),v_{i}(\mu)\bigr)= \textstyle\begin{cases} \bigl(u_{i}, \mu^{2}/u_{i}\bigr) & \mbox{if } i \in I_{+}, \\ \bigl( \mu^{2}/v_{i},v_{i}\bigr) & \mbox{if } i \in J_{+}, \\ (\mu,\mu) & \mbox{if } i \in I_{0}. \end{cases} $$

Then \(\psi_{\mu}(u_{i}(\mu),v_{i}(\mu))=0\) for \(i=1,\ldots, m\) or equivalently \(\Psi_{\mu}(u(\mu),v(\mu))=0\) or \((u(\mu), v(\mu)) \in \Omega(\mu)\). Obviously \((u(\mu),v(\mu)) \rightarrow(u,v)\) and this implies that

$$\liminf_{\mu\searrow0} \Omega(\mu) \supset\Omega (0). $$

Therefore \(\Omega(\mu) \rightarrow\Omega(0)\) as \(\mu\searrow0\). □

Let us introduce the following notations:

$$\kappa(\mu):=\inf\bigl\{ \bar{f}(z,u,v) \mid (u,v) \in\Omega(\mu)\bigr\} \quad \mbox{and}\quad S(\mu):=\operatorname{Argmin}\bigl\{ \bar{f}(z,u,v) \mid (u,v) \in \Omega(\mu)\bigr\} . $$

The following theorem shows the convergence of the smoothing approach for solving the MPCC problem, which is characterized by using the terminology in variational analysis.

Theorem 3.1

Assume that \(\bar{f}\) is level-bounded. Then the function \(\kappa(\mu)\) is continuous at 0 with respect to \(\mathbb{R}_{+}\) and the set-valued mapping \(S(\mu)\) is outer semi-continuous at 0 with respect to \(\mathbb{R}_{+}\).

Proof

As \(\bar{f}\) is level-bounded, we see that \(\kappa(\mu)\) is finite and \(S(\mu) \ne\emptyset\) for any \(\mu\geq0\). Let

$$\hat{f}_{\mu}(z,u,v)=\bar{f}(z,u,v)+\delta_{\Omega(\mu)}(u,v), $$

where \(\delta_{\Omega(\mu)}\) is the indicator function of \(\Omega(\mu )\). From Lemma 3.1, \(\Omega(\mu)\rightarrow\Omega(0)\) as \(\mu\searrow0\), \(\hat{f}_{\mu}\) epi-converges to \(\hat {f}_{0}\). The level-boundedness of \(\hat{f}_{\mu}\) is easily verified for \(\mu\geq0\). Therefore, we see from Theorem 7.41 of Rockafellar and Wets [40] that the function \(\kappa(\mu)\) is continuous at 0 with respect to \(\mathbb{R}_{+}\) and the set-valued mapping \(S(\mu)\) is outer semi-continuous at 0 with respect to \(\mathbb{R}_{+}\). □

Now, we discuss the computational issue for problem (\(\mathrm{P}_{\mu}\)) when \(\mu>0\) is small enough. For any \(\mu>0\) and \(x \in{\mathbf{R}}^{n}\), we have

$${\mathcal{J}}_{u,v}\Psi_{\mu}(u,v)=\bigl[{ \mathcal{J}}_{u}\Psi_{\mu}(u,v) {\mathcal{J}}_{v} \Psi_{\mu}(u,v)\bigr], $$

where

$${\mathcal{J}}_{u}\Psi_{\mu}(u,v)= \left [ \textstyle\begin{array}{@{}c@{\quad}c@{\quad}c@{}} 1- \frac{u_{1}}{\sqrt{u_{1}^{2}+v_{1}^{2}+2\mu^{2}}} & & \\ & \ddots& \\ & & 1- \frac{u_{m}}{\sqrt{u_{m}^{2}+v_{m}^{2}+2\mu^{2}}} \end{array}\displaystyle \right ] $$

and

$${\mathcal{J}}_{v}\Psi_{\mu}(u,v)= \left [ \textstyle\begin{array}{@{}c@{\quad}c@{\quad}c@{}} 1- \frac{v_{1}}{\sqrt{u_{1}^{2}+v_{1}^{2}+2\mu^{2}}} & & \\ & \ddots& \\ & & 1- \frac{v_{m}}{\sqrt{u_{m}^{2}+v_{m}^{2}+2\mu^{2}}} \end{array}\displaystyle \right ]. $$

Obviously for any \(\mu>0\) and \((u,v) \in{\mathbf{R}}^{2m}\), both \({\mathcal {J}}_{u}\Psi_{\mu}(u,v)\) and \({\mathcal{J}}_{v}\Psi_{\mu}(u,v)\) are nonsingular matrices, we can easily obtain the following conclusion.

Corollary 3.2

Let \(\mu>0\). Then for any \((u,v) \in\Omega(\mu)\) the linear independence constraint qualification (LICQ) holds and the tangent cone of \(\Omega(\mu)\) at \((u,v)\) is

$$T_{\Omega(\mu)}(u,v)= \bigl\{ (\triangle u, \triangle v) \in{ \mathbf{R}}^{2m}: {\mathcal{J}}_{u,v}\Psi_{\mu}(u,v) ( \triangle u, \triangle v)=0 \bigr\} , $$

and the normal cone of \(\Omega(\mu)\) at \((u,v)\) is

$$N_{\Omega(\mu)}(u,v)={\mathcal{J}}_{u,v}\Psi_{\mu}(u,v)^{T}{ \mathbf{R}}^{m}. $$

Now we rewrite problem (\(\mathrm{P}_{\mu}\)) as follows:

$$\begin{aligned}& \min_{z,u,v} f_{0}(z) \\& \quad \mbox{s.t. } \overline{E}(z,u,v) \in\overline{K}, \\& \hphantom{\quad \mbox{s.t. }} \Psi_{\mu}(u,v)=0. \end{aligned}$$
(14)

The Lagrangian for problem (\(\mathrm{P}_{\mu}\)) is defined as

$$L(u,v,\lambda)=f_{0}(z)+\bigl\langle \lambda, \Psi_{\mu}(u,v) \bigr\rangle +\bigl\langle \chi, \overline{E}(z,u,v) \bigr\rangle . $$

If \((\bar{z},\bar{u},\bar{v})\) is a local minimizer for problem (\(\mathrm {P}_{\mu}\)) and the basic constraint qualification (from Rockafellar and Wets [40]) holds, namely

$$ \left . \textstyle\begin{array}{l} 0\in{\mathcal{J}}_{z,u,v} \overline{E}(\bar{z},\bar{u},\bar{v})^{T} \chi +\{0_{n}\} \times N_{\Omega(\mu)}(\bar{u},\bar{v}), \\ \chi\in N_{\overline{E}}(\bar{z},\bar{u},\bar{v}) \end{array}\displaystyle \right \} \quad \Longrightarrow\quad \chi=0, $$
(15)

then the Karush-Kuhn-Tucker (KKT) conditions for (\(\mathrm{P}_{\mu}\)) are satisfied, namely

$$ {\mathcal{J}}_{z,u,v}L(\bar{z},\bar{u},\bar{v},\bar{ \lambda},\bar{\chi })=0, \qquad \Psi_{\mu}(\bar{u},\bar{v})=0, \quad \bar{\chi} \in N_{\overline {E}}(\bar{z},\bar{u},\bar{v}) $$
(16)

and

$$ {\mathcal{J}}_{z,u,v}L(\bar{z},\bar{u},\bar{v},\bar{ \lambda},\bar{\chi })=0, \qquad \Psi_{\mu}(\bar{u},\bar{v})=0, \quad \bar{\chi} \in N_{\overline {E}}(\bar{z},\bar{u},\bar{v}). $$
(17)

As the linear independence constraint qualification holds at any feasible solution of (\(\mathrm{P}_{\mu}\)), we see that if there exists \(\bar{\lambda}\) such that the above KKT condition holds, then \(\bar {\lambda}\) is unique. The following proposition gives the second-order sufficient conditions at a KKT point of (\(\mathrm{P}_{\mu}\)).

Proposition 3.3

Let \((\bar{z}, \bar{u},\bar{v},\bar{\lambda},\bar{\chi})\) be a Karush-Kuhn-Tucker point for (\(\mathrm{P}_{\mu}\)). Suppose the following condition holds:

$$ \bigl\langle d, \nabla^{2}_{(z, u,v)}L(\bar{z}, \bar{u},\bar{v},\bar {\lambda})d \bigr\rangle > 0 $$
(18)

for

$$\forall d\ne0 \textit{ satisfying } {\mathcal{J}}_{z,u,v} \Psi_{\mu}(\bar {u},\bar{v}) (d_{u};d_{v})=0 \textit{ and } {\mathcal{J}}_{z,u,v} \overline {E}(\bar{z}, \bar{u}, \bar{v})d\in T_{\overline{K}}\bigl(\overline{E}(\bar{z}, \bar{u},\bar{v})\bigr). $$

Then the second-order growth condition holds at \((\bar{u},\bar{v})\), namely, there exist positive numbers \(\gamma>0\) and \(\delta>0\) such that

$$f_{0}(z)-f_{0}(\bar{z}) \geq\gamma\bigl\Vert (z,u,v)-( \bar{z},\bar{u},\bar{v})\bigr\Vert ^{2},\quad \forall(z, u,v) \in \bigl[ \Re^{n} \times\Omega(\mu) \bigr] \cap\mathbf {B}_{\delta}( \bar{z}, \bar{u},\bar{v}), $$

where

$$\mathbf{B}_{\delta}(\bar{z}, \bar{u},\bar{v})=\bigl\{ (z,u,v) \in{R^{n}}\mid\bigl\Vert (z,u,v)-(\bar{z},\bar{u},\bar{v})\bigr\Vert < \delta\bigr\} . $$

Proof

First we need the notion of the second-order derivative, which has been studied extensively in Chapter 13 of Rockafellar and Wets [40]. For any extended-real-value of the function \(f:R^{n}\rightarrow\bar{R}\) with \(f(z)\) finite and \(u,v\in{R^{n}}\), the second subderivative of f at z for u and v is defined by

$$d^{2}f_{0}(z|u) (v)= \liminf_{\tau\rightarrow0+,v'\rightarrow {v}} \frac{f_{0}(z+\tau{v'})-f_{0}(z)-\tau\langle u,v'\rangle}{\frac {1}{2}\tau^{2}}. $$

By definition, it is easy to verify that

$$ d^{2}f_{0}(\bar{z}|0) \bigl(v' \bigr)=2\bigl[df_{0}^{\frac{1}{2}}(\bar{z}) (v)\bigr]^{2}, \quad \forall v\in{R^{n}}. $$
(19)

In view of (16) and (17) and by applying Example 13.6 and Proposition 13.19 of Rockafellar and Wets [40], we have, for each \(u\in{\partial{\hat{d}}}f_{0}(\bar{z})\) and \(v\in \ker df_{0}(\bar{z})\cap{u}^{\perp}\),

$$ d^{2}f_{0}(\bar{z}|u) (v)=\max \left \{ \bigl\langle {v,\nabla^{2}_{zz}L_{0}{(\bar {z}, \lambda)}v}\bigr\rangle \Biggm| \textstyle\begin{array}{l} \nabla_{z}L_{0}{(\bar{z},\lambda)}=u, \\ \lambda_{i}=0, \forall i\notin{I(\bar{z})}, 0\leq{\lambda_{i}\geq {1}}, \forall i\in{I(\bar{z})}, \\ -1\leq\lambda_{j}\geq1, \forall j\in{J} \end{array}\displaystyle \right \}. $$
(20)

In view of (19) and (20), the result follows immediately. This completes the proof. □

4 Numerical examples

The perturbed NLP (\(\mathrm{P}_{\mu}\)) using the smoothing F-B function for the combined road toll pricing and capacity expansion is expressed as follows:

$$\begin{aligned}& \min F= \sum _{a\in{A}}t_{a}\bigl(x_{a}(\tau) \bigr)x_{a}(\tau) +\theta \sum_{a\in{A}}{g_{a}(y_{a})}- \sum_{w\in {W}}h_{w}(q_{w}) \\ & \quad \mbox{s.t. } {\tau_{a}}\geq0, \\ & \hphantom{\quad \mbox{s.t. }}\psi_{FB}\bigl(f^{w}_{k}, \eta_{a},\mu\bigr)=0, \quad \forall a\in{A}, \forall w\in{W}, \forall k \in{K_{w}}, \\ & \hphantom{\quad \mbox{s.t. }}\psi_{FB}(\lambda_{a},\zeta_{a}, \mu)=0, \quad \forall a\in{A}, \\ & \hphantom{\quad \mbox{s.t. }}q_{w}= \sum_{k\in{K_{w}}}f^{w}_{k}, \quad \forall w\in{W}, \\ & \hphantom{\quad \mbox{s.t. }}x_{a}= \sum_{w\in{W}} \sum _{k\in {K_{w}}}{f^{w}_{k} \delta^{w}_{a,k}}, \quad \forall a\in{A}, \\ & \hphantom{\quad \mbox{s.t. }}x_{a},y_{a},\tau_{a}, \lambda_{a},\eta_{a},\zeta_{a},f^{w}_{k},u_{w},q_{w} \geq 0,\quad \forall a\in{A}, \forall w\in{W}, \forall k\in {K_{w}}, \end{aligned}$$
(21)

where

$$\eta_{a}=c^{w}_{k}-u_{w}+ \sum _{a\in{A}}(\lambda_{a}+\tau _{a}) \delta^{w}_{a,k},\qquad \zeta_{a}=c_{a}+y_{a}- \sum_{w\in{W}} \sum_{k\in{K_{w}}}{f^{w}_{k} \delta^{w}_{a,k}}. $$

In this section, three numerical examples for combined road toll pricing and capacity expansion problems with homogeneous road users are analyzed to illustrate the applicability of the proposed model. In these examples, we adopt the SQP solver in MATLAB to solving the smoothing subproblem NLP (\(\mathrm{P}_{\mu}\)). All experiments are carried out using MATLAB 8.3.0 (2014a), 64-bit, on a desktop computer with the Intel (R), Core (TM) 2 of 3.3 GHz CPU and 4 GB RAM executed in Windows 7.

The link travel function \(t_{a}\) is defined by

$$ t_{a}(x_{a},y_{a}, \tau_{a})= \biggl(A_{a}+B_{a} \biggl( \frac{x_{a}}{c_{a}+y_{a}} \biggr)^{4} \biggr)+\tau_{a}. $$
(22)

The functions \(g_{a}(y_{a})\) and \(h_{w}(q_{w})\) are defined by

$$ g_{a}(y_{a})=y_{a}, \quad \forall a\in \mathcal{A},\qquad h_{w}(q_{w})=\varphi _{w}- \alpha_{w}q_{w},\quad \forall w\in{W}, $$

where \(\varphi_{w}\), \(\alpha_{w}\) are the parameters of function \(h_{w}(q_{w})\).

Example 1

(A 5-link network)

This network consists of five links and four nodes as shown in Figure 1. The total travel demand and the corresponding parameters are given in Table 1, where \(A_{a}\) denotes the free flow travel time in link a, \(x_{a}\) represents the link flows, and \(B_{a}\) is the link specific constant. \(y_{a}\) is the link capacity expansion variable, which is set to 20 and \(c_{a}\) is the link capacity of each link. In this study, we adopt the flat road toll policy where road users are charged with the same amount of fees in every access point of the links.

Figure 1
figure 1

A 5-link network.

Table 1 Parameters for the 5-link network

The numerical results indicate that the social welfare decreases slightly from case 1 to case 3 and increases slightly in case 4. The results show that when the road toll and capacity expansion are combined the general trend is that charging or expanding a larger number of links can worsen the network performance, because adding more toll locations allows the travelers to shift their routes at undesirable points so as to increase the total system travel costs. On the other hand, increasing the capacity in many links may improve the network performance because the flow toward the link may fall or rise unlike when only the road toll is used (see Table 2).

Table 2 Results for the 5-link network for different link tolls

Example 2

(A 16-link network)

The second numerical example consists of 16 links and six nodes as shown in Figure 2. The parameters for testing this example are presented in Table 3.

Figure 2
figure 2

A 16-link network.

Table 3 Parameters for a 16-link network

In this numerical experiment, some links will produce zero link flow as observed in Table 4. Although some of these links give zero flow in the optimization process, yet they can be used for investment because on self-optimizing these links they can reach suboptimality. The results further show that road users will spend less time when selecting the route with non-zero link flow than those choosing the path of link flow zero. When the road capacity is determined prior to road pricing, there is an additional indirect channel through which capacity investment affects traffic and investment impacts road pricing, which in turn will affect traffic flow.

Table 4 Results for a 16-link network for different link tolls

Table 4 shows that charging a single link and expanding others links provides a better result in the system optimum resulting in an increased network performance. It can be seen form this result that it is not always advantageous to consider a large number of road toll or capacity links when improving networks. This result indicates that road tolls have the ability to discourage the trips of road users resulting in a reduction of traffic congestion and the investments can be seen as a wastage. The type of road toll pricing associated with other factors such as quality of public transport services with induced demand depending on the time measured may significantly affect the capacity expansion strategy when usage is underpriced.

Example 3

(A 17-link network)

This example consists of 17 links and 12 nodes as described in Figure 3. The parameters for testing this example are presented in Table 5.

Figure 3
figure 3

A 17-link network.

Table 5 Parameters for a 17-link network

It is observed in Table 6 that adding more road toll locations tends initially to decrease and later to increase the total system travel times/costs. These results show the importance of selecting the number of road toll locations in addition to the toll rates because adding more toll locations tends to shift traveler routes to undesirable ways, increasing the total travel times. These findings are consistent with the result illustrated in the Braess paradox [41], in which the closure of some roads improves the performance of the road network and increases social welfare.

Table 6 Comparison results for a 17-link network with tolls

The overall implication of the results for combined road toll pricing and capacity expansion are two-fold:

  • The capacity expansion relieves congestion and lower congestion charges has a negative effects on the price.

  • Capacity expansion improves transportation service; particularly traffic congestion would lead to a higher willingness-to-pay by road users, which has a positive effect on the price.

These findings call for planners to coordinate properly all decisions regarding the capacity expansion and congestion pricing to improve transportation systems.

5 Conclusions

In this paper, we have formulated the capacity expansion with the combined road pricing problem as a bi-level program, where the upper level optimizes the link capacity expansion vector and maximizes the social welfare, while the lower level determines the demand and the flow satisfying the Wardrop principles. Then the bi-level program is transformed to the MPCC model. The smoothing approach is proposed to solve the MPCC problem and this approach overcomes the lack of a suitable set of constraint qualifications. Under the mild conditions, the convergence property studied in this paper shows that the global optimal solution of the perturbed problem converges to the original solutions of the MPCC problem.

The perturbation-based approach and the established model were tested on 5-link, 16-link, and 17-link road networks, widely used to analyze transportation networks. The numerical experiments indicate that the proposed model can be applied to solve various user equilibrium transportation problems efficiently. The proposed model can be employed to analyze the multi-modal transportation networks to improve the environmental pollution caused by transport emissions.

The proposed model with the findings can be used by the planner to allocate the links for pricing and expansion under budget constraints. Although the proposed model may be computationally time-demanding and it may take time to find the optimal solution for large-sized network design, yet it can easily be converted to a smaller dimensional problem and solved. The numerical examples show that the proposed model can produce a better solution of the combined road toll pricing and capacity expansion problem after solving the model several times with different values of the parameters.