1 Problem description

In the paper, we will consider the following principal–agent bilevel programming problem, in which the principal can neither observe nor verify the agent’s action:

$$ \begin{gathered} \max_{s(x),a\in A} U \bigl(s(x),a\bigr), \\ \begin{aligned} \text{s.t.} \quad & V\bigl(s(x),a\bigr)\geq V_{0} \\ & a\in \arg \max V\bigl(s(x),a\bigr), \end{aligned} \end{gathered} $$
(1)

where \(U(s(x),a)\) and \(V(s(x),a)\) denote expected utility of the principal and agent, respectively, \(s(x)\) is the contract when the agent’s action is a and a realized output \(x\in X=[\underline{x}, \overline{x}]\subset {\mathbb{R}}_{++}\) which denotes the set of possible outcomes. The agent chooses an action \(a\in [\underline{a}, \overline{a}]\) on the basis of the agreed payment schedule \(s(x)\) and has a separable von Newmann–Morgenstern utility \(v(s(x),a)=v(s(x))-c(a)\), which is concave. Let \(F(x,a)\) and \(f(x,a)\) denote the continuous distribution and density function of outcome x given that action a is undertaken by the agent, which are assumed sufficiently continuous differentiable for all x and a.

Since the principal–agent model was proposed, an efficient and conventional method for analyzing the problem (1) was the so-called first-order approach. But Mirrlees [1, 2] was the first scholar to point out that the analyzing method is generally incorrect. In the subsequent paper, Mirrlees [3] gave the validity conditions for implying the first-order approach which is named the monotone likelihood ratio condition (MLRC) and the convexity of the distribution function condition (CDFC). Then Rogerson [4] offered a correct and much simpler proof than Mirrlees. Jewitt [5] provided some conditions which can be used for justifying the first-order approach in the multi-statistic case. For more literature about the first-order approach for solving principal–agent problems see, e.g., [6,7,8,9,10,11,12,13,14] and the references therein.

However, since the problem (1) is an infinite-dimensional nonconvex bilevel programming, lots of papers have focused on the theoretical analysis of the validity on first-order approach, few papers were presented for directly solving the principal–agent model. Prescott [15] computed solutions to moral-hazard programs using the Dantzig–Wolfe decomposition algorithm in a discrete condition, but which is required to be block angular in order to turn it into a linear program (LP) and as the cardinalities of the underlying sets increase, LP quickly grows in size making computation infeasible. Su and Judd [16] studied computational aspects of formulating moral-hazard problems as a mathematical program with equilibrium constraints (MPEC) and proposed a hybrid procedure that combined the linear programming (LP) approach with lotteries, but this algorithm only obtained the local convergence. Armstrong et al. [17] formulated two complementary generalized principal–agent models that incorporate features observed in a real-world contracting environment as MPEC and solved the resulting models by the state-of-the-art numerical algorithms. Cecchini et al. [18] solved numerically the principal–agent problems written as a linear-exponential-normal model by solving bilevel programming problems using the ellipsoid algorithm in the case of assuming the performance measures is linear. To solve the problem, Zhu et al. [19] firstly proposed a modified constraint shifting homotopy method by designing a piecewise linear contractual function under some typical risk averse utility functions and the typical distribution functions. Renner and Schmedders [20] reformulated the agent’s utility maximization problem as an equivalent system of equations and inequalities under the assumption that the agent’s expected utility is a rational function of the action and computed an approximate solution to the nonpolynomial problems transformed by the principal’s utility maximization problem using the polynomial optimization approach. For the general distribution functions, Zhu and Yu [21] presented a constraint set swelling homotopy method for computing the solution to its Karush–Kuhn–Tucker (KKT) systems under the case of designing piecewise linear contractual function by using the composite Simpson’s rule to approximately compute integration, and they proved the global convergence of the homotopy path under much weaker conditions. Moreover, Zhu and Yu [22] proposed another modified homotopy method for computing the solution to its KKT systems under requiring only an interior point and, not necessarily, a feasible initial approximation for the constraint shifting set. However, it is usually assumed that the discrete nodes are known.

To directly solve the problem (1) by designing contractual function, as in the literature, the conditions MLRC and CDFC are also assumed to hold in this paper: 1. \(f(x,a)\) satisfies MLRC, i.e., \(\frac{f_{a}}{f}\) is nondecreasing in x for every a; and 2. \(f(x,a)\) satisfies CDFC, i.e., \(F_{aa}(x, a)\geq 0\). In this respect, the incentive compatibility constraint can be replaced by the first-order condition and the principal–agent problem can be turned into the following equivalently single-level nonconvex programming:

$$ \begin{gathered} \max_{s(x),a\in A} U \bigl(s(x),a\bigr), \\ \begin{aligned} \text{s.t.} \quad & V\bigl(s(x),a\bigr)\geq V_{0} \\ & V_{a}\bigl(s(x),a\bigr)= 0. \end{aligned} \end{gathered} $$
(2)

Then the solution to the principal–agent problems can be computed by solving the equivalent problem (2).

Since the combined homotopy method has global convergence and can be efficiently implemented for solving nonlinear programming, fixed point problems, variational inequalities, bilevel programming and other nonlinear problems, it has been paid much attention to since the 1990s; see, e.g., [23,24,25,26,27,28,29,30,31,32,33]. Since for the existing combined homotopy methods for solving the equivalent problem (2) it is usually assumed that the discrete nodes are known and the contract is designed as a piecewise linear function, to design a quadratic spline contractual function in the case of discretely unknown nodes, a new constraint shifting combined homotopy for solving the KKT systems of the principal–agent problem will be constructed and the global convergence will be proved in this paper.

In the paper, the standard model of principal–agent problems with designing quadratic spline contractual function is considered. In Sect. 2, the constraint shifting combined homotopy for the basic principal–agent problems is constructed and some lemmas from differential topology are introduced. In Sect. 3, the main results will be presented and the existence of a smooth path from any given initial point in shifted feasible set to a solution to the KKT systems is proven. In Sect. 4, a detailed predictor–corrector algorithm is presented.

2 Preliminaries

We assume that the wage contract is the following quadratic spline function. Setting the discrete nodes which are unknown in \(X=[ \underline{x},\overline{x}]\) satisfying \(\underline{x}=x_{1}\leq x _{2}\leq \cdots \leq x_{m}\leq x_{m+1}=\overline{x}\), the wage contract is

$$\begin{aligned}& s(x)= \textstyle\begin{cases} p_{1} x^{2}+q_{1} x+r_{1}, & x_{1}\leq x\leq x_{2}, \\ p_{2} x^{2}+q_{2} x+r_{2}, & x_{2}\leq x\leq x_{3}, \\ \vdots \\ p_{m} x^{2}+q_{m} x+r_{m}, & x_{m}\leq x\leq x_{m+1}, \end{cases}\displaystyle \end{aligned}$$

which satisfies \(p_{i} x_{i+1}^{2}+q_{i} x_{i+1}+r_{i}= p_{i+1}x_{i+1} ^{2}+q_{i+1}x_{i+1}+r_{i+1}\), \(2p_{i}x_{i+1}+q_{i}=2p_{i+1}x_{i+1}+q _{i+1}\), \(i=1,2,\ldots,m-1\), and \(p_{i}\leq 0\), \(i=1,2,\ldots,m\).

Then we can get the following expected utilities of the principal and agent, respectively:

$$\begin{aligned} U\bigl(s(x),a\bigr) =& \int _{x\in X} u\bigl(x-s(x)\bigr)f(x,a)\,dx \\ =& \int _{x\in X}\bigl(x-s(x)\bigr)f(x,a)\,dx \\ =&\sum_{i=1}^{m} \int _{x_{i}}^{x_{i+1}}\bigl(x-p_{i} x^{2}-q_{i}x-r _{i}\bigr)f(x,a)\,dx \end{aligned}$$

and

$$\begin{aligned} V\bigl(s(x),a\bigr) =& \int _{x\in X}v\bigl(s(x)\bigr)f(x,a)\,dx-a \\ =& \int _{x\in X}s(x)f(x,a)\,dx-a \\ =&\sum_{i=1}^{m} \int _{x_{i}}^{x_{i+1}}\bigl(p_{i} x^{2}+q_{i}x+r _{i}\bigr)f(x,a)\,dx-a. \end{aligned}$$

Because the integral does not change the differentiability, the expected utilities of the principal and agent are also sufficiently smooth for all x and a.

The expected utilities of principal and agent are concave on action a under CDFC. Therefore, the relaxed Pareto-optimization programming of the principal–agent problem for designing the wage contract as quadratic spline function can be reformulated as follows:

$$ \begin{gathered} \min_{(p_{i},q_{i},r_{i})\in \mathbb{R}} \sum _{i=1} ^{m} \int _{x_{i}}^{x_{i+1}}\bigl[p_{i} x^{2}+(q_{i}-1)x+r_{i}\bigr]f(x,a)\,dx \\ \begin{aligned} \text{s.t.} \quad &-\sum_{i=1}^{m} \int _{x_{i}}^{x_{i+1}}\bigl(p_{i} x^{2}+q_{i}x+r _{i}\bigr)f(x,a) \,dx+a+V_{0}\leq 0, \\ & a-\overline{a}\leq 0, \\ & -a+\underline{a}\leq 0, \\ & p_{i}\leq 0, \quad i=1,2,\ldots,m, \\ & p_{i} x_{i+1}^{2}+q_{i} x_{i+1}+r_{i}- p_{i+1}x_{i+1}^{2}-q_{i+1}x _{i+1}-r_{i+1}=0,\quad i=1,2,\ldots,m-1, \\ & 2p_{i}x_{i+1}+q_{i}-2p_{i+1}x_{i+1}-q_{i+1}=0, \quad i=1,2,\ldots,m-1, \\ & \sum_{i=1}^{m} \int _{x_{i}}^{x_{i+1}}\bigl(p_{i} x^{2}+q_{i}x+r _{i}\bigr)f_{a}(x,a) \,dx-1=0. \end{aligned} \end{gathered} $$
(3)

For convenience, we give the following notation, respectively:

$$\begin{aligned}& F(p,q,r,a,\zeta )=\sum_{i=1}^{m} \int _{x_{i}}^{x_{i+1}}\bigl(p _{i} x^{2}+q_{i}x+r_{i}-x\bigr)f(x,a)\,dx, \\& g_{1}(p,q,r,a,\zeta )=p_{1}, \\& g_{2}(p,q,r,a,\zeta )=p_{2}, \\& \vdots \\& g_{m}(p,q,r,a,\zeta )=p_{m}, \\& g_{m+1}(p,q,r,a,\zeta )=-\sum_{i=1}^{m} \int _{x_{i}}^{x_{i+1}}\bigl(p _{i} x^{2}+q_{i}x+r_{i}\bigr)f(x,a) \,dx+a+V_{0}, \\& g_{m+2}(p,q,r,a,\zeta )=a-\overline{a}, \\& g_{m+3}(p,q,r,a,\zeta )=-a+\underline{a}, \\& h_{1}(p,q,r,a,\zeta )=p_{1} x_{2}^{2}+q_{1}x_{2}+r_{1}-p_{2}x_{2} ^{2}-q_{2}x_{2}-r_{2}, \\& h_{2}(p,q,r,a,\zeta )=p_{2} x_{3}^{2}+q_{2}x_{3}+r_{2}-p_{3}x_{3} ^{2}-q_{3}x_{3}-r_{3}, \\& \vdots \\& h_{m-1}(p,q,r,a,\zeta )=p_{m-1} x_{m}^{2}+q_{m-1}x_{m}+r_{m-1}- p _{m}x_{m}^{2}-q_{m}x_{m}-r_{m}, \\& h_{m}(p,q,r,a,\zeta )=2p_{1}x_{2}+q_{1}-2p_{2}x_{2}-q_{2}, \\& h_{m+1}(p,q,r,a,\zeta )=2p_{2}x_{3}+q_{2}-2p_{3}x_{3}-q_{3}, \\& \vdots \\& h_{2m-2}(p,q,r,a,\zeta )=2p_{m-1}x_{m}+q_{m-1}-2p_{m}x_{m}-q_{m}, \\& h_{2m-1}(p,q,r,a,\zeta )=\sum_{i=1}^{m} \int _{x_{i}}^{x_{i+1}}\bigl(p _{i} x^{2}+q_{i}x+r_{i}\bigr)f_{a}(x,a) \,dx-1, \end{aligned}$$

where \(p=(p_{1},p_{2},\ldots,p_{m})^{T}\), \(q=(q_{1},q_{2},\ldots,q _{m})^{T}\), \(r=(r_{1},r_{2},\ldots,r_{m}) \), \(a=a\), \(\zeta =(x_{2},x _{3}, \ldots,x_{m})^{T}\).

Then, when the wage contract is quadratic spline, the relaxed Pareto-optimization programming (3) can be written as follows:

$$ \begin{gathered} \min F(p,q,r,a,\zeta ), \\ \begin{aligned} \text{s.t.} \quad & g_{i}(p,q,r,a,\zeta )\leq 0, \quad i\in \{1,2,\ldots,m+3\}, \\ & h_{j}(p,q,r,a,\zeta )=0, \quad j\in \{1,2,\ldots,2m-1\}. \end{aligned} \end{gathered} $$
(4)

Let \({\mathbb{R}}^{n}\), \({\mathbb{R}}_{+}^{n}\), \({\mathbb{R}}_{++} ^{n}\) denote n-dimensional Euclidean space, nonnegative orthant and positive orthant of \({\mathbb{R}}^{n}\), respectively. It is well known that the solution of the optimization problem can be obtained from the KKT system for a convex nonlinear programming problem. But for a nonconvex nonlinear programming problem, we can only obtain the solution to its KKT system.

Therefore, the aim is to solve the following KKT system of the relaxed Pareto-optimization programming (4):

$$ \begin{gathered} \nabla F(p,q,r,a,\zeta )+\sum _{i=1}^{m+3}\nabla g_{i}(p,q,r,a, \zeta )y_{i}+\sum_{j=1}^{2m-1} \nabla h_{j}(p,q,r,a,\zeta )z _{j}=0, \\ h_{j}(p,q,r,a,\zeta )=0,\quad j=1,2,\ldots,2m-1, \\ y_{i}g_{i}(p,q,r,a,\zeta )=0,\quad y_{i}\geq 0,\quad\quad g_{i}(p,q,r,a,\zeta )\leq 0, \quad i=1,2,\ldots,m+3, \end{gathered} $$
(5)

where \(y\in {\mathbb{R}}{_{+}}^{m+3}\), \(z\in {\mathbb{R}}^{2m-1}\), \(\nabla =(\frac{\partial }{\partial p},\frac{\partial }{\partial q},\frac{ \partial }{\partial r},\frac{\partial }{\partial a},\frac{\partial }{ \partial \zeta })^{T}\).

Consider the shifted constraint functions be \(\widetilde{g_{i}}( \theta ,\mu )=g_{i}(\theta )-\mu ^{\sigma }\tau \), with \(\tau \in {\mathbb{R}}_{++}^{m+3}\) and \(\widetilde{h_{j}}(\theta ,\mu )=h_{j}( \theta )-\mu h_{j}(\theta ^{0})\), which made \(\widetilde{g_{i}}(\theta ,0)=g_{i}(\theta )\), and \(\widetilde{h_{j}}(\theta ,0)=h_{j}(\theta )\). Therefore, \(\nabla \widetilde{g_{i}}(\theta ,\mu )=\nabla g_{i}( \theta )\) and \(\nabla \widetilde{h_{j}}(\theta ,\mu )=\nabla h_{j}( \theta )\). Because of the properties of \(g_{i}(\theta )\) and \(h_{j}(\theta )\), \(\nabla \widetilde{g_{i}}(\theta ,\mu )\) and \(\nabla \widetilde{h_{j}}(\theta ,\mu )\) are also sufficiently smooth.

For convenience, the following notations will be used in the paper:

$$\begin{aligned}& \theta =(p,q,r,a,\zeta )^{T} , \\& \varOmega =\bigl\{ \theta \vert g_{i}(\theta )\leq 0,h_{j}(\theta )=0,i\in \{1,2,\ldots,m+3\},j\in \{1,2,\ldots,2m-1\} \bigr\} , \\& \varOmega _{\mu }=\bigl\{ \theta \vert \widetilde{g_{i}}( \theta ,\mu )\leq 0, \widetilde{h_{j}}(\theta ,\mu )=0,i\in \{1, \ldots,m+3\},j\in \{1,\ldots,2m-1\}\bigr\} , \\& \varOmega _{\mu }^{0}=\bigl\{ \theta \vert \widetilde{g_{i}}(\theta ,\mu )< 0, \widetilde{h_{j}}( \theta ,\mu )=0,i\in \{1,\ldots,m+3\},j\in \{1,\ldots,2m-1\}\bigr\} , \\& \partial \varOmega _{\mu }=\varOmega _{\mu }\backslash \varOmega _{\mu }^{0} , \quad \text{and} \\& I_{\mu }(\theta )=\bigl\{ i\vert \widetilde{g_{i}}(\theta ,\mu )=0,i=1,2,\ldots,m+3\bigr\} . \end{aligned}$$

The following assumptions will be used in this paper. A vector \(\tau \in {\mathbb{R}}_{++}^{m+3}\) and an open subset V of the inequality constraint set \(\{\theta : \widetilde{g}_{i}(\theta ,1)<0 \}\) are existed to satisfy the following conditions:

  1. (A1)

    \(\forall \mu \in [0,1]\), \(\varOmega _{\mu }\) is bounded and \(\varOmega _{\mu }^{0}\neq \phi \).

  2. (A2)

    \(\forall \theta \in V\), \(\nabla h(\theta )\) is a full column rank matrix. For any parameter \(\mu \in [0,1]\) and \(\theta \in \varOmega _{\mu }\), \((\nabla \widetilde{g_{i}}(\theta ,\mu )_{i\in I _{\mu }(\theta )},\nabla h(\theta ))\) is positive linearly independent at θ, i.e., \(\sum_{i\in I(\theta )}\alpha _{i}\nabla g _{i}(\theta )+\sum_{j=1}^{m}\beta _{j}\nabla h_{j}(\theta )=0\), \(\alpha _{i}\geq 0\), \(\beta _{j}\in {\mathbb{R}} \Rightarrow \alpha _{i}=\beta _{j}=0 \).

  3. (A3)

    (The normal cone condition) \(\forall \theta \in \partial \varOmega _{1}\), the normal cone of \(\varOmega _{1}\) only meets \(\partial \varOmega _{1}\) at θ, i.e., \(\forall \theta \in \varOmega _{1}\),

    $$ \biggl\{ \theta +\sum_{i\in I_{1}(\theta )}\nabla \widetilde{g_{i}}(\theta ,1)y _{i}+\nabla h(\theta ) z \Big\vert i\in I_{1}(\theta ), y_{i}\geq 0, z\in { \mathbb{R}} ^{m}\biggr\} \cap \varOmega _{1}=\{\theta \}. $$

In order to compute a solution to the KKT system of the relaxed Pareto-optimization programming (4), for any randomly chosen vector \((\theta ^{0}, \xi )\in V\times {\mathbb{R}}^{4m}\) and any given vector \(\eta \in {\mathbb{R}}_{++}^{m+3}\), we construct the constraint shifting combined homotopy as follows:

$$ H\bigl(w,w^{0},\mu \bigr)= \begin{pmatrix} (1-\mu )(\nabla F(\theta )+\nabla \widetilde{g}(\theta ,\mu )y)+ \nabla h(\theta )z+\mu (\theta -\theta ^{0})+\mu (1-\mu )\xi \\ Y \widetilde{g}(\theta ,\mu )+\mu \eta \\ h(\theta )-\mu h(\theta ^{0}) \end{pmatrix}, $$
(6)

where \(w=(\theta ,y,z)^{T}\in {\mathbb{R}}^{4m}\times {\mathbb{R}} _{+}^{m+3}\times {\mathbb{R}}^{2m-1}\), \(w^{0}=(\theta ^{0},y^{0},z^{0})^{T}\), and \(Y=\operatorname{diag}(y_{1},y _{2},\ldots,y_{m+3})\).

When \(\mu =0\), the homotopy equation \(H(w,w^{0},0)=0\) turns to the KKT system (5) of the relaxed Pareto-optimization programming (4).

Lemma 2.1

Suppose that \(H(w, w^{0}, \mu )\)is defined as (6) and assumptions (A1), (A2) and (A3) hold, then the homotopy equation \(H(w,w^{0},1)=0\)has a unique solution

$$ (\theta ,y,z)=\bigl(\theta ^{0},y^{0},z^{0} \bigr)=\bigl(\theta ^{0},-\bigl[\operatorname{diag}\bigl( \widetilde{g}\bigl(\theta ^{0},1\bigr)\bigr)\bigr]^{-1} \eta ,0\bigr). $$

Proof

Let \(\bar{w}=(\bar{\theta },\bar{y},\bar{z})\) be a solution of \(H(w,w^{0},1)\). From \(H(\bar{w},w^{0},1)=0\) and \(\bar{y}\geq 0\), we get \(\bar{\theta }\in \varOmega _{1}\). In the following, we will prove \(\bar{\theta }=\theta ^{0}\) by contradiction. We assume \(\bar{\theta } \neq \theta ^{0}\), which implies that \(\bar{z}\neq 0\), together with the first equation of \(H(w,w^{0},1)=0\), we have \(\theta ^{0}=\bar{\theta }+ \nabla h(\bar{\theta })\bar{z}\), which contradicts with assumption (A2). Hence, \(\bar{\theta }=\theta ^{0}\). From assumption (A2) and \(\nabla h(\bar{\theta })\bar{z}=0\), we get \(\bar{z}=0\). From the second equation of \(H(\bar{w},w^{0},1)=0\) and \(\theta ^{0}\in \varOmega _{1}^{0}\), we have \(\bar{y}=-[\operatorname{diag}(\widetilde{g}(\theta ^{0},1))]^{-1} \eta \). Therefore, we obtain the result. □

The following lemmas from differential topology which can be found in Refs. [34,35,36] will be used in the next section. At first, let \(U\subset {\mathbb{R}}^{n}\) be an open set, let \(\phi \colon U \rightarrow {\mathbb{R}}^{p}\) be a \(C^{\alpha } \) (\(\alpha >\max \{0,n-p \}\)) mapping; we say that \(y\in {\mathbb{R}}^{p}\) is a regular value for ϕ if

$$ \operatorname{Range}\bigl[\partial \phi (x)/\partial x\bigr]={ \mathbb{R}}^{p}, \quad \forall x\in \phi ^{-1}(y). $$

Lemma 2.2

Let \(V\subset {\mathbb{R}}^{n}\), \(U\subset {\mathbb{R}}^{m}\)be open sets, and let \(\phi \colon V\times U\rightarrow {\mathbb{R}}^{k}\)be a \(C^{\alpha }\)mapping, where \(\alpha >\max \{0,m-k\}\). If \(0\in {\mathbb{R}} ^{k}\)is a regular value ofϕ, then, for almost all \(a\in V\), 0 is a regular value of \(\phi _{a}=F(a,\cdot )\).

Lemma 2.3

Let \(\phi \colon U\subset {\mathbb{R}}^{n}\rightarrow {\mathbb{R}} ^{p}\)be a \(C^{\alpha }\) (\(\alpha >\max \{0,n-p\}\)). If 0 is a regular value ofϕ, then \(\phi ^{-1}(0)\)consists of some \((n-p)\)dimensional \(C^{\alpha }\)manifolds.

Lemma 2.4

A one-dimensional smooth manifold is diffeomorphic to a unit circle or a unit interval.

3 Main result

For the sake of convenience, for any given \(w^{0}\in V\times {\mathbb{R}} _{++}^{m+3}\times {\mathbb{R}}^{2m-1}\), \(H(w, w^{0}, \mu )\) in (6) is rewritten as \(H_{w^{0}}(w, \mu )=H(w, w^{0}, \mu )\), and the zero-point set of \(H_{w^{0}}(w,\mu )\) is written as follows:

$$ H_{w^{0}}^{-1}(0)=\bigl\{ (w,\mu )\in \varOmega _{\mu }\times {\mathbb{R}}_{++} ^{m+3}\times { \mathbb{R}}^{2m-1}\times (0,1]:H_{w^{0}}(w,\mu )=0\bigr\} . $$

Theorem 3.1

Suppose that assumptions (A1) and (A2) hold for any chosen vector \(w^{0}\in V\)and the homotopy is defined by (6), then, for almost all \((w^{0},\xi )\in V\times {\mathbb{R}}_{++}^{m+3}\times {\mathbb{R}} ^{2m-1}\times {\mathbb{R}}^{4m}\), the zero-point set \(H_{w^{0}}^{-1}(0)\)must contain a smooth curve \(\varGamma _{w^{0}}\)starting from \((\theta ^{0},y^{0},z^{0},1)\). Besides, if assumption (A3) holds, then the smooth curve \(\varGamma _{w^{0}}\)terminates or approaches to the hyperplane \(\mu = 0\). If \((\bar{\theta }, \bar{y}, \bar{z}, 0)\)is an ending limit point of the smooth curve \(\varGamma _{w^{0}}\), then \(\bar{w}=(\bar{ \theta }, \bar{y}, \bar{z})\)is a solution to KKT system (5) of the relaxed Pareto-optimization programming (4).

Proof

Suppose that \(\tilde{H}(w,\theta ^{0},\xi ,\mu )\colon \varOmega _{\mu } \times {\mathbb{R}}_{+}^{m+3}\times {\mathbb{R}}^{2m-1}\times V\times {\mathbb{R}}^{4m}\times (0,1]\rightarrow \varOmega _{\mu }\times {\mathbb{R}} _{+}^{m+3}\times {\mathbb{R}}^{2m-1}\) is the same map as \(H(w,w^{0}, \mu )\) but taking \((\theta ^{0},\xi )\) as variate. Considering the following submatrix of the Jacobian \(D\tilde{H}(w,\theta ^{0},\xi , \mu )\):

$$\begin{aligned}& \frac{\partial \tilde{H}(w,\theta ^{0},\xi ,\mu )}{\partial (\theta ^{0},y,\xi )}= \begin{pmatrix} -\mu I & (1-\mu )\nabla \widetilde{g}(\theta ,\mu ) & \mu (1-\mu ) \\ 0 & \operatorname{diag}(\widetilde{g}(\theta ,\mu )) & 0 \\ -\mu \nabla h(\theta ^{0})^{T} & 0 & 0 \end{pmatrix}. \end{aligned}$$

For all \(\mu \in (0,1)\), by the fact that \(\eta >0\) and \(Y \widetilde{g}(\theta ,\mu )+\mu \eta =0\), we see that \(\operatorname{diag}( \widetilde{g}(\theta ,\mu ))\) is nonsingular. By assumption (A2), we see that the matrix \(\nabla h(\theta ^{0})^{T}\) is full row rank, which implies that \(\frac{\partial \tilde{H}(w,\theta ^{0}, \xi ,\mu )}{\partial (\theta ^{0},y,\xi )}\) is a matrix of full row rank for any \(\mu \in (0,1)\). Hence, the matrix \(D\tilde{H}(w,\theta ^{0}, \xi ,\mu )\) is full row rank for any solution of the equation \(\tilde{H}(w,\theta ^{0},\xi ,\mu )=0\) in \({\mathbb{R}}^{4m}\times {\mathbb{R}}_{++}^{m+3}\times {\mathbb{R}}^{2m-1}\times V\times {\mathbb{R}} ^{4m}\times (0,1)\).

From the matrix

$$\begin{aligned}& \frac{\partial H(w,w^{0},\mu )}{\partial w}= \begin{pmatrix} \varPi & (1-\mu )\nabla \widetilde{g}(\theta ,\mu ) & \nabla h(\theta ) \\ Y\nabla \widetilde{g}(\theta ,\mu )^{T} & \operatorname{diag}(\widetilde{g}( \theta ,\mu )) & 0 \\ \nabla h(\theta )^{T} & 0 & 0 \end{pmatrix}, \end{aligned}$$

where \(\varPi =(1-\mu )(\nabla ^{2} F(\theta )+\sum_{i=1}^{m+3}y _{i}\nabla ^{2}\widetilde{g}_{i}(\theta ,\mu ))+\sum_{j=1}^{2m-1}z _{j}\nabla ^{2} h_{j}(\theta )+\mu I\), for any \(\mu \in (0,1]\) and the chosen initial point \(w^{0}=(\theta ^{0}, y^{0}, z^{0})\), using \(z^{0}=0\), we can obtain the following matrix:

$$\begin{aligned}& \frac{\partial H(w^{0},w^{0},1)}{\partial w}= \begin{pmatrix} I & 0 & \nabla h(\theta ^{0}) \\ Y^{0}\nabla \widetilde{g}(\theta ^{0},1)^{T} & \operatorname{diag}( \widetilde{g}(\theta ^{0},1)) & 0 \\ \nabla h(\theta ^{0})^{T} & 0 & 0 \end{pmatrix}, \end{aligned}$$

which is also nonsingular. So, we can see that the matrix \(D\tilde{H}(w, \theta ^{0},\xi ,\mu )\) is nonsingular for any \(\mu \in (0,1]\). Thus, 0 is a regular value of \(\tilde{H}(w,\theta ^{0},\xi ,\mu )\). By Lemma 2.2, we see that, for almost all \((\theta ^{0},\xi )\in V\times {\mathbb{R}} ^{4m}\), 0 is a regular value of \(H(w, w^{0},\mu )\). From Lemma 2.3, if 0 is a regular value of \(H(w, w^{0}, \mu )\), \(\frac{\partial H(w^{0},w ^{0},1)}{\partial w}\) is nonsingular and by the fact of \(H(w^{0},w ^{0},1)=0\), \(H_{w^{0}}^{-1}(0)\) must contain a smooth curve \(\varGamma _{w^{0}}\subset \varOmega _{\mu }^{0}\times {\mathbb{R}}_{+}^{m+3} \times {\mathbb{R}}^{2m-1}\times (0,1]\) starting from \((\theta ^{0},y ^{0},z^{0},1)\) and going into \(\varOmega _{1}^{0}\times {\mathbb{R}}_{+} ^{m+3}\times {\mathbb{R}}^{2m-1}\times (0,1)\) and terminating in the boundary of \(\varOmega _{\mu }\times {\mathbb{R}}_{+}^{m+3}\times {\mathbb{R}} ^{2m-1}\times [0,1]\). Then from Lemma 2.4, \(\varGamma _{w^{0}}\subset \varOmega _{\mu }^{0}\times {\mathbb{R}}_{+}^{m+3}\times {\mathbb{R}}^{2m-1} \times (0,1]\) must be diffeomorphic to a unit circle or a unit interval \([0,1)\). Since the matrix \(\frac{\partial H(w^{0},w^{0},1)}{\partial w}\) is nonsingular, the smooth curve \(\varGamma _{w^{0}}\) cannot be diffeomorphic to a unit circle. Therefore, \(\varGamma _{w^{0}}\) must be diffeomorphic to \([0,1)\).

When \(\mu \rightarrow 0\), the limit points of \(\varGamma _{w^{0}}\) belong to \(\partial (\varOmega _{\mu }\times {\mathbb{R}}_{+}^{m+3}\times {\mathbb{R}} ^{2m-1}\times (0,1])\). Let \((\bar{\theta }, \bar{y}, \bar{z},\bar{ \mu })\) be a limit point of \(\varGamma _{w^{0}}\) when \(\mu \rightarrow 0\). Only the following five cases are possible:

  1. (i)

    \((\bar{\theta }, \bar{y}, \bar{z})\in \varOmega _{1}\times {\mathbb{R}} _{+}^{m+3}\times {\mathbb{R}}^{2m-1}\), \(\bar{\mu }=1\), \(\Vert (\bar{y}, \bar{z}) \Vert <\infty \);

  2. (ii)

    \((\bar{\theta }, \bar{y}, \bar{z})\in \varOmega _{\bar{\mu }}\times {\mathbb{R}}_{+}^{m+3}\times {\mathbb{R}}^{2m-1}\), \(\bar{\mu }\in [0,1]\), \(\Vert (\bar{y},\bar{z}) \Vert =\infty \);

  3. (iii)

    \((\bar{\theta }, \bar{y}, \bar{z})\in \varOmega _{\bar{\mu }}\times \partial {\mathbb{R}}_{+}^{m+3}\times {\mathbb{R}}^{2m-1}\), \(\bar{\mu }\in (0,1)\), \(\Vert (\bar{y},\bar{z}) \Vert <\infty \);

  4. (iv)

    \((\bar{\theta }, \bar{y}, \bar{z})\in \partial \varOmega _{\bar{ \mu }}\times {\mathbb{R}}_{++}^{m+3}\times {\mathbb{R}}^{2m-1}\), \(\bar{\mu }\in (0,1)\), \(\Vert (\bar{y},\bar{z}) \Vert <\infty \);

  5. (v)

    \((\bar{\theta }, \bar{y}, \bar{z})\in \varOmega \times {\mathbb{R}} _{+}^{m+3}\times {\mathbb{R}}^{2m-1}\), \(\bar{\mu }=0\), \(\Vert (\bar{y}, \bar{z}) \Vert <\infty \).

From Lemma 2.1, the homotopy equation \(H(w,w^{0},1)=0\) has only one simple solution \(w^{0}=(\theta ^{0},y^{0},z^{0})\) in \(\varOmega _{1}^{0} \times {\mathbb{R}}_{++}^{m+3}\times R^{2m-1}\), and because the matrix \(\frac{\partial H(w^{0},w^{0},1)}{\partial w}\) is nonsingular, case (i) is impossible.

If case (ii) happens, by assumption (A1), there must be a subsequence \((\theta ^{k},y^{k},z^{k},\mu _{k})\subset \varGamma _{w^{0}}\) such that \(\Vert (y^{k},z^{k}) \Vert \rightarrow \infty \) and \(\mu _{k}\rightarrow \bar{ \mu }\), \(\theta ^{k}\rightarrow \bar{\theta }\) as \(k\rightarrow \infty \). Only the following three subcases are possible: (1) \(\bar{\mu }=1\); (2) \(\bar{\mu }\in (0,1)\); (3) \(\bar{\mu }=0\).

From the first equation of (6), we have

$$ \begin{aligned}[b] & (1-\mu _{k}) \Biggl(\nabla F\bigl(\theta ^{k}\bigr)+\sum _{i=1}^{m+3} y_{i}^{k} \nabla \widetilde{g}_{i}\bigl(\theta ^{k},\mu _{k}\bigr)\Biggr)+\bigtriangledown h\bigl( \theta ^{k} \bigr)z^{k} \\ &\quad{} +\mu _{k}\bigl(\theta ^{k}-\theta ^{0}\bigr)+\mu _{k}(1-\mu _{k})\xi =0. \end{aligned} $$
(7)

(1) When \(\bar{\mu }=1\), if \(\Vert (1-\mu ^{k})y^{k},z^{k} \Vert <\infty \), without loss of generality we can suppose that \(((1-\mu _{k})y_{k},z ^{k})\rightarrow (\bar{y},\bar{z})\). Then \(\bar{y_{i}}=0\) for \(i\notin I_{1}(\bar{\theta })\) from the second equation of (6). Taking the limit in (7) as \(k\rightarrow +\infty \), we have

$$\begin{aligned} \theta ^{0}&=\bar{ \theta }+\lim_{k\rightarrow \infty }\bigl[(1-\mu _{k}) \bigl( \nabla F\bigl(\theta ^{k}\bigr)+\nabla \widetilde{g}\bigl(\theta ^{k},\mu _{k}\bigr)y ^{k}\bigr)+\nabla h \bigl(\theta ^{k}\bigr)z^{k}\bigr] \\ &=\bar{\theta }+h(\bar{\theta })\bar{z}+\lim_{k\rightarrow \infty }\sum _{i\in I_{1}(\bar{\theta })}(1- \mu _{k})y_{i}^{k} \nabla \widetilde{g}_{i}\bigl(\theta ^{k},\mu _{k}\bigr) \\ &=\bar{\theta }+h(\bar{\theta })\bar{z}+\sum_{i\in I_{1}(\bar{\theta })} \bar{y_{i}}\nabla \widetilde{g} _{i}(\bar{\theta },1), \end{aligned}$$
(8)

which contradicts with assumption (A3).

If \(\Vert (1-\mu ^{k})y^{k},z^{k} \Vert \rightarrow \infty \), the discussion is the same as case (2).

(2) When \(\bar{\mu }\in (0,1)\), without loss of generality, we can suppose that \(((1-\mu ^{k})y^{k},z^{k})/ \Vert (1-\mu ^{k})y^{k},z^{k} \Vert \rightarrow (\bar{\alpha },\bar{\beta })\) with \(\Vert (\bar{\alpha },\bar{ \beta }) \Vert =1\) and \(\bar{\alpha _{i}}=0\) for \(i\notin I_{\bar{\mu }}(\bar{ \theta })\). Dividing both sides of Eq. (7) by \(\Vert ((1-\mu ^{k})y^{k},z ^{k}) \Vert \) and taking the limit, we have

$$\begin{aligned}& \sum_{i\in I_{\bar{\mu }}(\bar{\theta })}\bar{\alpha _{i}} \nabla \widetilde{g}_{i}(\bar{\theta },\bar{\mu })=0, \end{aligned}$$

which contradicts with assumption (A2).

(3) When \(\bar{\mu }=0\), without loss of generality, suppose that \((y^{k},z^{k})/ \Vert y^{k},z^{k} \Vert \rightarrow (\bar{\alpha },\bar{\beta })\) with \(\Vert (\bar{\alpha },\bar{\beta }) \Vert =1\) and \(\bar{\alpha _{i}}=0\) for \(i\notin I_{\bar{\mu }}(\bar{\theta })\). Dividing both sides of Eq. (7) by \(\Vert (y^{k},z^{k}) \Vert \) and taking the limit, we have

$$\begin{aligned}& \sum_{i\in I_{\bar{\mu }}(\bar{\theta })}\bar{\alpha _{i}} \nabla \widetilde{g}_{i}(\bar{\theta },\bar{\mu })+\sum _{j=1} ^{2m-1}\bar{\beta _{j}}\nabla h_{j}(\bar{\theta })=0, \end{aligned}$$

which contradicts with assumption (A2).

Therefore, from the discussions of (1), (2) and (3), we see that case (ii) is also impossible.

Now, we discuss that cases (iii) and (iv) are also impossible. For any given \(\eta \in {\mathbb{R}}_{++}^{m+3}\), from the second equation \(\operatorname{diag}(\widetilde{g}(\bar{\theta },\bar{\mu }))\bar{y}+\bar{ \mu }\eta =0\) of \(H(\bar{w},w^{0},\bar{\mu })=0\) as \(k\rightarrow \infty \), we see that \(\bar{\mu }>0\) and \(\bar{y}\in \partial {\mathbb{R}} _{+}^{m+3}\), i.e., \(\bar{y}_{i}=0\) for some \(1\leq i\leq m+3\) cannot happen simultaneously. Therefore, case (iii) is also impossible. If \(\bar{y}>0\) and \(\bar{\mu }>0\), from \(\operatorname{diag}(\widetilde{g}(\bar{ \theta },\bar{\mu }))\bar{y}+\bar{\mu }\eta =0\), we can get \(\operatorname{diag}(\widetilde{g}(\bar{\theta },\bar{\mu }))<0\), which implies that case (iv) is impossible.

As a conclusion, case (v) is the only possible case. That is \(\varGamma _{w^{0}}\) must terminate in or approach to the hyperplane at \(\bar{\mu }=0\) and \(\bar{w}=(\bar{\theta },\bar{y},\bar{z})\) is a solution to the KKT system (5) of principal–agent problems. □

4 Numerical algorithm

By Theorem 3.1, the homotopy equation (6) generates a smooth curve \(\varGamma _{w^{0}}\) for almost all \((w^{0},\mu )\in \varOmega _{1}^{0}\times {\mathbb{R}}_{++}^{m+3}\times {\mathbb{R}}^{2m-1}\times (0,1]\) as \(\mu \rightarrow 0\), one can find a solution to (5). Letting s be the arc length of \(\varGamma _{w^{0}}\), we can parameterize the smooth curve \(\varGamma _{w^{0}}\) with respect to s, i.e.

$$\begin{aligned} \begin{gathered} H\bigl(w(s),w^{0},\mu (s)\bigr)=0, \\ w(0)=w^{0}, \quad\quad \mu (0)=1. \end{gathered} \end{aligned}$$
(9)

By differentiating Eq. (9), we can get the following theorem.

Theorem 4.1

The smooth homotopy path \(\varGamma _{w^{0}}\)is determined by the following initial value problem to the ordinary differential equation:

$$\begin{aligned} \begin{gathered} \mathit{DH}\bigl(w(s),w^{0}, \mu (s)\bigr) \begin{pmatrix} \dot{w} \\ \dot{\mu } \end{pmatrix}=0, \\ \bigl(w(0),\mu (0)\bigr)=\bigl(w^{0},1\bigr), \end{gathered} \end{aligned}$$
(10)

whereDHis the derivative ofH, sis the arc length of the curve \(\varGamma _{w^{0}}\), and thew-component of the solution point \((w(s^{*}),\mu (s^{*}))\)of Eq. (10) is the solution to the KKT system (5) as \(\mu (s^{*})=0\).

As regards how to trace numerically the homotopy path \(\varGamma _{w^{0}}\), we can use the standard predictor–corrector procedure, which contains three steps: the first predictor step, the midway predictor step and the corrector step. The first predictor step is usually taken by computing the tangent direction, the midway predictor steps are usually taken by using secant directions, and the corrector steps are usually taken by Newton iterations for solving an augmented system.

For the first predictor step, the tangent vector at a point on the homotopy path \(\varGamma _{w^{0}}\) has two directions: one positive direction which makes the arc length s gradually increase, and one negative direction which makes the arc length s gradually decrease. Since the negative direction will lead the homotopy path \(\varGamma _{w ^{0}}\) back to the initial point \((w(0),\mu (0))=(w^{0},1)\), we must trace the homotopy path along the positive direction. By the basic theory of homotopy methods, the positive direction ν at any point \((w,\mu )\) of \(\varGamma _{w^{0}}\) must keep the sign of the following determinant invariant:

$$\begin{aligned}& \begin{vmatrix} \mathit{DH}(w(s),w^{0},\mu (s)) \\ \nu ^{T} \end{vmatrix}. \end{aligned}$$

Proposition 4.1

If the homotopy path \(\varGamma _{w^{0}}\)is smooth, then the positive direction \(\nu ^{0}\)at the initial point \(w^{0}\)satisfies

$$\begin{aligned}& \operatorname{sign} \begin{vmatrix} \mathit{DH}(w^{0},w^{0},1) \\ \nu {^{0}}^{T} \end{vmatrix}=(-1)^{(m+3)+(2m-1)+1}. \end{aligned}$$

Proof

We have the shifted constraint functions \(\widetilde{g_{i}}(\theta , \mu )=g_{i}(\theta )-\mu ^{\sigma }\tau \), \(\tau \in {\mathbb{R}}_{++}\) and \(\widetilde{h_{j}}(\theta ,\mu )=h_{j}(\theta )-\mu h_{j}(\theta ^{0})\), \(\nabla \widetilde{g_{i}}(\theta ,\mu )=\nabla g_{i}(\theta )\) and \(\nabla \widetilde{h_{j}}(\theta ,\mu )=\nabla h_{j}(\theta )\). From the matrix

$$\begin{aligned}& \mathit{DH}\bigl(w,w^{0},\mu \bigr) =\frac{\partial H(w,w^{0},\mu )}{\partial (w, \mu )}= \begin{pmatrix} \varPi & (1-\mu )\nabla \widetilde{g}(\theta ,\mu ) & \nabla h(\theta )& \varLambda \\ Y\nabla \widetilde{g}(\theta ,\mu )^{T} & \operatorname{diag}(\widetilde{g}( \theta ,\mu )) & 0 & \varXi \\ \nabla h(\theta )^{T} & 0 & 0 & -h(\theta ^{0}) \end{pmatrix}, \end{aligned}$$

where \(\varPi =(1-\mu )(\nabla ^{2} F(\theta )+\sum_{i=1}^{m+3}y _{i}\nabla ^{2}\widetilde{g}_{i}(\theta ,\mu ))+\sum_{j=1}^{2m-1}z _{j}\nabla ^{2} h_{j}(\theta )+\mu I\), \(\varLambda =-\nabla F(\theta )- \nabla g(\theta )y+\theta -\theta ^{0}+(1-2\mu )\xi \) and \(\varXi =- \sigma \mu ^{\sigma -1}Y\tau +\eta \), for the initial point \(w^{0}=( \theta ^{0}, y^{0}, z^{0})\), and by using \(z^{0}=0\), we can obtain

$$\begin{aligned}& \mathit{DH}\bigl(w^{0},w^{0},1\bigr)= \begin{pmatrix} I & 0 & \nabla h(\theta ^{0}) & \varPi ^{0} \\ Y^{0}\nabla \widetilde{g}(\theta ^{0},1)^{T} & \operatorname{diag}( \widetilde{g}(\theta ^{0},1)) & 0 & \varXi ^{0} \\ \nabla h(\theta ^{0})^{T} & 0 & 0 & -h(\theta ^{0}) \end{pmatrix} =(\varUpsilon _{1},\varUpsilon _{2}), \end{aligned}$$

where \(\varUpsilon _{1}\in {\mathbb{R}}^{[4m+(m+3)+(2m-1)]\times [4m+(m+3)+(2m-1)]}\), \(\varUpsilon _{2}\in {\mathbb{R}}^{[4m+(m+3)+(2m-1)]\times 1}\). The tangent vector \(\nu {^{0}}^{T}=(\nu _{1}{^{0}}^{T},\nu _{2}^{0})\) of \(\varGamma _{w^{0}}\) at the point \((w^{0},1)\) satisfies

$$\begin{aligned}& (\varUpsilon _{1},\varUpsilon _{2}) \begin{pmatrix} \nu _{1}^{0} \\ \nu _{2}^{0} \end{pmatrix}=0, \end{aligned}$$

where \(\nu _{1}^{0}\in {\mathbb{R}}^{[4m+(m+3)+(2m-1)]\times 1}\) and \(\nu _{2}^{0}\in R\). By a simple computation, we can get \(\nu _{1}^{0}=- \varUpsilon _{1}^{-1}\varUpsilon _{2}\nu _{2}^{0}\). Then we can obtain the following determinant:

$$\begin{aligned}& \begin{vmatrix} \mathit{DH}(w^{0},w^{0},1) \\ \nu {^{0}}^{T} \end{vmatrix} \\& \quad = \begin{vmatrix} \varUpsilon _{1} & \varUpsilon _{2} \\ \nu _{1}{^{0}}^{T} & \nu _{2}^{0} \end{vmatrix}= \begin{vmatrix} \varUpsilon _{1} & \varUpsilon _{2} \\ -\varUpsilon _{2}^{T}\varUpsilon _{1}^{T} & 1 \end{vmatrix}\nu _{2}^{0} \\& \quad = \begin{vmatrix} \varUpsilon _{1} & \varUpsilon _{2} \\ 0 & \varUpsilon _{2}^{T}\varUpsilon _{1}^{-T}\varUpsilon _{1}^{-1}\varUpsilon _{2}+1 \end{vmatrix}\nu _{2}^{0}= \vert \varUpsilon _{1} \vert \nu _{2}^{0} \bigl(\varUpsilon _{2}^{T}\varUpsilon _{1}^{-T} \varUpsilon _{1}^{-1}\varUpsilon _{2}+1\bigr) \\& \quad = \begin{vmatrix} I & 0 & \nabla h(\theta ^{0}) \\ Y^{0}\nabla \widetilde{g}(\theta ^{0},1)^{T} & \operatorname{diag}( \widetilde{g}(\theta ^{0},1)) & 0 \\ \nabla h(\theta ^{0})^{T} & 0 & 0 \end{vmatrix}\nu _{2}^{0} \bigl(\varUpsilon _{2}^{T}\varUpsilon _{1}^{-T} \varUpsilon _{1}^{-1} \varUpsilon _{2}+1 \bigr) \\& \quad = \begin{vmatrix} I & 0 & \nabla h(\theta ^{0}) \\ Y^{0}\nabla \widetilde{g}(\theta ^{0},1)^{T} & \operatorname{diag}( \widetilde{g}(\theta ^{0},1)) & 0 \\ 0 & 0 & -\nabla h(\theta ^{0})^{T}\nabla h(\theta ^{0}) \end{vmatrix}\nu _{2}^{0} \bigl(\varUpsilon _{2}^{T}\varUpsilon _{1}^{-T} \varUpsilon _{1}^{-1} \varUpsilon _{2}+1 \bigr) \\& \quad = \bigl\vert -\nabla h\bigl(\theta ^{0}\bigr)^{T} \nabla h\bigl(\theta ^{0}\bigr) \bigr\vert \begin{vmatrix} I & 0 \\ Y^{0}\nabla \widetilde{g}(\theta ^{0},1)^{T} & \operatorname{diag}( \widetilde{g}(\theta ^{0},1)) \end{vmatrix}\nu _{2}^{0}\bigl(\varUpsilon _{2}^{T}\varUpsilon _{1}^{-T} \varUpsilon _{1}^{-1} \varUpsilon _{2}+1 \bigr) \\& \quad =(-1)^{2m-1} \bigl\vert \nabla h\bigl(\theta ^{0} \bigr)^{T}\nabla h\bigl(\theta ^{0}\bigr) \bigr\vert \bigl\vert \operatorname{diag}\bigl(\widetilde{g}\bigl(\theta ^{0}, 1\bigr)\bigr) \bigr\vert \nu _{2} ^{0}\bigl( \varUpsilon _{2}^{T}\varUpsilon _{1}^{-T} \varUpsilon _{1}^{-1}\varUpsilon _{2}+1 \bigr). \end{aligned}$$

Since \(\widetilde{g}(\theta ^{0},1)<0\), \(\varUpsilon _{2}^{T}\varUpsilon _{1} ^{-T}\varUpsilon _{1}^{-1}\varUpsilon _{2}+1>0\), and the last element \(\nu _{2}^{0}\) of the positive direction \(\nu ^{0}\) should be negative, the sign of the determinant

$$\begin{aligned}& \begin{vmatrix} \mathit{DH}(w^{0},w^{0},1) \\ \nu {^{0}}^{T} \end{vmatrix} \text{ is }(-1)^{(m+3)+(2m-1)+1}. \end{aligned}$$

The proof is complete. □

For completeness of this study, a detailed predictor–corrector algorithm is presented here.

Algorithm 4.1

  • Step 1. Initialization.

    Given the risk averse utility \(v(\cdot )\), the distribute density function \(f(x,a)\), the minimum expected utility \(V_{0}\), the number m of nodes and vector \(\tau \in {\mathbb{R}}_{++}^{m+3}\), formulate \(F(\theta )\), \(g(\theta )\), \(h(\theta )\), shifted constraint functions \(\tilde{g}(\theta ,\mu )\), the homotopy map H and its Jacobian. Given a randomly chosen vector \(\xi \in {\mathbb{R}}^{4m}\) and a positive vector \(\eta \in {\mathbb{R}}_{++}^{m+3}\), set the accuracy parameters \(\varepsilon _{1}\geq \varepsilon _{2}>0\), the initial point \(\theta ^{0}\in V\subset \varOmega ^{0}(1)\), the initial steplength \(\lambda _{0}>0\), the minimum steplength \(\lambda _{\min }\), the maximum steplength \(\lambda _{\max }\), the maximum number of the corrector steps, the threshold value \(\epsilon _{\alpha }\) for the angle between two neighboring predictor directions, the step contraction factors \(\epsilon _{1}\), \(\epsilon _{2}\) and \(\epsilon _{3}\), the step expansion factors \(\epsilon _{4}\) and \(\epsilon _{5}\), and the threshold value \(0<\epsilon _{\mu }<1\) for starting the end game. Set \(k=0\).

  • Step 2. The first predictor step.

    • If \(k=0\), set \(\hat{\lambda }=\lambda _{0}\), \(\varepsilon = \varepsilon _{1}\);

      • Let \(\nu ^{-1}=(0,\ldots,0,-1)^{T}\in {\mathbb{R}}^{4m+(m+3)+(2m-1)+1}\) and compute the predictor step ν by

        $$\begin{aligned}& \begin{pmatrix} \mathit{DH}(w^{0},w^{0},1) \\ (\nu ^{-1})^{T}\nu \end{pmatrix}=-\nu ^{-1} \end{aligned}$$
      • Set \(\Vert \nu ^{0} \Vert =\frac{\nu }{ \Vert \nu \Vert }\). Determine the smallest nonnegative integer i such that

        $$\begin{aligned}& w^{1,0},\mu _{1,0})=\bigl(w^{0},\mu _{0}\bigr)+\epsilon _{3}^{i} \hat{\lambda } \nu ^{0}\in \varOmega (\mu _{1,0})\times { \mathbb{R}}_{+} ^{m+3}\times {\mathbb{R}}^{2m-1} \times (0,1), \end{aligned}$$

        set \(\hat{\lambda }= \epsilon _{3}^{i} \hat{\lambda }\);

      • Go to Step 3;

    • Else, perform the midway predictor step.

      • Let \(\nu ^{k}=((w^{k},\mu _{k})-(w^{k-1},\mu _{k-1}))/ \Vert (w^{k}, \mu _{k})-(w^{k-1},\mu _{k-1}) \Vert \), determine the smallest nonnegative integer i such that

        $$\begin{aligned}& \bigl(w^{k+1,0},\mu _{k+1,0}\bigr)=\bigl(w^{k}, \mu _{k}\bigr)+\epsilon _{3}^{i} \hat{ \lambda } \nu ^{k}\in \varOmega (\mu _{k+1,0})\times { \mathbb{R}}_{+} ^{m+3}\times {\mathbb{R}}^{2m-1} \times (0,1); \end{aligned}$$
      • Set \(\hat{\lambda }=\epsilon _{3}^{i} \hat{\lambda }\);

    • Go to Step 3.

  • Step 3. The corrector step.

    • Set \(j=0\). Repeat; Compute the Newton step ν̂ by solving the following augmented system:

      $$\begin{aligned}& \begin{pmatrix} \mathit{DH}(w^{k+1,j},w^{0},\mu _{k+1,j}) \\ (\nu ^{k})^{T} \end{pmatrix}\hat{\nu }= \begin{pmatrix} -H(w^{k+1,j},w^{0},\mu _{k+1,j}) \\ 0 \end{pmatrix} \end{aligned}$$
    • Determine the smallest nonnegative integer i such that

      $$\begin{aligned}& \bigl(w^{k+1,j+1},\mu _{k+1,j+1}\bigr)=\bigl(w^{k+1}, \mu _{k+1}\bigr)+\epsilon _{3}^{i} \hat{ \lambda }\hat{\nu }\in \varOmega (\mu _{k+1,j+1})\times {\mathbb{R}} _{+}^{m+3}\times {\mathbb{R}}^{2m-1}\times (0,1). \end{aligned}$$
    • If \(\Vert H(w^{k+1,j+1},w^{0},\mu _{k+1,j+1}) \Vert _{\infty }\leq \Vert H(w ^{k+1,j},w^{0},\mu _{k+1,j}) \Vert _{\infty }\)

      • Set \(j=j+1\).

    • Else

      • Set \(j=\bar{N}\), \((w^{k+1,j},\mu _{k+1,j})=(w^{k+1,0},\mu _{k+1,0})\), until

        $$\begin{aligned}& \bigl\Vert H\bigl(w^{k+1,j},w^{0},\mu _{k+1,j} \bigr) \bigr\Vert _{\infty } \leq \varepsilon \quad \text{or} \quad j=\bar{N}; \end{aligned}$$
      • Go to Step 4.

  • Step 4. The steplength strategy.

    • If \(j=\bar{N}\), \(\Vert H(w^{k+1,j},w^{0},\mu _{k+1,j}) \Vert _{\infty }>\varepsilon \),

      • Set \(\hat{\lambda }=\max \{\lambda _{\min },\epsilon _{2} \hat{\lambda }\}\) and \((w^{k+1,0},\mu _{k+1,0})=(w^{k},\mu _{k})+ \hat{\lambda }\nu ^{k}\);

      • Go to Step 3;

    • Else

      • Set \((w^{k+1},\mu _{k+1})=(w^{k+1,j},\mu _{k+1,j})\), adjust the steplength λ̂ as follows:

        • If \((\nu ^{k})^{T}\nu ^{k-1}<\epsilon _{\alpha }\), set \(\hat{\lambda }=\max \{\lambda _{\min }, \epsilon _{1}\hat{\lambda }\}\);

        • If \(j>3\), set \(\hat{\lambda }=\max \{\lambda _{\min }, \epsilon _{2}\hat{\lambda }\}\);

        • If \(j=2\), set \(\hat{\lambda }=\min \{\lambda _{\max }, \epsilon _{4}\hat{\lambda }\}\);

        • If \(j<2\), set \(\hat{\lambda }=\min \{\lambda _{\max }, \epsilon _{5}\hat{\lambda }\}\);

    • If \(\mu _{k+1}<\epsilon _{\mu }\), go to Step 5.

    • If \(\Vert H(w^{k+1},w^{0},0) \Vert _{\infty }\leq \varepsilon _{2}\) and \(w^{k+1}\) is feasible, then stop. \(\bar{w}=w^{k+1}\) is the computed solution to the KKT system (5). Thus, \(\bar{a}=a^{k}\) is the optimal action and \(s_{i}^{k+1}(x)=p_{i}^{k+1}x^{2}+q_{i}^{k+1}x+r_{i}^{k+1}\), \(i=1,2,\ldots,m\) is the quadratic spline contractual function.

    • Set \(\varepsilon =\min \{\mu _{k+1},\varepsilon _{1}\}\), \(k=k+1\).

  • Step 5. The end game.

    • Set \(j=0\), \(w^{k+1,0}=w^{k+1}\);

    • Repeat;

      • Compute the Newton step \(\nu _{\mathrm{end}}\) by solving the equation

        • \(\frac{\partial H}{\partial w}(w^{k+1,0},w^{0},0) \nu _{\mathrm{end}}=-H(w^{k+1,j},w^{0},0)\),

        • set \(w^{k+1,j+1}=w^{k+1,j}+\nu _{\mathrm{end}}\),

        • \(j=j+1\);

      • Until, \(\Vert H(w^{k+1,j},w^{0},0) \Vert _{\infty }\leq \varepsilon _{2}\) or \(j=\bar{N}\);

    • If \(\Vert H(w^{k+1,j},w^{0},0) \Vert _{\infty }\leq \varepsilon _{2}\) and \(w^{k+1,j}\) is feasible, then stop. \(\bar{w}=w^{k+1,j}\) is the computed solution to the KKT system (5). Thus, \(\bar{a}=a^{k+1,j}\) is the optimal action and \(s_{i}^{k+1,j}(x)=p_{i}^{k+1,j}x^{2}+q_{i}^{k+1,j}x+r _{i}^{k+1,j}\), \(i=1,2,\ldots,m\) is the quadratic spline contractual function.

    • Else, set \(\epsilon _{t}=0.1\epsilon _{\mu }\).

5 Conclusions

To design a quadratic spline contractual function in the case of discretely unknown nodes, in this paper we constructed a modified constraint shifting homotopy algorithm for solving principal–agent problems. Then the existence of a globally convergent solution to KKT systems for the principal–agent problem with spline contractual function is proved under mild conditions. The proposed algorithm only requires that any initial point is in the shifted feasible set but not necessarily in the original feasible set. Our contribution in the paper is theoretical and the numerical simulations on performance for the CSCH can be implemented as in Refs. [21, 22]. For more discussions about algorithms for solving optimization problems, see, e.g., [36,37,38].