Appendix 1: Omitted proofs
Proof of Theorem 4
Proof
Let \(\mathbf {f}^* = (f^*_1, \ldots , f^*_n)\) be any solution. We will construct a new solution with potentially lower cost with the required representation. We do this iteratively beginning with \(f^*_1\).
Consider the subspace \(\mathcal {T} \subset {\mathcal {H}}_1\) defined by \(\mathcal {T} = \text {span}(k_{\mathbf {x}_1}, \ldots , k_{\mathbf {x}_N})\), and let \(\mathcal {T}^\bot \) be its orthogonal complement. It follows that \(f^*_1\) decomposes uniquely into \(f_1^* = f_0 + f_0^\bot \) with \(f_0 \in \mathcal {T}\) and \(f_0^\bot \in \mathcal {T}^\bot \). Consequently,
$$\begin{aligned} f^*_1(\mathbf {x}_j)&= \langle k_{\mathbf {x}_j}, f^*_1 \rangle ,&\text {[by (20)]}\\&=\langle k_{\mathbf {x}_j} , f_0 \rangle + \langle k_{\mathbf {x}_j}, f_0^\bot \rangle \\&=\langle k_{\mathbf {x}_j} , f_0 \rangle&\text {(since }f_0^\bot \in \mathcal {T}^\bot )\\&= f_0(\mathbf {x}_j)&\text {[by (20)]}. \end{aligned}$$
Thus, the solution \(\mathbf {f} = (f_0, f^*_2, \ldots , f^*_n)\) is feasible to (22). Furthermore, by orthogonality \(\Vert f^*_1 \Vert _{{\mathcal {H}}_1} = \Vert f_0 \Vert _{{\mathcal {H}}_1} + \Vert f_0^\bot \Vert _{{\mathcal {H}}_1} \ge \Vert f_0 \Vert _{{\mathcal {H}}_1}.\) Since the objective is non-decreasing in \(\Vert f_1 \Vert _{\mathcal {H}}\), \(\mathbf {f}\) has an objective value which is no worse than \(\mathbf {f}^*\). We can now proceed iteratively, considering each coordinate in turn. After at most \(n\) steps, we have constructed a solution with the required representation. \(\square \)
Proof of Theorem 5
Proof
Suppose Problem (24) is feasible and let \({\varvec{\alpha }}\) be a feasible solution. Define \(\mathbf {f}\) via eq. (23). It is straightforward to check that \(\mathbf {f}\) is feasible in Problem (22) with the same objective value.
On the other hand, let \(\mathbf {f}\) be some feasible solution to Problem (22). By Theorem 4, there exists \({\varvec{\alpha }}\) such that \(f_i( \mathbf {x}_j) = \mathbf {e}_i^T {\varvec{\alpha }}{\mathbf {K}}\mathbf {e}_j,\) and \(\Vert f_i \Vert ^2_{\mathcal {H}}= \mathbf {e}_i^T{\varvec{\alpha }}{\mathbf {K}}{\varvec{\alpha }}^T \mathbf {e}_i.\) It straightforward to check that such \({\varvec{\alpha }}\) is feasible in Problem (24) and that they yield the same objective value. Thus, Problem (22) is feasible if and only if Problem (24) is feasible, and we can construct an optimal solution to Problem (22) from an optimal solution to Problem (24) via (23). \(\square \)
Proof of Theorem 6
Proof
As mentioned in the text, the key idea in the proof is to relate (12) with a randomized uncertain convex program. To this end, notice that if \(z_N, {\varvec{\theta }}_N\) are an optimal solution to (12) with the \(\ell _\infty \)-norm, then \((z_N, {\varvec{\theta }}_N) \in \bigcap _{j=1}^N \mathcal {X}( \mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j)\) where
$$\begin{aligned} \mathcal {X}( \mathbf {x}, \mathbf {A}, {\mathbf {b}}, C) = \left\{ z, {\varvec{\theta }}\in \Theta : \exists \mathbf {y}\in {\mathbb {R}}^m \text { s.t. } \mathbf {A}^T \mathbf {y}\le \mathbf {f}(\mathbf {x}, {\varvec{\theta }}), \ \ \mathbf {x}^T\mathbf {f}(\mathbf {x}, {\varvec{\theta }}) - {\mathbf {b}}^T\mathbf {y}\le z \right\} . \end{aligned}$$
The sets \(\mathcal {X}( \mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j)\) are convex. Consider then the problem
$$\begin{aligned} \min \limits _{z \ge 0, {\varvec{\theta }}} \quad z \ \ \text {s.t.} \quad (z, {\varvec{\theta }}) \in \bigcap _{j=1}^N \mathcal {X}( \mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j) . \end{aligned}$$
This is exactly of the form Eq. 2.1 in [18]. Applying Theorem 2.4 of that work shows that with probability \(\beta (\alpha )\) with respect to the sampling, the “violation probability” of the pair \((z_N, \theta _N)\) is a most \(\alpha \). In our context, the probability of violation is exactly the probability that \((\tilde{\mathbf {x}}, \tilde{\mathbf {A}}, \tilde{{\mathbf {b}}}, \tilde{C})\) is not a \(z_N\) approximate equilibria. This proves the theorem. \(\square \)
Observe that the original proof in [18] requires that the solution \({\varvec{\theta }}_N\) be unique almost surely. However, as mentioned on pg. 7 discussion point 5 of that text, it suffices to pick a tie-breaking rule for the \({\varvec{\theta }}_N\) in the case of multiple solutions. The tie-breaking rule discussed in the main text is one possible example.
Proof of Theorem 7
We require auxiliary results. Our treatment closely follows [7]. Let \(\zeta _1, \ldots , \zeta _N\) be i.i.d. For any class of functions \(\mathcal {S}\), define the empirical Rademacher complexity \(\mathcal {R}_N(\mathcal {S})\) by
$$\begin{aligned} \mathcal {R}_N(\mathcal {S}) = {\mathbb {E}}\left[ \sup _{f \in \mathcal {S}} \frac{2}{N} \left| \left. \sum _{i=1}^N \sigma _i f(\zeta _i) \right| \right| \zeta _1, \ldots , \zeta _N \right] , \end{aligned}$$
where \(\sigma _i\) are independent uniform \(\{ \pm 1 \}\)-valued random variables. Notice this quantity is random, because it depends on the data \(\zeta _1, \ldots , \zeta _N\).
Our interest in Rademacher complexity stems from the following lemma.
Lemma 1
Let \(\mathcal {S}\) be a class of functions whose range is contained in \([0, M]\). Then, for any \(N\), and any \(0 < \beta < 1\), with probability at least \(1-\beta \) with respect to \({\mathbb {P}}\), every \(f \in \mathcal {F}\) simultaneously satisfies
$$\begin{aligned} {\mathbb {E}}[ f( \zeta ) ] \le \frac{1}{N} \sum \limits _{i=1}^N f(\zeta _i) + \mathcal {R}_N(\mathcal {S}) + \sqrt{ \frac{8 M \log (2/\beta ) }{N}} \end{aligned}$$
(34)
Proof
The result follows by specializing Theorem 8 of [7]. Namely, using the notation of that work, let \(\phi (y, a) = \mathcal {L}(y, a) = a / M\), \(\delta = \beta \) and then apply the theorem. Multiply the resulting inequality by \(M\) and use Theorem 12, part 3 of the same work to conclude that \(M \mathcal {R}_N( M^{-1} \mathcal {S} ) = \mathcal {R}_N(\mathcal {S})\) to finish the proof. \(\square \)
Remark 10
The constants in the above lemma are not tight. Indeed, modifying the proof of Theorem 8 in [7] to exclude the centering of \(\phi \) to \(\tilde{\phi }\), one can reduce the constant 8 in the above bound to 2. For simplicity in what follows, we will not be concerned with improvements at constant order.
Remark 11
Lemma 1 relates the empirical expectation of a function to its true expectation. If \(f \in \mathcal {S}\) were fixed a priori, stronger statements can be proven more simply by invoking the weak law of large numbers. The importance of Lemma 1 is that it asserts the inequality holds uniformly for all \(f \in \mathcal {S}\). This is important since in what follows, we will be identifying the relevant function \(f\) by an optimization, and hence it will not be known to us a priori, but will instead depend on the data.
Our goal is to use Lemma 1 to bound the \({\mathbb {E}}[\epsilon (\mathbf {f}_N, \tilde{\mathbf {x}}, \tilde{\mathbf {A}}, \tilde{{\mathbf {b}}}, \tilde{C})]\). To do so, we must compute an upper-bound on the Rademacher complexity of a suitable class of functions. As a preliminary step,
Lemma 2
For any \(\mathbf {f}\) which is feasible in (12) or (28), we have
$$\begin{aligned} \tilde{\epsilon }(\mathbf {f}, \tilde{\mathbf {x}}, \tilde{\mathbf {A}}, \tilde{{\mathbf {b}}}, \tilde{C}) \le \overline{B} \quad \text { a.s. } \end{aligned}$$
(35)
Proof
Using strong duality as in Theorem 2,
$$\begin{aligned} \tilde{\epsilon }(\mathbf {f}, \tilde{\mathbf {x}}, \tilde{\mathbf {A}}, \tilde{{\mathbf {b}}}, \tilde{C}) = \max _{\mathbf {x}\in \tilde{\mathcal {F}}} (\tilde{\mathbf {x}} - \mathbf {x})^T \mathbf {f}( \tilde{\mathbf {x}} ) \le 2R \sup _{\tilde{\mathbf {x}} \in \tilde{\mathcal {F}}} \Vert \mathbf {f}(\tilde{\mathbf {x}}) \Vert _2, \end{aligned}$$
(36)
by A6. For Problem (12), the result follows from the definition of \(\overline{B}\). For Problem (28), observe that for any \(\tilde{\mathbf {x}} \in \tilde{\mathcal {F}}\),
$$\begin{aligned} | f_i (\tilde{\mathbf {x}}) |^2 = \langle f_i, k_{\tilde{\mathbf {x}}} \rangle ^2 \le \Vert f_i \Vert _{\mathcal {H}}^2 \sup _{\Vert \mathbf {x}\Vert _2 \le R } k(\mathbf {x}, \mathbf {x}) = \Vert f_i \Vert _{\mathcal {H}}^2 \overline{K}^2\le \kappa _i^2 \overline{K}^2, \end{aligned}$$
(37)
where the middle inequality follows from Cauchy–Schwartz. Plugging this into Eq. (36) and using the definition of \(\overline{B}\) yields the result.
Now consider the class of functions
$$\begin{aligned} F = {\left\{ \begin{array}{ll} \Big \{ (\mathbf {x}, \mathbf {A}, {\mathbf {b}}, C) \mapsto \epsilon (\mathbf {f}, \mathbf {x}, \mathbf {A}, {\mathbf {b}}, C) : \mathbf {f} = \mathbf {f}(\cdot , {\varvec{\theta }}), {\varvec{\theta }}\in \Theta \Big \} \quad \text {for Problem~(12)}\\ \Big \{ (\mathbf {x}, \mathbf {A}, {\mathbf {b}}, C) \mapsto \epsilon (\mathbf {f}, \mathbf {x}, \mathbf {A}, {\mathbf {b}}, C) : f_i \in {\mathcal {H}}, \ \ \Vert f_i \Vert _{\mathcal {H}}\le \kappa _i \ \ i=1, \ldots , N \Big \}\\ \quad \text {for Problem~(28). } \end{array}\right. } \end{aligned}$$
Lemma 3
$$\begin{aligned} \mathcal {R}_N(F) \le \frac{2\overline{B}}{\sqrt{N}} \end{aligned}$$
Proof
We prove the lemma for Problem (12). The proof in the other case is identical. Let \(\mathcal {S} = \{ \mathbf {f}( \cdot , {\varvec{\theta }}) : {\varvec{\theta }}\in \Theta \}\). Then,
$$\begin{aligned} \mathcal {R}_N(F)&= \frac{2}{N} {\mathbb {E}}\left[ \sup _{f \in \mathcal {S}}\left. \left| \sum \limits _{j=1}^N \sigma _j \epsilon (\mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j) \right| \right| (\mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j)_{j=1}^N \right] \\&\le \frac{2 \overline{B}}{N} {\mathbb {E}}\left[ \left( \sum \limits _{j=1}^N \sigma _j^2 \right) ^{\frac{1}{2}}\right] \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad [\text {using (35)}]\\&\le \frac{2 \overline{B}}{N} \sqrt{ {\mathbb {E}}\left[ \sum \limits _{j=1}^N \sigma _j^2 \right] }\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad (\text {Jensen's inequality})\\&=\frac{2 \overline{B}}{\sqrt{ N }}\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad ( \sigma _j^2 = 1\text { a.s.}). \end{aligned}$$
\(\square \)
We are now in a position to prove the theorem.
Proof
(Theorem 7) Observe that \(z_N = \frac{1}{N} \sum \nolimits _{j=1}^N (\epsilon ( \mathbf {f}_N, \mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j))^p\). Next, the function \(\phi (z) = z^p\) satisfies \(\phi (0) = 0\) and is Lipschitz with constant \(L_\phi = p \overline{B}^{p-1}\) on the interval \([0, \overline{B} ]\). Consequently, from Theorem 12 part 4 of [7],
$$\begin{aligned} \mathcal {R}_N( \phi \circ F )&\le 2 L_\phi \mathcal {R}_N(F)\\&\le 2p \overline{B}^{p-1} \frac{2\overline{B}}{\sqrt{N}}\\&= \frac{4p \overline{B}^p }{\sqrt{N}}. \end{aligned}$$
Now applying Lemma 1 with \(\zeta \rightarrow (\tilde{\mathbf {x}}, \tilde{\mathbf {A}}, \tilde{{\mathbf {b}}}, \tilde{C})\), \(f(\cdot ) \rightarrow \epsilon ( \cdot )^p\), and \(M = \overline{B}^p\) yields the first part of the theorem.
For the second part of the theorem, observe that, conditional on the sample, the event \(\tilde{\mathbf {x}}\) is not a \(z_N + \alpha \)-approximate equilibrium is equivalent to the event that \(\epsilon _N > z_N + \alpha \). Now use Markov’s inequality and apply the first part of the theorem. \(\square \)
Proof of Theorem 8
Proof
Consider the first part of the theorem.
By construction, \(\hat{\mathbf {x}}\) solves \({{\mathrm{\text {VI}}}}(\mathbf {f}(\cdot , \theta _N), \mathbf {A}_{N+1}, {\mathbf {b}}_{N+1}, C_{N+1})\). The theorem, then, claims that \(\mathbf {x}_{N+1}\) is \(\delta ^\prime \equiv \sqrt{\frac{z_N}{\gamma }}\) near a solution to this VI. From Theorem 1, if \(\mathbf {x}_{N+1}\) were not \(\delta ^\prime \) near a solution, then it must be the case that \(\epsilon ( \mathbf {f}(\cdot , \theta _N), \mathbf {x}_{N+1}, \mathbf {A}_{N+1}, {\mathbf {b}}_{N+1}, C_{N+1}) > z_N\). By Theorem 6, this happens only with probability \(\beta (\alpha )\).
The second part is similar to the first with Theorem 6 replaced by Theorem 7. \(\square \)
Appendix 2: Casting structural estimation as an inverse variational inequality
In the spirit of structural estimation, assume there exists a true
\({\varvec{\theta }}^* \in \Theta \) that generates solutions \(\mathbf {x}_j^*\) to \({{\mathrm{\text {VI}}}}(\mathbf {f}(\cdot , \theta ^*), \mathbf {A}^*_j, {\mathbf {b}}^*_j, C^*_j)\). We observe \((\mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j)\) which are noisy versions of these true parameters. We additionally are given a precise mechanism for the noise, e.g., that
$$\begin{aligned} \mathbf {x}_j = \mathbf {x}^*_j + \Delta \mathbf {x}_j, \quad \mathbf {A}_j = \mathbf {A}^*_j + \Delta \mathbf {A}_j, \quad {\mathbf {b}}_j = {\mathbf {b}}^*_j + \Delta {\mathbf {b}}_j, \quad C_j = C_j^*, \end{aligned}$$
where \((\Delta \mathbf {x}_j, \Delta \mathbf {A}_j, \Delta {\mathbf {b}}_j)\) are i.i.d. realizations of a random vector \((\tilde{\Delta \mathbf {x}}, \tilde{\Delta \mathbf {A}}, \tilde{\Delta {\mathbf {b}}})\) and \(\tilde{\Delta \mathbf {x}}, \tilde{\Delta \mathbf {A}}, \tilde{\Delta {\mathbf {b}}}\) are mutually uncorrelated.
We use Theorem 2 to estimate \({\varvec{\theta }}\) under these assumptions by solving
$$\begin{aligned} \min \limits _{\mathbf {y}\ge \mathbf {0}, {\varvec{\theta }}\in \Theta , \Delta \mathbf {x}, \Delta \mathbf {A}, \Delta {\mathbf {b}}} \quad&\left\| \begin{pmatrix} \tilde{\Delta \mathbf {x}_j} \\ \tilde{\Delta \mathbf {A}_k} \\ \tilde{\Delta {\mathbf {b}}_j} \end{pmatrix}_{j=1, \ldots , N} \right\| \nonumber \\ \text {s.t.} \quad&(\mathbf {A}_j - \Delta \mathbf {A}_j)^T \mathbf {y}_j \le _{C_j} \mathbf {f}(\mathbf {x}_j - \Delta \mathbf {x}_j, {\varvec{\theta }}), \ j=1, \ldots , N,\nonumber \\&(\mathbf {x}_j - \Delta \mathbf {x}_j)^T \mathbf {f}(\mathbf {x}_j - \Delta \mathbf {x}_j, {\varvec{\theta }}) = {\mathbf {b}}_j^T \mathbf {y}_j, j =1, \ldots , N, \end{aligned}$$
(38)
where \(\Vert \cdot \Vert \) refers to some norm. Notice this formulation also supports the case where potentially some of the components of \(\mathbf {x}\) are unobserved; simply replace them as optimization variables in the above. In words, this formulation assumes that the “de-noised” data constitute a perfect equilibrium with respect to the fitted \({\varvec{\theta }}\).
We next claim that if we assume all equilibria occur on the strict interior of the feasible region, Problem (38) is equivalent to a least-squares approximate solution to the equations \(\mathbf {f}(\mathbf {x}^*) = \mathbf {0}\). Specifically, when \(\mathbf {x}^*\) occurs on the interior of \(\mathcal {F}\), the VI condition Eq. (1) is equivalent to the equations \(\mathbf {f}(\mathbf {x}^*) = \mathbf {0}.\) At the same time, by Theorem 2, Eq. (1) is equivalent to the system (8, 9) with \(\epsilon = 0\) which motivated the constraints in Problem (38). Thus, Problem (38) is equivalent to finding a minimal (with respect to the given norm) perturbation which satisfies the structural equations.
We can relate this weighted least-squares problem to some structural estimation techniques. Indeed, [20] and [37] observed that many structural estimation techniques can be reinterpreted as a constrained optimization problem which minimizes the size of the perturbation necessary to make the observed data satisfy the structural equations, and, additionally, satisfy constraints motivated by orthogonality conditions and the generalized method of moments (GMM). In light of our previous comments, if we augment Problem (38) with the same orthogonality constraints, and all equilibria occur on the strict interior of the feasible region, the solutions to this problem will coincide traditional estimators.
Of course, some structural estimation techniques incorporate even more sophisticated adaptations. They may also pre-process the data (e.g., 2 stage least squares technique in econometrics) incorporate additional constraints (e.g. orthogonality of instruments approach), or tune the choice of norm in the least-squares computation (two-stage GMM estimation). These application-specific adaptations improve the statistical properties of the estimator given certain assumptions about the data generating process. What we would like to stress is that, provided we make the same adaptations to Problem (38)—i.e., preprocess the data, incorporate orthogonality of instruments, and tune the choice of norm—and provided that all equilibria occur on the interior, the solution to Problem (38) must coincide exactly with these techniques. Thus, they necessarily inherit all of the same statistical properties.
Recasting (at least some) structural estimation techniques in our framework facilitates a number of comparisons to our proposed approach based on Problem (12). First, it is clear how our perspective on data alters the formulation. Problem (38) seeks minimal perturbations so that the observed data are exact equilibria with respect to \({\varvec{\theta }}\), while Problem (12) seeks a \({\varvec{\theta }}\) that makes the observed data approximate equilibria and minimizes the size of the approximation. Secondly, the complexity of the proposed optimization problems differs greatly. The complexity of Problem (38) depends on the dependence of \(\mathbf {f}\) on \(\mathbf {x}\) and \({\varvec{\theta }}\) (as opposed to just \({\varvec{\theta }}\) for (12)), and there are unavoidable non-convex, bilinear terms like \(\Delta \mathbf {A}_j^T \mathbf {y}_j\). These terms are well-known to cause difficulties for numerical solvers. Thus, we expect that solving this optimization to be significantly more difficult than solving Problem (12). Finally, as we will see in the next section, Problem (12) generalizes naturally to a nonparametric setting.
Appendix 3: Omitted formulations
Formulation from Sect. 8.1
Let \(\xi ^{med}\) be the median value of \(\xi \) over the dataset. Breaking ties arbitrarily, \(\xi ^{med}\) occurs for some observation \(j = j^{med}\). Let \(p_1^{med}, p_2^{med}, \xi ^{med}_1, \xi ^{med}_2\) be the corresponding prices and demand shocks at time \(j^{med}\). (Recall that in this section \(\xi = \xi _1 = \xi _2\).) These definitiosn make precise what we mean in the main text by “fixing other variables to the median observation. Denote by \(\underline{p}_1, \underline{p}_2\) the minimum prices observed over the data set.
Our parametric formation in Sect. 8.1 is
$$\begin{aligned}&\min \limits _{\mathbf {y}, \varvec{\epsilon }, {\varvec{\theta }}_1, {\varvec{\theta }}_2 } \quad \Vert \varvec{\epsilon }\Vert _\infty \end{aligned}$$
(39a)
$$\begin{aligned}&\text {s.t.} \quad \mathbf {y}^j \ge \mathbf {0}, \quad j = 1, \ldots , N,\nonumber \\&y_i^j \ge M_i(p_1^j, p_2^j, \xi ^j; {\varvec{\theta }}_i), \quad i = 1, 2, \ j = 1, \ldots , N,\nonumber \\&\quad \sum \limits _{i=1}^2\overline{p}^j y_i^j -(p_i^j) M_i(p_1^j, p_2^j, \xi ^j; {\varvec{\theta }}_i) \le \epsilon _j, \quad j = 1, \ldots , N,\nonumber \\&\quad M_1(p_1^j, p_2^{med}, \xi ^{med}; {\varvec{\theta }}_1) \!\ge \! M_1(p_1^k, p_2^{med}, \xi ^{med}; {\varvec{\theta }}_1), \; \forall 1 \!\le \! j,k \!\le \! N \text { s.t. } p_1^j \le p_1^k,\end{aligned}$$
(39b)
$$\begin{aligned}&\quad M_2(p_1^{med}, p_2^j, \xi ^{med}; {\varvec{\theta }}_2) \!\ge \! M_2(p_1^{med}, p_2^k, \xi ^{med}; {\varvec{\theta }}_2), \; \forall 1 \!\le \! j,k \le N \text { s.t. } p_2^j \le p_2^k,\end{aligned}$$
(39c)
$$\begin{aligned}&\quad M_1(\underline{p}_1, p_2^{med}, \xi ^{med}; {\varvec{\theta }}_1) = M^*_1(\underline{p}_1, p_2^{med}, \xi ^{med}_1; {\varvec{\theta }}^*_1)\end{aligned}$$
(39d)
$$\begin{aligned}&\quad M_2(p_1^{med}, \underline{p}_2, \xi ^{med}; {\varvec{\theta }}_2) = M^*_2(p_1^{med}, \underline{p}_2, \xi ^{med}_2 ; {\varvec{\theta }}^*_2) \end{aligned}$$
(39e)
Here \(M_1\) and \(M_2\) are given by Eq. (32). Notice, for this choice, the optimization is a linear optimization problem.
Equations (39b) and (39c) constrain the fitted function to be non-decreasing in the firm’s own price. Equations (39d) and (39e) are normalization conditions. We have chosen to normalize the functions to be equal to the true functions at this one point to make the visual comparisons easier. In principle, any suitable normalization can be used.
Our nonparametric formulation is similar to the above, but we replace
-
The parametric \(M_1(\cdot , {\varvec{\theta }}_1), M_2(\cdot , {\varvec{\theta }}_2)\) with nonparametric \(M_1(\cdot ), M_2(\cdot ) \in \mathcal {H}\)
-
The objective by \(\Vert \varvec{\epsilon }\Vert _1 + \lambda ( \Vert M_1 \Vert _{\mathcal {H}}+ \Vert M_2 \Vert _{\mathcal {H}})\).
By Theorem 4 and the discussion in Sect. 6, we can rewrite this optimization as a convex quadratic program.
Formulation from Sect. 8.2
Our parametric formulation is nearly identical to the parametric formulation in Appendix “Formulation from Sect. 8.1”, with the following changes:
-
Replace Eq. (39a) by \(\Vert \varvec{\epsilon }\Vert _\infty + \lambda ( \Vert {\varvec{\theta }}_1 \Vert _1 + \Vert {\varvec{\theta }}_2\Vert _1)\)
-
Replace the definition of \(M_1, M_2\) by Eq. (33).
Our nonparametric formulation is identical to the nonparametric formulation of the previous section.