Appendix 1: Mathematical derivation of \({\mathrm{OR}}^{{\mathrm{PNDE}}}_{x,x^{\star }}\) and \({\mathrm{OR}}^{{\mathrm{TNIE}}}_{x,x^{\star }}\)
Given the standard causal inference assumptions of Sect. 2, the natural effects can be non-parametrically identified by using Pearl’s mediation formula (Pearl 2001, 2010). For a binary mediator, the expression identifying the pure natural direct effect is
$$\begin{aligned} {\mathrm{OR}}_{x,x^{\star }}^{{\mathrm{PNDE}}} = \frac{\overbrace{\sum _{w}P(Y=1\mid X=x,W=w)P(W=w\mid X=x^{\star })/\sum _{w}P(Y=0\mid X=x,W=w)P(W=w\mid X=x^{\star })}^{Q_1}}{\underbrace{\sum _{w}P(Y=1\mid X=x^{\star },W=w)P(W=w\mid X=x^{\star })/\sum _{w}P(Y=0\mid X=x^{\star },W=w)P(W=w\mid X=x^{\star })}_{Q_2}}. \end{aligned}$$
Given the parametric models assumed, the numerator of the expression above can be written as
$$\begin{aligned} \begin{aligned} Q_1&= \frac{P(Y=1\mid X=x,W=1)P(W=1\mid X=x^{\star })+P(Y=1\mid X=x,W=0)P(W=0\mid X=x^{\star })}{P(Y=0\mid X=x,W=1)P(W=1\mid X=x^{\star })+P(Y=0\mid X=x,W=0)P(W=0\mid X=x^{\star })} \\&= \frac{\frac{\exp \{\beta _0+\beta _w+(\beta _x+\beta _{xw})x\}}{1+\exp \{\beta _0+\beta _w+(\beta _x+\beta _{xw})x\}}\times \frac{\exp (\gamma _0+\gamma _xx^{\star })}{1+\exp (\gamma _0+\gamma _xx^{\star })}+\frac{\exp (\beta _0+\beta _xx)}{1+\exp (\beta _0+\beta _xx)}\times \frac{1}{1+\exp (\gamma _0+\gamma _xx^{\star })}}{\frac{1}{1+\exp \{\beta _0+\beta _w+(\beta _x+\beta _{xw})x\}}\times \frac{\exp (\gamma _0+\gamma _xx^{\star })}{1+\exp (\gamma _0+\gamma _xx^{\star })}+\frac{1}{1+\exp (\beta _0+\beta _xx)}\times \frac{1}{1+\exp (\gamma _0+\gamma _xx^{\star })}} \\&= \frac{\exp \{\beta _0+\beta _w+(\beta _x+\beta _{wx})x\}\exp (\gamma _0+\gamma _xx^{\star })\{1+\exp (\beta _0+\beta _xx)\}+\exp (\beta _0+\beta _xx)[1+\exp \{\beta _0+\beta _w+(\beta _x+\beta _{xw})x \}]}{\exp (\gamma _0+\gamma _xx^{\star })\{1+\exp (\beta _0+\beta _xx)\}+1+\exp \{\beta _0+\beta _w+(\beta _x+\beta _{xw})x \}} \\&= \exp (\beta _0+\beta _xx)A_{x,x^{\star }}. \end{aligned} \end{aligned}$$
For the denominator, an analogous calculation leads to \(Q_2 = \exp (\beta _0+\beta _xx^{\star })A_{x^{\star },x^{\star }}\) and therefore to \(\log {\mathrm{OR}}_{x,x^{\star }}^{{\mathrm{PNDE}}}=\log Q_1 - \log Q_2 = \beta _x(x-x^{\star }) + \log (A_{x,x^{\star }}/A_{x^{\star },x^{\star }})\), which proves Eq. (9). Derivations for the total natural indirect effect are similar since we have
$$\begin{aligned} {\mathrm{OR}}_{x,x^{\star }}^{{\mathrm{TNIE}}} = \frac{\overbrace{\sum _{w}P(Y=1\mid X=x,W=w)P(W=w\mid X=x)/\sum _{w}P(Y=0\mid X=x,W=w)P(W=w\mid X=x)}^{Q_3}}{\underbrace{\sum _{w}P(Y=1\mid X=x,W=w)P(W=w\mid X=x^{\star })/\sum _{w}P(Y=0\mid X=x,W=w)P(W=w\mid X=x^{\star })}_{Q_1}}, \end{aligned}$$
with \(Q_3=\exp (\beta _0+\beta _xx)A_{x,x}\), leading to \(\log {\mathrm{OR}}_{x,x^{\star }}^{{\mathrm{TNIE}}}=\log Q_3 - \log Q_1 = \log (A_{x,x}/A_{x,x^{\star }})\), that is, Eq. (10).
To prove that \(A_{x,x}\) equals the inverse risk ratio term in (14), let
$$\begin{aligned} \begin{aligned} g_y(x)= y(\beta _{w}+\beta _{xw}x)+ \log \left( \frac{1+\exp (\beta _{0}+\beta _{x}x)}{1+\exp (\beta _{0}+\beta _{x}x+\beta _{w}+\beta _{xw}x)} \right) +\gamma _{0}+\gamma _{x}x \end{aligned} \end{aligned}$$
be the parametric expression for \(\log \frac{P(W=1\mid Y=y, X=x)}{P(W=0\mid Y=y, X=x)}\) given by Stanghellini and Doretti (2019). Then, it is straightforward to prove that
$$\begin{aligned} \begin{aligned} A_{x,x} = \frac{1+\exp \{g_1(x)\}}{1+\exp \{g_0(x)\}}, \end{aligned} \end{aligned}$$
with the right-hand side term being indeed the probability ratio in (14).
The algebraic developments above remain unchanged once confounding-removing covariates C are added, provided that linear predictors are suitably modified. Specifically, the conditional versions of \(Q_1\), \(Q_2\) and \(Q_3\) become
$$\begin{aligned} \begin{aligned} Q_{1\mid c}&= e_y(x,0,z)A_{x,x^{\star }\mid c} \\ Q_{2\mid c}&= e_y(x^{\star },0,z)A_{x^{\star },x^{\star }\mid c} \\ Q_{3\mid c}&= e_y(x,0,z)A_{x,x\mid c}, \end{aligned} \end{aligned}$$
where \(e_y(x,w,z)\) and \(A_{x,x^{\star }\mid c}\) are as in Sect. 2. The derivation of Eqs. (17) and (18) is then immediate. Notice that this approach can be immediately generalized to account for multiple confounders; it suffices to replace z with \(\varvec{z}=(z_{1},\dots ,z_{p})'\) and v with \(\varvec{v}=(v_1,\dots ,v_q)'\) in the formulas above, substituting every product involving z and v with the corresponding row-column product (for instance, \(\beta _{z}z\) is replaced by \(\varvec{\beta }_{z}'\varvec{z}\) with \(\varvec{\beta }_{z}=(\beta _{z_{1}},\dots ,\beta _{z_{p}})'\) and so on).
Appendix 2: Natural effects and the rare outcome assumption
Recalling that \(e_y(x,w)=\exp (\beta _0+\beta _xx+\beta _ww+\beta _{xw}xw)\) and \(e_w(x)=\exp (\gamma _0+\gamma _xx)\), the odds ratio pure natural direct effect is given by
$$\begin{aligned} \begin{aligned} {\mathrm{OR}}^{{\mathrm{PNDE}}}_{x,x^{\star }}&= \exp \{\beta _x(x-x^{\star })\} \\&\quad \times \frac{e(\beta _w+\beta _{xw}x)e_w(x^{\star })\{1+e_y(x,0)\}+1+e_y(x,1)}{e_w(x^{\star })\{1+e_y(x,0)\}+1+e_y(x,1)} \\&\quad \times \frac{e_w(x^{\star })\{1+e_y(x^{\star },0)\}+1+e_y(x^{\star },1)}{e(\beta _w+\beta _{xw}x^{\star })e_w(x^{\star })\{1+e_y(x^{\star },0)\}+1+e_y(x^{\star },1)}. \end{aligned} \end{aligned}$$
If all the four terms \(e_y(x,0)\), \(e_y(x,1)\), \(e_y(x^{\star },0)\) and \(e_y(x^{\star },1)\) tend to zero, then the expression above reduces to
$$\begin{aligned} {\mathrm{OR}}^{{\mathrm{PNDE}}}_{x,x^{\star }} \approx \exp \{\beta _x(x-x^{\star })\}\frac{1+e(\beta _w+\beta _{xw}x)e_w(x^{\star })}{1+e(\beta _w+\beta _{xw}x^{\star })e_w(x^{\star })}, \end{aligned}$$
(20)
which is the expression in Valeri and VanderWeele (2013, p.150). The same logic applies to the odds ratio total natural direct effect, which is
$$\begin{aligned} \begin{aligned} {\mathrm{OR}}^{{\mathrm{TNDE}}}_{x,x^{\star }}&= \exp \{\beta _x(x-x^{\star })\} \\&\quad \times \frac{e(\beta _w+\beta _{xw}x)e_w(x)\{1+e_y(x,0)\}+1+e_y(x,1)}{e_w(x)\{1+e_y(x,0)\}+1+e_y(x,1)} \\&\quad \times \frac{e_w(x)\{1+e_y(x^{\star },0)\}+1+e_y(x^{\star },1)}{e(\beta _w+\beta _{xw}x^{\star })e_w(x)\{1+e_y(x^{\star },0)\}+1+e_y(x^{\star },1)}. \end{aligned} \end{aligned}$$
Again, if all the four terms above tend to 0 this quantity tends to
$$\begin{aligned} {\mathrm{OR}}^{{\mathrm{TNDE}}}_{x,x^{\star }} \approx \exp \{\beta _x(x-x^{\star })\}\frac{1+e(\beta _w+\beta _{xw}x)e_w(x)}{1+e(\beta _w+\beta _{xw}x^{\star })e_w(x)} \end{aligned}$$
(21)
in analogy with (20). The second factors of the right-hand sides of both (20) and (21) are equal to 1 if \(\beta _{xw}=0\). This means that under the rare outcome assumption the odds ratio natural direct effects are given by a product between \({\mathrm{OR}}^{{\mathrm{CDE}}}_{x,x^{\star }}(0)=\exp \{\beta _x(x-x^{\star })\}\) (governed by \(\beta _x\) only) and a residual quantity governed by the interaction coefficient \(\beta _{xw}\). This clear separation does not occur in the exact formulae, where both these residual terms are not generally equal to 1 when \(\beta _{xw}=0\).
The odds ratio total natural indirect effect is
$$\begin{aligned} \begin{aligned} {\mathrm{OR}}^{{\mathrm{TNIE}}}_{x,x^{\star }}&= \frac{e(\beta _w+\beta _{xw}x)e_w(x)\{1+e_y(x,0)\}+1+e_y(x,1)}{e_w(x)\{1+e_y(x,0)\}+1+e_y(x,1)} \\&\quad \times \frac{e_w(x^{\star })\{1+e_y(x,0)\}+1+e_y(x,1)}{e(\beta _w+\beta _{xw}x)e_w(x^{\star })\{1+e_y(x,0)\}+1+e_y(x,1)}. \end{aligned} \end{aligned}$$
Contrary to the case of natural direct effects, we notice that \(e_y(x^{\star },0)\) and \(e_y(x^{\star },1)\) are not present in the expression above. Thus, we only need \(e_y(x,0)\) and \(e_y(x,1)\) to tend to zero in order to obtain the expression in Valeri and VanderWeele (2013, p.150)
$$\begin{aligned} {\mathrm{OR}}^{{\mathrm{TNIE}}}_{x,x^{\star }} \approx \frac{\{1+e_w(x^{\star })\}\{1+e_w(x)e(\beta _w+\beta _{xw}x)\}}{\{1+e_w(x)\}\{1+e_w(x^{\star })e(\beta _w+\beta _{xw}x)\}}. \end{aligned}$$
This is particularly relevant when X is binary, since it means that we only need \(P(Y=1\mid X=1,W=0)\approx 0\) and \(P(Y=1\mid X=1,W=1)\approx 0\), but not necessarily \(P(Y=1\mid X=0,W=0)\approx 0\) and \(P(Y=1\mid X=0,W=1)\approx 0\).
Similarly, in the expression of the odds ratio pure natural indirect effect
$$\begin{aligned} \begin{aligned} {\mathrm{OR}}^{{\mathrm{PNIE}}}_{x,x^{\star }}&= \frac{e(\beta _w+\beta _{xw}x^{\star })e_w(x)\{1+e_y(x^{\star },0)\}+1+e_y(x^{\star },1)}{e_w(x)\{1+e_y(x^{\star },0)\}+1+e_y(x^{\star },1)} \\&\quad \times \frac{e_w(x^{\star })\{1+e_y(x^{\star },0)\}+1+e_y(x^{\star },1)}{e(\beta _w+\beta _{xw}x^{\star })e_w(x^{\star })\{1+e_y(x^{\star },0)\}+1+e_y(x^{\star },1)} \end{aligned} \end{aligned}$$
the terms \(e_y(x,0)\) and \(e_y(x,1)\) are not present, so it suffices that \(e_y(x^{\star },0)\) and \(e_y(x^{\star },1)\) tend to zero to obtain
$$\begin{aligned} {\mathrm{OR}}^{{\mathrm{PNIE}}}_{x,x^{\star }} \approx \frac{\{1+e_w(x^{\star })\}\{1+e_w(x)e(\beta _w+\beta _{xw}x^{\star })\}}{\{1+e_w(x)\}\{1+e_w(x^{\star })e(\beta _w+\beta _{xw}x^{\star })\}}. \end{aligned}$$
For the case of binary X, this means that the conditions \(P(Y=1\mid X=0,W=0)\approx 0\) and \(P(Y=1\mid X=0,W=1)\approx 0\) are needed, while \(P(Y=1\mid X=1,W=0)\approx 0\) and \(P(Y=1\mid X=1,W=1)\approx 0\) are not. Overall, it is possible to conclude that the two natural indirect effects need only different subsets of the rare outcome assumption (instead of the whole assumption as traditionally defined) to be identified by the approximate parametric formulae.
Appendix 3: Variance–covariance matrix of the estimated causal effects
Denoting by \(\varvec{\beta }=(\beta _{0},\beta _{x},\beta _{z},\beta _{xz},\beta _{w},\beta _{xw},\beta _{wz},\beta _{xwz})'\) and \(\varvec{\gamma }=(\gamma _{0},\gamma _{x},\gamma _{v},\gamma _{xv})'\) the two vectors of model parameters, by \(\varvec{\varSigma }_{\hat{\varvec{\beta }}}\) and \(\varvec{\varSigma }_{\hat{\varvec{\gamma }}}\) the variance–covariance matrices of their estimators \(\hat{\varvec{\beta }}\) and \(\hat{\varvec{\gamma }}\), and by
$$\begin{aligned} \varvec{e}=\left( {\mathrm{OR}}^{{\mathrm{PNDE}}}_{x,x^{\star }\mid c},{\mathrm{OR}}^{{\mathrm{TNIE}}}_{x,x^{\star }\mid c},{\mathrm{OR}}^{{\mathrm{TNDE}}}_{x,x^{\star }\mid c},{\mathrm{OR}}^{{\mathrm{PNIE}}}_{x,x^{\star }\mid c},{\mathrm{OR}}^{{\mathrm{TE}}}_{x,x^{\star }\mid c}\right) ^{\prime } \end{aligned}$$
the true causal effect vector, the first-order approximate variance–covariance matrix of the estimator
$$\begin{aligned} \hat{\varvec{e}}=\left( \hat{{\mathrm{OR}}}^{{\mathrm{PNDE}}}_{x,x^{\star }\mid c},\hat{{\mathrm{OR}}}^{{\mathrm{TNIE}}}_{x,x^{\star }\mid c},\hat{{\mathrm{OR}}}^{{\mathrm{TNDE}}}_{x,x^{\star }\mid c},\hat{{\mathrm{OR}}}^{{\mathrm{PNIE}}}_{x,x^{\star }\mid c},\hat{{\mathrm{OR}}}^{{\mathrm{TE}}}_{x,x^{\star }\mid c}\right) ^{\prime } \end{aligned}$$
can be obtained as \(V(\hat{\varvec{e}}) = \varvec{E}\varvec{D}\varvec{\varSigma }\varvec{D}'\varvec{E}'\), where \(\varvec{E}={\mathrm{diag}}(\varvec{e})\),
$$\begin{aligned} \varvec{\varSigma } = \left( \begin{array}{cc} \varvec{\varSigma }_{\hat{\varvec{\beta }}} &{}\quad \varvec{0} \\ \varvec{0} &{}\quad \varvec{\varSigma }_{\hat{\varvec{\gamma }}} \end{array}\right) \end{aligned}$$
and \(\varvec{D}\) is the matrix of derivatives \(\varvec{D} = \partial \log \varvec{e}/\partial \varvec{\theta }'\), with \(\varvec{\theta }=(\varvec{\beta }',\varvec{\gamma }')'\) denoting the whole parameter vector. To obtain \(\varvec{D}\), it is convenient to compute the row vector \(\varvec{d}_{x,x^{\star }\mid c}=\partial A_{x,x^{\star }\mid c}/\partial \varvec{\theta }'\) first. To this end, it is worth to write \(A_{x,x^{\star }\mid c}\) as
$$\begin{aligned} A_{x,x^{\star }\mid c} = \frac{p_1p_2p_3 + p_4}{p_2p_3 + p_4}, \end{aligned}$$
with \(p_1=\exp (\beta _w+\beta _{xw}x+\beta _{wz}z+\beta _{xwz}xz)\), \(p_2=e_{w}(x^{\star },v)\), \(p_3=1+e_{y}(x,0,z)\) and \(p_4=1+e_{y}(x,1,z)\). Under this notation, the three key derivatives to compute are
$$\begin{aligned} \begin{aligned} d_{\beta _{0}}(x,x^{\star }\mid c)&= \frac{\partial A_{x,x^{\star }\mid c}}{\partial \beta _0} \\&= \frac{\{p_1p_2(p_3-1)+p_4-1\}(p_2p_3+p_4)-(p_1p_2p_3 + p_4)\{p_2(p_3-1)+p_4-1\}}{(p_2p_3+p_4)^2} \\ d_{\beta _{w}}(x,x^{\star }\mid c) &= \frac{\partial A_{x,x^{\star }\mid c}}{\partial \beta _w}\\&= \frac{(p_1p_2p_3+p_4-1)(p_2p_3+p_4) - (p_1p_2p_3 + p_4)(p_4-1)}{(p_2p_3+p_4)^2} \\ d_{\gamma _{0}}(x,x^{\star }\mid c) &= \frac{\partial A_{x,x^{\star }\mid c}}{\partial \gamma _0}\\&= \frac{(p_1p_2p_3)(p_2p_3+p_4) - (p_1p_2p_3 + p_4)(p_2p_3)}{(p_2p_3+p_4)^2}, \\ \end{aligned} \end{aligned}$$
while the others can be written as functions thereof. Specifically, a compact form for \(\varvec{d}_{x,x^{\star }\mid c}\) is given by
$$\begin{aligned} \varvec{d}_{x,x^{\star }\mid c} = [(d_{\beta _{0}}(x,x^{\star }\mid c),d_{\beta _{w}}(x,x^{\star }\mid c))\otimes \varvec{d}(x,z)\,,\,d_{\gamma _{0}}(x,x^{\star }\mid c)\varvec{d}(x^{\star },v)], \end{aligned}$$
where \(\otimes\) denotes the Kronecker product and, letting \(\varvec{I}_{2}\) be a diagonal matrix of order 2, \(\varvec{d}(a,b)\) is the row vector returned by the vector-matrix multiplication \(\varvec{d}(a,b) = (1\,,\,a)[(1\,,\,b)\otimes \varvec{I}_{2}]\). The vectors \(\varvec{d}_{x,x\mid c}\), \(\varvec{d}_{x^{\star },x^{\star }\mid c}\) and \(\varvec{d}_{x^{\star },x\mid c}\) can be calculated applying the same formulas above to \(A_{x,x\mid c}\), \(A_{x^{\star },x^{\star }\mid c}\) and \(A_{x^{\star },x\mid c}\) respectively. Then, the matrix \(\varvec{D}\) can be obtained as
$$\begin{aligned} \varvec{D}=\left( \begin{array}{c} \varvec{d}_1 + \varvec{d}_2 \\ \varvec{d}_3 \\ \varvec{d}_1+\varvec{d}_4 \\ \varvec{d}_5 \\ \varvec{d}_6 \end{array}\right) , \end{aligned}$$
where
$$\begin{aligned} \begin{aligned} \varvec{d}_2&= \varvec{d}_{x,x^{\star }}/A_{x,x^{\star }} - \varvec{d}_{x^{\star },x^{\star }}/A_{x^{\star },x^{\star }}\\ \varvec{d}_3&= \varvec{d}_{x,x}/A_{x,x} - \varvec{d}_{x,x^{\star }}/A_{x,x^{\star }}\\ \varvec{d}_4&= \varvec{d}_{x,x}/A_{x,x} - \varvec{d}_{x^{\star },x}/A_{x^{\star },x}\\ \varvec{d}_5&= \varvec{d}_{x^{\star },x}/A_{x^{\star },x} - \varvec{d}_{x^{\star },x^{\star }}/A_{x^{\star },x^{\star }}\\ \varvec{d}_6&= \varvec{d}_{x,x}/A_{x,x} - \varvec{d}_{x^{\star },x^{\star }}/A_{x^{\star },x^{\star }}\\ \end{aligned} \end{aligned}$$
while \(\varvec{d}_1\) is a row vector of the same length of \(\varvec{\theta }\) with all its components set to zero but the ones in the positions of \(\beta _{x}\) and \(\beta _{xz}\), worth \(x-x^{\star }\) and \(z(x-x^{\star })\) respectively. Again, extension to multiple confounders is immediate provided that \(\varvec{\beta }\) and \(\varvec{\gamma }\) are extended as follows:
$$\begin{aligned} \begin{aligned} \varvec{\beta }&= (\beta _{0},\beta _{x},\beta _{z_{1}},\dots ,\beta _{z_{p}},\beta _{xz_{1}},\dots ,\beta _{xz_{p}},\beta _{w},\beta _{xw},\beta _{wz_{1}},\dots ,\beta _{wz_{p}},\beta _{xwz_{1}},\dots ,\beta _{xwz_{p}})'\\ \varvec{\gamma }&= (\gamma _{0},\gamma _{x},\gamma _{v_{1}},\dots ,\gamma _{v_{q}},\gamma _{xv_{1}},\dots ,\gamma _{xv_{p}})'. \end{aligned} \end{aligned}$$
Clearly, in finite-sample analyses one has plug in the estimates
\(\hat{\varvec{\beta }}\) and \(\hat{\varvec{\gamma }}\) everywhere in the formulae above to obtain the estimated variance/covariance matrix \({\hat{V}}(\hat{\varvec{e}})\).