Appendix
Proof Proposition 3.1
(a) Let us assume that the general heterogeneous choice model with predictor
$$\begin{aligned} \eta _i = ({\alpha _{00}+x_{i0}\alpha _0+{\varvec{x}}_i^T\varvec{\alpha }}+ x_{i0}({\varvec{x}}_i^{S})^T \varvec{\alpha }^{S})/\exp (x_{i0}\gamma ) \end{aligned}$$
(13)
holds, where for \(S=\{j_1,\ldots ,j_m\}\) one has \(({\varvec{x}}_i^{S})^T=(x_{ij_1},\ldots ,x_{ij_m})\), and interaction effects \((\varvec{\alpha }^{S})^T=(\alpha _{0j_1},\ldots , \alpha _{0j_m})\). One can define new parameters
$$\begin{aligned} \beta _{00}&= \alpha _{00}, \quad \beta _{0} = \alpha _{00}\frac{1-e^{\gamma }}{e^{\gamma }}- \frac{\alpha _{0}}{e^{\gamma }},\\ \beta _j&=\alpha _j,\quad j=1,\ldots ,p,\\ \beta _{0j}&=\alpha _{0j}\;\; \text {for}\; j \in S, \quad \beta _{0j}=\alpha _{j} \frac{1-e^{\gamma }}{e^{\gamma }}\;\; \text {for}\; j \notin S, \end{aligned}$$
When using these parameters as parameters in the interaction model
$$\begin{aligned} {\text {logit}}(\pi _i)=\beta _{00}+x_{i0}\beta _0+x_{i1}\beta _1+\cdots +x_{ip}\beta _p+x_{i0}x_{i1}\beta _{01}+\cdots +x_{i0}x_{ip}\beta _{0p}, \end{aligned}$$
(14)
one obtains that the linear predictor in (14) is the same as the linear predictor in the general heterogeneous choice model (13). Thus, the interaction model holds.
In addition, for \(j \notin S\) the relation \(\beta _{0j}/\beta _j= (1-e^{\gamma })/(e^{\gamma })\) holds. Let \(\{1,\ldots ,p\}\) be partitioned into the disjunct subsets S and \({\tilde{S}}=\{1,\ldots ,p\} {\setminus } S\). Then for pairs \(j,s \in {\tilde{S}}\) the constraints
$$\begin{aligned} \beta _{0j}/\beta _{j}=\cdots =\beta _{0s}/\beta _{s}\quad \text {for all}\;\; j,s \in {\tilde{S}} \end{aligned}$$
(15)
hold. Of course it is only a constraint if \({\tilde{S}} \ge 2\).
(b) Let us now assume that the interaction model (14) with constraints (15) holds.
Case 1 If \(|S|=p, |{\tilde{S}}|=0\) one obtains with the parameters defined by \(\alpha _{00}=\beta _{00}, \alpha _{0}=\beta _{0}\)\(\alpha _{j}=\beta _{j}\), \(\alpha _{0j}=\beta _{0j}\), \(j=1,\ldots , p\) that the linear predictor \(\eta _i = ({\alpha _{00}+x_{i0}\alpha _0+{\varvec{x}}_i^T\varvec{\alpha }}+ x_{i0}{\varvec{x}}_i^T \varvec{\alpha })/\exp (x_{i0}\gamma )\) is equivalent to the predictor in (14), which means that the heterogeneous choice model holds with \(\gamma \) fixed by \(\gamma =1\) since it is not identified.
Case 2 Let \(|S|=p-1, |{\tilde{S}}|=1\) hold and parameters be defined by
$$\begin{aligned} \alpha _{00}&=\beta _{00}, \quad \alpha _{0} = \beta _{00}(e^{\gamma }-1)+e^{\gamma }\beta _{0}, \\ \alpha _j&=\beta _j, \quad j=1,\ldots ,p-1, \quad \alpha _p=e^{\gamma }(\beta _{p}+\beta _{0p}),\\ \alpha _{0j}&=\beta _{j}(e^{\gamma }-1) + e^{\gamma }\beta _{0j}, \quad j=1,\ldots ,p-1, \\ e^{\gamma }&= \beta _{p}/(\beta _{p}+\beta _{0p}). \end{aligned}$$
Using these parameters in the predictor \(\eta _i = ({\alpha _{00}+x_{i0}\alpha _0+{\varvec{x}}_i^T\varvec{\alpha }}+ x_{i0}(x_{i1}\ldots , \ldots , x_{i,p-1}) \varvec{\alpha }^{S})/\exp (x_{i0}\gamma )\) yields the predictor in (14). Thus the interaction model is represented as a heterogeneous chioce model, in which \(\alpha _{0p}=0\). It should be noted that one could have omitted another interaction parameter. Without loss of generality we chose the parameter \(\alpha _{0p}\).
Case 3 Let \(|S| \le p-2, |{\tilde{S}}|=m \ge 2\) hold. Without loss of generality let \({\tilde{S}}=\{p-m+1,\ldots ,p\}\). Let parameters be defined by
$$\begin{aligned} \alpha _{00}&=\beta _{00}, \quad \alpha _{0} = \beta _{00}(e^{\gamma }-1)+e^{\gamma }\beta _{0}, \\ \alpha _j&=\beta _j, \quad j=1,\ldots ,p, \\ \alpha _{0j}&=\beta _{j}(e^{\gamma }-1) + e^{\gamma }\beta _{0j},\quad \text {for}\;\; j \in S. \end{aligned}$$
In addition, \(\gamma \) is defined by
$$\begin{aligned} e^{\gamma } = \beta _{j}/(\beta _{j}+\beta _{0j}) \quad \text {for}\;\; j \in {\tilde{S}}, \end{aligned}$$
which is possible since \((1-e^{\gamma })/e^{\gamma }= \beta _{0j}/\beta _{j}\), and \(\beta _{0j}/\beta _{j}\) has the same value for all \(j \in {\tilde{S}}\). Using these parameters in the predictor \(\eta _i = ({\alpha _{00}+x_{i0}\alpha _0+{\varvec{x}}_i^T\varvec{\alpha }}+ x_{i0}(x_{i1}\ldots , \ldots , x_{i,p-m}) \varvec{\alpha }^{S})/\exp (x_{i0}\gamma )\) yields the predictor in (14). Therefore, it is shown that the heterogeneous choice model with interactions \(\alpha _{0,p-m+1}=\cdots =\alpha _{0,p}=0\) holds.
Proof Proposition 3.2
Let us consider the model (10) and assume that one of the interaction parameters is zero. Without loss of generality we assume \(\alpha _{0p}=0\). Then one has the model
$$\begin{aligned} {\text {logit}}(\pi _i)=\frac{\alpha _{00} +x_{i0}\alpha _0+x_{i1}\alpha _1+\cdots +x_{ip}\alpha _p+x_{i0}x_{i1}\alpha _{01}+\cdots +x_{i0}x_{i,p-1}\alpha _{0,p-1}}{\exp (x_{i0}\gamma )}. \end{aligned}$$
Let \(\alpha _{00},\ldots ,\alpha _{0,p-1},\gamma \) and \({{\tilde{\alpha }}}_{00},\ldots ,{{\tilde{\alpha }}}_{0,p-1},{{\tilde{\gamma }}}\) be two parameterizations of the model. It has to be shown that the two parameterizations are identical.
Let \(\pi (x_{ij})\) denote the probability of observing \(Y_i=1\) when the jth covariate has value \(x_{ij}\) and \(\pi (x_{ij}+1)\) denote the probability if the jth covariate has value \(x_{ij}+1\); all other variables are kept fixed. In addition we let \(\pi (x_{ij}, x_{i0}=g)\) denote the probability of observing \(Y_i=1\) when the jth covariate has value \(x_{ij}\) and \(x_{i0}=g\), correspondingly \(\pi (x_{ij}+1)\) denotes the probability if the jth covariate has value \(x_{ij}+1\) and \(x_{i0}=g\); all other variables are kept fixed.
(1) One obtains immediately
$$\begin{aligned} {{\text {logit}}(\pi (x_{ip}+1))-{\text {logit}}(\pi (x_{ip})= e^{-x_{i0}\gamma }}\alpha _p \end{aligned}$$
and therefore, provided \(\alpha _p \ne 0\),
$$\begin{aligned} \frac{{\text {logit}}(\pi (x_{ip}+1,x_{i0}=1))-{\text {logit}}(\pi (x_{ip},x_{i0}=1))}{{\text {logit}}(\pi (x_{ip}+1,x_{i0}=0))-{\text {logit}}(\pi (x_{ip},x_{i0}=0))}= e^{-\gamma }. \end{aligned}$$
Since the equations hold for both parameterizations one obtains \(e^{\gamma }=e^{{{\tilde{\gamma }}}}\) and therefore \(\gamma ={{\tilde{\gamma }}}\).
(2) For all variables \(j \ne p\) one has
$$\begin{aligned} {{\text {logit}}(\pi (x_{ij}+1))-{\text {logit}}(\pi (x_{ij})= e^{-x_{i0}\gamma }}(\alpha _j+x_{i0}\alpha _{0j}). \end{aligned}$$
This yields for \(x_{i0}=0\) that \(\alpha _j={{\tilde{\alpha }}}_j\) holds, and for \(x_{i0}=1\) that \(\alpha _{0j}={\tilde{\alpha }}_{0j}\) holds.
(3) The only left parameters, which still to be investigated, are \(\alpha _{00}\) and \(\alpha _0\). By using for \(x_{i0}=0\)
$$\begin{aligned} {\text {logit}}(\pi _i)=\alpha _{00}+x_{i1}\alpha _1+\cdots +x_{ip}\alpha _p \end{aligned}$$
and for \(x_{i0}=1\)
$$\begin{aligned} {\text {logit}}(\pi _i)=\frac{\alpha _{00} +x_{i0}\alpha _0+x_{i1}\alpha _1+\cdots +x_{ip}\alpha _p+x_{i0}x_{i1}\alpha _{01}+\cdots +x_{i0}x_{i,p-1}\alpha _{0,p-1}}{\exp (x_{i0}^T\gamma )} \end{aligned}$$
one obtains \(\alpha _{00}={\tilde{\alpha }}_{00}\) and \(\alpha _{0}={\tilde{\alpha }}_{0}\), which concludes the proof.