1 Preface

I felt very happy and privileged when invited to submit a paper for the special issue of Sankhya A, celebrating the 100th birth anniversary of Prof. C. R. Rao. Professor Rao contributed, indirectly, a great deal to my research. In 1993 I published an article in the International Statistical Review entitled, “The Role of Sampling Weights when Modeling Survey Data”, Pfeffermann (1993). While working on this article, I came across a short discussion written by the late Professor Steve Fienberg in 1989, stating that “the one exception in which the use of weights may be appropriate is outcome-based sampling, where the sampling plan may be informative for the model of interest.” Professor Fienberg referred to an earlier article by Patil and Rao (1978), which shows how the sampling weights (inverse of the sample inclusion probabilities) feature in the distribution of the sample data in such cases, and how the sample distribution differs from the corresponding population distribution. One of the examples in that article was probability proportional to size (PPS) sampling. This whole area was new to me at the time, but I got really interested in it, and since then I published many articles with colleagues on the relationship between the population distribution, the sample distribution and the non-sample distribution, and how the latter two distributions can be used for inference about the population distribution and for imputation of missing data. See Pfeffermann (2017) for a unified theory with applications to informative sampling and nonignorable nonresponse, small area estimation, observational studies, web panels and more. The present article is another extension of this general theory.

Professor Rao contributed more directly to my work in 2016, when inviting me to co-edit with him the 29th Handbook of Statistics on Survey Samples. This turned out to be a fascinating experience and ended up with a two-volume handbook, containing 41 chapters spread over 1300 pages.

I am very grateful to Professor Rao for his indirect and direct contributions to my career, and I wish him many more years of happy and productive life, with good health.

2 Introduction

In modern sample surveys, sampled units often have the choice of how to respond, whether by telephone, personal interview, mail, fax, or via the Internet. Such surveys are nowadays very popular in many countries, called mixed-mode surveys. See, e.g., De Leeuw (2018) for a recent comprehensive review. Sometimes, the different modes of response are offered sequentially. For example, when starting the survey, all the sampled units are encouraged to respond via the internet. Those who do not respond within a certain time period are approached by telephone and finally, those who couldn’t be contacted or refused to respond via the telephone, are approached for a personal interview.

The term mode-effect encompasses two confounded effects: selection effect - the effect of differences between characteristics of respondents preferring to respond with different modes and consequently, possible differences in the values of reported study variables of interest, and measurement effect - the effect of potentially responding differently by the same person, depending on the mode of response. The motivation behind the use of mixed mode surveys is to possibly increase the response rates and reduce measurement effects, by letting each person to reply by his preferred mode. Clearly, some modes are cheaper and simpler than other, notably, the use of the internet. The literature contains many examples illustrating that different modes of data collection can affect the responses. See also Table 4 in Section 8.1 of the present paper.

If all sampled units respond correctly by their preferred mode, no bias occurs and the use of a mixed-mode survey benefits from the advantages listed above. However, in practice, no responses are obtained from some of the sampled units, with the rates of nonresponse steadily increasing in recent years all over the world. In this case, the use of mixed-mode surveys may introduce large bias in sample estimators, if not accounted for properly. The situation can be even worse in the case of measurement effects. It is often recommended to reduce the measurement effects by a careful questionnaire design across the modes, see, e.g., Dillman and Christian (2003) and De Leeuw et al. (2018), but in the present article we assume a given sample with given responses. Notably, the two effects are confounded, and several studies in the literature attempt to disentangle them, see, e.g., de Leeuw (2005), Hox et al. (2017) and Vannieuwenhuyze et al. (2010, 2014).

In order to reduce the total mode effect, it is common to first determine whether the survey estimates produced from the different modes are indeed different and if they are, to infer which mode is the best in the sense of producing the smallest bias for the variable of interest. The selected mode is then used as a benchmark for correcting the other modes. Vannieuwenhuyze et al. (2014) assume the existence of a mode under which no bias occurs and develop bias corrected estimators by applying the observational study theory of Rosenbaum and Rubin (1983, 1984). In another approach, mode effects are conceptualized as a missing-data problem. Here again, one of the models is assumed to yield correct measurements and is used to impute values for the other modes. For example, Park et al. (2016) consider the case of two modes, use one of them as a benchmark and assume a linear relationship between the observations obtained under the two modes.

The mode comparisons are often based on heuristic arguments. For example, for questions on sensitive topics such as drug use and alcohol consumption, it is sometimes assumed that the mode which provides the highest prevalence of the illicit behavior produces the smallest bias, since the tendency of respondents would be to underreport such behavior, Tourangeau and Yan (2007). An obvious shortcoming of this approach is the underlying assumption that there consists a tendency to underreport sensitive questions. Turner et al. (1998).

An alternative approach to assess mode effects is to compare the estimates obtained from the different modes with known external data, which is assumed to be more accurate. For example, in an income study in Denmark, Kormendi (1988) estimated the bias obtained from the use of telephone and face to face modes by using income data of tax authorities. Biemer (1988, 2001) discusses several limitations of this approach, such as unavailability of appropriate external data for all variables of interest, or differences in definition between the survey measurements and the measurements in the external records.

Proxy surveys by which one member of the household (HH) responds for all the other members of the HH are in common use in HH surveys all over the world. The main motivation in this case is to increase the overall sample size, since information is obtained in principle for all the HH members (Moore, 1988). It also helps in theory to increase the response since if the designated sampled person of the HH cannot be reached or he refuses to respond, another member of the HH is contacted instead. On the other hand, information provided by one member of the HH about another member may be subject to large measurement error (supplying wrong information), and many missing items, (“I don’t know”). There seems to be a common perception that proxy-response is less accurate than self-response, Groves et al. (2004). Kalsbeek and Agans (2007) mention a possible cognitive basis for the better quality of self-response over proxy response. There are, however examples where proxy responses turned out to be more accurate, see e.g., O’Muircheartaigh (1991) and also Table 8 in Section 9 of the present paper.

Finally, we note that there exists an ethical problem with the use of proxy surveys, especially in non-mandatory surveys with no obligation to respond. Have the other members of the HH authorized the responding person to provide all the (possibly sensitive) information about them?

In this article we propose to deal with proxy surveys by considering them as a special case of mode effects with the two main modes defined as “direct response” and “indirect response”, where direct response defines that the person provides information about himself and indirect response defines that the response is obtained by another member of the HH. Within each of the two main modes other modes can be defined, like the mode of response, known characteristics of the responding unit and nonresponse, when no information is obtained from any member of the HH. See Section 9 for an example.

In the following sections we propose and illustrate a new model-based methodology for dealing with mode effects, which does not require a-priori knowledge of a mode providing unbiased estimators. We consider the case of not missing at random (NMAR) nonresponse, by allowing the mode selection probabilities and the probability of nonresponse to depend on the true variable of interest (unobserved under measurement effects) and other explanatory variables. These two parts of our model, the model for the true values and the model for the mode selection account for selection effects, with NMAR nonresponse. Nonresponse is considered as another mode. As stated before, ignoring the NMAR nonresponse already induces bias to the sample estimates even in the absence of measurement effects, i.e., when the responses are correct. In order to account for measurement effects, we further extend our model by modelling the observed responses as a function of the true target variable, the mode selected and known covariates. Note again that with the existence of measurement effects, the true values of the target variable are unknown. To the best of our knowledge, our approach has not been proposed in the literature. Furthermore, when the covariates are known for all the population values from a census or another register, our approach is applicable also for nonprobability samples.

To fit our three-part model we follow the frequency-based approach with the likelihood maximized by application of an EM algorithm. We discuss converges properties of the algorithm and develop the asymptotic properties of the resulting maximum likelihood estimators. Having estimated all the unknown model parameters, we use the estimated model for predicting the population target quantities. We illustrate our approach by use of simulated data and two real samples, for which the true population values of interest are actually known.

In Section 3, we introduce some notation and define our 3- parts model. In Section 4 we describe the estimation of the unknown model parameters and discuss their properties, which are proved in the A at the end of the article. Section 5 considers the estimation of the population parameters of interest, distinguishing between the case where the covariates are known for all the population values of interest and the case where they are known for only the sampled units. Model evaluation is considered in Section 6. In Section 7 we illustrate our approach by simulation experiments, followed by two applications with real data in Sections 8 and 9, with Section 8 focusing on mode-effects and Section 9 on a proxy survey. We conclude with a short summary in Section 10.

3 Models for Selection and Measurement Effects

Consider a finite population U of size N and denote by (Yi,Mi,Xi,Zi) the true outcome variable Y, the response mode M, the auxiliary variables (covariates) X explaining the variability of Y and the covariates Z explaining the variability of M, corresponding to unit i belonging to a sample S of size n, selected from U. In this article we consider the case where Y is binary, taking the values 0 and 1. Suppose first that no measurement effects exist such that every respondent reports his true outcome. Denote by \(\stackrel {\leftrightarrow }{M}\) the number of available modes, with the last mode defining the subsample of non-respondents for which only the covariates are known. For convenience, we assume noninformative sampling as defined in Pfeffermann and Sverchkov (1999), such that \(\Pr (Y_{i}|\mathrm {X}_{i},i\in S)=\Pr (Y_{i}|X_{i})\), but the nonresponse is allowed to be NMAR in the sense that \(\Pr (R_{i}=1|Y_{i}\text {,X}_{i},i\in S)\ne \Pr (R_{i}=1|\mathrm {X}_{i},i\in S)\), where Ri = 1 if sampled unit i responds and Ri = 0 otherwise. We further assume that the mode selection depends not only on the covariates Z, but also on the true outcome Y, in the sense that \(\Pr (M_{i}|Y_{i}\text {,Z}_{i})\ne \Pr (M_{i}|\mathrm {Z}_{i})\). Defining Wi = Xi ∪Zi,

$$ \begin{array}{@{}rcl@{}} \Pr(Y_{i}|{W}_{i},M_{i}) =\frac{\Pr(M_{i}|Y_{i},\mathrm{Z}_{i})}{\Pr(M_{i}|{W}_{i})}\Pr(Y_{i}|\mathrm{X}_{i}) \end{array} $$
(3.1)

where \(\Pr (M_{i}|{W}_{i})={\sum }_{j=0}^{1}\Pr (M_{i}|Y_{i}=j,\mathrm {Z}_{i})\Pr (Y_{i}|\mathrm {X}_{i})\). \(\Pr (Y_{i}|\mathrm {X}_{i})\) is our target population distribution of Y before sampling. It follows from Eq. 3.1 that unless Mi is independent of the outcome in the sense that \(\Pr (M_{i}|Y_{i},\mathrm {Z}_{i})=\Pr (M_{i}|\mathrm {Z}_{i})\), a mode selection effect is present and with the existence of NMAR nonresponse, the mode effects cannot be ignored in the inference process.

In our empirical study we assume a logistic model for \(\Pr (Y_{i}|X_{i})\), and a multivariate logistic model for \(\Pr (M_{i}|Y_{i},\mathrm {Z}_{i})\), with 4 possible values for Mi, including nonresponse.

So far, we assumed no measurement effects. Denote by yi the value measured for responding unit i, which in the case of measurement effects may differ from the true outcome Yi. We account for possible measurement effects by extending the model (3.1) as,

$$ \begin{array}{@{}rcl@{}} \Pr(y_{i}| {W}_{i},M_{i}) =\sum\limits_{j=0}^{1}\Pr(y_{i}| {Y}_{i}= {j},W_{i},M_{i})\frac{\Pr(M_{i}|Y_{i},\mathrm{Z}_{i})}{\Pr(M_{i}| {W}_{i})}\Pr(Y_{i}|\mathrm{X}_{i}). \end{array} $$
(3.2)

Denoting, Di = I(yi = Yi) where I(⋅) is the indicator function, and defining 00 = 1, Eq. 3.2 can be written alternatively as,

$$ \Pr(y_{i} = k| {W}_{i},M_{i}) = \sum\limits_{j=0}^{1}\Pr(D_{i} = j^{k}(1 - j)^{1-k}|Y_{i} = j, {W}_{i},M_{i})\Pr(Y_{i} = j|{W}_{i},M_{i}). $$
(3.3)

Application of Eq. 3.3 requires modelling additionally \(\Pr (D_{i}=1|Y_{i}=j,{W}_{i},M_{i})\) and in our empirical study we again assume a logistic model. Notice that unlike in Eq. 3.1, we now only observe the pair (yi,Mi) and for the non-respondents, only the mode. The target distribution remains \(\Pr (Y_{i}|\mathrm {X}_{i})\).

4 Model Estimation

4.1 The Case of No Measurement Effects

As before, consider first the case of no measurement effects. Adding parameter notation, Eq. 3.1 takes the form,

$$ \Pr(Y_{i}|{W}_{i},M_{i};\alpha_{0},\beta_{0})=\frac{\Pr(M_{i}|Y_{i},\mathrm{Z}_{i};\beta_{0})}{\Pr(M_{i}|{W}_{i};\alpha_{0},\beta_{0})}\Pr(Y_{i}|\mathrm{X}_{i};\alpha_{0}), $$
(4.1)

where α0 ∈A and β0 ∈B are the true parameter vectors. In what follows we assume that the triples (Yi,Wi,Mi) are independent, identically distributed (iid) random variables. Denoting \(\delta _{0}=(\alpha _{0}^{\prime },\beta _{0}^{\prime })'\in \mathrm {A}\times \mathrm {B}\equiv {\Delta }\subset \Re ^{k}\), a compact parameter set, the (full) log likelihood can be written as,

$$ \begin{array}{@{}rcl@{}} L_{n}(\delta)\!\!\!&=&\!\!\!n^{-1}\sum\limits_{i=1}^{n}\ell_{i}(\delta)=n^{-1}\sum\limits_{i=1}^{n}\left[I(M_{i}<\stackrel{\leftrightarrow}{M})\textit{log}\Pr(Y_{i},M_{i}|W_{i};\delta)\right. \\ &&\left.+I(M_{i}=\stackrel{\leftrightarrow}{M})\textit{log}\Pr(M_{i}|W_{i};\delta)\right]\\ \!\!\!&=&\!\!\!n^{-1}\sum\limits_{i=1}^{n}\left\{ I(M_{i}<\stackrel{\leftrightarrow}{M})\textit{log}[f_{i}(\delta)^{Y_{i}}g_{i}(\delta)^{1-Y_{i}}]\right.\\&&\quad\qquad\left.+I(M_{i}=\stackrel{\leftrightarrow}{M})\textit{log}[f_{i}(\delta)+g_{i}(\delta)]\right\} \end{array} $$
(4.2)

where \(f_{i}(\delta )=\Pr (Y_{i}=1,M_{i}|W_{i};\delta )\) and \(g_{i}(\delta )=\Pr (Y_{i}=0,M_{i}|W_{i};\delta )\).

There exist many optimization algorithms for maximizing the likelihood defined by Eq. 4.2. In our empirical study we applied the Newton-Raphson algorithm, which we describe briefly for later use. Denote, \(S_{n}(\delta )=n^{-1}{\sum }_{i=1}^{n}\nabla _{\delta i}\ell _{i}(\delta )\), where ∇δi is the gradient operator with respect to δ and let Dn(δ) \(=n^{-1}{\sum }_{i=1}^{n}\nabla _{\delta i}^{(2)}\ell _{i}(\delta )\), represent the corresponding k × k Hessian matrix. Starting with some initial value δ0, the Newton-Raphson recursive algorithm is,

$$ \delta^{j}=\delta^{j-1}+D_{n}^{-1}S_{n}(\delta^{j-1}), j=1,2... $$
(4.3)

The iterations continue until ||δj+ 1δj|| < ξ for some small positive value ξ, where ||⋅|| is the Euclidean norm. Denote, \(H_{1}(\alpha )=\Pr (Y|X;\alpha )\), \(H_{2}(\beta )=\Pr (M|Y,Z;\beta ), f_{M}(\delta )=\Pr (Y=1,M|W;\delta ), g_{M}(\delta )=\Pr (Y=0,M|W;\delta )\) and \(C_{n}(\delta )=n^{-1}{\sum }_{i=1}^{n}\nabla _{\delta i}\ell _{i}(\delta )\nabla ^{\prime }_{\delta i}\ell _{i}(\delta )\). Let, C0n = Cn(δ0) and D0n = Dn(δ0).

In what follows we define regularity conditions for the identifiability and asymptotic properties of the MLE \(\hat {\delta }_{n}\). For this, we assume that as the total sample size n increases, the number of sampled units in each of the modes, except possibly the nonresponse also increases, in the sense that \(n_{m}\ge q_{m}\textit {n} ; {\sum }_{m=1}^{\stackrel {\leftrightarrow }{M}}n_{m}=n\), for some fixed constants {qm}.

  • A1: The elements of W are bounded almost surely (a.s.) and the functions H1(α) and H2(β) are identifiable and positive uniformly over A and B a.s. (H1(α) is positive uniformly over A if \(\mathrm {P}[\min \limits _{\alpha \in \mathrm {A}}H_{1}(\alpha )>0]=1\)).

  • A2: The covariates in X (or in Z) contain at least one continuous variable Xν(or Zν) not contained in Z (X). The derivative of H1(α)[H2(β)] with respect to this covariate exists and is positive uniformly over A(B) a.s.

  • A3: Assuming the existence of Xν as above, if αα, then \(\log H_{1}(\alpha )-\log H_{1}(\alpha ^{*})=h_{1}(X,\alpha ,\alpha ^{*})\) satisfies h1(⋅)/Xv≠ 0 a.s. Similarly, Assuming the existence of Zν, if ββ, then \(\log H_{2}(\beta )-\log H_{2}^ {}(\beta ^{*})=h_{2}(Z,\beta ,\beta ^{*})\) satisfies h2(⋅)/Zv≠ 0 a.s. The functions \(f_{\stackrel {\leftrightarrow }{M}}(\delta )\) and \(g_{\stackrel {\leftrightarrow }{M}}(\delta )\) are linearly independent.

  • A4: δ0 is an interior vector of Δ, and D0n is positive definite for sufficiently large n.

The first three conditions are needed to prove the identifiability and strong consistency of \(\hat {\delta }_{n}\), the maximum likelihood estimator (MLE) of δ. A similar condition to A2 is used in Follmann and Lambert (1991) for the case of a mixture of logistic models with constant mixing probabilities, and in Pfeffermann and Landsman (2011) for modelling non-ignorable assignments in observational studies. The last condition is needed for showing that \(\hat {\delta }_{n}\) is \(\sqrt {n}\) consistent and asymptotically normal (CAN). Note that for the logit and probit functions, the conditions A1, A3 and the second part of A4 hold, if the elements of W are linearly independent.

In what follows, →a.s.defines convergence almost surely and →D defines convergence in distribution.

Theorem 1.

Under the conditions (A1)-(A3), (i) The likelihood (4.2) is identifiable and \(\hat {\delta }_{n}\to _{a.s.}\delta _{0}\). If, in addition, (A4) also holds, then (ii) \(C_{n}^{-1/2}D_{n}\sqrt {n}(\hat {\delta }_{n}-\delta _{0})\to _{D}N(0,I_{k})\) where Ik is the identity matrix of order k.

4.2 Model with Measurement Effects

Next consider the model defined by Eq. 3.2, which accounts also for measurement effects, in which case the observed measurement, y, may differ from the true outcome, Y. Denoting as before Di = I(yi = Yi), we may write,

$$ \Pr(y_{i},M_{i}|D_{i},W_{i};\delta)=\left\{ \begin{array}{ll} {\Pr(Y_{i},M_{i}|W_{i};\delta_{0})} & {\text{if} D_{i}=1}\\ {\Pr(1-Y_{i},M_{i}|W_{i};\delta_{0})} & {\text{if} D_{i}=0} \end{array}\right., $$
(4.4)

where δ0 ∈A ×B = Δ is the true parameter vector.

Notice that the models for \(\Pr (Y_{i}|\mathrm {X}_{i};\alpha _{0})\) and \(\Pr (M_{i}|Y_{i},\mathrm {Z}_{i};\beta _{0})\) are unchanged but we need to model,

$$ p_{ij}(\gamma_{0}) =\Pr(D_{i}=1|Y_{i}=\textit{j},W_{i},M_{i};\gamma_{0}), M_{i}\ne\stackrel{\leftrightarrow}{M}, j=0,1, $$
(4.5)

where γ0 ∈Γ is the true parameter vector.

Denote \(\theta =(\delta ^{\prime },\gamma ^{\prime })^{\prime }\) and 𝜃0 ∈Θ = Δ ×Γ⊂Rs, a compact set where \(s=k+\dim (\gamma _{0})\). By Eqs. 4.4 and 4.5, the log-likelihood is now given by,

$$ \begin{array}{lll} \tilde{L}_{n}(\theta)\!\!\!\!&=&\!\!\!\!n^{-1}{\sum}_{i=1}^{n}\tilde{\ell}_{i}(\theta)=n^{-1}{\sum}_{i=1}^{n}\left[I(M_{i}<\stackrel{\leftrightarrow}{M})y_{i}\{\log[p_{i1}(\gamma)f_{i}(\delta)\right.\\&&+(1-p_{i0}(\gamma))g_{i}(\delta)]\}+(1-y_{i})\{\log[p_{i0}(\gamma)g_{i}(\delta)\\&&\left.+(1-p_{i1}(\gamma))f_{i}(\delta)]\}+I(M_{i}=\stackrel{\leftrightarrow}{M})\log[f_{i}(\delta)+g_{i}(\delta)]\right]. \end{array} $$
(4.6)

For \(M_{i}\ne \stackrel {\leftrightarrow }{M}\), denote \(h_{3}(\gamma )=({h_{3}^{0}}(\gamma ),{h_{3}^{1}}(\gamma ) )' \); \({h_{3}^{j}}(\gamma ) =\Pr (D=1|Y=\textit {j},W,M;\gamma )\) j = 0,1 and define \(\tilde {C}_{n}(\theta )\) and \(\tilde {D}_{n}(\theta )\) in a similar manner as for the model with no measurement effects but with \(\tilde {\ell }_{i}(\theta )\) replacing i(δ). Let \(\tilde {C}_{0n}=\tilde {C}_{n}(\theta _{0})\), \(\tilde {D}_{0n}=\tilde {D}_{n}(\theta _{0})\) and \(\hat {\theta }_{n}\) the MLE maximizing the likelihood 4.6.

Suppose the following regularity conditions:

  • B1: Conditions (A1)-(A3) in Section 4.1 hold, the functions fM(δ) and gM(δ) are linearly independent also for \(M_{i}\ne \stackrel {\leftrightarrow }{M}\) and h3(γ) is identifiable over Γ.

  • B2: Condition (A4) holds; γ0 is interior to Γ and \(\tilde {D}_{0n}\) is positive definite for sufficiently large n.

Theorem 2.

Under B1, (i) The likelihood (4.6) is identifiable and \(\hat {\theta }_{n}\)a.s.𝜃0. If, in addition, B2 also holds, then (ii), \(\tilde {C}_{0n}^{-1/2}\tilde {D}_{0n}\sqrt {n}(\hat {\theta }_{n}-\theta _{0}) {\mathop {\to }\limits ^{D}}\) N(0,Is).

4.3 The EM Algorithm

To maximize the likelihood (4.6), we apply the iterative EM algorithm (Dempster et al. 1977) which, as shown below, is particularly convenient in the present context. The algorithm has been developed for computing the MLE in cases of incomplete data, which is what happens in our case in the presence of measurement effects, where the true outcomes are unknown.

The key idea underlying the EM algorithm is to add latent variables to the observed data and define a modified likelihood as a function of the observed data and the values of the latent variables. Denote by 𝜃l the parameter estimates at the l-th iteration. The algorithm cycles between two states. In the first state, it calculates the expected value of the latent variables given the observed data and 𝜃l, denoted by Si(𝜃l). Next, the latent variables in the modified likelihood are replaced by their expected values, thus resulting in a “likelihood” denoted by M(𝜃,𝜃l), which depends on Si(𝜃l) and is much easier to optimize than the original log likelihood (4.6). This step is called the estimation- E-step. In the second state, the likelihood M(𝜃,𝜃l) is maximized with respect to the unknown parameters, yielding the estimate, 𝜃l+ 1. This step is called the maximization- M-step.

In order to implement the algorithm in our case, we define Di = I(yi = Yi) to be the latent variable values. By Eqs. 4.5 and 4.6, the modified ith log-likelihood is,

$$ \begin{array}{l} \tilde{\tilde{\ell}}_{i}(y_{i},M_{i},D_{i}|W_{i};\theta)=I(M_{i}<\stackrel{\leftrightarrow}{M})\log[\left( p_{i1}(\gamma)f_{i}(\delta)\right)^{D_{i}y_{i}}\\ \left( (1-p_{i0}(\gamma))g_{i}(\delta)\right)^{(1-D_{i})y_{i}}\\ \times\left( p_{i0}(\gamma)g_{i}(\delta)\right)^{D_{i}(1-y_{i})}\left( (1-p_{i1}(\gamma))f_{i}(\delta)\right)^{(1-D_{i})(1-y_{i})}]\\+I(M_{i}=\stackrel{\leftrightarrow}{M})\log(f_{i}(\delta)+g_{i}(\delta)). \end{array} $$
(4.7)

Taking the expectation of Di given (yi,Mi) with 𝜃 = 𝜃l yields,

$$ \begin{array}{@{}rcl@{}} S_{i}(\theta^{l}) \!\!\!\!&=&\!\!\!\!\Pr(D_{i}|y_{i},M_{i},W_{i};\theta^{l})=\frac{\Pr(D_{i},y_{i}|M_{i},W_{i};\theta^{l})}{\Pr(y_{i}|M_{i},W_{i};\theta^{l})}.\\ \!\!\!\!&=&\!\!\!\!\frac{[p_{i1}(\gamma^{l})f_{i}(\delta^{l})]^{y_{i}}[p_{i0}(\gamma^{l})g_{i}(\delta^{l})]^{(1-y_{i})}}{\left[\left( p_{i1}(\gamma^{l})f_{i}(\delta^{l})\right)+\left( (1\! -\! p_{i0}(\gamma^{l}))g_{i}(\delta^{l})\right)\right]^{y_{i}}[p_{i0}(\gamma^{l})g_{i}(\delta^{l})+(1-p_{i1}(\gamma^{l}))f_{i}(\delta^{l})]^{(1-y_{i})}}.\\ \end{array} $$
(4.8)

This defines the E-step. For defining the M-step, denote the ith log likelihood in Eq. 4.2 for the case of no measurement effects as \(\ell _{i}(A_{i}^{(1)},A_{i}^{(2)},M_{i},W_{i},\delta ^{l})=\ell _{i}(\delta ^{l})\), where \(A_{i}^{(1)}=Y_{i}\) and \(A_{i}^{(2)}=1-Y_{i}\). By Eqs. 4.7 and 4.8 and some simple calculations,

$$ \begin{array}{@{}rcl@{}} M(\theta,\theta^{l}) &=&{\sum}_{i=1}^{n}E\tilde{\tilde{\ell}}(y_{i},M_{i},D_{i}|W_{i},\theta)\\ &=&\sum\limits_{i=1}^{n}\{\ell_{i}[A_{i}^{(1)}(\theta^{l}),A_{i}^{(2)}(\theta^{l}),M_{i},W_{i},\delta]+\stackrel{\frown}{\ell}_{i}(\gamma;\theta^{l})\} \end{array} $$
(4.9)

where, denoting Si = Si(𝜃l),

$$ \begin{array}{@{}rcl@{}} \overset{\frown}{\ell}_{i}(\gamma;\theta^{l})&=&I(M_{i}<\stackrel{\leftrightarrow}{M})\ln p_{i1}(\gamma)^{y_{i}S_{i}}(1-p_{i0}(\gamma))^{y_{i}(1-S_{i})}p_{i0}(\gamma)^{(1-y_{i})S_{i}}\\&&(1-p_{i1}(\gamma))^{(1-y_{i})(1-S_{i})},\\ A_{i}^{(1)}(\theta^{l})&=&y_{i}S_{i}+(1-y_{i})(1-S_{i}),A_{i}^{(2)}(\theta^{l})=y_{i}(1-S_{i})\\&&+(1-y_{i})S_{i}. \end{array} $$

The “likelihood” Eq. 4.9 is seen to be the sum of two terms, one of only δ and the other of only γ. Thus, the maximization over 𝜃 is obtained by maximizing each term separately, simplifying the maximization process substantially.

In our empirical study we set the initial values by drawing many values from a broad uniform prior distribution around \(\theta ^{0}=(\hat {\delta }_{n}^{A},0)\), where \(\hat {\delta }_{n}^{A}\) is the MLE of the model without measurement effects, and use the value that maximizes the log-likelihood as the starting initial value.

5 Prediction of Finite Population Means

Replacing the unknown model parameters by their MLE permits estimating the true population mean (proportion), \(\bar {Y}_{(P)}={\sum }_{j=1}^{N}Y_{j}\). Denote,

$$ \begin{array}{@{}rcl@{}} \hat{\rho}_{i} &=&\Pr(Y_{i}=1|\mathrm{X}_{i};\hat{\alpha}); \hat{\tau}_{1,im}=\Pr(Y_{i}=1,M_{i}=m|W_{i};\hat{\delta}), \hat{\tau}_{2,im}\\&&=\Pr(M_{i}=m|W_{i};\hat{\delta}), \end{array} $$
(5.1)

where \(\hat {\delta }^{\prime }=(\hat {\alpha }^{\prime },\hat {\beta }^{\prime })\) denotes the MLE. For the case where the covariates{Xi} are known for all the population units, we estimate,

$$ \hat{\bar{Y}}_{({Model})}=N^{-1}\sum\limits_{i=1}^{N}\hat{\rho}_{i}. $$
(5.2)

Remark 1.

As long as the model assumed for the population values holds also for the sample, the use of Eq. 5.2 does not require probability sampling. Furthermore, if the sample is deemed to be informative in the sense that the inclusion in the sample depends on the true outcome variable, one may extract the population model from the model holding for the sample data, by use of the relationship between the population and sample models, as developed in Pfeffermann and Sverchkov (1999). This requires to model also the probability \(\Pr (i\in S|Y_{i},X_{i};\eta )\).

When the covariates are known for only the sampled units and the population size is unknown, we may use a modification of the Horvitz-Thompson (HT) estimator, i.e.,

$$ \hat{\bar{Y}}_{({HT}, {Model})}=\sum\limits_{i=1}^{n}\pi_{i}^{-1}\hat{\rho}_{i}/\sum\limits_{i=1}^{n}\pi_{i}^{-1}. $$
(5.3)

Assuming that each population unit can be classified by his or her preferred mode, we can also estimate the population mean for each mode, which as discussed in the introduction, is often of important interest on its own. For the case where Wi is known for every jU, we estimate,

$$ \hat{\bar{Y}}_{(Pm)}=\hat{N}_{m}^{-1}\sum\limits_{i=1}^{N}\hat{\tau}_{1,im}; \hat{N}_{m}=\sum\limits_{i=1}^{N}\hat{\tau}_{2,im}. $$
(5.4)

Otherwise,

$$ \hat{\bar{Y}}_{Pm,HT}=\hat{N}_{m,HT}^{-1}\sum\limits_{i=1}^{n}\pi_{i}^{-1}\hat{\tau}_{1,im}; \hat{N}_{m,HT}=\sum\limits_{i=1}^{n}\pi_{i}^{-1}\hat{\tau}_{2,im}. $$
(5.5)

It is easy to show that under the conditions of Theorem 2, the predictors defined by Eqs. 5.15.4 are \(\sqrt {n}\)-consistent for the corresponding true population means, under an appropriate asymptotic framework for finite population sampling. See Isaki and Fuller (1982) for such a framework.

Remark 2.

The predictors 5.15.5 are computed the same way under both the models with, and without measurement effects.

6 Model Evaluation

6.1 The Hosmer-Lemeshow Test

The predictors developed in the previous sections are model-dependent and as such, the model used for their construction needs to be tested. Many goodness-of-fit test procedures under the frequentist approach for continuous outcomes have been proposed in the literature. See, e.g., Pfeffermann and Landsman (2011) and Pfeffermann and Sikov (2011) for review and references. In our empirical study we consider the case of binary outcomes generated from logistic models and we apply the Hosmer Lemeshow (HL, 1980, 2000) goodness-of-fit test, which is in common use for testing models for binary data.

The test statistic compares within pre-specified groups the number of observed successes (yi = 1), with the number of expected successes, as predicted under the estimated model. For this, the data are first ordered according to the predicted probability of success under the model evaluated. Next, the units are grouped based on the ordering into a certain number of groups of approximately equal size, (G = 10 groups is common), and within each group the estimated expected number of successes, (the sum of the predicted probabilities of success), is compared to the observed number of successes. The test statistic is,

$$ HL=\sum\limits_{g=1}^{G}\frac{(O_{g}-n_{g}\bar{\hat{p}}_{g})^{2}}{n_{g}\bar{\hat{p}}_{g}(1-\bar{\hat{p}}_{g})}\sim\chi_{(G-2)}^{2}, $$
(6.1)

where Og is the number of observed successes in group g, ng is the number of units in the group and \(\bar {\hat {p}}_{g}=n_{g}^{-1}{\sum }_{i=1}^{n_{g}}\hat {p}_{gi}\) is the mean of the estimated probabilities of success. Hosmer and Lemeshow (1980) found through empirical studies under a much simpler setup than in our case that the test statistic (6.1) follows approximately the χ2 distribution with (G − 2) degrees of freedom, under the null hypothesis that the model fits the data.

6.2 Normalized Likelihood Ratio (N-LR) Test for Model Selection

A standard test of the null hypothesis that two models, one nested within the other, fit the data “equally well”, is the likelihood ratio test. We apply the test (with certain correction, see below), for testing the null hypothesis that accounting for measurement effects in the extended model (3.2) does not improve the goodness-of-fit compared to the model (3.1) with only selection effects, or more formally, for testing that there are no measurement effects.

Distinguishing the models without and with measurement effects by the superscripts A and B respectively, the standard test statistic is,

$$ LR_{n}=-2[{l_{n}^{A}}(\hat{\delta})-{l_{n}^{B}}(\hat{\theta})], $$
(6.2)

where \({l_{n}^{A}}(\hat {\delta })={\sum }_{i=1}^{n}{\ell _{i}^{A}}(\hat {\delta })\) and \({l_{n}^{B}}(\hat {\theta })={\sum }_{i=1}^{n}{\ell _{i}^{B}}(\hat {\delta })\) are the corresponding log-likelihoods computed with the MLEs, as obtained under the two models (Eqs. 4.2 and 4.6). However, unlike in standard problems where the nested model is obtained from the extended model by nullifying certain parameters, this is not the case in our application where we restrict to logistic models, since \([1+\exp (-t)]^{-1}\ne 0\) for all t in a compact set and hence, the model A without measurement effects is not nested in the model B with them. To deal with this problem, Vuong (1989) proposed a normalized likelihood ratio test (N-LR) defined as,

$$ LR_{nor}=\frac{LR_{n}}{\hat{\omega}_{n}}, $$
(6.3)

where \(\hat {\omega }_{n}^{2}=4n\left [n^{-1}{\sum }_{i=1}^{n}[{l_{i}^{A}}(\hat {\delta })-{l_{i}^{B}}(\hat {\theta })]^{2}-(n^{-1}LR_{n})^{2}\right ]\) is an estimator of the variance of LRn. The author shows that under the assumption that the true parameter vector is an interior point and some other regularity conditions, the asymptotic distribution of LRnor is normal. However, when model A is correct, one would expect that some of the parameters γ which define the probability of the measurement effects lie on the boundary of the assumed parameter set. In this case, Vuong’s condition that the true parameter vector is an interior point is violated. A similar problem is demonstrated in Wilson (2015). In the empirical study we approximated the distribution of the statistic LRnor by parametric bootstrap.

7 Empirical Results Based on Simulations

7.1 Design of Simulation Study

In order to assess the performance of our proposed approach, we designed a simulation study which consists of the following sages:

  1. 1.

    Generate a population of size N = 105 with values of three covariates, X1i,X2i,X3i, generated independently from a Beta(2,5) distribution.

  2. 2.

    Generate a binary outcome Yi from the logistic distribution;

    $$ \Pr(Y_{i}=1|X_{1i};\alpha)=\text{logit}^{-1}(\alpha_{0}+\alpha_{1}X_{1i}), \textit{i}=1,...,\textit{N}. $$
    (7.1)
  3. 3.

    Classify the population units into four modes with probabilities,

    $$ \begin{array}{l} {\Pr(M_{i}=m|Y_{i},X_{2i};\beta_{m})=\frac{\exp(\beta_{0m}+\beta_{1m}X_{2i}+\beta_{2m}Y_{i})}{1+{\sum}_{m=1}^{3}\exp(\beta_{0m}+\beta_{1m}X_{2i}+\beta_{2m}Y_{i})}, m=1,2,3}\\ {\Pr(M_{i}=4|Y_{i},X_{2i};\beta_{m})=1-{\sum}_{m=1}^{3}\Pr(M_{i}=m|Y_{i},X_{i};\beta_{m}),} \end{array} $$
    (7.2)

where \(M_{i}=4=\stackrel {\leftrightarrow }{M}\) defines the “mode” of nonresponse. Notice that the model assumes NMAR nonresponse as the probability not to respond depends directly on the outcome. The parameter values in Eqs. 7.1 and 7.2 were set such that the relative population sizes in the 4 modes are approximately (10%, 25%, 40%, 25%).

The population in Stages 1-3 has been generated only once, such that the assessment of the performance of the proposed methodology is “design-based”, over all possible sample selections from a fixed population. Alternatively, one could generate many populations and draw samples from each population, but with a population of size N = 105, this would not make much difference.

  1. 4.

    Draw K = 1000 samples of size n = 5000 by simple random sampling without replacement from the population obtained in 1-3.

  2. 5.

    Generate measured values for the sampled units from the model,

    $$ \begin{array}{@{}rcl@{}} &&\Pr(D_{i}=1|Y_{i},M_{i}=m,X_{3i};\gamma_{m})\\&&=\text{logit}^{-1}(\gamma_{0m}+\gamma_{1m}X_{3i}+\gamma_{2m}Y_{i}) , m=1,2,3. \end{array} $$
    (7.3)
  3. 6.

    Estimate the model coefficients for each sample by maximizing the likelihood function for the respective model (with or without measurement effects). We used the Newton Raphson algorithm (4.3) for estimating the coefficients of the model without measurement effects (hereafter model A), and the EM algorithm described in section (4.3) for the model with measurement effects (hereafter model B). We used parametric bootstrap for estimating the standard errors (S.E.) of the model coefficients and of the estimators of the true population mean, with 100 bootstrap samples for each parent sample. Estimating the S.E. of the model coefficients by use of the inverse information matrix turned out to be unstable, with occasional very extreme and even negative variance estimators. The computer code for running the simulation study (and for the applications with real data in Sections 8 and 9) has been written in MATLAB version 9.5.

7.2 Simulation Results

Table 1 contains the mean of the estimated coefficients (Mean Est.), along with their empirical S.E., for the models without (model A) and with (model B) measurement effects. The empirical S.E. are the standard deviations of the estimates over the 1,000 samples divided by \(\sqrt {1,000}\). We also computed the means of the estimated bootstrap S.E. but they are not shown since they are very close to the empirical S.E. for all the coefficients, under both the models A and B. We note in this respect that the measurement effects under Model B are not negligible. We computed for the first of the 1,000 samples the probabilities of measurement effects; \(a_{i}=\Pr (D_{i}=0|Y_{i},\textit {X}_{2i}X_{3i})\) and found that Min(ai)= 0.01, Q1(ai)= 0.12, Mean (ai)= 0.36, Q3(ai)= 0.59, Max(ai)= 0.99. (Q1 and Q3 are the 1st and 3rd quarters.)

Table 1 True coefficients, means of estimated coefficients and empirical standard deviations (S.E.) of estimates

As seen in Table 1, the mean estimates under both models are very close to the corresponding true coefficients with very small standard errors, although it should be noted that in most cases the differences are significant when tested by use of the standard t-statistic. As expected, the empirical S.E. under Model B are somewhat higher than the corresponding S.E. under Model A, as results from the existence of measurement effects under Model B. As mentioned above, the means of the bootstrap S.E. estimators (not shown) are very close to the corresponding empirical S.E.

To save in space, in what follows we restrict to only the results obtained under Model B with the measurement effects. Similar (and somewhat better) results have been obtained under Model A for which the observed values are the same as the true values, with no errors.

Table 2 shows the results obtained when predicting the population means and sizes for the various modes, using the predictors defined by Eqs. 5.25.5. The predictor \(\bar {y}_{S}\) is the HT estimator when ignoring the mode effects, using all the measurements including for the non-respondents. (Reduces to the simple sample mean based on all the sample units under simple random sampling).

Table 2 Mean predictions of population and sub-population means and sizes, and empirical S.E. (in parenthesis). Model B

Table 2 shows very good performance of the predictors, despite the existence of nonignorable measurement effects. The HT estimators have slightly larger S.E. than the model-based predictors, which of course is expected since the latter predictors use the information about the population covariates, not used by the HT estimator. Notice how well the size and true mean of the non-respondents (m = 4) are predicted, even with only the HT estimator. Finally, the use of the estimator \(\bar {y}_{S}\) (simple sample mean in our case), which ignores the mode effects is seen to be biased, illustrating that mode effects cannot be ignored when estimating population means. (When ignoring also the nonresponse and computing the mean of only the observed measurements, the corresponding estimator is \(\bar {y}_{S}=0.52\).)

7.3 Model Evaluation

7.3.1 Distribution of Hosmer-Lemeshow Statistic Under Model B

To illustrate the distribution of the Hosmer-Lemeshow (HL) we drew 1,000 new samples of size 5,000 without replacement from the same population and subjected the true outcomes of the responding units to measurement effects via the model (7.3). Figure 1 shows the smoothed empirical density of the test statistic for G = 10 nearly equal-size groups over the 1,000 samples. Recall that the HL statistic is supposed to have a \(\chi ^{2}{~}_{(8)}\) distribution under correct model specification (Model B in our case). As can be seen, the two densities are indeed very close, supporting the conjecture that the true distribution is indeed \(\chi ^{2}{~}_{(8)}\).

Figure 1
figure 1

Empirical distribution of HL statistic under model B. Nonparametric density estimation. Comparison to \(\chi ^{2}{~}_{(8)}\) density

In order to study the power of HL statistic, we assume under the null hypothesis that the model for the mode selection probabilities is as defined by Eq. 7.2, where in fact we generated the modes using the model,

$$ \begin{array}{l} {\Pr(M_{i}=m|Y_{i},X_{2i};\beta_{m},\eta)=\frac{\exp(\beta_{0m}+\beta_{1m}Y_{i}+\beta_{2m}X_{2i}+\eta X_{2i}^{2})}{1+{\sum}_{j=1}^{M}\exp(\beta_{0m}+\beta_{1m}Y_{i}+\beta_{2m}X_{2i}+\eta X_{2i}^{2})},m=1,2,3}\\ {\Pr(M_{i}=4|Y_{i},X_{2i};\beta_{m},\eta)=1-{\sum}_{m=1}^{3}\Pr(M_{i}=m|Y_{i},X_{2i};\beta_{m},\eta).} \end{array} $$
(7.4)

(Compare with 7.2).

Figure 2 shows the empirical rejection rates when using the HL test for values η in the range η ∈ [0,0.2], based on 1000 samples of size n = 5,000 for each value η, generated by use of Eq. 7.4. For η = 0, (correct model specification), the rejection rate is close to the nominal size of 0.05, as it should be. As η increases (the assumed model is further away from the correct model), the rejection rates increase monotonically, reaching powers ≥ 0.8, already for η ≥ 0.15.

Figure 2
figure 2

Rejection rates with significance level of 0.05, when true mode selection probabilities are defined by Eq. 7.4. 1000 samples, η ∈ [0,0.2]

7.3.2 Distribution of the N-LR Statistic Under the Correct Model

The purpose of this section is to illustrate that we can approximate the distribution of LRnor, the N-LR test statistic as defined in Eq. 6.3 by parametric bootstrap. For this, we generated 1,000 samples of size 5,000 under Model A (Eqs. 7.17.2) with the same true parameters as before, and other 1,000 samples of size 5,000 under model B. The first set of samples allow us to assess the approximation of the distribution under the null hypothesis of no measurement effects, while the second set allows to assess the distribution of the test when there exist measurement effects. For this, we followed the following steps:

  1. 1)

    Estimate the models A and B for each of the 2,000 samples and compute the LRnor test statistic. At the end of this step we have 1,000 observations of the test statistic with data obeying the null hypothesis of no measurement effects, and 1,000 observations of the test statistic with data containing measurement effects as under the alternative hypothesis, allowing us to estimate the true distributions under the two hypotheses by the corresponding empirical distributions.

  2. 2)

    Generate for each of the first 10 samples generated under Model A, 1000 parametric bootstrap samples of size 5,000 under the model A, using the corresponding estimated coefficients from the parent sample. Re-estimate the two models for each bootstrap sample and compute the LRnor statistic. This step generates 10,000 realizations form the bootstrap distribution of the test statistic when the null hypothesis of no measurement effects holds.

  3. 3)

    Repeat Step 2 but this time by generating the bootstrap samples from Model B, as estimated for each of the first 10 parent samples generated under this model. This step generates 10,000 realizations form the bootstrap distribution of the test statistic under the alternative hypothesis that measurement effects exist.

Figures 3 and 4 compare the empirical distributions under the two hypotheses with the corresponding bootstrap approximations, showing sufficiently close fit in both cases. Notice that when the actual observed data conform with Model A, it is the first bootstrap distribution which will practically be used for testing the null hypothesis of no measurement effects. When the observed data conform with Model B, it is the second bootstrap distribution which will practically be used.

Figure 3
figure 3

Empirical and Bootstrap distributions of the normalized LR statistic under Model A

Figure 4
figure 4

Empirical and Bootstrap distributions of the normalized LR statistic under Model B

8 Empirical Results for a Real Mixed-Mode Survey

8.1 Example of Measurement Effects

As explained in the introduction, the term measurement effect refers to the case where a sampled unit responds differently, depending on the mode of response. In the Agriculture Census of Israel carried out in 2018, 210 farmers (out of about 17,000) happened to respond both by Telephone and via the internet, after receiving mistakenly a reminder to respond via the Internet, even though they already responded by telephone. Table 3 summarizes the results obtained for two of the questions asked in the census: number (#) of workers in the farm, and total cultivated area. Out of the 210 farmers, 131 farmers responded the same way on Question 1 and 139 farmers responded the same way on Question 2. The notation T>I (T<I) defines the farmers with the higher (lower) responses on telephone than on the internet.

Table 3 Mean responses obtained by farmers responding both by telephone and internet: overall and separately for farmers with T>I and T<I

The figures in the table indicate big differences in the answers of the farmers that provided different answers by the two modes (about one third of the farmers responding by the two modes). In this example the measurement effects cancel out when computing the means, but there is no guarantee that this always happens, and proper models need to be used to account successfully for the measurement effects. See next section.

8.2 Accounting for Mode Effects in a Real Mixed-Mode Survey

In this section we illustrate the performance of our proposed approach by using data collected as part of the annual crime victimization survey in 2017, administered by the Israel Central Bureau of Statistics (ICBS). The survey collects information on victimization with respect to a variety of crimes, as well as socio-demographic information. Similar surveys are carried out by national statistical offices throughout the world. The sample is drawn by probability sampling, and the sampled units can respond using either the internet, or by telephone, implying that we have 3 modes, with the third mode defined by non-respondents. The total sample size is n = 7035, with 11% responding via the internet (I), 60% by telephone (T) and 29% not responding (NR).

Although not a primary variable of interest in this survey, we chose as the target outcome variable the binary variable Yi, taking the value “1” if unit i has an academic degree (Bachelor or higher). The aim is to predict the true population proportion of persons with academic degree. The reason for this choice is that the ICBS has an extensive register of education, with population coverage of over than 95%, so that we can assess the performance of our method by comparing the predictors to the “truth”. The register is complete for only 2016, but we don’t expect significant differences between the two years. We confined our analysis to persons aged 20+ and based on the register, the true proportion is \(\bar {Y}_{(P)}=0.24\). On the other hand, 41.4% of the internet respondents and 23.5% of the telephone respondents in the sample have an academic degree, indicating the existence of a mode selection effect, and possibly also measurement effects.

In what follows we present and discuss the results of our study. We included among the covariates (X) the variables gender (gen; male= 1), age (in years), and country of birth (cob; Israel= 1, other countries= 0). In accordance with the general methodology outlined in Sections 35, and the models and notation in Section 7, we fitted the following models:

$$ \Pr(Y_{i}=1|X_{i};\alpha)=\text{logit}^{-1}(\alpha_{0}+\alpha_{1}age_{i}+\alpha_{2}cob_{i}+\alpha_{3}gen_{i}) $$
(8.1)
$$ \begin{array}{@{}rcl@{}} &&{}\Pr(M = m|X_{i},Y_{i};\beta) = \frac{\exp(\beta_{0m}+\beta_{1m}Y_{i}+\beta_{2m}cob_{i}+\beta_{3m}gen_{i})}{1+{\sum}_{m=1}^{2}\exp(\beta_{0m}+\beta_{1m}Y_{i}+\beta_{2m}Cob_{i}+\beta_{3m}gen_{i})},\\&&{}m=T,I \end{array} $$
(8.2)
$$ \Pr(M_{i}=NR|Y_{i},X_{i};\beta_{1},\beta_{2})=1-{\sum}_{m=I,T}^{~}\Pr(M_{i}=m|Y_{i},X_{i};\beta_{m}), $$
$$ \begin{array}{@{}rcl@{}} &&{}\Pr(D_{i} = 1|\textit{Y}_{i},M_{i} = m,X_{i};\gamma_{m}) = \text{logit}^{-1}\mathrm{(}\gamma_{0m}+\gamma_{1m}Y_{i}+\gamma_{2m} age_{i}+\gamma_{3m}gen_{i}),\\&&{}m=T,I \end{array} $$
(8.3)

Table 4 shows the estimated coefficients and the parametric bootstrap S.E. of the models (8.1) and (8.2), as obtained when fitting the models with- and without accounting for possible measurement effects (M.E.). Table 5 shows the estimated coefficients and S.E. of the model (8.3), which accounts for measurement effects. We also computed the means of the estimated coefficients over all the BS samples (not shown), and found that they are very close to the estimates obtained from the original sample, thus verifying the asymptotic unbiasedness of our ML estimators.

Table 4 Estimation of α and β coefficients (Eqs. 8.18.2), when fitting the model without- and with accounting for possible measurement effects (M.E.)
Table 5 Estimation of γ coefficients (8.3), when fitting the model which accounts for possible measurement effects

As can be seen, most of the coefficients are highly significant in both tables under the standard t-test. What we find a bit surprising is that the coefficient of Y (having academic degree) in the models for Mi = T|Xi,Yi;β and Mi = I|Xi,Yi;β is highly negative in all 4 models in Table 4, suggesting that with the other covariates held fixed, having an academic degree actually encourages nonresponse (the third mode). Also, for the model with measurement effects (Table 5), the coefficient of Y is positive and highly significant (but of much smaller magnitude), suggesting that having an academic degree increases the probability of misreporting. Including more covariates in the models could possibly resolve these, somehow unexpected outcomes.

Next, we study the performance of the model in predicting the true population proportion, which as noted in Section 8.1, is basically known for this application. We computed the following predictors:

  1. 1.

    \(\hat {\bar {Y}}_{(Model)}=N^{-1}{\sum }_{i=1}^{N}\hat {\rho }_{i}\); \(\hat {\rho }_{i}=\Pr (Y_{i}=1|\mathrm {X}_{i};\hat {\alpha })\) (uses the covariate information for all the population units) (5.2)

  2. 2.

    \(\hat {\bar {Y}}_{(\textit {HT,Model})}={\sum }_{i=1}^{n}\pi _{i}^{-1}\hat {\rho }_{i}/{\sum }_{i=1}^{n}\pi _{i}^{-1}\); \(\pi _{i}=\Pr (i\in S)\) (5.3)

  3. 3.

    \(\hat {\bar {Y}}_{HT,True}=\hat {N}^{-1}{\sum }_{i=1}^{n}\pi _{i}^{-1}Y_{i}^{True}; \hat {N}={\sum }_{i=1}^{n}\pi _{i}^{-1}\) (uses the true values Yi known from the education register).

  4. 4.

    \(\hat {\bar {Y}}_{HT,Adj}=\hat {N}_{Adj}^{-1}{\sum }_{i=1}^{n}\tilde {\pi }_{i}^{-1}y_{i}; \hat {N}_{Adj}={\sum }_{i=1}^{n}\tilde {\pi }_{i}^{-1}\), the HT estimator but with the standard base weights \(\{a_{i}=\pi _{i}^{-1}\}\) replaced by adjusted weights to account for the nonresponse.

  5. 5.

    \(\hat {\bar {Y}}_{HT,imp}=\hat {N}^{-1}{\sum }_{i=1}^{n}\pi _{i}^{-1}\tilde {Y}_{i}\); \(\tilde {Y}_{i}=y_{i}\) if unit i responds, \(\tilde {Y}_{i}=Y_{i,imp}\) if unit i does not respond. The imputation was carried out using the monotone imputation method of Rubin (1987, p. 172), based on the observed sample values yi.

The estimators obtained under the two models (with out and with the accounting for measurement errors), are shown in Table 6.

Table 6 Predictors of proportion of persons with academic degree

Computing the Hosmer-Lemeshow (HL) test discussed in Section 6.1 under the two models, and the normalized likelihood ratio (N-LR) test discussed in Section 6.2, yields (p-values in parentheses): HL(A) = 11.6(p-value = 0.17),HL(B) = 9.44(p-value = 0.31),LRnor = 0.21(p-value = 0.34).

The results of this study show very clearly that our proposed model-based predictors are much superior to the design-based estimators, which ignore the mode effects (\(\hat {\bar {Y}}_{HT,Adj},\hat {\bar {Y}}_{HT,imp}\)), despite the use of only three covariates for which the population values are known. (The estimator \(\hat {\bar {Y}}_{HT,True}=0.25\), which uses the correct values of the outcome variable indicates that the design-based estimator in the case of no measurement effects and nonresponse performs well.) Model B, which accounts for possible measurement effects seems to perform somewhat better than Model A, which assumes no measurement effects (note the relative high value of \(\hat {\bar {Y}}_{HT,Model}=0.28\) under Model A), although this is only partly reflected by the values of the two test statistics, suggesting that for the variable of having an academic degree in this survey, there are only small measurement effects, not detected by the two tests considered.

9 Dealing with Proxy Surveys as Mode Effects

As mentioned in the introduction, we propose dealing with the problem of proxy surveys via the methodology developed in the present article for dealing with mode effects. We illustrate our proposal using data collected as part of the Labor Force Survey (LFS), administered by ICBS. The LFS in Israel is a monthly survey with a 4- in, 8- out, 4- in, rotation pattern. The LFS is a proxy survey because every sampled person is asked to respond about himself (direct response), and about all other members of the household (proxy response). For the present illustration we use the data observed in all the months of 2018 for the first interview, which is carried out by a personal interview. To further reduce the overall sample size, we restrict to the Jewish population aged 20-40, yielding a sample of n= 19,820 persons.

We again use the binary variable Yi “having an academic degree”, as our target variable, thus allowing us to compare predictors of the population proportion of people with academic degree to the true proportion. Table 7 shows a few design-based estimates computed from the data, after modifying the base sampling weights to account for nonresponse. The estimates shown are:

Table 7 Proportion of persons with academic degree

\(\hat {\bar {Y}}_{NoM\text {odes}}\)- Standard HT estimator when ignoring the mode effects, but with the sampling weights adjusted for non-response,

\(\hat {\bar {Y}}_{Direct}\)- Design-based estimator using only the direct responses (with adjusted weights),

\(\hat {\bar {Y}}_{\text {Proxy}}\)- Design-based estimator using only the proxy responses (with adjusted weights).

As expected, the design-based estimator \(\hat {\bar {Y}}_{No\text {Modes}}\) that ignores the mode effects performs poorly. The estimator based on only the direct responses performs even worse but quite surprising, the estimator \(\hat {\bar {Y}}_{\text {Proxy}}\), which uses only the proxy responses performs relatively well. It seems therefore that when asked about the possession of an academic degree, the proxy responses are generally more accurate than the responses of interviewees responding about themselves. The importance of this outcome is in illustrating that it is not necessarily true that interviewees responding about themselves provide correct answers, or in a more general mode effects set up, that one can decide on a mode with correct answers. Recall from the Introduction that several methods proposed in the literature to deal with mode effects assume the existence (and knowledge) of a mode which provides unbiased estimators for the true population mean.

We now illustrate the use of our mode effects methodology to handle proxy survey problems. We consider 5 different “modes” as follows: proxy response- male (MP), proxy response- female (FP), direct response- male (MD), direct response- female (FD), nonresponse (NR). By MP we mean that the unit for which a proxy response is provided is a male and similarly for the other modes. Out of the total sample size n = 19,820, 16.8% responses have been obtianed by FD, 15% by MD, 30.1% by FP, 31.8 by MP and 6.3% did not respond. Denoting by Sm the sample of units responding by mode m, we estimated the population proportion for each of the modes using the ratio HT estimator with weights adjusted for nonresponse, \(\hat {\bar {Y}}_{HT,Adj}^{(m)}=\hat {N}_{Adj,m}^{-1}{\sum }_{i\in S_{m}}\tilde {\pi }_{i}^{-1}y_{i}; \hat {N}_{Adj,m}={\sum }_{i\in S_{m}}\tilde {\pi }_{i}^{-1}\) and found: \(\hat {\bar {Y}}_{HT,Adj}^{(MP)}=0.19\), \(\hat {\bar {Y}}_{HT,Adj}^{(FP)}=0.34,\hat {\bar {Y}}_{HT,Adj}^{(MD)}=0.37,\hat {\bar {Y}}_{HT,Adj}^{(FD)}=0.48\), suggesting the existence of mode effects. (Similar differences exist when using instead the true values of Y as known from the register.)

Next we consider our model-based predictors. We use as covariates age and years of study. (Years of study is known from the register, the gender of the interviewee is accounted for in the definition of the modes.) To save in space, we do not present the coefficients of the models (8.1)–(8.3) obtained in this case.

Table 8 shows again the true value and the different predictors obtained in this case. The notation is the same as before.

Table 8 Proportion of persons with academic degree. Model- and design-based estimators

The results in Table 8 show very good performance of the model-based predictors when fitting Model B, which accounts for possible measurement effects, with the estimator \(\hat {\bar {Y}}_{(Model)}\) that uses the population covariates yielding an almost perfect predictor. On the other hand, the predictors obtained under Model A, which assumes no measurement effects are clearly biased, indicating the existence of measurement effects in this application. This result is reinforced by the HL test statistics, rejecting the null hypothesis of Model A, HL(A) = 26.359(pvalue = 0.001) but not rejecting Model B, HL(B) = 7.24(pvalue = 0.51). Also, the N-LR test rejects Model A in favor of Model B, LRnor = 10.31(pvalue = 0.00).

The estimator \(\hat {\bar {Y}}_{HT,True}\), which uses the true Y -values from the education register performs very well, validating the sampling design and corresponding estimator, but the estimator \(\hat {\bar {Y}}_{HT,imp}\) which imputes the missing data for the nonrespondents based on the observed data and thus ignores the measurement effects, and the estimator \(\hat {\bar {Y}}_{HT,Adj}\), which attempts to correct for the nonresponse by modification of the sampling weights (but does not account for the measurement effects) perform poorly, over-estimating the true proportion by about 23%, the same as the model-based estimators under Model A. We conclude that in this application, there are large measurement effects, captured well under Model B, but not under Model A and the modified design-based estimators considered.

10 Summary

In this article we propose a new comprehensive model-based approach to deal with mode-effects, which is applied also to deal with proxy surveys; two major problems in survey sampling. Our approach addresses both selection- and measurement effects, underlying the possible mode-effects. Furthermore, we allow for not missing at random (NMAR) nonresponse, by considering the nonresponse as another mode. Unlike other approaches proposed in the literature, we do not assume that one of the modes provides unbiased predictors. The existence of such a mode is not guaranteed, and even if it exists, it is not clear how to determine which one it is. The approach is model-based but we cannot think of a proper design-based approach that can deal simultaneously with selection- and measurement effects and NMAR nonresponse, without very strong and generally untestable assumptions. In this article we restricted to binary outcome variables (fitting logistic models in the empirical illustrations), but the proposed approach can be extended to continuous outcomes, with proper modifications.

We propose simple test procedures for testing our model, and in particular, for testing the existence of measurement effects, which are seen to work well in the empirical studies, although more powerful tests can, and should be developed. When applied to proxy surveys, an interesting open question is how to define the different modes. In our empirical study we defined them in an “ad-hoc” manner, but a more founded methodology should be established. One possible way is to start with as many as possible modes, estimate the means or other characteristics of interest for each mode, and then collapse modes based on proper statistical analysis, so as to stabilize the final results.

The empirical results with the simulated and real data sets are promising and we encourage other researchers to test the approach with their data. We mention again that the approach is applicable in principle also to nonprobability samples, which become more and more popular in recent years with the availability of new “big data” sets.