Abstract
We propose a new, modelbased methodology to address two major problems in survey sampling: The first problem is known as mode effects, under which responses of sampled units possibly depend on the mode of response, whether by internet, telephone, personal interview, etc. The second problem is of proxy surveys, whereby sampled units respond not only about themselves but also for other sampled. For example, in many familiar household surveys, one member of the household provides information for all other members, possibly with measurement errors. Ignoring the existence of mode effects and/or possible measurement errors in proxy surveys could result in possible bias in point estimators and subsequent inference. Our approach accounts also for nonignorable nonresponse. We illustrate the proposed methodology by use of simulation experiments and real sample data, with known true population values.
Preface
I felt very happy and privileged when invited to submit a paper for the special issue of Sankhya A, celebrating the 100th birth anniversary of Prof. C. R. Rao. Professor Rao contributed, indirectly, a great deal to my research. In 1993 I published an article in the International Statistical Review entitled, “The Role of Sampling Weights when Modeling Survey Data”, Pfeffermann (1993). While working on this article, I came across a short discussion written by the late Professor Steve Fienberg in 1989, stating that “the one exception in which the use of weights may be appropriate is outcomebased sampling, where the sampling plan may be informative for the model of interest.” Professor Fienberg referred to an earlier article by Patil and Rao (1978), which shows how the sampling weights (inverse of the sample inclusion probabilities) feature in the distribution of the sample data in such cases, and how the sample distribution differs from the corresponding population distribution. One of the examples in that article was probability proportional to size (PPS) sampling. This whole area was new to me at the time, but I got really interested in it, and since then I published many articles with colleagues on the relationship between the population distribution, the sample distribution and the nonsample distribution, and how the latter two distributions can be used for inference about the population distribution and for imputation of missing data. See Pfeffermann (2017) for a unified theory with applications to informative sampling and nonignorable nonresponse, small area estimation, observational studies, web panels and more. The present article is another extension of this general theory.
Professor Rao contributed more directly to my work in 2016, when inviting me to coedit with him the 29^{th} Handbook of Statistics on Survey Samples. This turned out to be a fascinating experience and ended up with a twovolume handbook, containing 41 chapters spread over 1300 pages.
I am very grateful to Professor Rao for his indirect and direct contributions to my career, and I wish him many more years of happy and productive life, with good health.
Introduction
In modern sample surveys, sampled units often have the choice of how to respond, whether by telephone, personal interview, mail, fax, or via the Internet. Such surveys are nowadays very popular in many countries, called mixedmode surveys. See, e.g., De Leeuw (2018) for a recent comprehensive review. Sometimes, the different modes of response are offered sequentially. For example, when starting the survey, all the sampled units are encouraged to respond via the internet. Those who do not respond within a certain time period are approached by telephone and finally, those who couldn’t be contacted or refused to respond via the telephone, are approached for a personal interview.
The term modeeffect encompasses two confounded effects: selection effect  the effect of differences between characteristics of respondents preferring to respond with different modes and consequently, possible differences in the values of reported study variables of interest, and measurement effect  the effect of potentially responding differently by the same person, depending on the mode of response. The motivation behind the use of mixed mode surveys is to possibly increase the response rates and reduce measurement effects, by letting each person to reply by his preferred mode. Clearly, some modes are cheaper and simpler than other, notably, the use of the internet. The literature contains many examples illustrating that different modes of data collection can affect the responses. See also Table 4 in Section 8.1 of the present paper.
If all sampled units respond correctly by their preferred mode, no bias occurs and the use of a mixedmode survey benefits from the advantages listed above. However, in practice, no responses are obtained from some of the sampled units, with the rates of nonresponse steadily increasing in recent years all over the world. In this case, the use of mixedmode surveys may introduce large bias in sample estimators, if not accounted for properly. The situation can be even worse in the case of measurement effects. It is often recommended to reduce the measurement effects by a careful questionnaire design across the modes, see, e.g., Dillman and Christian (2003) and De Leeuw et al. (2018), but in the present article we assume a given sample with given responses. Notably, the two effects are confounded, and several studies in the literature attempt to disentangle them, see, e.g., de Leeuw (2005), Hox et al. (2017) and Vannieuwenhuyze et al. (2010, 2014).
In order to reduce the total mode effect, it is common to first determine whether the survey estimates produced from the different modes are indeed different and if they are, to infer which mode is the best in the sense of producing the smallest bias for the variable of interest. The selected mode is then used as a benchmark for correcting the other modes. Vannieuwenhuyze et al. (2014) assume the existence of a mode under which no bias occurs and develop bias corrected estimators by applying the observational study theory of Rosenbaum and Rubin (1983, 1984). In another approach, mode effects are conceptualized as a missingdata problem. Here again, one of the models is assumed to yield correct measurements and is used to impute values for the other modes. For example, Park et al. (2016) consider the case of two modes, use one of them as a benchmark and assume a linear relationship between the observations obtained under the two modes.
The mode comparisons are often based on heuristic arguments. For example, for questions on sensitive topics such as drug use and alcohol consumption, it is sometimes assumed that the mode which provides the highest prevalence of the illicit behavior produces the smallest bias, since the tendency of respondents would be to underreport such behavior, Tourangeau and Yan (2007). An obvious shortcoming of this approach is the underlying assumption that there consists a tendency to underreport sensitive questions. Turner et al. (1998).
An alternative approach to assess mode effects is to compare the estimates obtained from the different modes with known external data, which is assumed to be more accurate. For example, in an income study in Denmark, Kormendi (1988) estimated the bias obtained from the use of telephone and face to face modes by using income data of tax authorities. Biemer (1988, 2001) discusses several limitations of this approach, such as unavailability of appropriate external data for all variables of interest, or differences in definition between the survey measurements and the measurements in the external records.
Proxy surveys by which one member of the household (HH) responds for all the other members of the HH are in common use in HH surveys all over the world. The main motivation in this case is to increase the overall sample size, since information is obtained in principle for all the HH members (Moore, 1988). It also helps in theory to increase the response since if the designated sampled person of the HH cannot be reached or he refuses to respond, another member of the HH is contacted instead. On the other hand, information provided by one member of the HH about another member may be subject to large measurement error (supplying wrong information), and many missing items, (“I don’t know”). There seems to be a common perception that proxyresponse is less accurate than selfresponse, Groves et al. (2004). Kalsbeek and Agans (2007) mention a possible cognitive basis for the better quality of selfresponse over proxy response. There are, however examples where proxy responses turned out to be more accurate, see e.g., O’Muircheartaigh (1991) and also Table 8 in Section 9 of the present paper.
Finally, we note that there exists an ethical problem with the use of proxy surveys, especially in nonmandatory surveys with no obligation to respond. Have the other members of the HH authorized the responding person to provide all the (possibly sensitive) information about them?
In this article we propose to deal with proxy surveys by considering them as a special case of mode effects with the two main modes defined as “direct response” and “indirect response”, where direct response defines that the person provides information about himself and indirect response defines that the response is obtained by another member of the HH. Within each of the two main modes other modes can be defined, like the mode of response, known characteristics of the responding unit and nonresponse, when no information is obtained from any member of the HH. See Section 9 for an example.
In the following sections we propose and illustrate a new modelbased methodology for dealing with mode effects, which does not require apriori knowledge of a mode providing unbiased estimators. We consider the case of not missing at random (NMAR) nonresponse, by allowing the mode selection probabilities and the probability of nonresponse to depend on the true variable of interest (unobserved under measurement effects) and other explanatory variables. These two parts of our model, the model for the true values and the model for the mode selection account for selection effects, with NMAR nonresponse. Nonresponse is considered as another mode. As stated before, ignoring the NMAR nonresponse already induces bias to the sample estimates even in the absence of measurement effects, i.e., when the responses are correct. In order to account for measurement effects, we further extend our model by modelling the observed responses as a function of the true target variable, the mode selected and known covariates. Note again that with the existence of measurement effects, the true values of the target variable are unknown. To the best of our knowledge, our approach has not been proposed in the literature. Furthermore, when the covariates are known for all the population values from a census or another register, our approach is applicable also for nonprobability samples.
To fit our threepart model we follow the frequencybased approach with the likelihood maximized by application of an EM algorithm. We discuss converges properties of the algorithm and develop the asymptotic properties of the resulting maximum likelihood estimators. Having estimated all the unknown model parameters, we use the estimated model for predicting the population target quantities. We illustrate our approach by use of simulated data and two real samples, for which the true population values of interest are actually known.
In Section 3, we introduce some notation and define our 3 parts model. In Section 4 we describe the estimation of the unknown model parameters and discuss their properties, which are proved in the A at the end of the article. Section 5 considers the estimation of the population parameters of interest, distinguishing between the case where the covariates are known for all the population values of interest and the case where they are known for only the sampled units. Model evaluation is considered in Section 6. In Section 7 we illustrate our approach by simulation experiments, followed by two applications with real data in Sections 8 and 9, with Section 8 focusing on modeeffects and Section 9 on a proxy survey. We conclude with a short summary in Section 10.
Models for Selection and Measurement Effects
Consider a finite population U of size N and denote by (Y_{i},M_{i},X_{i},Z_{i}) the true outcome variable Y, the response mode M, the auxiliary variables (covariates) X explaining the variability of Y and the covariates Z explaining the variability of M, corresponding to unit i belonging to a sample S of size n, selected from U. In this article we consider the case where Y is binary, taking the values 0 and 1. Suppose first that no measurement effects exist such that every respondent reports his true outcome. Denote by \(\stackrel {\leftrightarrow }{M}\) the number of available modes, with the last mode defining the subsample of nonrespondents for which only the covariates are known. For convenience, we assume noninformative sampling as defined in Pfeffermann and Sverchkov (1999), such that \(\Pr (Y_{i}\mathrm {X}_{i},i\in S)=\Pr (Y_{i}X_{i})\), but the nonresponse is allowed to be NMAR in the sense that \(\Pr (R_{i}=1Y_{i}\text {,X}_{i},i\in S)\ne \Pr (R_{i}=1\mathrm {X}_{i},i\in S)\), where R_{i} = 1 if sampled unit i responds and R_{i} = 0 otherwise. We further assume that the mode selection depends not only on the covariates Z, but also on the true outcome Y, in the sense that \(\Pr (M_{i}Y_{i}\text {,Z}_{i})\ne \Pr (M_{i}\mathrm {Z}_{i})\). Defining W_{i} = X_{i} ∪Z_{i},
where \(\Pr (M_{i}{W}_{i})={\sum }_{j=0}^{1}\Pr (M_{i}Y_{i}=j,\mathrm {Z}_{i})\Pr (Y_{i}\mathrm {X}_{i})\). \(\Pr (Y_{i}\mathrm {X}_{i})\) is our target population distribution of Y before sampling. It follows from Eq. 3.1 that unless M_{i} is independent of the outcome in the sense that \(\Pr (M_{i}Y_{i},\mathrm {Z}_{i})=\Pr (M_{i}\mathrm {Z}_{i})\), a mode selection effect is present and with the existence of NMAR nonresponse, the mode effects cannot be ignored in the inference process.
In our empirical study we assume a logistic model for \(\Pr (Y_{i}X_{i})\), and a multivariate logistic model for \(\Pr (M_{i}Y_{i},\mathrm {Z}_{i})\), with 4 possible values for M_{i}, including nonresponse.
So far, we assumed no measurement effects. Denote by y_{i} the value measured for responding unit i, which in the case of measurement effects may differ from the true outcome Y_{i}. We account for possible measurement effects by extending the model (3.1) as,
Denoting, D_{i} = I(y_{i} = Y_{i}) where I(⋅) is the indicator function, and defining 0^{0} = 1, Eq. 3.2 can be written alternatively as,
Application of Eq. 3.3 requires modelling additionally \(\Pr (D_{i}=1Y_{i}=j,{W}_{i},M_{i})\) and in our empirical study we again assume a logistic model. Notice that unlike in Eq. 3.1, we now only observe the pair (y_{i},M_{i}) and for the nonrespondents, only the mode. The target distribution remains \(\Pr (Y_{i}\mathrm {X}_{i})\).
Model Estimation
The Case of No Measurement Effects
As before, consider first the case of no measurement effects. Adding parameter notation, Eq. 3.1 takes the form,
where α_{0} ∈A and β_{0} ∈B are the true parameter vectors. In what follows we assume that the triples (Y_{i},W_{i},M_{i}) are independent, identically distributed (iid) random variables. Denoting \(\delta _{0}=(\alpha _{0}^{\prime },\beta _{0}^{\prime })'\in \mathrm {A}\times \mathrm {B}\equiv {\Delta }\subset \Re ^{k}\), a compact parameter set, the (full) log likelihood can be written as,
where \(f_{i}(\delta )=\Pr (Y_{i}=1,M_{i}W_{i};\delta )\) and \(g_{i}(\delta )=\Pr (Y_{i}=0,M_{i}W_{i};\delta )\).
There exist many optimization algorithms for maximizing the likelihood defined by Eq. 4.2. In our empirical study we applied the NewtonRaphson algorithm, which we describe briefly for later use. Denote, \(S_{n}(\delta )=n^{1}{\sum }_{i=1}^{n}\nabla _{\delta i}\ell _{i}(\delta )\), where ∇_{δi} is the gradient operator with respect to δ and let D_{n}(δ) \(=n^{1}{\sum }_{i=1}^{n}\nabla _{\delta i}^{(2)}\ell _{i}(\delta )\), represent the corresponding k × k Hessian matrix. Starting with some initial value δ^{0}, the NewtonRaphson recursive algorithm is,
The iterations continue until δ^{j+ 1} − δ^{j} < ξ for some small positive value ξ, where ⋅ is the Euclidean norm. Denote, \(H_{1}(\alpha )=\Pr (YX;\alpha )\), \(H_{2}(\beta )=\Pr (MY,Z;\beta ), f_{M}(\delta )=\Pr (Y=1,MW;\delta ), g_{M}(\delta )=\Pr (Y=0,MW;\delta )\) and \(C_{n}(\delta )=n^{1}{\sum }_{i=1}^{n}\nabla _{\delta i}\ell _{i}(\delta )\nabla ^{\prime }_{\delta i}\ell _{i}(\delta )\). Let, C_{0n} = C_{n}(δ_{0}) and D_{0n} = D_{n}(δ_{0}).
In what follows we define regularity conditions for the identifiability and asymptotic properties of the MLE \(\hat {\delta }_{n}\). For this, we assume that as the total sample size n increases, the number of sampled units in each of the modes, except possibly the nonresponse also increases, in the sense that \(n_{m}\ge q_{m}\textit {n} ; {\sum }_{m=1}^{\stackrel {\leftrightarrow }{M}}n_{m}=n\), for some fixed constants {q_{m}}.

A1: The elements of W are bounded almost surely (a.s.) and the functions H_{1}(α) and H_{2}(β) are identifiable and positive uniformly over A and B a.s. (H_{1}(α) is positive uniformly over A if \(\mathrm {P}[\min \limits _{\alpha \in \mathrm {A}}H_{1}(\alpha )>0]=1\)).

A2: The covariates in X (or in Z) contain at least one continuous variable X_{ν}(or Z_{ν}) not contained in Z (X). The derivative of H_{1}(α)[H_{2}(β)] with respect to this covariate exists and is positive uniformly over A(B) a.s.

A3: Assuming the existence of X_{ν} as above, if α≠α^{∗}, then \(\log H_{1}(\alpha )\log H_{1}(\alpha ^{*})=h_{1}(X,\alpha ,\alpha ^{*})\) satisfies ∂h_{1}(⋅)/∂X_{v}≠ 0 a.s. Similarly, Assuming the existence of Z_{ν}, if β≠β^{∗}, then \(\log H_{2}(\beta )\log H_{2}^ {}(\beta ^{*})=h_{2}(Z,\beta ,\beta ^{*})\) satisfies ∂h_{2}(⋅)/∂Z_{v}≠ 0 a.s. The functions \(f_{\stackrel {\leftrightarrow }{M}}(\delta )\) and \(g_{\stackrel {\leftrightarrow }{M}}(\delta )\) are linearly independent.

A4: δ_{0} is an interior vector of Δ, and D_{0n} is positive definite for sufficiently large n.
The first three conditions are needed to prove the identifiability and strong consistency of \(\hat {\delta }_{n}\), the maximum likelihood estimator (MLE) of δ. A similar condition to A2 is used in Follmann and Lambert (1991) for the case of a mixture of logistic models with constant mixing probabilities, and in Pfeffermann and Landsman (2011) for modelling nonignorable assignments in observational studies. The last condition is needed for showing that \(\hat {\delta }_{n}\) is \(\sqrt {n}\) consistent and asymptotically normal (CAN). Note that for the logit and probit functions, the conditions A1, A3 and the second part of A4 hold, if the elements of W are linearly independent.
In what follows, →_{a.s.}defines convergence almost surely and →_{D} defines convergence in distribution.
Theorem 1.
Under the conditions (A1)(A3), (i) The likelihood (4.2) is identifiable and \(\hat {\delta }_{n}\to _{a.s.}\delta _{0}\). If, in addition, (A4) also holds, then (ii) \(C_{n}^{1/2}D_{n}\sqrt {n}(\hat {\delta }_{n}\delta _{0})\to _{D}N(0,I_{k})\) where I_{k} is the identity matrix of order k.
Model with Measurement Effects
Next consider the model defined by Eq. 3.2, which accounts also for measurement effects, in which case the observed measurement, y, may differ from the true outcome, Y. Denoting as before D_{i} = I(y_{i} = Y_{i}), we may write,
where δ_{0} ∈A ×B = Δ is the true parameter vector.
Notice that the models for \(\Pr (Y_{i}\mathrm {X}_{i};\alpha _{0})\) and \(\Pr (M_{i}Y_{i},\mathrm {Z}_{i};\beta _{0})\) are unchanged but we need to model,
where γ_{0} ∈Γ is the true parameter vector.
Denote \(\theta =(\delta ^{\prime },\gamma ^{\prime })^{\prime }\) and 𝜃_{0} ∈Θ = Δ ×Γ⊂R^{s}, a compact set where \(s=k+\dim (\gamma _{0})\). By Eqs. 4.4 and 4.5, the loglikelihood is now given by,
For \(M_{i}\ne \stackrel {\leftrightarrow }{M}\), denote \(h_{3}(\gamma )=({h_{3}^{0}}(\gamma ),{h_{3}^{1}}(\gamma ) )' \); \({h_{3}^{j}}(\gamma ) =\Pr (D=1Y=\textit {j},W,M;\gamma )\) j = 0,1 and define \(\tilde {C}_{n}(\theta )\) and \(\tilde {D}_{n}(\theta )\) in a similar manner as for the model with no measurement effects but with \(\tilde {\ell }_{i}(\theta )\) replacing ℓ_{i}(δ). Let \(\tilde {C}_{0n}=\tilde {C}_{n}(\theta _{0})\), \(\tilde {D}_{0n}=\tilde {D}_{n}(\theta _{0})\) and \(\hat {\theta }_{n}\) the MLE maximizing the likelihood 4.6.
Suppose the following regularity conditions:

B1: Conditions (A1)(A3) in Section 4.1 hold, the functions f_{M}(δ) and g_{M}(δ) are linearly independent also for \(M_{i}\ne \stackrel {\leftrightarrow }{M}\) and h_{3}(γ) is identifiable over Γ.

B2: Condition (A4) holds; γ_{0} is interior to Γ and \(\tilde {D}_{0n}\) is positive definite for sufficiently large n.
Theorem 2.
Under B1, (i) The likelihood (4.6) is identifiable and \(\hat {\theta }_{n}\) →_{a.s.}𝜃_{0}. If, in addition, B2 also holds, then (ii), \(\tilde {C}_{0n}^{1/2}\tilde {D}_{0n}\sqrt {n}(\hat {\theta }_{n}\theta _{0}) {\mathop {\to }\limits ^{D}}\) N(0,I_{s}).
The EM Algorithm
To maximize the likelihood (4.6), we apply the iterative EM algorithm (Dempster et al. 1977) which, as shown below, is particularly convenient in the present context. The algorithm has been developed for computing the MLE in cases of incomplete data, which is what happens in our case in the presence of measurement effects, where the true outcomes are unknown.
The key idea underlying the EM algorithm is to add latent variables to the observed data and define a modified likelihood as a function of the observed data and the values of the latent variables. Denote by 𝜃^{l} the parameter estimates at the lth iteration. The algorithm cycles between two states. In the first state, it calculates the expected value of the latent variables given the observed data and 𝜃^{l}, denoted by S_{i}(𝜃^{l}). Next, the latent variables in the modified likelihood are replaced by their expected values, thus resulting in a “likelihood” denoted by M(𝜃,𝜃^{l}), which depends on S_{i}(𝜃^{l}) and is much easier to optimize than the original log likelihood (4.6). This step is called the estimation Estep. In the second state, the likelihood M(𝜃,𝜃^{l}) is maximized with respect to the unknown parameters, yielding the estimate, 𝜃^{l+ 1}. This step is called the maximization Mstep.
In order to implement the algorithm in our case, we define D_{i} = I(y_{i} = Y_{i}) to be the latent variable values. By Eqs. 4.5 and 4.6, the modified i^{th} loglikelihood is,
Taking the expectation of D_{i} given (y_{i},M_{i}) with 𝜃 = 𝜃^{l} yields,
This defines the Estep. For defining the Mstep, denote the i^{th} log likelihood in Eq. 4.2 for the case of no measurement effects as \(\ell _{i}(A_{i}^{(1)},A_{i}^{(2)},M_{i},W_{i},\delta ^{l})=\ell _{i}(\delta ^{l})\), where \(A_{i}^{(1)}=Y_{i}\) and \(A_{i}^{(2)}=1Y_{i}\). By Eqs. 4.7 and 4.8 and some simple calculations,
where, denoting S_{i} = S_{i}(𝜃^{l}),
The “likelihood” Eq. 4.9 is seen to be the sum of two terms, one of only δ and the other of only γ. Thus, the maximization over 𝜃 is obtained by maximizing each term separately, simplifying the maximization process substantially.
In our empirical study we set the initial values by drawing many values from a broad uniform prior distribution around \(\theta ^{0}=(\hat {\delta }_{n}^{A},0)\), where \(\hat {\delta }_{n}^{A}\) is the MLE of the model without measurement effects, and use the value that maximizes the loglikelihood as the starting initial value.
Prediction of Finite Population Means
Replacing the unknown model parameters by their MLE permits estimating the true population mean (proportion), \(\bar {Y}_{(P)}={\sum }_{j=1}^{N}Y_{j}\). Denote,
where \(\hat {\delta }^{\prime }=(\hat {\alpha }^{\prime },\hat {\beta }^{\prime })\) denotes the MLE. For the case where the covariates{X_{i}} are known for all the population units, we estimate,
Remark 1.
As long as the model assumed for the population values holds also for the sample, the use of Eq. 5.2 does not require probability sampling. Furthermore, if the sample is deemed to be informative in the sense that the inclusion in the sample depends on the true outcome variable, one may extract the population model from the model holding for the sample data, by use of the relationship between the population and sample models, as developed in Pfeffermann and Sverchkov (1999). This requires to model also the probability \(\Pr (i\in SY_{i},X_{i};\eta )\).
When the covariates are known for only the sampled units and the population size is unknown, we may use a modification of the HorvitzThompson (HT) estimator, i.e.,
Assuming that each population unit can be classified by his or her preferred mode, we can also estimate the population mean for each mode, which as discussed in the introduction, is often of important interest on its own. For the case where W_{i} is known for every j ∈ U, we estimate,
Otherwise,
It is easy to show that under the conditions of Theorem 2, the predictors defined by Eqs. 5.1–5.4 are \(\sqrt {n}\)consistent for the corresponding true population means, under an appropriate asymptotic framework for finite population sampling. See Isaki and Fuller (1982) for such a framework.
Remark 2.
The predictors 5.1–5.5 are computed the same way under both the models with, and without measurement effects.
Model Evaluation
The HosmerLemeshow Test
The predictors developed in the previous sections are modeldependent and as such, the model used for their construction needs to be tested. Many goodnessoffit test procedures under the frequentist approach for continuous outcomes have been proposed in the literature. See, e.g., Pfeffermann and Landsman (2011) and Pfeffermann and Sikov (2011) for review and references. In our empirical study we consider the case of binary outcomes generated from logistic models and we apply the Hosmer Lemeshow (HL, 1980, 2000) goodnessoffit test, which is in common use for testing models for binary data.
The test statistic compares within prespecified groups the number of observed successes (y_{i} = 1), with the number of expected successes, as predicted under the estimated model. For this, the data are first ordered according to the predicted probability of success under the model evaluated. Next, the units are grouped based on the ordering into a certain number of groups of approximately equal size, (G = 10 groups is common), and within each group the estimated expected number of successes, (the sum of the predicted probabilities of success), is compared to the observed number of successes. The test statistic is,
where O_{g} is the number of observed successes in group g, n_{g} is the number of units in the group and \(\bar {\hat {p}}_{g}=n_{g}^{1}{\sum }_{i=1}^{n_{g}}\hat {p}_{gi}\) is the mean of the estimated probabilities of success. Hosmer and Lemeshow (1980) found through empirical studies under a much simpler setup than in our case that the test statistic (6.1) follows approximately the χ^{2} distribution with (G − 2) degrees of freedom, under the null hypothesis that the model fits the data.
Normalized Likelihood Ratio (NLR) Test for Model Selection
A standard test of the null hypothesis that two models, one nested within the other, fit the data “equally well”, is the likelihood ratio test. We apply the test (with certain correction, see below), for testing the null hypothesis that accounting for measurement effects in the extended model (3.2) does not improve the goodnessoffit compared to the model (3.1) with only selection effects, or more formally, for testing that there are no measurement effects.
Distinguishing the models without and with measurement effects by the superscripts A and B respectively, the standard test statistic is,
where \({l_{n}^{A}}(\hat {\delta })={\sum }_{i=1}^{n}{\ell _{i}^{A}}(\hat {\delta })\) and \({l_{n}^{B}}(\hat {\theta })={\sum }_{i=1}^{n}{\ell _{i}^{B}}(\hat {\delta })\) are the corresponding loglikelihoods computed with the MLEs, as obtained under the two models (Eqs. 4.2 and 4.6). However, unlike in standard problems where the nested model is obtained from the extended model by nullifying certain parameters, this is not the case in our application where we restrict to logistic models, since \([1+\exp (t)]^{1}\ne 0\) for all t in a compact set and hence, the model A without measurement effects is not nested in the model B with them. To deal with this problem, Vuong (1989) proposed a normalized likelihood ratio test (NLR) defined as,
where \(\hat {\omega }_{n}^{2}=4n\left [n^{1}{\sum }_{i=1}^{n}[{l_{i}^{A}}(\hat {\delta }){l_{i}^{B}}(\hat {\theta })]^{2}(n^{1}LR_{n})^{2}\right ]\) is an estimator of the variance of LR_{n}. The author shows that under the assumption that the true parameter vector is an interior point and some other regularity conditions, the asymptotic distribution of LR_{nor} is normal. However, when model A is correct, one would expect that some of the parameters γ which define the probability of the measurement effects lie on the boundary of the assumed parameter set. In this case, Vuong’s condition that the true parameter vector is an interior point is violated. A similar problem is demonstrated in Wilson (2015). In the empirical study we approximated the distribution of the statistic LR_{nor} by parametric bootstrap.
Empirical Results Based on Simulations
Design of Simulation Study
In order to assess the performance of our proposed approach, we designed a simulation study which consists of the following sages:

1.
Generate a population of size N = 10^{5} with values of three covariates, X_{1i},X_{2i},X_{3i}, generated independently from a Beta(2,5) distribution.

2.
Generate a binary outcome Y_{i} from the logistic distribution;
$$ \Pr(Y_{i}=1X_{1i};\alpha)=\text{logit}^{1}(\alpha_{0}+\alpha_{1}X_{1i}), \textit{i}=1,...,\textit{N}. $$(7.1) 
3.
Classify the population units into four modes with probabilities,
$$ \begin{array}{l} {\Pr(M_{i}=mY_{i},X_{2i};\beta_{m})=\frac{\exp(\beta_{0m}+\beta_{1m}X_{2i}+\beta_{2m}Y_{i})}{1+{\sum}_{m=1}^{3}\exp(\beta_{0m}+\beta_{1m}X_{2i}+\beta_{2m}Y_{i})}, m=1,2,3}\\ {\Pr(M_{i}=4Y_{i},X_{2i};\beta_{m})=1{\sum}_{m=1}^{3}\Pr(M_{i}=mY_{i},X_{i};\beta_{m}),} \end{array} $$(7.2)
where \(M_{i}=4=\stackrel {\leftrightarrow }{M}\) defines the “mode” of nonresponse. Notice that the model assumes NMAR nonresponse as the probability not to respond depends directly on the outcome. The parameter values in Eqs. 7.1 and 7.2 were set such that the relative population sizes in the 4 modes are approximately (10%, 25%, 40%, 25%).
The population in Stages 13 has been generated only once, such that the assessment of the performance of the proposed methodology is “designbased”, over all possible sample selections from a fixed population. Alternatively, one could generate many populations and draw samples from each population, but with a population of size N = 10^{5}, this would not make much difference.

4.
Draw K = 1000 samples of size n = 5000 by simple random sampling without replacement from the population obtained in 13.

5.
Generate measured values for the sampled units from the model,
$$ \begin{array}{@{}rcl@{}} &&\Pr(D_{i}=1Y_{i},M_{i}=m,X_{3i};\gamma_{m})\\&&=\text{logit}^{1}(\gamma_{0m}+\gamma_{1m}X_{3i}+\gamma_{2m}Y_{i}) , m=1,2,3. \end{array} $$(7.3) 
6.
Estimate the model coefficients for each sample by maximizing the likelihood function for the respective model (with or without measurement effects). We used the Newton Raphson algorithm (4.3) for estimating the coefficients of the model without measurement effects (hereafter model A), and the EM algorithm described in section (4.3) for the model with measurement effects (hereafter model B). We used parametric bootstrap for estimating the standard errors (S.E.) of the model coefficients and of the estimators of the true population mean, with 100 bootstrap samples for each parent sample. Estimating the S.E. of the model coefficients by use of the inverse information matrix turned out to be unstable, with occasional very extreme and even negative variance estimators. The computer code for running the simulation study (and for the applications with real data in Sections 8 and 9) has been written in MATLAB version 9.5.
Simulation Results
Table 1 contains the mean of the estimated coefficients (Mean Est.), along with their empirical S.E., for the models without (model A) and with (model B) measurement effects. The empirical S.E. are the standard deviations of the estimates over the 1,000 samples divided by \(\sqrt {1,000}\). We also computed the means of the estimated bootstrap S.E. but they are not shown since they are very close to the empirical S.E. for all the coefficients, under both the models A and B. We note in this respect that the measurement effects under Model B are not negligible. We computed for the first of the 1,000 samples the probabilities of measurement effects; \(a_{i}=\Pr (D_{i}=0Y_{i},\textit {X}_{2i}X_{3i})\) and found that Min(a_{i})= 0.01, Q1(a_{i})= 0.12, Mean (a_{i})= 0.36, Q3(a_{i})= 0.59, Max(a_{i})= 0.99. (Q1 and Q3 are the 1^{st} and 3^{rd} quarters.)
As seen in Table 1, the mean estimates under both models are very close to the corresponding true coefficients with very small standard errors, although it should be noted that in most cases the differences are significant when tested by use of the standard tstatistic. As expected, the empirical S.E. under Model B are somewhat higher than the corresponding S.E. under Model A, as results from the existence of measurement effects under Model B. As mentioned above, the means of the bootstrap S.E. estimators (not shown) are very close to the corresponding empirical S.E.
To save in space, in what follows we restrict to only the results obtained under Model B with the measurement effects. Similar (and somewhat better) results have been obtained under Model A for which the observed values are the same as the true values, with no errors.
Table 2 shows the results obtained when predicting the population means and sizes for the various modes, using the predictors defined by Eqs. 5.2–5.5. The predictor \(\bar {y}_{S}\) is the HT estimator when ignoring the mode effects, using all the measurements including for the nonrespondents. (Reduces to the simple sample mean based on all the sample units under simple random sampling).
Table 2 shows very good performance of the predictors, despite the existence of nonignorable measurement effects. The HT estimators have slightly larger S.E. than the modelbased predictors, which of course is expected since the latter predictors use the information about the population covariates, not used by the HT estimator. Notice how well the size and true mean of the nonrespondents (m = 4) are predicted, even with only the HT estimator. Finally, the use of the estimator \(\bar {y}_{S}\) (simple sample mean in our case), which ignores the mode effects is seen to be biased, illustrating that mode effects cannot be ignored when estimating population means. (When ignoring also the nonresponse and computing the mean of only the observed measurements, the corresponding estimator is \(\bar {y}_{S}=0.52\).)
Model Evaluation
Distribution of HosmerLemeshow Statistic Under Model B
To illustrate the distribution of the HosmerLemeshow (HL) we drew 1,000 new samples of size 5,000 without replacement from the same population and subjected the true outcomes of the responding units to measurement effects via the model (7.3). Figure 1 shows the smoothed empirical density of the test statistic for G = 10 nearly equalsize groups over the 1,000 samples. Recall that the HL statistic is supposed to have a \(\chi ^{2}{~}_{(8)}\) distribution under correct model specification (Model B in our case). As can be seen, the two densities are indeed very close, supporting the conjecture that the true distribution is indeed \(\chi ^{2}{~}_{(8)}\).
In order to study the power of HL statistic, we assume under the null hypothesis that the model for the mode selection probabilities is as defined by Eq. 7.2, where in fact we generated the modes using the model,
(Compare with 7.2).
Figure 2 shows the empirical rejection rates when using the HL test for values η in the range η ∈ [0,0.2], based on 1000 samples of size n = 5,000 for each value η, generated by use of Eq. 7.4. For η = 0, (correct model specification), the rejection rate is close to the nominal size of 0.05, as it should be. As η increases (the assumed model is further away from the correct model), the rejection rates increase monotonically, reaching powers ≥ 0.8, already for η ≥ 0.15.
Distribution of the NLR Statistic Under the Correct Model
The purpose of this section is to illustrate that we can approximate the distribution of LR_{nor}, the NLR test statistic as defined in Eq. 6.3 by parametric bootstrap. For this, we generated 1,000 samples of size 5,000 under Model A (Eqs. 7.1–7.2) with the same true parameters as before, and other 1,000 samples of size 5,000 under model B. The first set of samples allow us to assess the approximation of the distribution under the null hypothesis of no measurement effects, while the second set allows to assess the distribution of the test when there exist measurement effects. For this, we followed the following steps:

1)
Estimate the models A and B for each of the 2,000 samples and compute the LR_{nor} test statistic. At the end of this step we have 1,000 observations of the test statistic with data obeying the null hypothesis of no measurement effects, and 1,000 observations of the test statistic with data containing measurement effects as under the alternative hypothesis, allowing us to estimate the true distributions under the two hypotheses by the corresponding empirical distributions.

2)
Generate for each of the first 10 samples generated under Model A, 1000 parametric bootstrap samples of size 5,000 under the model A, using the corresponding estimated coefficients from the parent sample. Reestimate the two models for each bootstrap sample and compute the LR_{nor} statistic. This step generates 10,000 realizations form the bootstrap distribution of the test statistic when the null hypothesis of no measurement effects holds.

3)
Repeat Step 2 but this time by generating the bootstrap samples from Model B, as estimated for each of the first 10 parent samples generated under this model. This step generates 10,000 realizations form the bootstrap distribution of the test statistic under the alternative hypothesis that measurement effects exist.
Figures 3 and 4 compare the empirical distributions under the two hypotheses with the corresponding bootstrap approximations, showing sufficiently close fit in both cases. Notice that when the actual observed data conform with Model A, it is the first bootstrap distribution which will practically be used for testing the null hypothesis of no measurement effects. When the observed data conform with Model B, it is the second bootstrap distribution which will practically be used.
Empirical Results for a Real MixedMode Survey
Example of Measurement Effects
As explained in the introduction, the term measurement effect refers to the case where a sampled unit responds differently, depending on the mode of response. In the Agriculture Census of Israel carried out in 2018, 210 farmers (out of about 17,000) happened to respond both by Telephone and via the internet, after receiving mistakenly a reminder to respond via the Internet, even though they already responded by telephone. Table 3 summarizes the results obtained for two of the questions asked in the census: number (#) of workers in the farm, and total cultivated area. Out of the 210 farmers, 131 farmers responded the same way on Question 1 and 139 farmers responded the same way on Question 2. The notation T>I (T<I) defines the farmers with the higher (lower) responses on telephone than on the internet.
The figures in the table indicate big differences in the answers of the farmers that provided different answers by the two modes (about one third of the farmers responding by the two modes). In this example the measurement effects cancel out when computing the means, but there is no guarantee that this always happens, and proper models need to be used to account successfully for the measurement effects. See next section.
Accounting for Mode Effects in a Real MixedMode Survey
In this section we illustrate the performance of our proposed approach by using data collected as part of the annual crime victimization survey in 2017, administered by the Israel Central Bureau of Statistics (ICBS). The survey collects information on victimization with respect to a variety of crimes, as well as sociodemographic information. Similar surveys are carried out by national statistical offices throughout the world. The sample is drawn by probability sampling, and the sampled units can respond using either the internet, or by telephone, implying that we have 3 modes, with the third mode defined by nonrespondents. The total sample size is n = 7035, with 11% responding via the internet (I), 60% by telephone (T) and 29% not responding (NR).
Although not a primary variable of interest in this survey, we chose as the target outcome variable the binary variable Y_{i}, taking the value “1” if unit i has an academic degree (Bachelor or higher). The aim is to predict the true population proportion of persons with academic degree. The reason for this choice is that the ICBS has an extensive register of education, with population coverage of over than 95%, so that we can assess the performance of our method by comparing the predictors to the “truth”. The register is complete for only 2016, but we don’t expect significant differences between the two years. We confined our analysis to persons aged 20+ and based on the register, the true proportion is \(\bar {Y}_{(P)}=0.24\). On the other hand, 41.4% of the internet respondents and 23.5% of the telephone respondents in the sample have an academic degree, indicating the existence of a mode selection effect, and possibly also measurement effects.
In what follows we present and discuss the results of our study. We included among the covariates (X) the variables gender (gen; male= 1), age (in years), and country of birth (cob; Israel= 1, other countries= 0). In accordance with the general methodology outlined in Sections 3–5, and the models and notation in Section 7, we fitted the following models:
Table 4 shows the estimated coefficients and the parametric bootstrap S.E. of the models (8.1) and (8.2), as obtained when fitting the models with and without accounting for possible measurement effects (M.E.). Table 5 shows the estimated coefficients and S.E. of the model (8.3), which accounts for measurement effects. We also computed the means of the estimated coefficients over all the BS samples (not shown), and found that they are very close to the estimates obtained from the original sample, thus verifying the asymptotic unbiasedness of our ML estimators.
As can be seen, most of the coefficients are highly significant in both tables under the standard ttest. What we find a bit surprising is that the coefficient of Y (having academic degree) in the models for M_{i} = TX_{i},Y_{i};β and M_{i} = IX_{i},Y_{i};β is highly negative in all 4 models in Table 4, suggesting that with the other covariates held fixed, having an academic degree actually encourages nonresponse (the third mode). Also, for the model with measurement effects (Table 5), the coefficient of Y is positive and highly significant (but of much smaller magnitude), suggesting that having an academic degree increases the probability of misreporting. Including more covariates in the models could possibly resolve these, somehow unexpected outcomes.
Next, we study the performance of the model in predicting the true population proportion, which as noted in Section 8.1, is basically known for this application. We computed the following predictors:

1.
\(\hat {\bar {Y}}_{(Model)}=N^{1}{\sum }_{i=1}^{N}\hat {\rho }_{i}\); \(\hat {\rho }_{i}=\Pr (Y_{i}=1\mathrm {X}_{i};\hat {\alpha })\) (uses the covariate information for all the population units) (5.2)

2.
\(\hat {\bar {Y}}_{(\textit {HT,Model})}={\sum }_{i=1}^{n}\pi _{i}^{1}\hat {\rho }_{i}/{\sum }_{i=1}^{n}\pi _{i}^{1}\); \(\pi _{i}=\Pr (i\in S)\) (5.3)

3.
\(\hat {\bar {Y}}_{HT,True}=\hat {N}^{1}{\sum }_{i=1}^{n}\pi _{i}^{1}Y_{i}^{True}; \hat {N}={\sum }_{i=1}^{n}\pi _{i}^{1}\) (uses the true values Y_{i} known from the education register).

4.
\(\hat {\bar {Y}}_{HT,Adj}=\hat {N}_{Adj}^{1}{\sum }_{i=1}^{n}\tilde {\pi }_{i}^{1}y_{i}; \hat {N}_{Adj}={\sum }_{i=1}^{n}\tilde {\pi }_{i}^{1}\), the HT estimator but with the standard base weights \(\{a_{i}=\pi _{i}^{1}\}\) replaced by adjusted weights to account for the nonresponse.

5.
\(\hat {\bar {Y}}_{HT,imp}=\hat {N}^{1}{\sum }_{i=1}^{n}\pi _{i}^{1}\tilde {Y}_{i}\); \(\tilde {Y}_{i}=y_{i}\) if unit i responds, \(\tilde {Y}_{i}=Y_{i,imp}\) if unit i does not respond. The imputation was carried out using the monotone imputation method of Rubin (1987, p. 172), based on the observed sample values y_{i}.
The estimators obtained under the two models (with out and with the accounting for measurement errors), are shown in Table 6.
Computing the HosmerLemeshow (HL) test discussed in Section 6.1 under the two models, and the normalized likelihood ratio (NLR) test discussed in Section 6.2, yields (pvalues in parentheses): HL(A) = 11.6(pvalue = 0.17),HL(B) = 9.44(pvalue = 0.31),LR_{nor} = 0.21(pvalue = 0.34).
The results of this study show very clearly that our proposed modelbased predictors are much superior to the designbased estimators, which ignore the mode effects (\(\hat {\bar {Y}}_{HT,Adj},\hat {\bar {Y}}_{HT,imp}\)), despite the use of only three covariates for which the population values are known. (The estimator \(\hat {\bar {Y}}_{HT,True}=0.25\), which uses the correct values of the outcome variable indicates that the designbased estimator in the case of no measurement effects and nonresponse performs well.) Model B, which accounts for possible measurement effects seems to perform somewhat better than Model A, which assumes no measurement effects (note the relative high value of \(\hat {\bar {Y}}_{HT,Model}=0.28\) under Model A), although this is only partly reflected by the values of the two test statistics, suggesting that for the variable of having an academic degree in this survey, there are only small measurement effects, not detected by the two tests considered.
Dealing with Proxy Surveys as Mode Effects
As mentioned in the introduction, we propose dealing with the problem of proxy surveys via the methodology developed in the present article for dealing with mode effects. We illustrate our proposal using data collected as part of the Labor Force Survey (LFS), administered by ICBS. The LFS in Israel is a monthly survey with a 4 in, 8 out, 4 in, rotation pattern. The LFS is a proxy survey because every sampled person is asked to respond about himself (direct response), and about all other members of the household (proxy response). For the present illustration we use the data observed in all the months of 2018 for the first interview, which is carried out by a personal interview. To further reduce the overall sample size, we restrict to the Jewish population aged 2040, yielding a sample of n= 19,820 persons.
We again use the binary variable Y_{i} “having an academic degree”, as our target variable, thus allowing us to compare predictors of the population proportion of people with academic degree to the true proportion. Table 7 shows a few designbased estimates computed from the data, after modifying the base sampling weights to account for nonresponse. The estimates shown are:
\(\hat {\bar {Y}}_{NoM\text {odes}}\) Standard HT estimator when ignoring the mode effects, but with the sampling weights adjusted for nonresponse,
\(\hat {\bar {Y}}_{Direct}\) Designbased estimator using only the direct responses (with adjusted weights),
\(\hat {\bar {Y}}_{\text {Proxy}}\) Designbased estimator using only the proxy responses (with adjusted weights).
As expected, the designbased estimator \(\hat {\bar {Y}}_{No\text {Modes}}\) that ignores the mode effects performs poorly. The estimator based on only the direct responses performs even worse but quite surprising, the estimator \(\hat {\bar {Y}}_{\text {Proxy}}\), which uses only the proxy responses performs relatively well. It seems therefore that when asked about the possession of an academic degree, the proxy responses are generally more accurate than the responses of interviewees responding about themselves. The importance of this outcome is in illustrating that it is not necessarily true that interviewees responding about themselves provide correct answers, or in a more general mode effects set up, that one can decide on a mode with correct answers. Recall from the Introduction that several methods proposed in the literature to deal with mode effects assume the existence (and knowledge) of a mode which provides unbiased estimators for the true population mean.
We now illustrate the use of our mode effects methodology to handle proxy survey problems. We consider 5 different “modes” as follows: proxy response male (MP), proxy response female (FP), direct response male (MD), direct response female (FD), nonresponse (NR). By MP we mean that the unit for which a proxy response is provided is a male and similarly for the other modes. Out of the total sample size n = 19,820, 16.8% responses have been obtianed by FD, 15% by MD, 30.1% by FP, 31.8 by MP and 6.3% did not respond. Denoting by S_{m} the sample of units responding by mode m, we estimated the population proportion for each of the modes using the ratio HT estimator with weights adjusted for nonresponse, \(\hat {\bar {Y}}_{HT,Adj}^{(m)}=\hat {N}_{Adj,m}^{1}{\sum }_{i\in S_{m}}\tilde {\pi }_{i}^{1}y_{i}; \hat {N}_{Adj,m}={\sum }_{i\in S_{m}}\tilde {\pi }_{i}^{1}\) and found: \(\hat {\bar {Y}}_{HT,Adj}^{(MP)}=0.19\), \(\hat {\bar {Y}}_{HT,Adj}^{(FP)}=0.34,\hat {\bar {Y}}_{HT,Adj}^{(MD)}=0.37,\hat {\bar {Y}}_{HT,Adj}^{(FD)}=0.48\), suggesting the existence of mode effects. (Similar differences exist when using instead the true values of Y as known from the register.)
Next we consider our modelbased predictors. We use as covariates age and years of study. (Years of study is known from the register, the gender of the interviewee is accounted for in the definition of the modes.) To save in space, we do not present the coefficients of the models (8.1)–(8.3) obtained in this case.
Table 8 shows again the true value and the different predictors obtained in this case. The notation is the same as before.
The results in Table 8 show very good performance of the modelbased predictors when fitting Model B, which accounts for possible measurement effects, with the estimator \(\hat {\bar {Y}}_{(Model)}\) that uses the population covariates yielding an almost perfect predictor. On the other hand, the predictors obtained under Model A, which assumes no measurement effects are clearly biased, indicating the existence of measurement effects in this application. This result is reinforced by the HL test statistics, rejecting the null hypothesis of Model A, HL(A) = 26.359(p − value = 0.001) but not rejecting Model B, HL(B) = 7.24(p − value = 0.51). Also, the NLR test rejects Model A in favor of Model B, LR_{nor} = 10.31(p − value = 0.00).
The estimator \(\hat {\bar {Y}}_{HT,True}\), which uses the true Y values from the education register performs very well, validating the sampling design and corresponding estimator, but the estimator \(\hat {\bar {Y}}_{HT,imp}\) which imputes the missing data for the nonrespondents based on the observed data and thus ignores the measurement effects, and the estimator \(\hat {\bar {Y}}_{HT,Adj}\), which attempts to correct for the nonresponse by modification of the sampling weights (but does not account for the measurement effects) perform poorly, overestimating the true proportion by about 23%, the same as the modelbased estimators under Model A. We conclude that in this application, there are large measurement effects, captured well under Model B, but not under Model A and the modified designbased estimators considered.
Summary
In this article we propose a new comprehensive modelbased approach to deal with modeeffects, which is applied also to deal with proxy surveys; two major problems in survey sampling. Our approach addresses both selection and measurement effects, underlying the possible modeeffects. Furthermore, we allow for not missing at random (NMAR) nonresponse, by considering the nonresponse as another mode. Unlike other approaches proposed in the literature, we do not assume that one of the modes provides unbiased predictors. The existence of such a mode is not guaranteed, and even if it exists, it is not clear how to determine which one it is. The approach is modelbased but we cannot think of a proper designbased approach that can deal simultaneously with selection and measurement effects and NMAR nonresponse, without very strong and generally untestable assumptions. In this article we restricted to binary outcome variables (fitting logistic models in the empirical illustrations), but the proposed approach can be extended to continuous outcomes, with proper modifications.
We propose simple test procedures for testing our model, and in particular, for testing the existence of measurement effects, which are seen to work well in the empirical studies, although more powerful tests can, and should be developed. When applied to proxy surveys, an interesting open question is how to define the different modes. In our empirical study we defined them in an “adhoc” manner, but a more founded methodology should be established. One possible way is to start with as many as possible modes, estimate the means or other characteristics of interest for each mode, and then collapse modes based on proper statistical analysis, so as to stabilize the final results.
The empirical results with the simulated and real data sets are promising and we encourage other researchers to test the approach with their data. We mention again that the approach is applicable in principle also to nonprobability samples, which become more and more popular in recent years with the availability of new “big data” sets.
References
Biemer, P. P. (1988). Measuring Data Quality. In Telephone Surveys Methodology. Wiley, New York,.
Biemer, P. P. (2001). Nonresponse Bias and measurement bias in a comparison of face to face and telephone interviewing. J. Off. Stat. 17, 295–320.
de Leeuw, E. (2005). To mix or not to mix. Data collection modes in surveys. Journal of Official Statistics 21, 233–255.
De Leeuw, E. D. (2018). Mixedmode: past, present and future. Surv. Res. Methods 12, 75–89.
De Leeuw, E. D., SuzerGurtekin, Z. and Hox, J. (2018). The Design and Implementation of Mixed Mode Surveys. In Advances in Comparative Survey Methodology. Wiley, New York,.
Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B 39, 1–38.
Hox, J., de Leeuw, E. D. and Klausch, T. (2017). Mixed mode research: issues in design and analysis. In Total Survey Error in Practice. Wiley Series in Survey Methodology.
Dillman, D. A. and Christian, L. M. (2003). Survey mode as a source of instability in responsesacross surveys. Field Methods 15, 1–22.
Follmann, D. A. and Lambert, D. (1991). Identifiability of finite mixture of logistic regression models. J. Stat. Plan. Inference 27, 375–381.
Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E. and Tourangeau, R. (2004). Survey Methodology. Wiley, Hoboken.
Hosmer, D. W. and Lemeshow, S. (1980). A goodnessoffit test for multiple logistic regression model. Commun. Stat. A 10, 1043–1069.
Hosmer, D. W. and Lemeshow, S. (2000). Applied Logistic Regression. Wiley, New York.
Isaki, C. T. and Fuller, W. A. (1982). Survey design under a regression superpopulation model. J. Am. Stat. Assoc. 77, 89–96.
Kalsbeek, W. D. and Agans, R. P. (2007). Sampling and weighting in household telephone surveys. Wiley, New York,.
Kormendi, E. (1988). The quality of income information in telephone and face to face surveys. In Telephone Survey Methodology. Wiley, New York.
Moore, J. C. (1988). Self/Proxy response status and survey response quality: A review of literature. J. Off. Stat. 4, 155–172.
O’Muircheartaigh, C. (1991). Simple response variance: estimation and determinants. Measurement Errors in Surveys. Wiley, New York.
Park, S., Kim, J. K. and Park, S. (2016). An imputation approach for handling mixed mode surveys. Ann. Appl. Stat. 10, 1063–1085.
Patil, G. P. and Rao, C. R. (1978). Weighted distributions and size biased sampling with applications to wildlife populations and human families. Biometrics34, 179–189.
Pfeffermann, D. (1993). The role of sampling weights when modeling survey data. Int. Stat. Rev. 61, 317–337.
Pfeffermann, D. (2017). Bayesbased nonBayesian inference on finite populations from nonrepresentative samples. A unified approach. Calcutta Statistical Association (CSA) Bulletin 69, 35–63.
Pfeffermann, D. and Landsman, A. (2011). Are private schools really better than public schools? Assessment by methods for observational studies. Ann. Appl. Stat. 5, 1726–1751.
Pfeffermann, D. and Sikov, A. (2011). Imputation and estimation under non ignorable nonresponse in household surveys with missing covariate information. J. Off. Stat. 27, 181–209.
Pfeffermann, D. and Sverchkov, M (1999). Parametric and semiparametric estimation of regression models fitted to survey data. Sankhya 61, 166–186.
Rothenberg, T. J. (1971). Identification in parametric models. Econometrica 39, 577–591.
Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley, New York.
Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55.
Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 79, 516–524.
Tourangeau, R. and Yan, T. (2007). Sensitive questions in surveys. Psychlogical Bulletin 133, 859–883.
Turner, C. F., Ku, L., Rogers, S., Lindberg, L., Pleck, J. and Sonenstein, F. (1998). Adolescent sexual behavior, drug use and violence: increased reporting with computer survey technology. Science 280, 867–873.
Vannieuwenhuyze, J., Loosveldt, G. and Molenberghs, G. (2010). A method for evaluating mode effects in mixedmode surveys. Public Opin. Q. 74, 1027–1045.
Vannieuwenhuyze, J., Loosveldt, G. and Molenberghs, G. (2014). Evaluating mode effects in mixedmode survey data using covariate adjustment models. J. Off. Stat. 30, 1–24.
Vuong, Q. H. (1989). Likelihood ratio tests for model selection and nonnested hypotheses. Econometrica 57, 307–333.
White, H. (1994). Estimation, Inference and Specification Analysis. Cambridge University Press, Cambridge.
Wilson, P. (2015). The misuse of the Vuong test for nonnested models to test for zeroinflation. Econ. Lett. 127, 51–53.
Author information
Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix:
Appendix:
Let the variables Y,M,X,Z under Model A (y,Y,M,X,Z under Model B) be defined on the probability space (Ω,F,P). Condition A1(B1) implies the following condition:
(*) the sequences\(\{\ell _{i}(\delta )\}_{i\in S},\{\tilde {\ell }_{i}(\delta )\}_{i\in S}\) and their first and second derivatives are iid and bounded almost surely respectively.
Denote, \(\ell (\delta )=I(M<\stackrel {\leftrightarrow }{M})\textit {log}[f(\delta )^{Y}g(\delta )^{1Y}]+I(M=\stackrel {\leftrightarrow }{M})\textit {log}[f(\delta )+g(\delta )]\),
and ℓ_{0} = ℓ(δ_{0}).Also, let \(\tilde {\ell }(\theta ) = I(M\!<\stackrel {\leftrightarrow }{M})\left (y\log [p_{1}(\gamma )f(\delta ) + (1  p_{0}(\gamma ))g(\delta )]\right .\)
where \(p_{j}(\gamma ) =\Pr (D=1Y=\textit {j},W,M;\gamma ), D=I(Y=y)\) for \(M\ne \stackrel {\leftrightarrow }{M}, j=0,1\).
Proof Proof of Theorem 1.
(i) First, we show that the model is identifiable, that is, ℓ(δ) = ℓ(δ^{∗}) almost surely implies δ = δ^{∗}. Let \(F_{1}=\{\omega \in {\Omega } M(\omega )\ne \stackrel {\leftrightarrow }{M}\}\) and \(F_{2}=\{\omega \in {\Omega } M(\omega )=\stackrel {\leftrightarrow }{M}\}\). Note that under the condition A1 both sets are nonnull. Let ω ∈ F_{1}, and suppose thatδ≠δ^{∗} but ℓ(δ) = ℓ(δ^{∗}). Suppose first that α≠α^{∗}.
Under Condition A2, there exists a variable X_{ν} not included in Z. Let,\({H_{1}^{v}}(\alpha )=\partial H_{1}(\alpha )/\partial X_{v}\) and denote by X_{−v} the vector of covariates in X excluding X_{ν}. Taking the partial derivative of Eq. A.1 with respect to X_{v} yields,
By integrating Eq. A.2 with respect to X_{v} and using the notation in Eq. A.3,
where ψ(X_{−v}) is some differentiable function with respect X_{−v}. Taking the partial derivative of Eq. A.3 with respect toX_{v} implies ∂h_{1}(X)/∂X_{v} = 0, which contradicts Condition A2 and hence α = α^{∗}. It remains to show that β = β^{∗}. Substituting α = α^{∗} in Eq. A.1, it follows from A1 that β = β^{∗} and thus δ = δ^{∗}. □
A similar proof applies when considering β≠β^{∗} and when there exists a variable Z_{ν} in Z not included in X.
Consider now ω ∈ F_{2}. Using similar arguments to above, we can show that \(f_{\stackrel {\leftrightarrow }{M}}(\delta )\) and \(g_{\stackrel {\leftrightarrow }{M}}(\delta )\) are each identifiable and by Condition A3, they are linearly independent. (The functions f_{M}(δ) and g_{M}(δ) are defined in Section 4.1). Hence,\(f_{\stackrel {\leftrightarrow }{M}}(\delta )+g_{\stackrel {\leftrightarrow }{M}}(\delta )\)\(=f_{\stackrel {\leftrightarrow }{M}}(\delta ^{*})+g_{\stackrel {\leftrightarrow }{M}}(\delta ^{*})\Rightarrow \delta =\delta ^{*}\). This completes the proof of identifiability.
Second, the compactness of the parameter set and the identifiability property implies by the information inequality that \(E\ell (\delta _{0})\max \limits _{\delta \in {\Delta }}E\ell (\delta )\ge 0\).
Third, by (*) and Theorem A.2.2 in White (1994),
Given the results so far and using similar arguments to those used in the proof of Theorem 3.4 of White (1994), it follows that \(\hat {\delta }_{n}\to _{a.s.}\delta _{0}\), thus completing the first part of the theorem.
(ii) Note that \(E(\nabla _{\delta _{0}}\ell _{0})=\nabla _{\delta _{0}}(E\ell _{0})=0\). The left hand side equality follows by Condition (*). The right hand side equality follows from Condition A4. The identifiability of the model shown in part (i) and Theorem 1 of Rothenberg (1971) implies that for sufficiently large n, C_{0n} is positive definite. The last two results imply by the LindbergLevy Central Limit Theorem,
Further, by (*), Condition A4 and Theorem A.2.2 in White (1994),
Thus, by the strong consistency of \(\hat {\delta }_{n}\) shown in the first part,
By Eqs. A.5–A.7, and Theorem of 6.2 in White (1994),
\(C_{n}^{1/2}D_{n}\sqrt {n}(\hat {\delta }_{n}\delta _{0})\to _{D}N(0,I_{k})\). Q.E.D
Proof Proof of Theorem 2.
We again start by proving the likelihood identifiability (4.6). Denote as before, \(p_{j}={h_{3}^{j}}(\gamma ) =\Pr (D=1Y=\textit {j},W,M;\gamma )\) for \(M_{i}\ne \stackrel {\leftrightarrow }{M}\), j = 0,1 and \(p=h_{3}(\gamma )=({h_{3}^{0}}(\gamma ),{h_{3}^{1}}(\gamma ) )' \). Using the same steps as in the proof of theorem 1, it can be shown that under Condition B1, the probabilities f_{M}(δ) and g_{M}(δ) are both identifiable. Hence, by Condition B1, for the set \(\tilde {F}_{2}=\{\omega \in {\Omega } M(\omega )=\stackrel {\leftrightarrow }{M}\}\), \(\tilde {\ell }(\theta )=\tilde {\ell }(\theta ^{*})\Rightarrow \theta =\theta ^{*}\), similarly to the first part of Theorem 1. For \(\omega \in \tilde {F}_{1}=\{\omega \in {\Omega } M(\omega )\ne \stackrel {\leftrightarrow }{M}, y(\omega )=1\}\), the contribution to the likelihood is given by,
(compare with 3.6). Thus, under Condition B1, G(δ,p) is identifiable in the sense that,
and since h_{3}(γ) is identifiable, we have that if p = p^{∗}⇒ γ = γ^{∗} and 𝜃 = 𝜃^{∗}.
By repeating the same arguments as above, we establish the identifiability of the model also for the set \(\omega \in \tilde {F}_{3}=\{\omega \in {\Omega } M(\omega )\ne \stackrel {\leftrightarrow }{M}, y(\omega )=0\}\). The rest of the proof of strong consistency (part i) and asymptotic normality (part ii) of the MLE, is similar to the proof of Theorem 1. □
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Pfeffermann, D., Preminger, A. Estimation Under Mode Effects and Proxy Surveys, Accounting for Nonignorable Nonresponse. Sankhya A (2021). https://doi.org/10.1007/s1317102000229w
Received:
Accepted:
Published:
Keywords
 EM algorithm
 measurement effects
 NMAR nonresponse
 probability and nonprobability sampling
 selection effects.
AMS (2000) subject classification
 Primary 62D05; Secondary 62E20