Abstract
Interval-censored data may arise in questionnaire surveys when, instead of being asked to provide an exact value, respondents are free to answer with any interval without having pre-specified ranges. In this context, the assumption of noninformative censoring is violated, and thus, the standard methods for interval-censored data are not appropriate. This paper explores two schemes for data collection and deals with the problem of estimation of the underlying distribution function, assuming that it belongs to a parametric family. The consistency and asymptotic normality of a proposed maximum likelihood estimator are proven. A bootstrap procedure that can be used for constructing confidence intervals is considered, and its asymptotic validity is shown. A simulation study investigates the performance of the suggested methods.
Similar content being viewed by others
1 Introduction
In questionnaire surveys respondents are often allowed to give an answer in the form of an interval. For example, the respondent can be asked to select from several pre-specified intervals; this question format is known as range card. Another approach is called unfolding brackets, where the respondent is asked a sequence of yes-no questions that narrow down the range in which the respondent’s true value is. These formats are suitable when asking questions that are difficult to answer with an exact value (e.g., recall questions) or when asking sensitive questions (e.g., asking about income) because they allow partial information to be elicited from respondents who are unable or unwilling to provide exact amounts. However, studies have found that the pre-specified intervals given to the respondents in a range-card question are likely to influence their answers. Such bias is known as bracketing effect (see, e.g., McFadden et al. 2005). Similarly, the unfolding brackets format is prone to the so-called anchoring effect, i.e., answers can be biased toward the starting value in the sequence of yes-no questions (see, e.g., Furnham and Boo 2011; Van Exel et al. 2006).
A format that does not involve any pre-specified values is the respondent-generated intervals approach, suggested by Press and Tanur (2004a, b), where the respondent is asked to provide both a point value (a best guess for the true value) and an interval. They employed Bayesian methods for estimating the parameters of the underlying distribution. Similar format, in which the respondent is free to answer with any interval containing his/her true value, was considered by Belyaev and Kriström (2010). They use the term self-selected interval (SSI). Estimating the underlying distribution using SSI data, however, requires some generally untestable assumptions related to how the respondent chooses the interval. To avoid such assumptions, Belyaev and Kriström (2012, 2015) introduced a novel two-stage approach. The idea is to ask the respondent first to provide an SSI and then to select from several sub-intervals of the SSI the one that most likely contains his/her true value. Data collected in a pilot stage are used for generating the sub-intervals in the second question. Belyaev and Kriström (2012, 2015) proposed a nonparametric maximum likelihood estimator of the underlying distribution for two-stage SSI data. Angelov and Ekström (2017) extended their work by exploring a sampling scheme where the number of sub-intervals in the second question is limited to two or three, which is motivated by the fact that a question with a large number of sub-intervals might be difficult to implement in practice, e.g., in a telephone interview.
Data consisting of self-selected intervals are a special case of interval-censored data. Let X be a random variable of interest. An observation on X is interval-censored if, instead of observing X exactly, only an interval \((L,R\,]\) is observed, where \(L < X \le R\) (see, e.g., Zhang and Sun 2010). Interval-censored data arise most commonly when the observed variable is the time to some event (known as survival data, failure time data, lifetime data, duration data, or time-to-event data). The problem of estimating the underlying distribution for interval-censored data has been approached through nonparametric methods by Peto (1973), Turnbull (1976), and Gentleman and Geyer (1994), among others. These estimators rely on the assumption of noninformative censoring, i.e., the observation process that generates the censoring is independent of the variable of interest (see, e.g., Sun 2006, p. 244). In the sampling schemes considered by Belyaev and Kriström (2010, 2012, 2015) and Angelov and Ekström (2017) this is not a reasonable assumption as it is the respondent who chooses the interval; thus, the standard methods are not appropriate. The existing methods for data with informative interval censoring (see Finkelstein et al. 2002; Shardell et al. 2007) are specific for time-to-event data and are not directly applicable in the context that we are discussing.
In this paper, we focus on parametric estimation of the underlying distribution function, i.e., we assume a particular functional form of the distribution. Compared to nonparametric methods, this approach usually leads to more efficient estimators, provided that the distributional assumption is true (see, e.g., Collett 1994, p. 107). The problem of choosing the right parametric model can be sidestepped by using a wide parametric family like the generalized gamma distribution (see, e.g., Cox et al. 2007) that includes most of the commonly used distributions as special cases (exponential, gamma, Weibull, and log-normal).
We suggest two modifications of the sampling scheme for SSI data studied in Angelov and Ekström (2017) and propose a parametric maximum likelihood estimator. In Sect. 2, we introduce the sampling schemes. In Sect. 3, the statistical model is defined and the corresponding likelihood function is derived. Asymptotic properties of the maximum likelihood estimator are established in Sect. 4. The results of a simulation study are presented in Sect. 5, and the paper is concluded in Sect. 6. In “Appendix” are given proofs and auxiliary results.
2 Sampling schemes
2.1 Scheme A
The rationale behind this scheme is that we need to have more information than just the self-selected intervals in order to estimate the underlying distribution. Therefore, we ask the respondent to select a sub-interval of the interval that he/she stated. The problem of deciding where to split the stated interval into sub-intervals can be resolved using some previously collected data (in a pilot stage) or based on other knowledge about the quantity of interest.
We consider the following two-stage scheme for collecting data. In the pilot stage, a random sample of \(n_0\) individuals is selected and each individual is requested to give an answer in the form of an interval containing his/her value of the quantity of interest. It is assumed that the endpoints of the intervals are rounded, for example, to the nearest integer or to the nearest multiple of 10. Thus, instead of (50.2, 78.7] respondents will answer with (50, 79] or (50, 80].
Let \( d_0^{\star }< d_1^{\star }< \cdots< d_{k'-1}^{\star } < d_{k'}^{\star } \) be the endpoints of all observed intervals. The set \( \{ d_j^{\star } \} = \{ d_0^{\star }, \ldots , d_{k'}^{\star } \} \) can be seen as a set of typical endpoints. The data collected in the pilot stage are used only for constructing the set \( \{ d_j^{\star } \} \), which is needed for the main stage. The set \( \{ d_j^{\star } \} \) may also be constructed using data from a previous survey, or it can be determined by the researcher based on prior knowledge about the quantity of interest or other reasonable arguments. For instance, if it is known that the variable of interest ranges between 0 and 200 and that the respondents are rounding their endpoints to a multiple of 10, then a reasonable set of endpoints will be \(\{0,10,20,\ldots ,200\}\).
In the main stage, a new random sample of n individuals is selected and each individual is asked to state an interval containing his/her value of the quantity of interest. We refer to this question as Qu1. The stated interval is then split into two or three sub-intervals, and the respondent is asked to select one of these sub-intervals (the points of split are chosen in some random fashion among the points \(d_j^{\star }\) that are within the stated interval, e.g., equally likely or according to some other pre-specified probabilities). We refer to this question as Qu2. The respondent may refuse to answer Qu2, and this will be allowed for. If there are no points \(d_j^{\star }\) within the stated interval, the second question is not asked.
Let \( d_0< d_1< \cdots< d_{k-1} < d_k \) be the union of \( \{ d_j^{\star } \} \) and the endpoints of all intervals observed at the main stage. Note that k is unknown but, because of the rounding of endpoints, it can not be arbitrarily large. Let us define a set of intervals \( {\mathcal {V}} = \{ \mathbf {v}_1, \ldots , \mathbf {v}_k \} \), where \( \mathbf {v}_j = (d_{j-1}, d_{j}], \; j=1, \ldots , k \), and let \( {\mathcal {U}} = \{ \mathbf {u}_1, \ldots , \mathbf {u}_m \} \) be the set of all intervals that can be expressed as a union of intervals from \( {\mathcal {V}} \), i.e., \( {\mathcal {U}} = \{ (d_l, d_r] : \,\, d_l < d_r, \,\, l,r=0,\ldots ,k \} \). For example, if \( {\mathcal {V}} = \{ (0,5], \, (5,10], \, (10,20] \}\), then \( {\mathcal {U}} = \{ (0,5], \, (5,10], \)\(\, (10,20], \, (0,10], \, (5,20], \, (0,20] \} \). We denote \({\mathcal {J}}_{\scriptstyle h}\) to be the set of indices of intervals from \({\mathcal {V}}\) contained in \(\mathbf {u}_h\):
In the example with \( {\mathcal {V}} = \{ (0,5], \, (5,10], \, (10,20] \}\), \( \mathbf {u}_5 = (5,20] = \mathbf {v}_2 \cup \mathbf {v}_3 \), hence \( {\mathcal {J}}_5 = \{2,3\} \).
Remark 1
The main difference between this scheme and the one explored in Angelov and Ekström (2017) is that with scheme A there is no exclusion of respondents, while with the former scheme respondents are excluded if they stated an interval with endpoints not belonging to \( \{ d_j^{\star } \} \).
2.2 Scheme B
This scheme is a modification of scheme A with two follow-up questions after Qu1 aiming to extract more refined information from the respondents. The pilot stage is the same as in scheme A. The sets \( \{ d_0, \ldots , d_k \} \), \( {\mathcal {V}} \), \( {\mathcal {U}} \), and \( {\mathcal {J}}_{\scriptstyle h} \) are also defined in the same way. In the main stage, a new random sample of n individuals is selected and each individual is asked to state an interval containing his/her value of the quantity of interest. We refer to this question as Qu1. The stated interval is then split into two sub-intervals, and the respondent is asked to select one of these sub-intervals. The point of split is the \(d_j^{\star }\) that is the closest to the middle of the interval; if there are two points that are equally close to the middle, one of them is taken at random. This way of splitting the interval yields two sub-interval of similar length, which would be more natural for the respondent. We refer to this question as Qu2a. The interval selected at Qu2a is thereafter split similarly into two sub-intervals, and the respondent is asked to select one of them. We refer to this question as Qu2b. The respondent may refuse to answer the follow-up questions Qu2a and Qu2b. If there are no points \(d_j^{\star }\) within the interval stated at Qu1 or Qu2a, the respective follow-up question is not asked. We assume that if a respondent has answered Qu2a, he/she has chosen the interval containing his/her true value, independent of how the interval stated at Qu1 was split. An analogous assumption is made about the response to Qu2b.
If we know the intervals stated at Qu1 and Qu2b, we can find out the answer to Qu2a. For this reason, if Qu2b is answered, the data from Qu2a can be omitted. Let \(\hbox {Qu2}\varDelta \) denote the last follow-up question that was answered by the respondent. If the respondent did not answer both Qu2a and Qu2b, we say that there is no answer at \(\hbox {Qu2}\varDelta \). We will distinguish three types of answers in the main stage:
- Type 1. :
-
\( \; ( \mathbf {u}_h; \text{ NA }) \), when the respondent stated interval \(\mathbf {u}_h\) at Qu1 and did not answer \(\hbox {Qu2}\varDelta \);
- Type 2. :
-
\( \; ( \mathbf {u}_h; \mathbf {v}_j ) \), when the respondent stated interval \(\mathbf {u}_h\) at Qu1 and \(\mathbf {v}_j\) at \(\hbox {Qu2}\varDelta \), where \( \mathbf {v}_j \subseteq \mathbf {u}_h \);
- Type 3. :
-
\( \; ( \mathbf {u}_h; \mathbf {u}_s ) \), when the respondent stated interval \(\mathbf {u}_h\) at Qu1 and \(\mathbf {u}_s\) at \(\hbox {Qu2}\varDelta \), where \(\mathbf {u}_s\) is a union of at least two intervals from \({\mathcal {V}}\) and \( \mathbf {u}_s \subset \mathbf {u}_h \).
Similar types of answers can be considered for scheme A, as well. In what follows we will use these three types for both schemes (for scheme A, \(\hbox {Qu2}\varDelta \) will denote Qu2).
3 Model and estimation
We consider the unobserved (interval-censored) values \( x_1, \ldots , x_n \) of the quantity of interest to be values of independent and identically distributed (i.i.d.) random variables \( X_1, \ldots , X_n \) with distribution function \( F(x) = \mathrm {P}\,(X_i \le x) \). Our goal is to estimate F(x) through a maximum likelihood approach. Let \(q_j\) be the probability mass placed on the interval \(\mathbf {v}_j = (d_{j-1}, d_j]\):
Because only intervals with endpoints from \( \{ d_0, \ldots , d_k \} \) are observed, the likelihood function will depend on F(x) through the probabilities \(q_j\). In order to avoid complicated notation, we assume that \( q_j > 0 \) for all \( j=1,\ldots ,k \). The case when \( q_j=0 \) for some j can be treated similarly (cf. Rao 1973, p. 356).
Let \( H_i, \; i=1,\ldots ,n \), be i.i.d. random variables such that \( H_i = h \) if the i-th respondent has stated interval \(\mathbf {u}_h\) at Qu1. The event \( \{H_i = h\} \) implies \( \{X_i \in \mathbf {u}_h\} \). Let us denote
If \(\mathbf {u}_h\) does not contain \(\mathbf {v}_j\), then \(w_{h|j} = 0\).
Hereafter we will need the following frequencies:
- \(n_{h,\mathrm {NA}}\) :
-
is the number of respondents who stated \(\mathbf {u}_h\) at Qu1 and NA (no answer) at \(\hbox {Qu2}\varDelta \);
- \(n_{hj}\) :
-
is the number of respondents who stated \(\mathbf {u}_h\) at Qu1 and \(\mathbf {v}_j\) at \(\hbox {Qu2}\varDelta \), where \( \mathbf {v}_j \subseteq \mathbf {u}_h \);
- \(n_{h*s}\) :
-
is the number of respondents who stated \(\mathbf {u}_h\) at Qu1 and \(\mathbf {u}_s\) at \(\hbox {Qu2}\varDelta \), where \(\mathbf {u}_s\) is a union of at least two intervals from \({\mathcal {V}}\) and \( \mathbf {u}_s \subset \mathbf {u}_h \);
- \(n_{h \bullet }\) :
-
is the number of respondents who stated \(\mathbf {u}_h\) at Qu1 and any sub-interval at \(\hbox {Qu2}\varDelta \).
Now we will derive the likelihood for scheme B. If respondent i has given an answer of type 1, i.e., \(\mathbf {u}_h\) at Qu1 and \(\text{ NA }\) at \(\hbox {Qu2}\varDelta \), then the contribution to the likelihood can be expressed using the law of total probability: \( \mathrm {P}\,( H_i = h ) = \sum _{j\in {\mathcal {J}}_{\scriptstyle h}} w_{h|j} \, q_j \). If an answer of type 2 is observed, i.e., \(\mathbf {u}_h\) at Qu1 and \(\mathbf {v}_j\) at \(\hbox {Qu2}\varDelta \), then the contribution to the likelihood is \( w_{h|j} \, q_j \). And if a respondent has given an answer of type 3, i.e., \(\mathbf {u}_h\) at Qu1 and \(\mathbf {u}_s\) at \(\hbox {Qu2}\varDelta \), then the contribution to the likelihood is \( \sum _{j\in {\mathcal {J}}_{\scriptstyle s}} w_{h|j} \, q_j \). Thus, the log-likelihood function corresponding to the main-stage data is
where \(c_1\) does not depend on \( \mathbf {q}= (q_1, \ldots , q_k) \). By similar arguments it can be shown that the log-likelihood for scheme A has essentially the same form as the log-likelihood (1), it differs by an additive constant (the pre-specified probabilities of choosing the points of split of the stated interval are incorporated in \(c_1\)).
If we want to estimate F(x) without making any distributional assumptions, we can maximize the log-likelihood (1) with respect to \(\mathbf {q}\) (for details see Angelov and Ekström 2017). Here we will assume that F(x) belongs to a parametric family, i.e., F(x) is a known function of some unknown parameter \({\varvec{\theta }}= (\theta _1, \ldots , \theta _d)\), and thus the probabilities \(q_j\) are functions of \({\varvec{\theta }}\). Therefore, the log-likelihood will be a function of \({\varvec{\theta }}\), i.e., \(\log L({\varvec{\theta }}) = \log L\bigl ( \mathbf {q}({\varvec{\theta }}) \bigr )\), and in order to estimate F(x) we need to estimate \({\varvec{\theta }}\). For emphasizing that F(x) depends on \({\varvec{\theta }}\), we will sometimes write \(F_{{\varvec{\theta }}}(x)\).
The conditional probabilities \(w_{h|j}\) are nuisance parameters. If \(w_{h|j}\) does not depend on j, the assumption of noninformative censoring will be satisfied. In our case, there are no grounds for making such assumptions about \(w_{h|j}\), and therefore, we need the data from \(\hbox {Qu2}\varDelta \) in order to estimate \(w_{h|j}\). For this task we employ the procedure suggested in Angelov and Ekström (2017), which we outline here. The idea is first to estimate the probabilities \( p_{j|h} = \mathrm {P}\,( X_i \in \mathbf {v}_j \,|\, H_i=h ), \;j \in {\mathcal {J}}_{\scriptstyle h} \). For a given h, a strongly consistent estimator \(\widetilde{p}_{j|h}\) of \( p_{j|h}, \;j \in {\mathcal {J}}_{\scriptstyle h} \) is obtained by maximizing the log-likelihood:
where \(c_0\) does not depend on \(p_{j|h}\). Then, an estimator of \(w_{h|j}\) is derived using the Bayes formula:
where \( \widehat{w}_h = (n_{h \bullet } + n_{h,\mathrm {NA}})/n \) is a strongly consistent estimator of \( w_h = \mathrm {P}\,( H_i = h )\).
To find the maximum likelihood estimate of the parameter \({\varvec{\theta }}\), we insert the estimates of the probabilities \(w_{h|j}\) into \(\log L({\varvec{\theta }})\) and maximize with respect to \({\varvec{\theta }}\). Alternatively, one may maximize the log-likelihood with respect to both \({\varvec{\theta }}\) and the nuisance parameters \(w_{h|j}\) using standard numerical optimization methods. This is, however, a high-dimensional and computationally time-consuming optimization problem, which we avoid by simply plugging in the estimated nuisance parameters \(\widetilde{w}_{h|j}\) into the log-likelihood.
Remark 2
The proposed methodology for estimating \(F_{{\varvec{\theta }}}(x)\) assumes that the respondents are selected according to simple random sampling. If this is not the case, extrapolating the results to the target population may be incorrect. For surveys with a complex design, parameter estimates can be obtained, for example, by using the pseudo-likelihood approach, in which the individual contribution to the log-likelihood is weighted by the reciprocal of the corresponding sample inclusion probability (see, e.g., Chambers et al. 2012, p. 60).
4 Asymptotic results
Let us consider \(q_j\) as a function of \({\varvec{\theta }}= (\theta _1, \ldots , \theta _d)\), a multidimensional parameter belonging to a set \(\Theta \subseteq \mathbb {R}^d\), and let the true value \({\varvec{\theta }}^0\) be an interior point of \(\Theta \). In this section we prove the consistency and asymptotic normality of the proposed maximum likelihood estimator of \({\varvec{\theta }}\). We also show the asymptotic validity of a bootstrap procedure which can be used for constructing confidence intervals.
Let \({\varvec{\theta }}^1, {\varvec{\theta }}^2 \in \Theta \) and \(\Vert {\varvec{\theta }}^1 - {\varvec{\theta }}^2 \Vert \) denote the Euclidean distance between \({\varvec{\theta }}^1\) and \({\varvec{\theta }}^2\). Let the contribution of the i-th respondent to the log-likelihood be denoted \(\,\mathrm {llik}_i({\varvec{\theta }})\), whose precise definition is given by (13). We will consider the following assumptions:
- A1 :
-
If \({\varvec{\theta }}^1 \ne {\varvec{\theta }}^2\), then \(\mathbf {q}({\varvec{\theta }}^1) \ne \mathbf {q}({\varvec{\theta }}^2)\).
- A2 :
-
For every \(\delta >0\), there exists \(\varepsilon >0\) such that
$$\begin{aligned} \inf _{\Vert {\varvec{\theta }}- {\varvec{\theta }}^0 \Vert > \delta } \,\sum _{j} q_j({\varvec{\theta }}^0) \log \frac{q_j({\varvec{\theta }}^0)}{q_j({\varvec{\theta }})} \ge \varepsilon . \end{aligned}$$ - A3 :
-
The functions \(q_j({\varvec{\theta }})\) are continuous.
- A4 :
-
The functions \(q_j({\varvec{\theta }})\) have first-order partial derivatives that are continuous.
- A5 :
-
The set \(\Theta \) is compact, and the functions \(q_j({\varvec{\theta }})\) have first- and second-order partial derivatives that are continuous on \(\Theta \). Furthermore, \(q_j({\varvec{\theta }}) > 0\) on \(\Theta \).
- A6 :
-
For each \( {\varvec{\theta }}\in \Theta \), the Fisher information matrix \(\mathbf {I}({\varvec{\theta }})\) with elements \( I_{r\ell }({\varvec{\theta }}) = \)\(- \mathrm {E}\,_{{\varvec{\theta }}} \Bigl ( \frac{\partial ^2 \,\mathrm {llik}_i({\varvec{\theta }})}{\partial \theta _r \partial \theta _{\ell }} \Bigr ) \), \( r,\ell = 1,\ldots ,d \), is nonsingular.
We say that \(\widetilde{{\varvec{\theta }}}\) is an approximate maximum likelihood estimator (cf. Rao 1973, p. 353) of \({\varvec{\theta }}\) if for some \(c \in (0,1)\),
Let \(\gamma _t\) be the probability that a respondent gives an answer of type t, for \(t=1,2,3\).
Theorem 1
Let \(\widetilde{{\varvec{\theta }}}\) be an approximate maximum likelihood estimator of \({\varvec{\theta }}\).
-
(i)
If assumption A2 is satisfied, \( \gamma _2>0 \), and the conditional probabilities \(w_{h|j}\) are known, then \( \widetilde{{\varvec{\theta }}} \;\overset{{{\mathrm{a.s.}}}}{\longrightarrow }{\varvec{\theta }}^0 \) as \( n \longrightarrow \infty \).
-
(ii)
If assumption A2 is satisfied, \( \gamma _2>0 \), and a strongly consistent estimator of \(w_{h|j}\) is inserted into the log-likelihood, then \( \widetilde{{\varvec{\theta }}} \;\overset{{{\mathrm{a.s.}}}}{\longrightarrow }{\varvec{\theta }}^0 \) as \( n \longrightarrow \infty \).
Theorem 2
If assumptions A2 and A3 are satisfied, \( \gamma _2>0 \), and the conditional probabilities \(w_{h|j}\) are known (or strongly consistently estimated), then the maximum likelihood estimator of \({\varvec{\theta }}\) exists and is strongly consistent.
Theorem 3
If assumptions A1 and A4 are satisfied and the conditional probabilities \(w_{h|j}\) are known (or strongly consistently estimated), then there exists a root \({\varvec{{\bar{\theta }}}}\) of the system of likelihood equations
such that \( {\varvec{{\bar{\theta }}}} \;\overset{{{\mathrm{a.s.}}}}{\longrightarrow }{\varvec{\theta }}^0 \) as \( n \longrightarrow \infty \).
In what follows, \(\widetilde{{\varvec{\theta }}}\) will denote the maximum likelihood estimator of \({\varvec{\theta }}\), unless we state that it denotes an approximate maximum likelihood estimator.
For obtaining asymptotic distributional results about \( \sqrt{n}(\widetilde{{\varvec{\theta }}}- {\varvec{\theta }}^0) \) we will use the notion of weakly approaching sequences of distributions (Belyaev and Sjöstedt-de Luna 2000), which is a generalization of the well-known concept of weak convergence of distributions but without the need to have a limiting distribution. Two sequences of random variables, \( \{X_n\}_{n \ge 1} \) and \( \{Y_n\}_{n \ge 1} \), are said to have weakly approaching distribution laws, \( \{{\mathcal {L}}(X_n)\}_{n \ge 1} \) and \( \{{\mathcal {L}}(Y_n)\}_{n \ge 1} \), if for every bounded continuous function \( \varphi (\cdot ) \),\( \mathrm {E}\,\varphi (X_n) - \mathrm {E}\,\varphi (Y_n) \longrightarrow 0 \) as \(n \longrightarrow \infty \). Further, we say that the sequence of conditional distribution laws \( \{{\mathcal {L}}(X_n \,|\, Z_n)\}_{n \ge 1} \) weakly approaches \( \{{\mathcal {L}}(Y_n)\}_{n \ge 1} \) in probability (along \(Z_n\)) if for every bounded continuous function \( \varphi (\cdot ) \), \( \mathrm {E}\,(\varphi (X_n) \,|\, Z_n) - \mathrm {E}\,\varphi (Y_n) \longrightarrow 0 \) in probability as \(n \longrightarrow \infty \).
Theorem 4
Let assumptions A2, A4, and A6 be true, \( \gamma _2>0 \), and the conditional probabilities \(w_{h|j}\) be known (or strongly consistently estimated). Then the maximum likelihood estimator \(\widetilde{{\varvec{\theta }}}\) exists and the distribution of \( \sqrt{n}(\widetilde{{\varvec{\theta }}}- {\varvec{\theta }}^0) \) weakly approaches \( {\mathcal {N}}(\mathbf {0}, \mathbf {I}^{-1}({\varvec{\theta }}^0)) \) as \( n \longrightarrow \infty \).
The claim of Theorem 4 implies weak convergence, i.e., the limiting distribution of \( \sqrt{n}(\widetilde{{\varvec{\theta }}}- {\varvec{\theta }}^0) \) is multivariate normal with zero mean vector and covariance matrix \(\mathbf {I}^{-1}({\varvec{\theta }}^0)\).
Let \( \mathbf {y}_1, \ldots , \mathbf {y}_n \) be the observed main-stage data. Each data point \(\mathbf {y}_i\) is a vector of size four, where the first two elements represent the endpoints of the interval stated at Qu1 and the last two elements represent the endpoints of the interval stated at \(\hbox {Qu2}\varDelta \). We consider \( \mathbf {y}_1, \ldots , \mathbf {y}_n \) to be values of i.i.d. random variables \( \mathbf {Y}_1, \ldots , \mathbf {Y}_n \). We denote \( \mathbf {Y}_{1:n}=(\mathbf {Y}_1, \ldots , \mathbf {Y}_n) \). Let \( \mathbf {Y}_1^{\star }, \ldots , \mathbf {Y}_n^{\star } \) be i.i.d. random variables taking on the values \( \mathbf {y}_1, \ldots , \mathbf {y}_n \) with probability 1 / n, i.e., \( \mathbf {Y}_1^{\star }, \ldots , \mathbf {Y}_n^{\star } \) is a random sample with replacement from the original data set \( \{ \mathbf {y}_1, \ldots , \mathbf {y}_n \} \). We say that \( \mathbf {Y}_1^{\star }, \ldots , \mathbf {Y}_n^{\star } \) is a bootstrap sample. Let \(\widetilde{{\varvec{\theta }}}^{\star }\) be the maximum likelihood estimator of \({\varvec{\theta }}\) from the bootstrap sample \( \mathbf {Y}_1^{\star }, \ldots , \mathbf {Y}_n^{\star } \).
Theorem 5
Let assumptions A2, A5, and A6 be true, \( \gamma _2>0 \), and the conditional probabilities \(w_{h|j}\) be known (or strongly consistently estimated). Then the distribution of\( \sqrt{n}(\widetilde{{\varvec{\theta }}}^{\star } - \widetilde{{\varvec{\theta }}}) \,|\, \mathbf {Y}_{1:n} \) weakly approaches the distribution of \( \sqrt{n}(\widetilde{{\varvec{\theta }}}- {\varvec{\theta }}^0) \) in probability as \( n \longrightarrow \infty \).
This result can be applied for constructing confidence intervals for \(\theta _r, \; r=1,\ldots ,d\). Let \( G_{\mathrm {boot}}(x) = \mathrm {P}\,\bigl ( n^{1/2}(\widetilde{\theta }^{\star }_r - \widetilde{\theta }_r) \le x \,|\, \mathbf {Y}_{1:n} \bigr ) \). The interval
is an approximate \(1-\alpha \) confidence interval for \(\theta _r\) (hybrid bootstrap confidence interval; see Shao and Tu 1995, p. 140).
5 Simulation study
We have conducted a simulation study to examine the performance of the proposed methods. The data for the pilot stage and for Qu1 at the main stage are generated in the same way. We describe it for Qu1 to avoid unnecessary notation. In all simulations, the random variables \(X_1, \ldots , X_n\) are independent and have a Weibull distribution:
where \(\nu =1.5\) and \(\sigma =80\). The Weibull distribution has a flexible shape and is used in various contexts, for example, in contingent valuation studies where people are asked how much they would be willing to pay for a certain nonmarket good (see, e.g., Alberini et al. 2005). Contingent valuation is a natural application area for the sampling schemes considered here because they account for respondent uncertainty.
Let \(U_{1}^{\mathrm {L}}, \ldots , U_{n}^{\mathrm {L}}\) and \(U_{1}^{\mathrm {R}}, \ldots , U_{n}^{\mathrm {R}}\) be sequences of i.i.d. random variables defined below:
where \( M_i \sim \mathrm {Bernoulli}(1/2), \, U_{i}^{(1)} \sim \mathrm {Uniform}(0,20) \), and \( U_{i}^{(2)} \sim \mathrm {Uniform}(20,50) \). Let \( ( L_{1i}, R_{1i} ] \) be the interval stated by the i-th respondent at Qu1. The left endpoints are generated as \( L_{1i} = (X_i - U_{i}^{\mathrm {L}}) \,\mathbb {1}\{X_i - U_{i}^{\mathrm {L}} > 0\} \) rounded downwards to the nearest multiple of 10. The right endpoints are generated as \( R_{1i} = X_i + U_{i}^{\mathrm {R}} \) rounded upwards to the nearest multiple of 10. The data for the follow-up questions Qu2a and Qu2b are generated according to scheme B. The probability that a respondent gives no answer to \(\hbox {Qu2}\varDelta \) is 1/6. All computations were performed in R (R Core Team 2016). The R code can be obtained from the first author upon request.
It is of interest to investigate to what extent the set of endpoints \( \{ d_j^{\star } \} \) influences the properties of the estimator of \( {\varvec{\theta }}= (\nu , \sigma ) \). For this purpose, we explore three different ways of obtaining the set \( \{ d_j^{\star } \} \), i.e., three variations of scheme B, specified below:
-
(i)
pilot stage with sample size \(n_0=20\);
-
(ii)
pilot stage with sample size which is the same as in the main stage, \(n_0=n\);
-
(iii)
skipping the pilot stage and using instead a predetermined set of endpoints\( \{ d_j^{\star } \} = \{ 0, 10, 20, \ldots , 300, 320, 340, \ldots , 400 \}\), which is a reasonable set given the rounding to a multiple of 10 and the likely values of \(X_i\).
Under the settings of our simulations, the set \( \{ d_j^{\star } \} \) will on average be smallest in scenario (i) and largest in scenario (iii).
First, we compare the suggested estimator of \( {\varvec{\theta }}\) under the three variations of scheme B and the maximum likelihood estimator when \(X_1, \ldots , X_n\) are observed without censoring (uncensored observation scheme). For each scheme, 40000 samples of different sizes are generated. Table 1 presents the relative bias and the root mean square error over the simulations. If \(\widetilde{\nu }\) is an estimator of \(\nu \), the relative bias of \(\widetilde{\nu }\) is defined as \(\text{ rb }(\widetilde{\nu }) = 100\,\text{ bias }(\widetilde{\nu })/\nu \). The root mean square error is of more or less the same magnitude in each of the three scenarios for obtaining the set of endpoints \( \{ d_j^{\star } \} \). However, if we look at the results for \(n=1000\), the bias is smallest when the set of endpoints is largest. This indicates that the set \( \{ d_j^{\star } \} \) should not be too small (ideally one would like the set to contain all endpoints that future respondents will give). As we can expect, the error with the uncensored scheme is lower; however, the difference is pretty small. The bias is fairly close to zero with all schemes. Analogous simulations for scheme A displayed comparable results with a slightly higher root mean square error. We also conducted similar simulations with the scheme suggested in Angelov and Ekström (2017) which showed a larger bias, e.g., for \(n_0=n=100\), \(\text{ rb }(\widetilde{\nu }) = 6.7\), for \(n_0=n=1000\), \(\text{ rb }(\widetilde{\nu }) = 1.7\), while with schemes A and B of the current paper, \(\text{ rb }(\widetilde{\nu }) < 1\) in each of the cases studied. This bias can be attributed to the exclusion of respondents in the former scheme. For the sake of brevity, the detailed simulation results for scheme A and the scheme of Angelov and Ekström (2017) are omitted.
In addition, we have performed simulations to examine potential bias due to wrongly assuming that \(w_{h|j}\) does not depend on j. This assumption implies noninformative censoring, and in this case the likelihood will be proportional to
where \((a_i, b_i]\) is the last interval stated by respondent i at the series of questions Qu1, \(\hbox {Qu2}\varDelta \) (cf. Sun 2006, p. 28). We compare the estimator suggested in this paper with an estimator assuming noninformative censoring, obtained by maximizing the likelihood (6). For generating data we use the model stated above with \( M_i \sim \mathrm {Bernoulli}(1/100) \) in (5). This model corresponds to a specific behavior of the respondents, that is, at Qu1 they tend to choose an interval in which the true value is located in the right half of the interval. The estimator assuming noninformative censoring has been applied both to the full data (Qu1 and \(\hbox {Qu2}\varDelta \)) and to the data only from Qu1. Table 2 displays the relative bias and the root mean square error of the estimators based on 40000 simulated samples of sizes \(n=100\) and \(n=1000\), with scheme variation B(ii). For \(n=1000\) when using the full data, the bias of the estimator assuming noninformative censoring is substantially greater than the bias of our estimator. Similar thing is observed for the root mean square error. The results with \(n=100\) indicate that when using the full data, for \(\nu \) the bias of our estimator is a bit greater than the bias of the other estimator, while for \(\sigma \) the bias of our estimator is smaller. Yet, with our estimator the estimated distribution function more closely resembles the true distribution. If only the data from Qu1 are used, the bias under the assumption of noninformative censoring is considerably larger for both sample sizes.
Finally, we compare the performance of the bootstrap confidence intervals (4) and the confidence intervals constructed using normal approximation (see Theorem 4). Table 3 shows results based on 1500 simulated samples of sizes \(n=100\) and \(n=1000\) using scheme variation B(iii); the confidence level is 0.95. One bootstrap confidence interval is calculated using 1000 bootstrap samples. For both sample sizes we see that the bootstrap confidence intervals have similar coverage and length as the confidence intervals based on normal approximation.
6 Conclusion
We considered two schemes (A and B) for collecting self-selected interval data that extend sampling schemes studied before in the literature. Under general assumptions, we proved the existence, consistency, and asymptotic normality of a proposed parametric maximum likelihood estimator. In comparison with the scheme used in a previous paper (Angelov and Ekström 2017), the new schemes do not involve exclusion of respondents and this leads to a smaller bias of the estimator as indicated by our simulation study. Furthermore, the simulations showed a good performance of the estimator compared to the maximum likelihood estimator for uncensored observations. It should be noted that the censoring in this case is imposed by the design of the question. A design allowing uncensored observations might introduce bias in the estimation if respondents are asked a question that is difficult to answer with an exact amount (e.g., number of hours spent on the internet) and they give a rough best guess. We also demonstrated via simulations that ignoring the informative censoring can lead to bias. We presented a bootstrap procedure for constructing confidence intervals that is easier to apply compared to the confidence intervals based on asymptotic normality where, e.g., the derivatives of the log-likelihood need to be calculated. According to our simulations, the two approaches yield similar results in terms of coverage and length of the confidence intervals. Finally, it would be of interest in future research to develop a test for assessing the goodness of fit of a parametric model.
References
Alberini, A., Rosato, P., Longo, A., Zanatta, V.: Information and willingness to pay in a contingent valuation study: the value of S. Erasmo in the Lagoon of Venice. J. Environ. Plan. Manag. 48(2), 155–175 (2005)
Angelov, A.G., Ekström, M.: Nonparametric estimation for self-selected interval data collected through a two-stage approach. Metrika 80(4), 377–399 (2017)
Athreya, K.B., Ghosh, M., Low, L.Y., Sen, P.K.: Laws of large numbers for bootstrapped U-statistics. J. Stat. Plan. Inference 9(2), 185–194 (1984)
Belyaev, Y., Kriström, B.: Approach to analysis of self-selected interval data. Working Paper 2010:2, CERE, Umeå University and the Swedish University of Agricultural Sciences (2010). https://doi.org/10.2139/ssrn.1582853
Belyaev, Y., Kriström, B.: Two-step approach to self-selected interval data in elicitation surveys. Working Paper 2012:10, CERE, Umeå University and the Swedish University of Agricultural Sciences (2012). https://doi.org/10.2139/ssrn.2071077
Belyaev, Y., Kriström, B.: Analysis of survey data containing rounded censoring intervals. Inf. Appl. 9(3), 2–16 (2015)
Belyaev, Y., Nilsson, L.: Parametric maximum likelihood estimators and resampling. Research Report 1997-15, Department of Mathematical Statistics, Umeå University (1997). http://www.diva-portal.org/smash/get/diva2:709550/FULLTEXT01.pdf
Belyaev, Y., Sjöstedt-de Luna, S.: Weakly approaching sequences of random distributions. J. Appl. Probab. 37(3), 807–822 (2000)
Chambers, R.L., Steel, D.G., Wang, S., Welsh, A.: Maximum Likelihood Estimation for Sample Surveys. CRC Press, Boca Raton (2012)
Collett, D.: Modelling Survival Data in Medical Research. Chapman & Hall, London (1994)
Cox, C., Chu, H., Schneider, M.F., Muñoz, A.: Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Stat. Med. 26(23), 4352–4374 (2007)
Finkelstein, D.M., Goggins, W.B., Schoenfeld, D.A.: Analysis of failure time data with dependent interval censoring. Biometrics 58(2), 298–304 (2002)
Furnham, A., Boo, H.C.: A literature review of the anchoring effect. J. Socio-Econ. 40(1), 35–42 (2011)
Gentleman, R., Geyer, C.J.: Maximum likelihood for interval censored data: consistency and computation. Biometrika 81(3), 618–623 (1994)
McFadden, D.L., Bemmaor, A.C., Caro, F.G., Dominitz, J., Jun, B.H., Lewbel, A., Matzkin, R.L., Molinari, F., Schwarz, N., Willis, R.J., Winter, J.K.: Statistical analysis of choice experiments and surveys. Mark. Lett. 16(3–4), 183–196 (2005)
Peto, R.: Experimental survival curves for interval-censored data. J. R. Stat. Soc. C Appl. Stat. 22(1), 86–91 (1973)
Press, S.J., Tanur, J.M.: An overview of the respondent-generated intervals (RGI) approach to sample surveys. J. Mod. Appl. Stat. Methods 3(2), 288–304 (2004a)
Press, S.J., Tanur, J.M.: Relating respondent-generated intervals questionnaire design to survey accuracy and response rate. J. Off. Stat. 20(2), 265–287 (2004b)
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (2016)
Rao, C.R.: Linear Statistical Inference and Its Applications, 2nd edn. Wiley, New York (1973)
Roussas, G.: An Introduction to Measure-Theoretic Probability, 2nd edn. Academic Press, Boston (2014)
Shao, J., Tu, D.: The Jackknife and Bootstrap. Springer, New York (1995)
Shardell, M., Scharfstein, D.O., Bozzette, S.A.: Survival curve estimation for informatively coarsened discrete event-time data. Stat. Med. 26(10), 2184–2202 (2007)
Sun, J.: The Statistical Analysis of Interval-Censored Failure Time Data. Springer, New York (2006)
Turnbull, B.W.: The empirical distribution function with arbitrarily grouped, censored and truncated data. J. R. Stat. Soc. B Stat. Methodol. 38(3), 290–295 (1976)
Van Exel, N., Brouwer, W., Van Den Berg, B., Koopmanschap, M.: With a little help from an anchor: discussion and evidence of anchoring effects in contingent valuation. J. Socio-Econ. 35(5), 836–853 (2006)
Zhang, Z., Sun, J.: Interval censoring. Stat. Methods Med. Res. 19(1), 53–70 (2010)
Acknowledgements
The authors would like to thank Maria Karlsson, Marie Wiberg, Philip Fowler, and two anonymous reviewers for their valuable comments which helped to improve this paper.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
We first introduce some notation and expressions for the log-likelihood that will be used henceforth. Let us denote by \(n', n''\), and \(n'''\) the number of respondents who gave an answer of type 1, 2, and 3, respectively, and let \(n_{\bullet j}\) be the number of respondents who stated \(\mathbf {v}_j\) at \(\hbox {Qu2}\varDelta \). The following are satisfied:
From (1) we have
where \( c_2 = (1/n)(c_1 + \sum _{h,j} n_{hj} \log w_{h|j}) \). Using the notations
we can write the log-likelihood (7) in a more compact way:
Taking into account that \( q_j = q_j({\varvec{\theta }}) \), the log-likelihood (8) may also be written as follows
where \( w_h({\varvec{\theta }}) = \sum _{j\in {\mathcal {J}}_{\scriptstyle h}} w_{h|j} \, q_j({\varvec{\theta }}) \) and \( w_{h*s}({\varvec{\theta }}) = \sum _{j\in {\mathcal {J}}_{\scriptstyle s}} w_{h|j} \, q_j({\varvec{\theta }}) \).
Lemma 1
(Information inequality) Let \(\sum _i a_i\) and \(\sum _i b_i\) be convergent series of positive numbers such that \( \sum _i a_i \ge \sum _i b_i \). Then
with strict inequality if \(a_i \ne b_i\) for at least one i.
A proof of Lemma 1 can be found in Rao (1973, p. 58).
Proof of Theorem 1
Let us consider the case when the conditional probabilities \(w_{h|j}\) are known. By convention, we define \( 0 \log 0 = 0 \) and \( 0 \log (a/0) = 0 \) on the basis that \( \lim _{x \downarrow 0} \, x \log x = 0\) and \( \lim _{x \downarrow 0} \, x \log (a/x) = 0\) for \(a>0\). Using (2) and Lemma 1, we get
From the strong law of large numbers (SLLN) it follows that
as \( n \longrightarrow \infty \). Combining (10) and (11) yields
as \( n \longrightarrow \infty \). From this and Lemma 1 it follows that
Using (11) and the assumption \( \gamma _2>0 \), we get
This, together with assumption A2, implies that \( \widetilde{{\varvec{\theta }}} \;\overset{{{\mathrm{a.s.}}}}{\longrightarrow }{\varvec{\theta }}^0 \), which is what we had to prove.
In the case, when a strongly consistent estimator of \(w_{h|j}\) is inserted into the log-likelihood, the proof follows the same lines. \(\square \)
Proof of Theorem 2
We give a proof for the case when \(w_{h|j}\) are known; the proof for the case when \(w_{h|j}\) are strongly consistently estimated is similar and thus omitted. From Lemma 1 we have
and from assumption A2 we deduce that for every \({\varvec{\theta }}\in \{ {\varvec{\theta }}: \Vert {\varvec{\theta }}- {\varvec{\theta }}^0 \Vert > \delta \}\),
Combining the above inequalities and using that \( \gamma _2>0 \), we obtain that for every \({\varvec{\theta }}\in \)\(\{ {\varvec{\theta }}: \Vert {\varvec{\theta }}- {\varvec{\theta }}^0 \Vert > \delta \}\),
It then follows, using (11), that for every \({\varvec{\theta }}\in \{ {\varvec{\theta }}: \Vert {\varvec{\theta }}- {\varvec{\theta }}^0 \Vert > \delta \}\) and for large enough n,
or equivalently, \(\log L({\varvec{\theta }}^0) > \log L({\varvec{\theta }})\). Therefore,
From this and assumption A3, it follows that \(L({\varvec{\theta }})\) is continuous and its supremum over \(\Theta \) is attained at some point \(\widetilde{{\varvec{\theta }}}\) within the set \(\{ {\varvec{\theta }}: \Vert {\varvec{\theta }}- {\varvec{\theta }}^0 \Vert \le \delta \}\). Because \(\delta \) is arbitrary, \( \widetilde{{\varvec{\theta }}}\;\overset{{{\mathrm{a.s.}}}}{\longrightarrow }{\varvec{\theta }}^0 \) as \( n \longrightarrow \infty \). \(\square \)
Proof of Theorem 3
We give a proof for the case when \(w_{h|j}\) are known; the proof for the case when \(w_{h|j}\) are strongly consistently estimated follows the same lines. Similarly to Rao (1973, p. 361), let us consider the function
on the set \(\{ {\varvec{\theta }}: \Vert {\varvec{\theta }}- {\varvec{\theta }}^0 \Vert \le \delta \}\), which is a neighborhood of \({\varvec{\theta }}^0\). Note that for \(\delta \) small enough, \(\{ {\varvec{\theta }}: \Vert {\varvec{\theta }}- {\varvec{\theta }}^0 \Vert \le \delta \} \subseteq \Theta \). By the continuity assumption A4, the infimum of (12) over the set \(\{ {\varvec{\theta }}: \Vert {\varvec{\theta }}- {\varvec{\theta }}^0 \Vert = \delta \}\) is attained, and then by assumption A1 and Lemma 1 there exists \(\varepsilon >0\) such that
It then follows, using (11), that for large enough n,
Hence, for every \({\varvec{\theta }}\in \{ {\varvec{\theta }}: \Vert {\varvec{\theta }}- {\varvec{\theta }}^0 \Vert = \delta \}\) and as \( n \longrightarrow \infty \), we have
or equivalently, \(\log L({\varvec{\theta }}^0) > \log L({\varvec{\theta }})\), which implies that \(\log L({\varvec{\theta }})\) has a local maximum at some point \({\varvec{{\bar{\theta }}}}\) within the set \(\{ {\varvec{\theta }}: \Vert {\varvec{\theta }}- {\varvec{\theta }}^0 \Vert < \delta \}\). Assumption A4 implies that \(\log L({\varvec{\theta }})\) has partial derivatives and therefore \({\varvec{{\bar{\theta }}}}\) is a root of the system of likelihood equations (3). Because \(\delta \) is arbitrary, \( {\varvec{{\bar{\theta }}}} \;\overset{{{\mathrm{a.s.}}}}{\longrightarrow }{\varvec{\theta }}^0 \) as \( n \longrightarrow \infty \). \(\square \)
Proof of Theorem 4
Let us consider the case when \(w_{h|j}\) are known; the case when \(w_{h|j}\) are strongly consistently estimated can be treated similarly. The existence of the maximum likelihood estimator \(\widetilde{{\varvec{\theta }}}\) follows from Theorem 2. The log-likelihood (9) can be expressed as follows
where \(\widehat{\pi }_a\) is a relative frequency, \( \pi _a({\varvec{\theta }}) \) is an expression of the form \( \sum _{j \in {\mathcal {J}}^{\scriptstyle (a)}} w_{h|j} \, q_j({\varvec{\theta }}) \), and \({\mathcal {J}}^{\scriptstyle (a)}\) is an index set. The continuity of the derivatives of \(q_j({\varvec{\theta }})\) implies the continuity of the derivatives of \(\pi _a({\varvec{\theta }})\). Thus, the proof of asymptotic normality follows the same lines as that of proposition (iv) in Rao (1973, p. 361). \(\square \)
For proving Theorem 5 we will consider assumptions B1–B4 stated below. Recall that the contribution of the i-th respondent to the log-likelihood is denoted \(\,\mathrm {llik}_i({\varvec{\theta }})\), given by
- B1 :
-
The partial derivatives
$$\begin{aligned} \frac{\partial \,\mathrm {llik}_i({\varvec{\theta }})}{\partial \theta _r}, \quad \frac{\partial ^2 \,\mathrm {llik}_i({\varvec{\theta }})}{\partial \theta _r \partial \theta _{\ell }}, \quad r,\ell = 1,\ldots ,d , \end{aligned}$$exist and are continuous functions of \({\varvec{\theta }}\in \Theta \).
- B2 :
-
For each \( {\varvec{\theta }}\in \Theta \), there exist \( K_1({\varvec{\theta }}), K_2({\varvec{\theta }}) \in \mathbb {R} \) such that
$$\begin{aligned} \mathrm {E}\,_{{\varvec{\theta }}} \biggl | \frac{\partial \,\mathrm {llik}_i({\varvec{\theta }})}{\partial \theta _r} \biggr |^3\le & {} K_1({\varvec{\theta }}) , \quad r=1,\ldots ,d ,\\ \mathrm {E}\,_{{\varvec{\theta }}} \biggl | \frac{\partial ^2 \,\mathrm {llik}_i({\varvec{\theta }})}{\partial \theta _r \partial \theta _{\ell }} \biggr |^3\le & {} K_2({\varvec{\theta }}) , \quad r,\ell = 1,\ldots ,d . \end{aligned}$$ - B3 :
-
For each \( {\varvec{\theta }}\in \Theta \) and for every \(\delta > 0\), there exists \(\varepsilon (\delta ,{\varvec{\theta }})\) such that for \(\varepsilon \le \varepsilon (\delta ,{\varvec{\theta }})\),
$$\begin{aligned} \mathrm {E}\,_{{\varvec{\theta }}} \Biggl ( \sup _{\Vert {\varvec{\theta }}-{\varvec{\theta }}' \Vert <\varepsilon } \biggl | \frac{\partial ^2 \,\mathrm {llik}_i({\varvec{\theta }})}{\partial \theta _r \partial \theta _{\ell }} - \frac{\partial ^2 \,\mathrm {llik}_i({\varvec{\theta }}')}{\partial \theta _r \partial \theta _{\ell }} \biggr | \Biggr ) \le \delta . \end{aligned}$$ - B4 :
-
For each \( {\varvec{\theta }}\in \Theta \),
$$\begin{aligned} \mathrm {E}\,_{{\varvec{\theta }}} \biggl ( \frac{\partial \,\mathrm {llik}_i({\varvec{\theta }})}{\partial \theta _r} \biggr )= & {} 0, \quad r=1,\ldots ,d , \\ \mathrm {E}\,_{{\varvec{\theta }}} \biggl ( \frac{\partial \,\mathrm {llik}_i({\varvec{\theta }})}{\partial \theta _r} \,\frac{\partial \,\mathrm {llik}_i({\varvec{\theta }})}{\partial \theta _{\ell }} \biggr )= & {} - \mathrm {E}\,_{{\varvec{\theta }}} \biggl ( \frac{\partial ^2 \,\mathrm {llik}_i({\varvec{\theta }})}{\partial \theta _r \partial \theta _{\ell }} \biggr ) , \quad r,\ell = 1,\ldots ,d . \end{aligned}$$
Lemma 2
If assumption A5 is satisfied and the conditional probabilities \(w_{h|j}\) are known (or strongly consistently estimated), then assumptions B1–B4 hold true.
Proof of Lemma 2
Assumption B1. From the continuity of the first- and second-order partial derivatives of \(q_j({\varvec{\theta }})\), it follows that \( \frac{\partial \,\mathrm {llik}_i({\varvec{\theta }})}{\partial \theta _r} \) and \( \frac{\partial ^2 \,\mathrm {llik}_i({\varvec{\theta }})}{\partial \theta _r \partial \theta _{\ell }} \) are also continuous.
Assumption B2. Because we have an experiment with a finite number of outcomes, the respective expectations can be expressed as finite sums and are therefore finite.
Assumption B3. From assumption B1, we have that \( \frac{\partial ^2 \,\mathrm {llik}_i({\varvec{\theta }})}{\partial \theta _r \partial \theta _{\ell }} \) is continuous on the compact set \(\Theta \), implying that it is uniformly continuous on \(\Theta \). Therefore
Using Lebesgue’s dominated convergence theorem (see, e.g., Roussas 2014, p. 75), we get
which is what we had to prove.
Assumption B4. The identities in this assumption follow from the fact that we have an experiment with a finite number of outcomes and thus the respective expectations can be expressed as finite sums. Indeed, if Y is a random variable that can take a finite number of possible values and \( \mathrm {P}\,(Y=y) = p_{{\varvec{\theta }}}(y) \), then
The same argument leads to the second identity. \(\square \)
Lemma 3
If assumptions A2 and A3 are satisfied, \( \gamma _2>0 \), and the conditional probabilities \(w_{h|j}\) are known (or strongly consistently estimated), then \(\widetilde{{\varvec{\theta }}}^{\star }\) exists and \( \widetilde{{\varvec{\theta }}}^{\star } \overset{{{\mathrm{a.s.}}}}{\longrightarrow }{\varvec{\theta }}^0 \) as \( n \longrightarrow \infty \).
Proof of Lemma 3
The proof follows the same arguments as that of Theorem 2 but, instead of the classical SLLN, the strong law of large numbers for the bootstrapped mean (see, e.g., Athreya et al. 1984) is used. \(\square \)
We will present a general result about bootstrapping maximum likelihood estimators that is used in the proof of Theorem 5. Let \( z_1, \ldots , z_n \) be observed values of i.i.d. random variables \( Z_1, \ldots , Z_n \) whose distribution depends on some unknown parameter \({\varvec{\theta }}= (\theta _1, \ldots , \theta _d) \in \Theta \subseteq \mathbb {R}^d\). The contribution of the i-th observation to the log-likelihood is denoted \(\,\mathrm {llik}_i({\varvec{\theta }})\). Let \( Z_{1:n}=(Z_1, \ldots , Z_n) \) and \(\widetilde{{\varvec{\theta }}}\) be the maximum likelihood estimator of \({\varvec{\theta }}\). We define \( Z_1^{\star }, \ldots , Z_n^{\star } \) to be i.i.d. random variables taking on the values \( z_1, \ldots , z_n \) with probability 1 / n, i.e., \( Z_1^{\star }, \ldots , Z_n^{\star } \) is a bootstrap sample. Let \(\widetilde{{\varvec{\theta }}}^{\star }\) be the maximum likelihood estimator of \({\varvec{\theta }}\) from the bootstrap sample \( Z_1^{\star }, \ldots , Z_n^{\star } \).
Lemma 4
Suppose that
-
(i)
assumptions A6 and B1–B4 are true;
-
(ii)
the estimator \(\widetilde{{\varvec{\theta }}}\) exists and \( \widetilde{{\varvec{\theta }}} \overset{{{\mathrm{a.s.}}}}{\longrightarrow }{\varvec{\theta }}^0 \) as \( n \longrightarrow \infty \);
-
(iii)
the estimator \(\widetilde{{\varvec{\theta }}}^{\star }\) exists and \( \widetilde{{\varvec{\theta }}}^{\star } \overset{{{\mathrm{a.s.}}}}{\longrightarrow }{\varvec{\theta }}^0 \) as \( n \longrightarrow \infty \).
Then the distribution of \( \sqrt{n}(\widetilde{{\varvec{\theta }}}^{\star } - \widetilde{{\varvec{\theta }}}) \,|\, Z_{1:n} \) weakly approaches the distribution of\( \sqrt{n}(\widetilde{{\varvec{\theta }}}- {\varvec{\theta }}^0) \) in probability as \( n \longrightarrow \infty \).
For a proof of Lemma 4, see Belyaev and Nilsson (1997, Corollary 3).
Proof of Theorem 5
The idea of the proof is to show that the conditions of Lemma 4 are fulfilled. By using the fact that assumption A5 implies assumption A3 and combining the results obtained in Theorem 2, Lemma 2, and Lemma 3, we see that these conditions are satisfied. Thus, the assertion of Theorem 5 follows directly.\(\square \)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Angelov, A.G., Ekström, M. Maximum likelihood estimation for survey data with informative interval censoring. AStA Adv Stat Anal 103, 217–236 (2019). https://doi.org/10.1007/s10182-018-00329-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-018-00329-x