Nonparametric estimation for self-selected interval data collected through a two-stage approach
- 1.1k Downloads
Abstract
Self-selected interval data arise in questionnaire surveys when respondents are free to answer with any interval without having pre-specified ranges. This type of data is a special case of interval-censored data in which the assumption of noninformative censoring is violated, and thus the standard methods for interval-censored data (e.g. Turnbull’s estimator) are not appropriate because they can produce biased results. Based on a certain sampling scheme, this paper suggests a nonparametric maximum likelihood estimator of the underlying distribution function. The consistency of the estimator is proven under general assumptions, and an iterative procedure for finding the estimate is proposed. The performance of the method is investigated in a simulation study.
Keywords
Informative interval censoring Self-selected intervals Nonparameric maximum likelihood estimation Two-stage data collection Questionnaire surveys1 Introduction
When being asked about a quantity, people often answer with an interval if they are not certain. For example, when asked about the distance to a given town, we would say “it is about 60–70 km”. This is one of the reasons why in questionnaire surveys respondents are often allowed to give an answer in the form of an interval to a quantitative question. One common question format is the so-called range card, where the respondent is asked to select from several pre-specified intervals (called “brackets”). Another approach is known as unfolding brackets. In this case the respondent is asked a sequence of yes-no questions that narrow down the range in which the respondent’s true value is. For example, the respondent is first asked “In the past year, did your household spend less than 500 EUR on electrical items?”. If the answer is “yes”, the next question asks if they spent more than 400 EUR. If the response to the first question is “no”, the next question asks if they spent less than 600 EUR and so on. Unfolding brackets can be designed such that they elicit the same information as in a range-card question. These formats are often used for asking sensitive questions, e.g. asking about income, because they allow partial information to be obtained from respondents who are unwilling to provide exact amounts.
However, there are some issues associated with these approaches. Studies have found that the choice of bracket values in range-card questions is likely to influence responses. This is known as the bracketing effect or range bias (see, e.g., McFadden et al. 2005; Whynes et al. 2004). In questions about usage frequency (e.g. “How many hours per day do you spend on the internet?”), respondents might assume that the range of response alternatives represents a range of “expected” behaviors. Thus, they seem reluctant to report behaviors that are “extreme”, i.e. the bottom and top brackets (see Schwarz et al. 1985). The unfolding brackets format is susceptible to the so-called anchoring effect (see, e.g., Furnham and Boo 2011; Van Exel et al. 2006), i.e. answers are biased toward the starting value (500 EUR in the example above). Respondents might perceive the initial value as representing a reasonable value of the quantity in question. It serves as an “anchor” or reference point, and respondents adjust their answer to be closer to the anchor than the estimate they had before seeing the question.
It is intuitively plausible that bracketing and anchoring effects would be avoided if the respondent is free to state any interval without having any hints like pre-specified values, in other words, if the question is open-ended. One such format is called respondent-generated intervals, proposed and investigated by Press and Tanur (see, e.g., Press and Tanur 2004a, b and the references therein). In this approach the respondent is asked to provide both a point value (a best guess for the true value) and an interval (a lower and an upper bound) to a question. They used hierarchical Bayesian methods to obtain point estimates and credibility intervals that are based on both the point values and the intervals.
Related to the respondent-generated intervals approach is the self-selected interval (SSI) approach suggested by Belyaev and Kriström (2010), where the respondent is free to provide any interval containing his/her true value. They proposed a maximum likelihood estimator of the underlying distribution based on SSI data. However, this estimator relies on certain restrictive assumptions on some nuisance parameters. To avoid such assumptions, Belyaev and Kriström (2012, 2015) introduced a novel two-stage approach. In the first stage of data collection (we will call it the pilot stage), respondents are asked to state single self-selected intervals. In the second stage (the main stage), each respondent from a new sample is asked two questions: (i) to provide a SSI and then (ii) to select from several sub-intervals of the SSI the one that most likely contains his/her true value. The sub-intervals in the second question of the main stage are generated from the SSIs collected in the pilot stage. Belyaev and Kriström (2012, 2015) developed a nonparametric maximum likelihood estimator of the underlying distribution for two-stage SSI data.
Data consisting of self-selected intervals or respondent-generated intervals (without the point values) are a special case of interval-censored data. Let X be a random variable of interest. An observation on X is interval-censored if, instead of observing X exactly, only an interval \((L,R\,]\) is observed, where \(L < X \le R\). Interval censoring also contains right censoring and left censoring as special cases, and if \(R=\infty \), the observation is right-censored, while if \(L=-\infty \) the observation is left-censored (see, e.g., Zhang and Sun 2010). Interval-censored data are encountered most commonly when the observed variable is the time to some event (known as time-to-event data, failure time data, survival data, or lifetime data). The problem of analyzing time-to-event data appears in many areas such as medicine, epidemiology, engineering, economics, and demography.
With regard to statistical analysis of interval-censored data, Peto (1973) considered nonparametric maximum likelihood estimation and employed a constrained Newton-Raphson algorithm. Turnbull (1976) extended the work of Peto to allow for truncation and suggested a self-consistency algorithm. Considering the case of no truncation, Gentleman and Geyer (1994) provided conditions under which Turnbull’s estimator is indeed a maximum likelihood estimator and is unique. All these methods rely on the assumption of noninformative censoring, which implies that the joint distribution of L and R contains no parameters that are involved in the distribution function of X and therefore does not contribute to the likelihood function (see, e.g., Sun 2006). In the sampling schemes considered by Belyaev and Kriström (2010, 2012, 2015) this is not a reasonable assumption, thus the standard methods are not appropriate. The existing methods for analysis of time-to-event data in the presence of informative interval censoring require modeling the censoring process and estimating nuisance parameters (see Finkelstein et al. 2002) or making additional assumptions about the censoring process (see Shardell et al. 2007). These estimators are specific for time-to-event data and are not directly applicable in the context that we are discussing.
In this paper, we extend the work of Belyaev and Kriström (2012, 2015) by considering a sampling scheme where the number of sub-intervals in the second question of the main stage is limited to two or three, which is motivated by the fact that a question with a large number of sub-intervals might be difficult to implement in practice (e.g., in a telephone interview). In Sect. 2, we describe the sampling scheme. Section 3 introduces the statistical model. In Sect. 4, a nonparametric maximum likelihood estimator of the underlying distribution is proposed, and some of its properties are established. In Sect. 5, the results of a simulation study are presented, and Sect. 6 concludes the paper. Proofs and auxiliary results are given in the Appendix.
2 Sampling scheme
We consider the following two-stage scheme for collecting data. In the pilot stage, a random sample of \(n_0\) individuals is selected and each individual is asked to state an interval containing his/her value of the quantity of interest. It is assumed that the endpoints of the intervals are rounded, for example, to the nearest integer or to the nearest multiple of 10. Thus, instead of (21.3, 47.8] respondents will answer with (21, 48] or (20, 50].
Let \( d_0< d_1< \ldots< d_{k-1} < d_k \) be the endpoints of all observed intervals. The set \( \{ d_0, \ldots , d_k \} \) can be seen as a set of typical endpoints. The data, collected in the pilot stage are used only for constructing the set \( \{ d_0, \ldots , d_k \} \), which is then needed for the the main stage. In the case that a similar survey is conducted again, a new pilot stage is not necessary—the data from the previous survey can be used for constructing \( \{ d_0, \ldots , d_k \} \).
In the main stage, a new random sample of individuals is selected and each individual is asked to state an interval containing his/her value of the quantity of interest. We refer to this first question as Qu1. If the interval has endpoints that do not belong to \( \{ d_0, \ldots , d_k \} \), we exclude the respondent from the collected data. If the endpoints of the stated interval belong to \( \{ d_0, \ldots , d_k \} \), then the interval is split into two or three sub-intervals with endpoints from \( \{ d_0, \ldots , d_k \} \) and the respondent is asked to select one of these sub-intervals (the points of split are chosen in some random fashion; for details see Sect. 3). We refer to this second question as Qu2. The respondent may refuse to answer Qu2, and this will be allowed for.
- type 1.
-
\( \; ( \mathbf {u}_h; \text{ NA }) \), when the respondent stated interval \(\mathbf {u}_h\) at Qu1 and refused to answer Qu2;
- type 2.
-
\( \; ( \mathbf {u}_h; \mathbf {v}_j ) \), when the respondent stated interval \(\mathbf {u}_h\) at Qu1 and \(\mathbf {v}_j\) at Qu2, where \( \mathbf {v}_j \subseteq \mathbf {u}_h \);
- type 3.
-
\( \; ( \mathbf {u}_h; \mathbf {u}_s ) \), when the respondent stated interval \(\mathbf {u}_h\) at Qu1 and \(\mathbf {u}_s\) at Qu2, where \(\mathbf {u}_s\) is a union of at least two intervals from \({\mathcal {V}}\) and \( \mathbf {u}_s \subset \mathbf {u}_h \).
In the case when \( \mathbf {u}_h \in {\mathcal {V}} \), Qu2 is not asked, but we input the answer from Qu1, and we consider this as an answer of type 2 : \( ( \mathbf {u}_h; \mathbf {v}_j=\mathbf {u}_h ) \). The number of respondents in the main stage is denoted by n (not counting those who were excluded).
Remark 1
This sampling scheme has two essential differences from the one introduced by Belyaev and Kriström (2012, 2015), namely (i) they include in the data for the main stage only respondents who stated at Qu1 an interval that was observed at the pilot stage, while we allow any interval with endpoints from \( \{ d_0, \ldots , d_k \} \), and (ii) in their scheme the interval stated at Qu1 is split into all the sub-intervals \(\mathbf {v}_j\) that it contains, while in our scheme it is split into two or three sub-intervals with endpoints from \( \{ d_0, \ldots , d_k \} \).
Remark 2
A question that arises naturally is: How large should the sample in the pilot stage be so that the proportion of excluded respondents in the main stage is sufficiently small? As noticed by Belyaev and Kriström (2015), this question is related to the problem of estimating the number of species in a population, which dates back to a work by Good (1953) and has been extensively treated in the literature since then. Belyaev and Kriström (2015) suggested a rule for determining the sample size for the pilot stage (stopping the sampling process) based on results by Good (1953). A similar stopping rule can be utilized for our sampling scheme.
3 Statistical model
We are considering a sampling scheme where, for the purpose of asking Qu2, the interval stated at Qu1 is split into two or three sub-intervals (we refer to these as 2-split design and 3-split design, respectively). We will now discuss how the points of split are determined. Let \({\mathcal {J}}_{h}^{\circ }\) be the set of indices of points from \( \{ d_0, \ldots , d_k \} \) that are in the interior of interval \(\mathbf {u}_h\), i.e. \({\mathcal {J}}_{h}^{\circ } = \{ j: \,\, d_{l_h}< d_j < d_{r_h}, \, (d_{l_h}, d_{r_h}] = \mathbf {u}_h \}, \; h=1, \ldots , m . \) In case of a 2-split design, the interval \(\mathbf {u}_h\) (stated at Qu1) is split into two sub-intervals: \((d_{l_h}, d_j]\) and \((d_j, d_{r_h}]\), and the respondent is asked to select one of these sub-intervals. The point \(d_j\) is chosen with probability \(\delta _{h,d_j}, \sum _{j \in {\mathcal {J}}_{h}^{\circ }} \delta _{h,d_j} = 1 \). In case of a 3-split design, \(\mathbf {u}_h\) is split into three sub-intervals: \((d_{l_h}, d_i]\), \((d_i, d_j]\), and \((d_j, d_{r_h}]\). The points \(d_i\) and \(d_j\) are chosen with probability \(\delta _{h,d_i,d_j}, \sum _{i,j \in {\mathcal {J}}_{h}^{\circ }, \;i<j} \delta _{h,d_i,d_j} = 1 \).
We denote by \(\gamma _t\) the probability that a respondent gives an answer of type t, for \(t=1,2,3\), and similarly \(\gamma _{ht}\) denotes the probability that a respondent, who stated \(\mathbf {u}_h\) at Qu1, gives an answer of type t for \(t=1,2,3\). Later on, we will need to assume that \(\gamma _2 > 0\) and \(\gamma _{h2} > 0\). Sufficient conditions for this are given by the following proposition.
Proposition 1
- (i)
If \( \delta _{h,d_j} > 0 \) for all \( j \in {\mathcal {J}}_{h}^{\circ } \), and \( p_{l_h+1|h} > 0 \) or \( p_{r_h|h}>0 \), then \( \gamma _2 > 0 \) and \( \gamma _{h2} > 0 \).
- (ii)
If \( \delta _{h,d_i,d_j} > 0 \) for all \( i,j \in {\mathcal {J}}_{h}^{\circ } \), and \( p_{l_h+1|h} > 0 \) or \(p_{r_h|h}>0 \), then \( \gamma _2 > 0 \) and \( \gamma _{h2} > 0 \).
Let \(\delta _{h,j}\) be the probability that \(\mathbf {u}_h\) is split so that one of the resulting sub-intervals is \(\mathbf {v}_j\), and let \(\delta _{h*s}\) be the probability that \(\mathbf {u}_h\) is split so that one of the resulting sub-intervals is \(\mathbf {u}_s\). It is easy to see that the probabilities \(\delta _{h,j}\) and \(\delta _{h*s}\) can be expressed in terms of \(\delta _{h,d_j}\) in case of a 2-split design, and in terms of \(\delta _{h,d_i,d_j}\) in case of a 3-split design.
4 Estimation
In this section we discuss the estimation of the distribution function F(x). We prove the consistency of a proposed nonparametric maximum likelihood estimator of the probabilities \(q_j\) given that the conditional probabilities \(w_{h|j}\) are known. We then show that if we plug in a consistent estimator of \(w_{h|j}\), the estimator of \(q_j\) is still consistent. Thereafter, we suggest an estimator of \(w_{h|j}\) and show its consistency. Iterative procedures are proposed for finding the estimates of \(q_j\) and \(w_{h|j}\).
4.1 Estimating the probabilities \(q_j\)
- \(n_{h,\mathrm {NA}}\)
-
\(=\) Number of respondents who stated \(\mathbf {u}_h\) at Qu1 and NA (no answer) at Qu2;
- \(n_{hj}\)
-
\(=\) Number of respondents who stated \(\mathbf {u}_h\) at Qu1 and \(\mathbf {v}_j\) at Qu2, where \( \mathbf {v}_j \subseteq \mathbf {u}_h \);
- \(n_{h*s}\)
-
\(=\) Number of respondents who stated \(\mathbf {u}_h\) at Qu1 and \(\mathbf {u}_s\) at Qu2, where \(\mathbf {u}_s\) is a union of at least two intervals from \({\mathcal {V}}\) and \( \mathbf {u}_s \subset \mathbf {u}_h \);
- \(n_{h \bullet }\)
-
\(=\) Number of respondents who stated \(\mathbf {u}_h\) at Qu1 and any sub-interval at Qu2;
- \(n_{\bullet j}\)
-
\(=\) Number of respondents who stated \(\mathbf {v}_j\) at Qu2.
Remark 3
If \(n'''=0\), the log-likelihood (1) has essentially the same form as the one in Belyaev and Kriström (2012).
Theorem 1
Let \({\widetilde{\mathbf {q}}}\) be an approximate maximum likelihood estimator of \(\mathbf {q}\) and \(\mathbf {q}^0\) be the vector of true probabilities. If the conditional probabilities \(w_{h|j}\) are known and \( \gamma _2>0 \), then \( {\widetilde{\mathbf {q}}} \;\overset{{{\mathrm{a.s.}}}}{\longrightarrow }\mathbf {q}^0 \) as \( n \longrightarrow \infty \).
Corollary 1
If we insert a strongly consistent estimator of \(w_{h|j}\) into the log-likelihood (1) and \( \gamma _2>0 \), then the approximate maximum likelihood estimator \({\widetilde{\mathbf {q}}}\) is strongly consistent.
4.2 Estimating the conditional probabilities \(w_{h|j}\)
Theorem 2
Let \({\widetilde{p}}_{j|h} \) be an approximate maximum likelihood estimator of \(p_{j|h}\) and \(p^0_{j|h}\) be the true probability, \( j \in {\mathcal {J}}_{\scriptstyle h} . \) If \( \gamma _{h2}>0 , \) then \( {\widetilde{p}}_{j|h} \;\overset{{{\mathrm{a.s.}}}}{\longrightarrow }p^0_{j|h} \) as \( n \longrightarrow \infty . \)
Remark 4
From the strong law of large numbers, it follows that \(\widehat{w}_h\) is a strongly consistent estimator of \(w_h\). This, together with Theorem 2, implies that the estimator \({\widetilde{w}}_{h|j}\) is strongly consistent.
Remark 5
If \(n_{h \bullet } = 0\), i.e. if the interval \(\mathbf {u}_h\) has not been observed in type 2 or in type 3 answers, we do not have any observations in order to estimate the probabilities \(p_{j|h}, \;j \in {\mathcal {J}}_{\scriptstyle h}\). In that presumably rare case, we need to make assumptions about those probabilities. In our simulation experiments, we have assumed that all sub-intervals \(\mathbf {v}_j, \;j \in {\mathcal {J}}_{\scriptstyle h}\), are equally likely, i.e. \(p_{j|h}=1/|\mathcal {J}_{\scriptstyle h}|\)
5 Simulation study
Summary statistics about the length of the interval at Qu1 (sample size is 2000)
| Min. | 1st quart. | Median | Mean | 3rd quart. | Max. |
|---|---|---|---|---|---|
| 10.0 | 40.0 | 50.0 | 51.9 | 60.0 | 80.0 |
True c.d.f. (the smooth curve), estimated c.d.f. \({\widetilde{F}} (x)\) using the 2-split design (the stepwise curve with jumps at \(10, 20, 30, \ldots \)), and empirical c.d.f. \(\widehat{F}_n (x)\) of the uncensored observations for sample size \( n=400 \)
True c.d.f. (the smooth curve), estimated c.d.f. \({\widetilde{F}} (x)\) using the 2-split design (the stepwise curve with jumps at \(10,20,30,\ldots \)), and empirical c.d.f. \(\widehat{F}_n (x)\) of the uncensored observations for sample size \( n=2000 \)
Root mean square error (top) and root relative mean square error (bottom) for different estimators of \(q_j = F(d_j) - F(d_{j-1}), \, j=1, \ldots , k\), for \( n=400 \). The vertical dashed lines correspond to the points \( d_0, \ldots , d_k \). The respective error for each estimator of \(q_j\) is plotted against x-coordinate \(d_j\)
Root mean square error (top) and root relative mean square error (bottom) for different estimators of \(q_j = F(d_j) - F(d_{j-1}), \, j=1, \ldots , k\), for \( n=2000 \). The vertical dashed lines correspond to the points \( d_0, \ldots , d_k \). The respective error for each estimator of \(q_j\) is plotted against x-coordinate \(d_j\)
Figures 1 and 2 illustrate the results of simulations with the 2-split design for sample sizes \(n=400\) and \(n=2000\). The estimated distribution function \( {\widetilde{F}} (x) = \sum _{j: \; d_j \le x} \,{\widetilde{q}}_j \) is plotted together with the true distribution function F(x) and the empirical cumulative distribution function (e.c.d.f.) of the uncensored observations \(x_1, \ldots , x_n\), i.e. \( \widehat{F}_n (x) = (1/n) \sum _{i=1}^n \mathbbm {1}\{x_i \le x\} \). We can see that the estimate \( {\widetilde{F}}(d_j) \) is very close to true probability \( F(d_j) \) for most j, and when \( {\widetilde{F}}(d_j) \) deviates from \( F(d_j) \) a similar deviation is observed for \( \widehat{F}_n (d_j)\).
It is of interest to compare the mean square error of different estimators of the probabilities \(q_j, \; j=1,\ldots ,k\), based on different sampling schemes. We have generated 5000 samples (only the main stage is repeated 5000 times) according to the three designs described above and calculated the root mean square error (RootMSE) and the root relative mean square error (RootRelMSE). These are compared with the corresponding error when \(q_j\) is estimated from the empirical c.d.f. \(\widehat{F}_n (x)\) of the uncensored observations. Figure 3 shows the results for sample size \(n=400\) and Fig. 4 shows the results for \(n=2000\). The design, corresponding to the sampling scheme in Belyaev and Kriström (2012), is denoted as “all-split”. The error when using the all-split design is fairly close to the error when \(q_j\) is estimated using the uncensored observations \(x_1, \ldots , x_n\). As we can expect, when using the 2-split or 3-split designs, the errors are a bit larger. We observe similar patterns for \(n=400\) and \(n=2000\), the main difference is that the error decreases with increasing sample size.
Average proportion of accepted respondents in the main stage (based on 3000 replications)
| \(n_0\) | \(n + n_{\text {rej}}\) | BK2012 scheme | Modified scheme |
|---|---|---|---|
| 200 | 400 | 0.8715 | 0.9852 |
| 200 | 1000 | 0.8721 | 0.9850 |
| 200 | 2000 | 0.8714 | 0.9855 |
| 500 | 1000 | 0.9486 | 0.9944 |
| 500 | 2500 | 0.9485 | 0.9945 |
| 500 | 5000 | 0.9485 | 0.9944 |
Bias and root mean square error for our estimator (solid curve) and Turnbull’s estimator (dashed curve), for \(n=2000\). The vertical dashed lines correspond to the points \( d_0, \ldots , d_k \). The respective bias and error for each estimator of \(q_j\) are plotted against x-coordinate \(d_j\)
Bias of Turnbull’s estimator applied to Qu1 data (short-dashed curve) and applied to 2-split data (long-dashed curve), \(n=2000\)
6 Concluding comments
In this paper, we considered a two-stage scheme for collecting self-selected interval data in which the number of sub-intervals in the second question of the main stage is limited to two or three. We suggested a nonparametric maximum likelihood estimator of the underlying distribution function and showed its strong consistency under easily verifiable conditions. Our simulations indicated a good performance of the proposed estimator—its error is comparable with the error of the empirical c.d.f. of the uncensored observations. It is important to note that the censoring in this context is imposed by the design of the question. A design allowing uncensored values might introduce bias in the estimation if respondents are forced to give an exact value of a quantity that is hard to evaluate exactly (e.g., number of hours spent on the internet), and consequently they give a rough “best guess”. We also showed via simulations that ignoring the informative censoring and thus applying a standard method (Turnbull’s estimator) can lead to serious bias.
It would be of interest to investigate the accuracy of the estimator theoretically, but we leave that as a future work.
Notes
Acknowledgements
The authors would like to thank Maria Karlsson and an anonymous referee for their valuable comments which helped to improve this paper.
References
- Belyaev Y, Kriström B (2010) Approach to analysis of self-selected interval data. Working Paper 2010:2, CERE, Umeå University and the Swedish University of Agricultural Sciences, http://ssrn.com/abstract=1582853
- Belyaev Y, Kriström B (2012) Two-step approach to self-selected interval data in elicitation surveys. Working Paper 2012:10, CERE, Umeå University and the Swedish University of Agricultural Sciences, http://ssrn.com/abstract=2071077
- Belyaev Y, Kriström B (2015) Analysis of survey data containing rounded censoring intervals. Inf Appl 9(3):2–16Google Scholar
- Finkelstein DM, Goggins WB, Schoenfeld DA (2002) Analysis of failure time data with dependent interval censoring. Biometrics 58(2):298–304MathSciNetCrossRefMATHGoogle Scholar
- Furnham A, Boo HC (2011) A literature review of the anchoring effect. J Socio Econ 40(1):35–42CrossRefGoogle Scholar
- Gentleman R, Geyer CJ (1994) Maximum likelihood for interval censored data: consistency and computation. Biometrika 81(3):618–623MathSciNetCrossRefMATHGoogle Scholar
- Good IJ (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40(3–4):237–264MathSciNetCrossRefMATHGoogle Scholar
- McFadden DL, Bemmaor AC, Caro FG, Dominitz J, Jun BH, Lewbel A, Matzkin RL, Molinari F, Schwarz N, Willis RJ, Winter JK (2005) Statistical analysis of choice experiments and surveys. Mark Lett 16(3–4):183–196CrossRefGoogle Scholar
- Peto R (1973) Experimental survival curves for interval-censored data. J R Stat Soc C Appl 22(1):86–91Google Scholar
- Press SJ, Tanur JM (2004a) An overview of the respondent-generated intervals (RGI) approach to sample surveys. J Mod Appl Stat Methods 3(2):288–304CrossRefGoogle Scholar
- Press SJ, Tanur JM (2004b) Relating respondent-generated intervals questionnaire design to survey accuracy and response rate. J Off Stat 20(2):265–287Google Scholar
- R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
- Rao CR (1973) Linear statistical inference and its applications, 2nd edn. Wiley, New YorkCrossRefMATHGoogle Scholar
- Schwarz N, Hippler HJ, Deutsch B, Strack F (1985) Response scales: effects of category range on reported behavior and comparative judgments. Public Opin Q 49(3):388–395CrossRefGoogle Scholar
- Shardell M, Scharfstein DO, Bozzette SA (2007) Survival curve estimation for informatively coarsened discrete event-time data. Stat Med 26(10):2184–2202Google Scholar
- Sun J (2006) The statistical analysis of interval-censored failure time data. Springer, New YorkMATHGoogle Scholar
- Turnbull BW (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. J R Stat Soc B (Methodol) 38(3):290–295MathSciNetMATHGoogle Scholar
- Van Exel N, Brouwer W, Van Den Berg B, Koopmanschap M (2006) With a little help from an anchor: Discussion and evidence of anchoring effects in contingent valuation. J Socio Econ 35(5):836–853CrossRefGoogle Scholar
- Whynes DK, Wolstenholme JL, Frew E (2004) Evidence of range bias in contingent valuation payment scales. Health Econ 13(2):183–190CrossRefGoogle Scholar
- Zhang Z, Sun J (2010) Interval censoring. Stat Methods Med Res 19(1):53–70MathSciNetCrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.





