Efficient item count techniques with one or two lists

If the direct questioning on sensitive variables leads to non-ignorable item-nonresponse and untruthful answering, a considerably biased estimator might be the consequence. In such cases, indirect questioning designs, which protect the respondents’ privacy by masking the sensitive information, could pay off in terms of accuracy through an increased willingness to cooperate. To achieve this goal, such a design has to be simple in its implementation for the users and easy to understand for the respondents. In this article, it is shown for one of the indirect questioning designs, the item count technique, how the usage of specific oftentimes available prior information can substantially improve the estimation accuracy and at the same time reduce the respondents’ task. This can make the method a stronger and more serious competitor of the direct questioning on sensitive attributes, which is commonly used in empirical research.


Introduction
When questions on sensitive topics, such as cyber bullying, illegal work, sexual behavior, or compulsory vaccination, are directly asked in statistical surveys, the rates of item-nonresponse as well as of untruthful answering might increase far above the usual levels because such questions can be seen as invasion of privacy, or certain answers on these questions can be considered to be socially unacceptable (cf., for instance, Tourangeau and Yan [26]). Such a behavior of the respondents might lead to strongly biased estimators of parameters under study such as the population proportion of people bearing a certain attribute. Therefore, before having to apply the methods of weighting adjustment or data imputation (cf., for instance, Särndal and Lundström [22]), which try to compensate just for the nonresponse that has occurred but not for the untruthful answering, everything should be done to increase the respondents' willingness to cooperate to make the rates of these two sources of systematic non-sampling errors as small as possible.
B Andreas Quatember andreas.quatember@jku.at 1 Johannes Kepler University (JKU) Linz, Science Park 2, Altenberger Str. 66a, 4040 Linz, Austria Indirect questioning (IQ) designs intend to ensure respondents' cooperation by protecting their privacy. To achieve this goal, these techniques "mask" the respondents' actual status with respect to a variable under study (for an overview of IQ designs see, for instance, Chaudhuri and Christofides [4]). One of these methods is the item count technique (ICT), also known as unmatched count technique or list experiment. Its original version was discussed in detail by Droitcour et al. [9]. Using the ICT, when it comes to the sensitive question, the questionnaire shows a list of different statements, the "items," describing the membership of different population subgroups. Respondents are asked to report only the number of items that apply to them and not which of them apply. Two independent samples are drawn from the population of interest. In the control sample, the item list consists only of a number of so-called non-key items. In the other sample, the treatment sample, the list does additionally include the "key item" with respect to the sensitive membership of a certain population subgroup under study.
The non-key items should be perceived by the respondents as meaningful information in the context of the questionnaire (Chaudhuri and Christofides [3], p. 592). The only task of the control sample is to deliver the information on the non-key items that is needed in the estimation process.
Compared with other privacy protecting IQ designs such as the "randomized response techniques," the main advantage of the ICT is that the task of the respondents can easily be understood without the need for complex instructions so that it can be implemented very simply even in self-administered questionnaires. Moreover, the interviewees do never have to supply the answer on the sensitive question directly. Various experiments examined the effectiveness of the method (cf., for instance, Droitcour et al. [9], Tsuchiya et al. [27], Coutts and Jann [8], Comsa and Postelnicu [7], Kiewiet de Jonge and Nickerson [16], Wolter and Laier [28], Blair et al. [2], and in particular the meta-analysis by Ehler et al. [10]). Exciting application examples can be found, for instance, in Comsa and Postelnicu [7], Malesky et al. [17], Frye et al. [11], Gibson et al. [12], Rinken et al. [21], or Wolter et al. [29].
Clearly, in order to be recognized as a serious competitor to the common direct questioning approach in empirical research, an IQ technique has to be easy to understand and implement and as accurate as possible. Furthermore, it should be applicable for general probability sampling because in surveys, in which sensitive questions are asked, oftentimes complex sampling methods including stratification and clustering are used. In Sect. 2 of this article, the statistical properties of the basic versions of the ICT with one or two lists of non-key items are discussed. In Sect. 3, modifications of these basic versions are proposed, which make use of available relevant information about at least a part of the used non-key items. These modifications aim to increase the accuracy of the survey results and at the same time reduce the respondents' burden in the questionnaire. The purpose of the calculations in Sect. 4 is to get a numerical impression of the possible positive effects of the application of the proposed ICT versions on the estimation accuracy. The article is concluded by a summary and an outlook to further research questions.

The item count technique
In a generalization of the original version of the ICT, two independent without-replacement probability samples s 1 and s 2 of sizes n 1 and n 2 , respectively, are drawn from the study population U of size N by probability sampling methods S 1 and S 2 with first-order inclusion probabilities π k and ρ k , respectively, and second-order inclusion probabilities π kl and ρ kl . Response: In the control sample s 2 , the item list consists only of G non-key items. An example with G = 5 non-key items is: • I am an only child.
• I use an electric toothbrush.
• I have had a reported traffic accident last year.
• I have been hospitalized last year.
• I have been abroad in the last year.
The answer x to be reported by a respondent k from the control sample s 2 be x k , the number of the G non-key items that apply (x k = 0, 1,…,G). In the treatment sample s 1 , the short item list consisting of the G non-key items is complemented by the key item under study, which describes the sensitive membership of a certain population subgroup U A ⊂ U. An example is: • I have been engaged in undeclared work in the last year.
Let variable y indicate this membership of respondent k from the treatment sample s 1 : The answer z to be actually reported by such a respondent be z k = x k + y k , the number of all G + 1 items that apply from the long item list (z k = 0, 1,…, G, G + 1) ( Table 1).
Let the proportion p of interest be given by ( U is an abbreviated notation for the sum over all units k ∈ U). Parameter p from Eq. (1) can be expressed by the difference of the population means μ z and μ x of variables z and x, respectively: Consequently, the differencê of the Horvitz-Thompson-based estimators z H T and x H T of the two population means μ z and μ x calculated from the probability samples s 1 and s 2 , provides an unbiased moment estimator of p under the sampling designs S 1 and S 2 (cf. Särndal  The theoretical variance V (p) of the estimatorp from Eq. (3) is given by the sum of the usual theoretical variances V (z H T ) and V (x H T ) of z H T and x H T , respectively: ( U is an abbreviated notation for the double sum over all units k,l ∈ U). For the effect of the selection of the non-key items on V (p), see, for instance, Glynn [13]. For the variance-optimal allocation of the total sample size n on the two samples, see, for instance, Tian et al. [25]. Perri et al. [19] discuss the idea of optimal allocation extensively for the Item Sum Technique.
The variance from Eq. (4) can be unbiasedly estimated bŷ For simple random sampling with replacement (SIR) (or also approximately for simple random sampling without replacement from large populations) in both samples, for instance, Eq. (3) results inp the difference of the simple sample means of z and x in the two SIR samples s 1 and s 2 , respectively. For this sampling design, Eq. (4) results in with σ 2 z and σ 2 x , the population variances of z and x. Eventually, Eq. (5) yieldŝ with s 2 z and s 2 x , respectively, the sample variances of z and x in s 1 and s 2 . Besides the previously mentioned advantages of this procedure, also two weaknesses have been discussed in the relevant literature. One is the waste of estimation accuracy because the sensitive information under study is observed only in the treatment sample s 1 , whereas the control sample s 2 only serves as a reference for the calculation of the estimate x H T needed in Eq. (3). The other one is that in s 1 , the process answers z k = G + 1 and z k = 0, respectively, do reveal a respondent's true status on the sensitive membership of the group U A (and at the same time also of all non-key items). The probabilities of the occurrence of these "ceiling" or "floor effects," respectively, can be reduced by a proper choice of the non-key items (see, for instance, Glynn [13], p. 163). The floor effect is less problematic than the ceiling effect unless also the non-membership of U A is sensitive. In s 2 , the same applies to the response answers x k = G and x k = 0 with respect to the non-key items.
As a consequence of these weaknesses, several modifications of this original ICT have been proposed. The double-list (DL) version by Droitcour et al. [9], for instance, addresses the waste of efficiency by adding to the questionnaires in both samples of the original ICT a second item list, where the non-key items of the first list are replaced by other non-key Table 2 The DL version of the ICT Sample s 1 : S a m p l e s 2 : First item list: G non-key items (x) + key item (y) G non-key items (x) First Response: Second item list: G other non-key items (u) G other non-key items (u) + key item (y) Second Response: u k w k = u k + y k items. Without loss of generality, let us assume that the number of non-key items equals G in both lists. But, the treatment sample s 1 of the original ICT with respect to the first list serves now at the same time as the control sample with respect to the second list and vice versa. Therefore, two answers, z and u, have to be given by a respondent k from sample s 1 . These are the number z k = x k + y k of applicable items from the long first item list including the key item (as in the original ICT) and the number u k of applicable non-key items from the short second list without the key item (u k = 0, 1,…, G) ( Table 2). The answers x and w to be given by a respondent k from sample s 2 are the number x k of applicable non-key items from the short first list (as in the original ICT) and the number w k = u k + y k of applicable items from the long second list (w k = 0, 1,…, G, G + 1). By this supplement to the questionnaires, in both samples information on the sensitive variable is observed. This increases the estimation accuracy compared with the original ICT by the price of an only insignificant increase in the respondents' burden by the usage of a second item list that should hardly negatively affect their willingness to cooperate.
With two different lists of non-key items applied to two samples, from Eq. (3) two separate estimatesp 1 andp 2 , respectively, can be calculated, of which their mean valuê is taken as the procedure's unbiased estimate. In Eq. (9), z H T and u H T calculated from the probability sample s 1 , and x H T and w H T calculated from the probability sample s 2 , respectively, are the Horvitz-Thompson-based estimators of the population means μ z , μ u , μ x , and μ w . The theoretical variance ofp is given by Equation (10) includes the two variances V (p 1 ) and V (p 2 ), which are calculated applying Eq. (4), and the covariance term C(p 1 ,p 2 ), addressing the levels of dependence of z and u in s 1 , and x and w in s 2 , respectively (for the details see Appendix 1). The formula for the estimator of this variance includes the variance estimators that can be generated straightforward from Eq. (5) and an estimator of the covariance term from the sample data. The relevant formulas under the SIR sampling design with the estimator can be derived accordingly.
Petróczi et al. [20] and Groenitz [14] proposed the "single sample count technique." For this ICT approach with only one list, the joint population distribution of the non-key items is assumed to be known so that a control sample is no longer needed. Its theory is developed for the SIR design under practically limiting assumptions about the independence of the individual items. Like the DL version, this technique addresses the efficiency problem of the original ICT. The modifications of the original ICT that will be proposed in Sect. 3 are based on these contributions.

The proposed modifications of the original item count technique
In this section, a modification of the original ICT is proposed that increases the accuracy of the estimation of the proportion p and at the same time reduces the complexity of the questioning design. For this modification, we presume that the population mean value of the number of applicable non-key items among at least a part of these items is available, for instance, from administrative or register data. Non-key items for which this information is available and which do not appear to be completely meaningless, such as asking for the last digit of the phone number, should not be too difficult to find. For this purpose, for example, socio-demographic items with known distribution in the target population such as marital status (e.g. the statement "I am unmarried.") or education (e.g. "I have a degree of a university of applied sciences.") might be used. Other examples of such items could be age, gender, place of residence, migration background, nationality, ethnicity, religion, household size, employment, income or working hours. In the following, the effect of the usage of such information on the expressions of the estimator, its theoretical variance, and the variance estimator, respectively, is presented for general probability sampling.
Let the population mean value μ x (F 1 ) of the number x (F 1 ) of applicable items among F 1 of the G non-key items be given (0 ≤ F 1 ≤ G). Hence, the expression of the parameter p from Eq. (2) can be re-written by with μ x [E 1 ] , the unknown population mean value of the number x [E 1 ] of applicable items among the E 1 = G -F 1 of the G non-key items that are not contained in μ x (F 1 ) . For F 1 = 0 (E 1 = G), the item list corresponds to that of the original ICT. But for 0 < F 1 < G, the item list of the control sample s 2 consists only of the E 1 non-key items that are not included in μ x (F 1 ) , which reduces the respondents' task in sample s 2 compared with the original ICT. The answer to be reported by a respondent k from sample s 2 is x [E 1 ] k , the number of the E 1 non-key items that apply (x [E 1 ] k = 0, 1,…, E 1 ). In the treatment sample s 1 , the answer to be reported by a respondent is the same as in the original ICT (Table 3).
Clearly, the parameter from Eq. (12) is unbiasedly estimated bŷ with z H T from Eq. (3) and the Horvitz-Thompson-based estimator Response: The theoretical variance V(p (F 1 ) ) of the estimatorp (F 1 ) from Eq. (13) is given by the sum of the two theoretical variances The accuracy of the estimatorp (F 1 ) increases with F 1 → G. This variance is unbiasedly estimated bŷ Under the SIR design in both samples s 1 and s 2 , for instance, Eq. (13) results in with the sample means z and with σ 2 z and σ 2 Following the idea of Petróczi et al. [20], in the special case that F 1 = G, a control sample is no longer needed because the mean number μ x of x from Eq. (2) is known (μ x (G) ≡ μ x , x (G) ≡ x). This means that the total sample number n = n 1 + n 2 can be allocated to the treatment sample alone. With only one long list in the whole sample of size n, Eq. (13) reduces tô and Eqs. (14) and (15), respectively, to For the SIR method, these equations yield For a combination of this modified ICT, which uses relevant prior information on the non-key items, with the DL version of the ICT, which uses two different item lists, let the mean values μ x (F 1 ) of the number x (F 1 ) of applicable items of F 1 of the G non-key items from the first list (0 ≤ F 1 ≤ G) and μ u (F 2 ) of the number u (F 2 ) of applicable items of F 2 of the G non-key items from the second list (0 ≤ F 2 ≤ G) be known. In this case, the short first item list included in sample s 2 consists only of the E 1 = G -F 1 non-key items that are not included in the known mean value μ x (F 1 ) , whereas the short second item list included in sample s 1 consists only of the E 2 = G -F 2 non-key items that are not included in the known mean value μ u (F 2 ) . Hence, for F 1 > 0 and/or F 2 > 0, the use of such prior information does also reduce the respondents' burden of the original DL design. The numbers to be reported by a respondent k from sample s 1 are z k , the number of applicable items from the long first item list including the key item, and u [E 2 ] k , the number of applicable non-key items from the short second item list (u [E 2 ] k = 0, 1,…, E 2 ). A respondent k from sample s 2 has to report k , the number of applicable non-key items from the short first item list (x [E 1 ] k = 0, 1,…, E 1 ), and w k = u k + y k , the number of applicable items from the long second item list (Table  4). For F 1 = F 2 = 0, the item lists correspond to those of the original DL version.
This is the mean value of the two separate estimatesp that can be retrieved from Eq. (13). Its theoretical variance is given by with the two variances V (p 2 ), respectively, according to Eq. (14) and a covariance term C(p 2 ) that addresses the levels of dependence of z and u [E 2 ] in s 1 , and x [E 1 ] and w in s 2 , respectively. The variance term (26) can be estimated applying Eq. (21) for the estimation of the variance terms and an estimator of the covariance term using the sample data. For the SIR design, Eq. (25) yieldŝ If one or both mean values μ x and μ u are known (F 1 = G and/or F 2 = G), for the first, the second or both item lists, a control sample would theoretically no longer be needed. But, to apply one of the two or even both long lists to the entire sample of size n could be counter-productive because in such cases, the members of at least one sample would have to respond to two long lists with the same key item. This might have a negative impact on their perceived privacy protection and consequently on their cooperation willingness.

A Numerical comparison of the accuracy of different versions of the ICT
In this section, the different ICT versions described in Sects. 2 and 3 are numerically compared under the SIR design to provide an impression of the effect of the proposed techniques on the accuracy of the estimation. For all these methods, the respondents' task is of similar simplicity, which should ensure a similar effect on the cooperation willingness. For this purpose, from the "imaginary population" provided by Shaw [24], the six-dimensional population distribution of the G = 5 non-key items (there named "1" to "5") and the key item (named "A"), and their dependence structure is used (see Appendix 2). On the one hand, for sample sizes n 1 and n 2 , the estimation accuracy of the estimatorsθ SI R (p from Eq. (27), 0 ≤ F 1 ,F 2 ≤ G) of the different ICT methods is compared by the relative variance reduction (RVR) in % of V (p SI R ) of the original ICT (Table 5): SI R from Eq. (22), for example, for which not only the population mean μ x is known, but also the variable z is observed in the whole sample of size n = 1,000, with the variances V (p (5) SI R ) and V (p SI R ) from Eqs. (20) and (7), the RVR in % is given by With the parameters from Shaw's population (see Appendix 2), this results in an RVR of 72.7% compared to the original ICT approach (Table 5).
On the other hand, the estimation accuracy of the estimatorsθ SI R is compared with that of the "direct" estimatorp the proportion of the "yes-"responses r = 1 in the SIR sample of size n = n 1 + n 2 with r k = 1 i f respondent k answer s yes 0 i f respondent k answer s no, assuming that with probability q instead of the true status y = 1 the response r = 0 is reported, when the sensitive question is asked directly, and that all other non-sampling error components are equally negligible for the considered questioning designs. For this purpose, the threshold value q 0 of the probability q is calculated, the exceeding of which yields a mean square error M SE(p D SI R ) larger than the variance V (θ SI R ) of the unbiased estimatorθ SI R ( Table 6): (for the theoretical development, see Appendix 3). For q > q 0 , the bias ofp D SI R will yield M SE(p D SI R ) > V (θ SI R ), and the increased privacy protection of the specific ICT design will pay off in terms of accuracy. Forθ SI R =p  With the parameters from Shaw's population (see Appendix 2), this results in q 0 = 0.070 (see Table 6). Under the given assumptions, for a probability q larger than only 7.0%, V (p (5) SI R ) < M SE(p D SI R ) applies and the modification of the original ICT with G = 5 will provide more accurate results than the direct questioning approach.
In the given data set, the proportion p under study is equal to 0.479. The comparison is done with n 1 = n 2 = 500. The uniform allocation of n to the two samples is a reasonable choice when DL versions of the ICT are included in the investigations. The results presented in Tables 5 and 6 provide a numerical impression of the possible effects of the knowledge of the mean values of at least a part of the G non-key-items in techniques with one or two item lists on the estimation accuracy. For the one list versions of the ICT, n 1 > n 2 would be a better choice than n 1 = n 2 . Hence, forp Table 5 can be interpreted as lower limits of the achievable RVR in % and the results in Table 6 as upper limits of the probability q 0 .
Clearly, in a one-list design (columnp SI R in the Tables 5 and 6), the higher the number F 1 of the G non-key items is, for which μ x (F 1 ) is known, the higher is the RVR ofp (F 1 ) SI R in comparison top SI R of the original ICT. For the given data (see Appendix 2), the DL approach (p (F 1 /F 2 ) SI R ) with F 1 = F 2 = 0 (p SI R ) roughly halves the variance of the original ICT (p SI R ) by the use of two lists instead of only one. Moreover, Table 5 shows how also the DL version can gain additional accuracy through knowledge of the mean values μ x (F 1 ) and μ u (F 2 ) , respectively (0 ≤ F 2 ≤ F 1 ≤ G). This shows, how the performances of the estimators benefit from both ideas, use of prior knowledge and of two lists. Each additional part of prior information increases the accuracy of the estimator. In the applied setting, only the estimator p (5/5) SI R of the DL version with F 1 = F 2 = 5 provides roughly the same result asp (5) , in which no control sample is needed anymore.
When it comes to the comparison with the direct questioning on the sensitive item by q 0 from Eq. (30), Table 6 shows that the original ICT would pay off in terms of accuracy if more than only 14.4% of the members of U A would lie to the sensitive question when they were asked directly. As a consequence of the achievable variance reductions of the proposed modified versions of the ICT presented in Table 5, this q 0 can accordingly be reduced. For the best case with respect to simplicity and accuracy of the questioning design (Table 5), the estimatorp (5) SI R would already be more accurate than the direct estimator (29) (both with n = 1,000) if the absolute bias ofp D SI R from Eq. (29) is larger than only 0.070 · 0.479 = 0.033.

Summary and outlook
The original ICT is an easy-to-understand and simple-to-implement IQ design to mask sensitive information with the aim to increase respondents' cooperation willingness. The take-away message of this article is that the usage of certain prior information about at least some of the involved non-key items can substantially decrease the estimators' additional variance. In addition to contextual non-key items for which such information might be available, for example, socio-demographic items with known distribution in the target population oftentimes could be used for this purpose. In this way, the method has the potential to become an even stronger and more serious competitor of the direct questioning about sensitive characteristics. For 1 ≤ F 1 ≤ G-1, this prior information could additionally be used for regression-type estimators. For this purpose, in the control sample two separate lists with E 1 and F 1 nonkey items would have to be used. For large correlations of x [E 1 ] and x (F 1 ) , such estimators could reduce the second component of the variance V (p (F 1 ) ) from (14) while the unchanged first component remains responsible for the larger part of the sum. Based on the proposed modifications of the original ICT, it may be of interest to investigate the pros and cons of such estimators in some detail in possible future research.
Moreover, a further combination of the proposed modifications of the ICT with methods that account for the floor/ceiling effect could also address these weaknesses of the original ICT in protecting also the privacy of the few respondents to whom these effects would apply. However, such methods, of course, would have to maintain the simplicity of the original ICT to avoid an increase in the respondents' task. If this cannot be ensured, it would be preferable to reduce the probability of these two effects by selecting the non-key items appropriately. , d),the covariance term on the right-hand side of the equation is decomposed into:

Appendix 2
For the purpose of providing a numerical impression of the effect of the different modifications of the original ICT proposed in Sect. 3 on the accuracy of the estimates, not the small "imaginary population" from Shaw [24] itself is used but the multi-dimensional population distribution of the five non-key items and the key item from this data set. From this distribution, SIR-samples of sizes n 1 = n 2 = 500 are drawn. Regarding the knowledge of the mean values μ x (1) , μ x (2) , ... , μ x (5) of the numbers x (1) , x (2) ,…, x (5) of applicable non-key items, the variables "1" to "5" from the data set are used in reverse order (the knowledge of μ x (1) means that the population mean value of variable "5" is known, whereas the mean value μ x [4] of the number of applicable non-key items "1" to "4" is not known; the knowledge of μ x (2) means that the mean value of the sum of variables "4" and "5" is known, whereas the mean value μ x [3] of the number of applicable non-key items "1" to "3" is not known; and so forth). To allow also calculations for the different DL versions of the proposed modifications, the variable values of the different five non-key items included in Shaw's population are additionally randomly ordered for the generation of a second item list. Furthermore, without loss of generality, F 1 ≥ F 2 was assumed.

Appendix 3
The