Artificial Inflation or Deflation? Assessing the Item Count Technique in Comparative Surveys

Abstract

While the popularity of using the item count technique (ICT) or list experiment to obtain estimates of attitudes and behaviors subject to social desirability bias has increased in recent years among political scientists, many of the empirical properties of the technique remain untested. In this paper, we explore whether estimates are biased due to the different list lengths provided to control and treatment groups rather than due to the substance of the treatment items. By using face-to-face survey data from national probability samples of households in Uruguay and Honduras, we assess how effective the ICT is in the context of face-to-face surveys—where social desirability bias should be strongest—and in developing contexts—where literacy rates raise questions about the capability of respondents to engage in cognitively taxing process required by ICT. We find little evidence that the ICT overestimates the incidence of behaviors and instead find that the ICT provides extremely conservative estimates of high incidence behaviors. Thus, the ICT may be more useful for detecting low prevalence attitudes and behaviors and may overstate social desirability bias when the technique is used for higher frequency socially desirable attitudes and behaviors. However, we do not find strong evidence of variance in deflationary effects across common demographic subgroups, suggesting that multivariate estimates using the ICT may not be biased.

This is a preview of subscription content, log in to check access.

Notes

  1. 1.

    The ICT is known by a variety of different names, including block total response, unmatched item count technique, randomized list technique, and the list experiment. The difference in terminology reflects the fact that the technique has developed in different disciplines in relative isolation, with political scientists generally referring to the technique as the list experiment, while those working in sociology, business ethics, and public health using the item count terminology.

  2. 2.

    The ICT has also been utilized frequently in sociology, psychology, and business ethics. For example, scholars have used the ICT to assess rates of risky behaviors (Anderson et al. 2007; Biemer and Brown 2005; LaBrie and Earleywine 2000; Miller 1984; Zimmerman and Langer 1995), unethical behavior in the workplace (Dalton et al. 1994; Wimbush and Dalton 1997), among many other topics.

  3. 3.

    Ceiling and floor effects can remove the anonymity of the responses by forcing the respondent to say “All” or “None” of the items. Thus, it is good practice to minimize the variance of responses to the control list and place the mean response at roughly \(\frac{X}{2}\).

  4. 4.

    Researchers can further stratify the sample so that estimates for different subgroups can be derived in a similar manner. Other researchers have also begun developing multivariate estimators for the ICT (e.g. Glynn 2013; Blair and Imai 2012).

  5. 5.

    Techniques that combat social desirability bias require additional assessment in the developing world setting. Although social desirability bias is a universal source of measurement error (Johnson and Van deVijver 2003), personality-based social desirability scales show variance across different cultural and socioeconomic settings, with greater levels of socially desirable responding occurring in poorer, more collectivist countries as opposed to richer individualistic countries (Johnson and Van deVijver 2003, p. 197–200). As survey research gains increasing importance in the developing world, scholars need to evaluate the usefulness of techniques developed in the industrialized world for countries at lower levels of development (cf. Harkness and Van de Vijver 2003).

  6. 6.

    Note that neither of these assumptions stipulate that responses to the control items be truthful or accurate; rather, only that the treatment item does not affect control item counts. That is, the assumptions “together eliminate the possibility that the coexistence of the sensitive and control items in a single list influence responses in one way or another” (Imai 2011, p. 409).

  7. 7.

    The success of the randomization procedure is discussed further in “Appendix 2”.

  8. 8.

    Although the percentage claiming to have been a candidate is higher than expected, the elections in Uruguay included both presidential and parliamentary elections, somewhat increasing the likelihood that the sample would include a candidate.

  9. 9.

    The wording in Spanish was the following: “Nos interesa saber cómo se involucran las personas en política. Voy a mostrale una lista de actividades políticas y quisiera que me diga cuántas de estas actividades realizó usted durante la última campaña. No me diga cuáles, sólo CUÁNTAS”. The baseline response categories were: “Participé como voluntario para la campaña de uno de los partidos,” “Participé en una movilización,” “Intenté convencer a un amigo de que votara por mi candidato,” and “Tuve una pelea con alguien sobre un candidato.”

  10. 10.

    In fact, Tsuchiya et al. (2007) finds deflation to be more of a problem with non-sensitive items than sensitive items.

  11. 11.

    The success of the randomization procedure is discussed further in “Appendix 2”.

  12. 12.

    The item counts produced in the Uruguayan survey were quite low, with just over half of respondents indicating zero items. As explained below, this feature of the Uruguayan experiment may partially account for some of the marginal heterogeneity in the results.

  13. 13.

    In Spanish, the baseline items for Honduras were “Voté por algún candidato,” “Participé en una movilización,” “Discutí acerca de la eleción con alguien,” “Vi o leí algo acerca de la elección en las noticias.” The treatment items were “Participé como candidato” and “Estaba al tanto de que las elecciones se iban a llevar a cabo,” respectively.

  14. 14.

    Although the June 2009 coup in Honduras might have increased awareness about the elections, there is no reason to believe that it would have made either of the treatment items sensitive.

  15. 15.

    As a reviewer helpfully pointed out, artificial deflation might also produce a negative estimate of the treat low condition and artificial inflation could potentially produce an estimate greater than 1 (i.e. greater than 100%) for the treat high condition.

  16. 16.

    Although the low number of items reported by members of both groups in the Uruguayan survey would generally lead to questions about potential floor effects, since the treatment items are non-sensitive, such concerns should be allayed. We thank a reviewer for pointing this out.

  17. 17.

    Restricting the sample only to those respondents who claimed that they were not a candidate makes the estimated difference between treatment and control lists even smaller (0.056) and the evidence against the artificial inflation hypothesis even stronger (one tailed p value = 0.26 for difference of means).

  18. 18.

    In Uruguay, a later question on the survey directly asked respondents if they knew the election was taking place; 96 % of respondents answered affirmatively.

  19. 19.

    The operationalization of all variables is outlined in Appendix 2.

  20. 20.

    To ease the interpretation of the treatment variables and constants in these models, the explanatory variables were centered at their medians (except gender). Thus, in the models including other explanatory variables, the coefficients for the uninteracted treatment variables (i.e. Test Low and Test High) reflect the average treatment effect (or ICT estimate) for the median respondent. The constant can be interpreted as the average number of items indicated by the median respondent to the control list.

  21. 21.

    Approximately 18 % of respondents did not answer the income item by either saying don’t know or refusing. To account for this large proportion, the income scale runs from 0 to 3, with missing data coded as zero, with a dummy variable and corresponding treatment interactions included test differences between those who responded and did not.

  22. 22.

    Although the relationship does not reach conventional levels of statistical significance, the coefficient of the main test high treatment variable in conjunction with the coefficient of the income test high interaction suggests that those who did not answer the income question were more accurate than were those with the lowest incomes (p = 0.15, Wald test).

  23. 23.

    The robustness of these results is further called into question when removing those respondents who said that they were candidates for office and those who said that they were not aware that the elections were taking place. The only interaction that retains marginal significance with this analysis is the income non-response indicator (p = 0.08).

  24. 24.

    Technically, likelihood ratio tests are inappropriate with clustered survey data. A Wald test produced similar non- significant results for Uruguay, while a Wald test suggested somewhat better model fit for the Honduras interactions, although none proved significant in the analysis.

References

  1. Anderson, D. A., Simmons, A. M., Milnes, S. M., & Earleywine, M. (2007). Effect of response format on endorsement of eating disordered attitudes and behaviors. International Journal of Eating Disorders, 40(1), 90–93.

    Article  Google Scholar 

  2. Biemer, P., & Brown, G. (2005). Model-based estimation of drug use prevalence using item count data. Journal of Official Statistics, 21(2), 287–308.

    Google Scholar 

  3. Biemer, P., Kathleen Jordan, B., Hubbard, M., & Wright, D. (2005). A test of the item count methodology for estimating cocaine use prevalence. In J. Kenneth & J. Gfroerer (Eds.), Evaluating and improving methods used in the national survey on drug use and health. Rockville: Substance Abuse and Mental Health Services Administration, Office of Applied Studies.

    Google Scholar 

  4. Blair, G., & Imai, K. (2012). Statistical analysis of list experiments. Political Analysis, 20, 47–77.

    Article  Google Scholar 

  5. Bless, H., Bohner, G., Hild, T., & Schwarz, N. (1992). Asking difficult questions: Task complexity increases the impact of response alternatives. European Journal of Social Psychology, 22, 309–312.

    Article  Google Scholar 

  6. Corstange, D. (2009). Sensitive questions, truthful answers? Modeling the list experiment with LISTIT. Political Analysis, 17(1), 45–63.

    Article  Google Scholar 

  7. Coutts, E and Jann B. (2008). Sensitive Questions in Online Surveys: Experimental results for the randomized response technique (RRT) and the item count technique (UCT). ETH Zurich Sociology Working Paper No. 3. ETH Zurich.

  8. Dalton, D. R., Wimbush, J. C., & Daily, C. M. (1994). Using the unmatched count technique (UCT) to estimate base rates for sensitive behavior. Personnel Psychology, 47(4), 817.

    Article  Google Scholar 

  9. Díaz Cayeros, A., Magaloni, B., Matanock, A., & Romero, V. (2011). Living in fear: Mapping the social embeddedness of drug gangs and violence in Mexico. doi:10.2139/ssrn.1963836.

  10. Droitcour, J., Caspar, R. A., Hubbard, M. L., Parsley, T. L., Visscher, W., & Ezzati, T. M. (1991). The item count technique as a method of indirect questioning: A review of its development and a case study application. Measurement errors in surveys, 185–210.

  11. Glynn, A. N. (2013). What can we learn with statistical truth serum? Design and analysis of the list experiment. Public Opinion Quarterly, 77(S1), 159–172. doi:10.1093/poq/nfs070.

    Article  Google Scholar 

  12. Harkness, J., & Van de Vijver, F. (2003). Cross-cultural survey methods. Hoboken: Wiley.

    Google Scholar 

  13. Heerwig, J. A., & McCabe, B. J. (2009). Education and social desirability bias: The case of a black presidential Candidate. Social Science Quarterly, 90(3), 674–686.

    Article  Google Scholar 

  14. Holbrook, A. L., & Krosnick, J. A. (2010). Social desirability bias in voter turnout reports: Tests using the item count technique. Public Opinion Quarterly, 74(1), 37–67.

    Article  Google Scholar 

  15. Hubbard, M.L., Casper, R.A., Lessler, J.T. (1989). Respondent reactions to item count lists and randomized response. Proceedings of the Survey Research Section of the American Statistical Association, pp. 544–548.

  16. Imai, K. (2011). Multivariate regression analysis for the item count technique. Journal of the American Statistical Association, 106(494), 407–416.

    Article  Google Scholar 

  17. Jackman, S. (2007). The social desirability of belief in god. Presentation for the Boston area methods meeting, March 2007.

  18. Johnson, T., & Van de Vijver, F. (2003). Social desirability bias in cross-cultural research. In J. Harkness, F. Van de Vijver, & P. Mohler (Eds.), Cross-cultural survey methods (pp. 195–204). Hoboken: Wiley.

    Google Scholar 

  19. Kane, J. G., Craig, S. C., & Wald, K. D. (2004). Religion and presidential politics in Florida: A list experiment. Social Science Quarterly, 85(2), 281–293.

    Article  Google Scholar 

  20. Karlan, D., & Zinman, J. (2012). List randomization for sensitive behavior: An application for measuring use of loan proceeds. Journal of Development Economics, 98, 71–75.

    Article  Google Scholar 

  21. Krosnick, J. A., & Alwin, D. F. (1987). An evaluation of a cognitive theory of response order effects in survey measurement. Public Opinion Quarterly, 51, 201–219.

    Article  Google Scholar 

  22. Kuklinski, J. H., Cobb, M. D., & Gilens, M. (1997a). Racial attitudes and the new South. The Journal of Politics, 59(2), 323–349.

    Article  Google Scholar 

  23. Kuklinski, J. H., Sniderman, P. M., Knight, K., Piazza, T., Tetlock, P. E., Lawrence, G. R., et al. (1997b). Racial prejudice and attitudes toward affirmative action. American Journal of Political Science, 41(2), 402–419.

    Article  Google Scholar 

  24. LaBrie, J. W., & Earleywine, M. (2000). Sexual risk behaviors and alcohol: Higher base rates revealed using the unmatched-count technique. The Journal of Sex Research, 37(4), 321–326.

    Article  Google Scholar 

  25. Malesky, E., Jensen, N., & Gueorguiev, D. (2011). “Rent(s) asunder: Sectoral rent extraction possibilities and bribery by Multinational Corporations. Working Paper Series, Peterson Institute for International Economics.

  26. Menon, G., Raghubir, P., & Schwarz, N. (1995). Behavioral frequency judgments: An accessibility-diagnosticity framework. Journal of Consumer Research, 22(2), 212–228.

    Article  Google Scholar 

  27. Miller, J.D. (1984). A new survey technique for studying deviant behavior. Ph.D. thesis. Washington, DC: George Washington University.

  28. Miller, J. D., & Cisin, I. H. (1984). The item-count/paired lists technique: An indirect method of surveying deviant behavior. Washington, DC: George Washington University, Social Research Group.

    Google Scholar 

  29. Ocantos, G., Ezequiel, C. K., de Jonge, C., Meléndez, J. O., & Nickerson, D. W. (2012). Vote buying and social desirability bias: Experimental evidence from Nicaragua. American Journal of Political Science, 56(1), 202–217.

    Article  Google Scholar 

  30. Schwarz, N., & Bienias, J. (1990). What mediates the impact of response alternatives on frequency reports of mundane behaviors? Applied Cognitive Psychology, 4, 61–72.

    Article  Google Scholar 

  31. Schwarz, N., & Scheuring, B. (1988). Judgments of relationship satisfaction: Inter- and intra-individual comparison strategies as a function of questionnaire structure. European Journal of Social Psychology, 18, 485–496.

    Article  Google Scholar 

  32. Schwarz, N., Hippler, H. J., Deutsch, B., & Strack, F. (1985). Response categories: Effects on behavioral reports and comparative judgments. Public Opinion Quarterly, 49, 388–395.

    Article  Google Scholar 

  33. Streb, M. J., Burrell, B., Frederick, B., & Genovese, M. A. (2008). Social desirability effects and support for a female American President. Public Opinion Quarterly, 72(1), 76–89.

    Article  Google Scholar 

  34. Sudman, Seymour. (1966). Probability sampling with quotas. Journal of the American Statistical Association, 61(315), 749–771.

    Article  Google Scholar 

  35. Sudman, S., Bradburn, N. M., & Schwarz, N. (1996). Thinking about answers: The application of cognitive processes to survey methodology. San Francisco: Jossey-Bass.

    Google Scholar 

  36. Tourangeau, R., & Smith, T. (1996). Asking sensitive questions: The impact of data collection, question format, and question context. Public Opinion Quarterly, 60, 275–304.

    Article  Google Scholar 

  37. Tourangeau, R., & Yan, T. (2007). Sensitive questions in surveys. Psychological Bulletin, 133(5), 859–883.

    Article  Google Scholar 

  38. Tsuchiya, T., & Hirai, Y. (2010). Elaborate item count questioning: Why do people underreport count responses? Survey Research Methods, 4(3), 139–149.

    Google Scholar 

  39. Tsuchiya, T., Hirai, Y., & Ono, S. (2007). A study of the properties of the item count technique. Public Opinion Quarterly, 71(2), 253–272.

    Article  Google Scholar 

  40. Weghorst, K. (2010). Uncovered sensitive political attitudes with list experiments and randomized response technique: A survey experiment assessing data quality in Tanzania. Presented at the 2010 Midwest Political Science Association National Conference.

  41. Wimbush, J. C., & Dalton, D. R. (1997). Base rate for employee theft: Convergence of multiple methods. Journal of Applied Psychology, 82(5), 756–763.

    Article  Google Scholar 

  42. Zimmerman, R. S., & Langer, L. M. (1995). Improving estimates of prevalence rates of sensitive behaviors: The randomized lists technique and consideration of self-reported honesty. The Journal of Sex Research, 32(2), 107–117.

    Article  Google Scholar 

Download references

Acknowledgments

Funding for the surveys was provided by the Kellogg Institute for International Studies and the Institute for Scholarship in the Liberal Arts at the University of Notre Dame. Nickerson is grateful for the Center for the Study of Democratic Politics at Princeton University for the time to work on this project. We thank Equipos Mori for fielding the Uruguayan survey and Borge y Asociados for conducting the Honduran survey. We would also like to thank Scott Desposato, Macartan Humphries, Jim Kuklinski, and Devra Moeller and anonymous reviewers for helpful comments. We are particularly indebted to the continuing collaboration of Ezequiel Gonzalez Ocantos, Carlos Melendez, and Javier Osorio.

Author information

Affiliations

Authors

Corresponding author

Correspondence to David W. Nickerson.

Appendices

Appendix 1: Survey Methodology

Uruguay

Survey Firm: Equipos Mori

Field Dates: December 15–18, 2009

Mode: Omnibus Face-to-Face

Sampling Universe: Nationally representative of adults (18+)

N: 900

Sample Design: The survey utilized a multistage probability sample of households with quotas utilized within households for the final selection of respondents (Sudman 1966). There were 243 final sampling points, with an average of 4 respondents per sampling point. The sample was first stratified into two grand strata—Montevideo and the Interior. Within Montevideo, the sample was further stratified by municipal zones. Within the interior, stratification occurred by population, with cities with populations exceeding 30,000 inhabitants automatically included and lower population cities selected randomly proportional to population size. Within cities (interior) and zones (Montevideo), final sampling points were randomly chosen proportionate to population, households were chosen randomly based on a systematic sampling procedure, and within households respondents were selected using sex and age quotas. For rural areas, departments were selected randomly according to population, and within selected departments and segments, national highways were selected. Highway distances (km markers) were then randomly selected as starting points for the selection of households, which were chosen based on predetermined random procedures. In total 6 rural sampling points were chosen.

AAPOR Response Rate 1: 32 %, Refusal Rate: 33 %

Randomization Design: The survey battery included one other question that required randomization such that the combination of the different question versions resulted in 6 different questionnaires. Each questionnaire was applied according to a predetermined randomized list.

Honduras

Survey Firm: Borge y Asociados

Field Dates: January 16–25, 2010

Mode: Omnibus Face-to-Face

Sampling Universe: Nationally representative of adults (18+), excluding the sparsely populated department of Gracias a Dios and the Bay Islands.

N: 1,008

Sample Design: The survey utilized a multistage random sample with 84 final sampling points (segments), including 12 respondents per segment. Sampling proceeded as follows: The sampling frame consisted of the electoral registry, with primary sampling units chosen proportionate to the size of voting centers within department—municipalities. Within municipalities, random selection proceeded by electoral centers, census tracks, and census blocks, with final sampling points (segments or blocks) containing 12 respondents. Households and respondents within households were chosen randomly in such a way that ensured gender balance.

AAPOR Response Rate 1: 50 %, Refusal rate: 9 %

Randomization Design: The survey battery included two other questions that required randomization such that the combination of the different questions resulted in 12 different questionnaires. Each of the 12 questionnaires was applied according to a predetermined randomized list within each sampling unit, each of which included 12 respondents.

Appendix 2: Descriptive Statistics and Randomization Balance

See Tables 2 and 3.

Table 2 Uruguay descriptive statistics and treatment balance
Table 3 Honduras descriptive Statistics and randomization balance

Appendix 3: Analysis of Heterogeneity

To get a sense of what types of respondents are more likely to make errors in responding to the ICT and to the high frequency treatment in particular, we examined whether the treatment effects varied by subgroup using a series of OLS regressions predicting the number of reported items. Variables indicating assignment to the two treatment groups, basic demographic variables (gender, age, education, income),Footnote 19 the degree of disengagement with the survey (number of missing values in the instrument), and interactions between the treatment variables and the explanatory variables were included in the analysis.Footnote 20 The results for Uruguay and Honduras are reported in Tables 4 and 5 respectively. Coefficients on the terms interacted with each of the treatments (i.e., low propensity behavior treatment list and high propensity behavior treatment list) provide evidence of heterogeneity in response to the treatment. Given the null finding for the low-incidence treatment and the lack of variance on this item, we expect substantially less heterogeneity in the response to the low-incidence treatment.

Table 4 Uruguay demographic interactions
Table 5 Honduras demographic interactions

For the Uruguayan sample, gender and age are unrelated to either artificial inflation (p = 0.74 and 0.48, respectively) or artificial deflation biases (p = 0.49 and 0.78, respectively). There is a positive effect of education in the low incidence treatment list that reaches marginal levels of statistical significance (p = 0.09). If replicated elsewhere, this finding would suggest that greater education increases the likelihood of artificial inflation, contrary to expectation that those with less education would be more likely to inflate their responses. Since education is positively correlated with participating in politics (the substance of the list), it is possible that social norms in relatively educated social circles cause people to provide a higher number in response to a longer list. While the interactions with the three-point income scale are not significant (p = 0.745 and 0.12), the interaction with a dummy variable indicating missing data for the income item (due to refusals or don’t knows)Footnote 21 is highly significant for the test high condition (p = 0.03), suggesting that those who did not answer the income item were much more likely to count the election awareness item than were those who provided a valid answer on the income scale.Footnote 22 Not responding to the income question could reveal the respondent to be the type of person unlikely to reveal sensitive information. The fact that they are substantially more accurate than people who provided income information may suggest that the ICT works as desired for this population, but then one is left with the question about why people who provided income information were not very accurate in providing the number of items. It should be noted, however, that including all the control variables does not appreciably change the estimated magnitude of these detected biases for the educated and income non-responders and only slightly alters the statistical significance (see Table 5, column “All”). Given the standard errors associated with these estimates, one should avoid rushing to too many conclusions about populations for which the ICT is valid based solely on these findings.Footnote 23

To highlight this point, the Honduran sample exhibits no treatment heterogeneity across demographic variables. Although the coefficients on the income variables hint that low-income respondents were more likely to artificially inflate their responses and high-income non-respondents were less likely to deflate theirs, the coefficients are not close to significant (p = 0.16, 0.13). Gender, age, and education similarly demonstrate no significant effects. Intriguingly, the interaction between gender and the two treatment lists are very similar for both the Uruguay and Honduras samples (positive for the low-incidence behavior and negative for the high-incidence behavior). However, even averaging these estimates the results never come close to the approaching traditional levels of statistical significance (test low combined p = 0.51, test high combined p = 0.18). So it is possible that the artificial inflation and deflation hypotheses are truer for women than men, but testing that proposition would require much larger datasets. In short, Honduras fails to replicate the interesting education effect found in Uruguay. Thus, both surveys show strong evidence of artificial deflation, but evidence of heterogeneity in exhibition of this bias is weak.

It is also interesting to note that the proxy for survey disengagement is not a moderator for either treatment in either country. If attentiveness to the survey were the cause of artificial deflation, one would expect the interaction between the “treat high” variable and “disengagement” to be highly negative. Instead, the model estimates values that are substantively and statistically indistinguishable from zero. Thus, survey disengagement is not driving the results. Finally, at the suggestion of a reviewer, we also conducted likelihood ratio tests to determine whether the inclusion of the treatment interacted variables improved upon the models only including the non-interacted variables. In neither case does the addition of the interacted variables improve model fit (Uruguay p = 0.37; Honduras p = 0.40).Footnote 24

As a robustness check, we also estimated the treatment heterogeneity models using the maximum likelihood technique developed by Blair and Imai (2012), which has the potential to increase the efficiency of the estimates (Table 6). Due to convergence difficulties, we had to exclude respondents who did not answer the income item as well as the survey disengagement variable in the Honduras analysis. The only difference for the Uruguay analysis is that income becomes marginally significant for the treat high item. For Honduras, poorer respondents are more likely to inflate responses (treat low) and females are somewhat more likely to report deflated responses (treat high), but both of these effects reach only marginal levels of significance. Given these small differences, the results of this analysis do not change the overall conclusions from the OLS analysis.

Table 6 Maximum likelihood estimates

Taken together, the analysis demonstrates that any heterogeneity in potential counting errors across demographic subgroups is not robust. If confirmed by other studies, these findings suggest that while ICT estimates are likely to be biased downward, multivariate estimates of treatment item predictors are not likely to be biased since the overall deflationary bias does not vary substantially across subgroups.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Kiewiet de Jonge, C.P., Nickerson, D.W. Artificial Inflation or Deflation? Assessing the Item Count Technique in Comparative Surveys. Polit Behav 36, 659–682 (2014). https://doi.org/10.1007/s11109-013-9249-x

Download citation

Keywords

  • List experiment
  • Item count technique
  • Survey design
  • Social desirability bias
  • Uruguay
  • Honduras