Skip to main content
Log in

A Property of the CHAID Partitioning Method for Dichotomous Randomized Response Data and Categorical Predictors

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

In this paper, we present empirical and theoretical results on classification trees for randomized response data. We considered a dichotomous sensitive response variable with the true status intentionally misclassified by the respondents using rules prescribed by a randomized response method. We assumed that classification trees are grown using the Pearson chi-square test as a splitting criterion, and that the randomized response data are analyzed using classification trees as if they were not perturbed. We proved that classification trees analyzing observed randomized response data and estimated true data have a one-to-one correspondence in terms of ranking the splitting variables. This is illustrated using two real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • BIGGS, D., DE VILLE, B., and SUEN, E. (1991), “A Method of Choosing Multiway Partitions for Classification and Decision Trees”, Journal of Applied Statistics, 18, 49–62.

    Article  Google Scholar 

  • BISHOP, Y.M.M., FIENBERG, S.E., and HOLLAND P.W. (1975), Discrete Multivariate Analysis. Theory and Practice, Massachusetts: The MIT Press.

    MATH  Google Scholar 

  • BÖCKENHOLT,U., and VAN DER HEIJDEN, P.G.M. (2007), “ItemRandomized-Response Models for Measuring Noncompliance: Risk-return Perceptions, Social Influences, and Self-Protective Responses”, Psychometrika, 72, 245–262.

    Article  MathSciNet  MATH  Google Scholar 

  • BORUCH, R.F. (1971), “Assuring Confidentiality of Response in Social Research: A Note on Strategies”, The American Sociologist, 6, 308–311.

    Google Scholar 

  • BREIMAN, L., FRIEDMAN, J.H., OLSHEN, R.A., and STONE, C.J. (1984), Classification and Regression Trees, Boca Raton: Chapmann & Hall/CRC.

    MATH  Google Scholar 

  • CHAUDHURI, A., and MUKERJEE, R. (1988), Randomized Response: Theory and Techniques, New York: Marcel Dekker, Inc.

    MATH  Google Scholar 

  • FOX, J.A., and TRACY, P.E. (1986), Randomized Response: AMethod for Sensitive Survey, Newbury Park: Sage Publication, Inc.

    Google Scholar 

  • FOX, J.P. (2005), “Randomized Item Response Theory Models”, Journal of Educational and Behavioral Statistics, 30, 1–24.

    Article  Google Scholar 

  • KASS, G.V. (1980), “An Exploratory Technique for Investigating Large Quantities of Categorical Data”, Applied Statistics, 29, 119–127.

    Article  Google Scholar 

  • LENSVELT-MULDERS, G.J.L.M., HOX, J.J., VAN DER HEIJDEN, P.G.M., and MAAS, C.J.M. (2005), “Meta-Analysis of Randomized Response Research: Thirty-five Years of Validation”, Sociological Methods and Research, 33, 319-3-48.

    Article  MathSciNet  Google Scholar 

  • LENSVELT-MULDERS, G.J.L.M., VAN DER HEIJDEN, P.G.M., LAUDY, O., and VAN GILS, G. (2006), “A Validation of a Computer-assisted Randomized Response Survey to Estimate the Prevalence of Fraud in Social Security”, Journal of the Royal Statistical Society, Series A, 169, 305–318.

    Article  Google Scholar 

  • MADDALA, G.S. (1983), Limited Dependent and Qualitative Variables in Econometrics, Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • MOLA, F., and SICILIANO, R. (1997), “A Fast Splitting Procedure for Classification and Regression Trees”, Statistics and Computing, 7, 209–216.

    Article  Google Scholar 

  • SCHEERS, N.J., and DAYTON, M.C. (1988), “Covariate Randomized Response Models”, Journal of the American Statistical Association, 83, 969–974.

    Article  Google Scholar 

  • SICILIANO, R., and MOLA, F. (2000), “Multivariate Data Analysis through Classification and Regression Trees”, Computational Statistics & Data Analysis, 32, 285–301.

    Article  MathSciNet  MATH  Google Scholar 

  • VAN DER HEIJDEN, P.G.M., and BÖCKENHOLT, U. (2008), “Applications of Randomized Response Methodology in e-Commerce”, in Statistical Methods in e-Commerce Research, eds. W. Jank and G. Shmueli, New York: Wiley, pp. 401–416.

    Chapter  Google Scholar 

  • WARNER, S.L. (1965), “Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias”, Journal of the American Statistical Association, 60, 63–69.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pier Francesco Perri.

Additional information

Most of the research of Pier Francesco Perri was done during his stay at the Department of Methodology and Statistics, University of Utrecht (The Netherlands). His work was partly supported by the research voucher awarded by Regione Calabria, Italy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Perri, P.F., van der Heijden, P.G. A Property of the CHAID Partitioning Method for Dichotomous Randomized Response Data and Categorical Predictors. J Classif 29, 76–90 (2012). https://doi.org/10.1007/s00357-011-9094-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-011-9094-8

Keywords

Navigation