Abstract
In this paper, we present empirical and theoretical results on classification trees for randomized response data. We considered a dichotomous sensitive response variable with the true status intentionally misclassified by the respondents using rules prescribed by a randomized response method. We assumed that classification trees are grown using the Pearson chi-square test as a splitting criterion, and that the randomized response data are analyzed using classification trees as if they were not perturbed. We proved that classification trees analyzing observed randomized response data and estimated true data have a one-to-one correspondence in terms of ranking the splitting variables. This is illustrated using two real data sets.
Similar content being viewed by others
References
BIGGS, D., DE VILLE, B., and SUEN, E. (1991), “A Method of Choosing Multiway Partitions for Classification and Decision Trees”, Journal of Applied Statistics, 18, 49–62.
BISHOP, Y.M.M., FIENBERG, S.E., and HOLLAND P.W. (1975), Discrete Multivariate Analysis. Theory and Practice, Massachusetts: The MIT Press.
BÖCKENHOLT,U., and VAN DER HEIJDEN, P.G.M. (2007), “ItemRandomized-Response Models for Measuring Noncompliance: Risk-return Perceptions, Social Influences, and Self-Protective Responses”, Psychometrika, 72, 245–262.
BORUCH, R.F. (1971), “Assuring Confidentiality of Response in Social Research: A Note on Strategies”, The American Sociologist, 6, 308–311.
BREIMAN, L., FRIEDMAN, J.H., OLSHEN, R.A., and STONE, C.J. (1984), Classification and Regression Trees, Boca Raton: Chapmann & Hall/CRC.
CHAUDHURI, A., and MUKERJEE, R. (1988), Randomized Response: Theory and Techniques, New York: Marcel Dekker, Inc.
FOX, J.A., and TRACY, P.E. (1986), Randomized Response: AMethod for Sensitive Survey, Newbury Park: Sage Publication, Inc.
FOX, J.P. (2005), “Randomized Item Response Theory Models”, Journal of Educational and Behavioral Statistics, 30, 1–24.
KASS, G.V. (1980), “An Exploratory Technique for Investigating Large Quantities of Categorical Data”, Applied Statistics, 29, 119–127.
LENSVELT-MULDERS, G.J.L.M., HOX, J.J., VAN DER HEIJDEN, P.G.M., and MAAS, C.J.M. (2005), “Meta-Analysis of Randomized Response Research: Thirty-five Years of Validation”, Sociological Methods and Research, 33, 319-3-48.
LENSVELT-MULDERS, G.J.L.M., VAN DER HEIJDEN, P.G.M., LAUDY, O., and VAN GILS, G. (2006), “A Validation of a Computer-assisted Randomized Response Survey to Estimate the Prevalence of Fraud in Social Security”, Journal of the Royal Statistical Society, Series A, 169, 305–318.
MADDALA, G.S. (1983), Limited Dependent and Qualitative Variables in Econometrics, Cambridge: Cambridge University Press.
MOLA, F., and SICILIANO, R. (1997), “A Fast Splitting Procedure for Classification and Regression Trees”, Statistics and Computing, 7, 209–216.
SCHEERS, N.J., and DAYTON, M.C. (1988), “Covariate Randomized Response Models”, Journal of the American Statistical Association, 83, 969–974.
SICILIANO, R., and MOLA, F. (2000), “Multivariate Data Analysis through Classification and Regression Trees”, Computational Statistics & Data Analysis, 32, 285–301.
VAN DER HEIJDEN, P.G.M., and BÖCKENHOLT, U. (2008), “Applications of Randomized Response Methodology in e-Commerce”, in Statistical Methods in e-Commerce Research, eds. W. Jank and G. Shmueli, New York: Wiley, pp. 401–416.
WARNER, S.L. (1965), “Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias”, Journal of the American Statistical Association, 60, 63–69.
Author information
Authors and Affiliations
Corresponding author
Additional information
Most of the research of Pier Francesco Perri was done during his stay at the Department of Methodology and Statistics, University of Utrecht (The Netherlands). His work was partly supported by the research voucher awarded by Regione Calabria, Italy.
Rights and permissions
About this article
Cite this article
Perri, P.F., van der Heijden, P.G. A Property of the CHAID Partitioning Method for Dichotomous Randomized Response Data and Categorical Predictors. J Classif 29, 76–90 (2012). https://doi.org/10.1007/s00357-011-9094-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-011-9094-8