Abstract
Micro-aggregation is a frequently used strategy to anonymize data before they are released to the scientific public. A sample of a continuous random variable is individually micro-aggregated by first sorting and grouping the data into groups of equal size and then replacing the values of the variable in each group by their group mean. In a similar way, data with more than one variable can be anonymized by individual micro-aggregation. Data thus distorted may still be used for statistical analysis. We show that if probabilities and quantiles are estimated in the usual way by computing relative frequencies and sample quantiles, respectively, these estimates are consistent and asymptotically normal under mild conditions.
Similar content being viewed by others
References
Aggarwal CC, Yu PS (2008) Privacy-preserving data mining: models and algorithms. Springer, New York
Bhattacharya RN, Rao RR (1976) Normal approximation and asymptotic expansions. Wiley, New York
Billingsley P (1968) Convergence of probability measures. Wiley, New York
Defays D, Anwar MN (1998) Masking microdata using microaggregation. J Official Stat 14: 449–461
Domingo-Ferrer J, Torra V (2001) A quantitative comparison of disclosure control methods for microdata. In: Doyle P, Lane J, Theeuwes J, Zayatz L (eds) Confidentiality, disclosure, and data access. North-Holland, Amsterdam, pp 111–133
Domingo-Ferrer J, Torra V (2004) Privacy in statistical databases. Springer, Berlin
Domingo-Ferrer J, Oganian A, Torres A, Mateo-Sanz JM (2002) On the security of microaggregation with individual ranking: Analytical attacks. Int J Uncertain Fuzz Knowl Based Syst 10: 77–491
Domingo-Ferrer J, Martinez-Balleste A, Mateo-Sanz JM, Sebe F (2006) Efficient multivariate data-oriented microaggregation. Int J Very Large Data Bases 15: 355–369
Domingo-Ferrer J, Sebe F, Solanas A (2008) A polynomial-time approximation to optimal multivariate microaggregation. Comput Math Appl 55: 714–732
Doyle P, Lane J, Theeuwes J, Zayatz L (2001) Confidentiality, disclosure, and data access. North-Holland, Amsterdam
Felsö F, Theeuwes J, Wagner GG (2001) Disclosure limitation methods in use: results of a survey. In: Doyle P, Lane J, Theeuwes J, Zayatz L (eds) Confidentiality, disclosure, and data access. North-Holland, Amsterdam, pp 17–42
Hrynaszkiewicz I, Norton ML, Vickers AJ, Altman DG (2010) Preparing raw clinical data for publication: Guidance for journal editors, authors, and peer reviewers. BMJ 340: c181
Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Lenz R, Naylor J, Nordholt ES, Seri G, De Wolf, PP (2010) Handbook on Statistical disclosure control, version 1.2 http://neon.vb.cbs.nl/CASC/SDC_Handbook.pdf
Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17: 902–911
Mateo-Sanz JM, Domingo-Ferrer J (1998) A comparative study of microaggregation methods. Questiio 22: 511–526
Ronning G, Sturm R, Höhne J, Lenz R, Rosemann M, Scheffler M, Vorgrimler D (2005) Handbuch zur Anonymisierung wirtschaftsstatistischer Mikrodaten. Statistik und Wissenschaft 4. Statistisches Bundesamt, Wiesbaden (in German)
Rosemann M, Lenz R, Vorgrimler D, Sturm R (2006) Anonymising business micro data—results of a German project. J Appl Soc Sci Stud 126: 635–651
Schmid M (2006) Estimation of a linear model under microaggregation by individual ranking. J German Stat Soc 90: 419–438
Schmid M (2009) The effect of single-axis sorting on the estimation of a linear regression. J Official Stat 25: 529–548
Schmid M, Schneeweiss H (2008) Estimation of a linear model in transformed variables under microaggregation by individual ranking. AStA Adv Stat Anal 92: 359–374
Schmid M, Schneeweiss H (2009) The effect of microaggregation by individual ranking on the estimation of moments. J Econom 153: 174–182
Schmid M, Schneeweiss H, Küchenhoff H (2007) Estimation of a linear regression under microaggregation with the response variable as a sorting variable. Stat Neerlandica 61: 407–431
Shorack GR, Wellner JA (1986) Empirical processes with applications to statistics. Wiley, New York
Solanas A, Martinez-Balleste A (2006) V-MDAV: a multivariate microaggregation with variable group size. In: Proceedings in computational statistics (COMPSTAT 2006). Physica-Verlag, Berlin, pp 917–925
Solanas A, Martinez-Balleste A (2009) Advances in artificial intelligence for privacy protection and security. World Scientific, Singapore
Strudler M, Oh HL, Scheuren F (1986) Protection of taxpayer confidentiality with respect to the tax model. In: Proceedings of the section on survey research methods of the American Statistical Association, pp 375–381
UNECE Secretariat (2001) Statistical data confidentiality in the transition countries: 2000/2001 winter survey. In: Joint ECE/Eurostat work session on statistical data confidentiality. Skopje, Macedonia
Willenborg L, de Waal T (2001) Elements of statistical disclosure control. Springer, New York
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Schneeweiss, H., Rost, D. & Schmid, M. Probability and quantile estimation from individually micro-aggregated data. Metrika 75, 721–742 (2012). https://doi.org/10.1007/s00184-011-0349-5
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-011-0349-5