Skip to main content
Log in

Probability and quantile estimation from individually micro-aggregated data

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

Micro-aggregation is a frequently used strategy to anonymize data before they are released to the scientific public. A sample of a continuous random variable is individually micro-aggregated by first sorting and grouping the data into groups of equal size and then replacing the values of the variable in each group by their group mean. In a similar way, data with more than one variable can be anonymized by individual micro-aggregation. Data thus distorted may still be used for statistical analysis. We show that if probabilities and quantiles are estimated in the usual way by computing relative frequencies and sample quantiles, respectively, these estimates are consistent and asymptotically normal under mild conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aggarwal CC, Yu PS (2008) Privacy-preserving data mining: models and algorithms. Springer, New York

    Book  Google Scholar 

  • Bhattacharya RN, Rao RR (1976) Normal approximation and asymptotic expansions. Wiley, New York

    MATH  Google Scholar 

  • Billingsley P (1968) Convergence of probability measures. Wiley, New York

    MATH  Google Scholar 

  • Defays D, Anwar MN (1998) Masking microdata using microaggregation. J Official Stat 14: 449–461

    Google Scholar 

  • Domingo-Ferrer J, Torra V (2001) A quantitative comparison of disclosure control methods for microdata. In: Doyle P, Lane J, Theeuwes J, Zayatz L (eds) Confidentiality, disclosure, and data access. North-Holland, Amsterdam, pp 111–133

    Google Scholar 

  • Domingo-Ferrer J, Torra V (2004) Privacy in statistical databases. Springer, Berlin

    Book  Google Scholar 

  • Domingo-Ferrer J, Oganian A, Torres A, Mateo-Sanz JM (2002) On the security of microaggregation with individual ranking: Analytical attacks. Int J Uncertain Fuzz Knowl Based Syst 10: 77–491

    Article  MathSciNet  Google Scholar 

  • Domingo-Ferrer J, Martinez-Balleste A, Mateo-Sanz JM, Sebe F (2006) Efficient multivariate data-oriented microaggregation. Int J Very Large Data Bases 15: 355–369

    Article  Google Scholar 

  • Domingo-Ferrer J, Sebe F, Solanas A (2008) A polynomial-time approximation to optimal multivariate microaggregation. Comput Math Appl 55: 714–732

    Article  MathSciNet  MATH  Google Scholar 

  • Doyle P, Lane J, Theeuwes J, Zayatz L (2001) Confidentiality, disclosure, and data access. North-Holland, Amsterdam

    Google Scholar 

  • Felsö F, Theeuwes J, Wagner GG (2001) Disclosure limitation methods in use: results of a survey. In: Doyle P, Lane J, Theeuwes J, Zayatz L (eds) Confidentiality, disclosure, and data access. North-Holland, Amsterdam, pp 17–42

    Google Scholar 

  • Hrynaszkiewicz I, Norton ML, Vickers AJ, Altman DG (2010) Preparing raw clinical data for publication: Guidance for journal editors, authors, and peer reviewers. BMJ 340: c181

    Article  Google Scholar 

  • Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Lenz R, Naylor J, Nordholt ES, Seri G, De Wolf, PP (2010) Handbook on Statistical disclosure control, version 1.2 http://neon.vb.cbs.nl/CASC/SDC_Handbook.pdf

  • Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17: 902–911

    Article  Google Scholar 

  • Mateo-Sanz JM, Domingo-Ferrer J (1998) A comparative study of microaggregation methods. Questiio 22: 511–526

    MATH  Google Scholar 

  • Ronning G, Sturm R, Höhne J, Lenz R, Rosemann M, Scheffler M, Vorgrimler D (2005) Handbuch zur Anonymisierung wirtschaftsstatistischer Mikrodaten. Statistik und Wissenschaft 4. Statistisches Bundesamt, Wiesbaden (in German)

  • Rosemann M, Lenz R, Vorgrimler D, Sturm R (2006) Anonymising business micro data—results of a German project. J Appl Soc Sci Stud 126: 635–651

    Google Scholar 

  • Schmid M (2006) Estimation of a linear model under microaggregation by individual ranking. J German Stat Soc 90: 419–438

    MATH  Google Scholar 

  • Schmid M (2009) The effect of single-axis sorting on the estimation of a linear regression. J Official Stat 25: 529–548

    Google Scholar 

  • Schmid M, Schneeweiss H (2008) Estimation of a linear model in transformed variables under microaggregation by individual ranking. AStA Adv Stat Anal 92: 359–374

    Article  MathSciNet  Google Scholar 

  • Schmid M, Schneeweiss H (2009) The effect of microaggregation by individual ranking on the estimation of moments. J Econom 153: 174–182

    Article  MathSciNet  Google Scholar 

  • Schmid M, Schneeweiss H, Küchenhoff H (2007) Estimation of a linear regression under microaggregation with the response variable as a sorting variable. Stat Neerlandica 61: 407–431

    Article  MATH  Google Scholar 

  • Shorack GR, Wellner JA (1986) Empirical processes with applications to statistics. Wiley, New York

    MATH  Google Scholar 

  • Solanas A, Martinez-Balleste A (2006) V-MDAV: a multivariate microaggregation with variable group size. In: Proceedings in computational statistics (COMPSTAT 2006). Physica-Verlag, Berlin, pp 917–925

  • Solanas A, Martinez-Balleste A (2009) Advances in artificial intelligence for privacy protection and security. World Scientific, Singapore

    Book  Google Scholar 

  • Strudler M, Oh HL, Scheuren F (1986) Protection of taxpayer confidentiality with respect to the tax model. In: Proceedings of the section on survey research methods of the American Statistical Association, pp 375–381

  • UNECE Secretariat (2001) Statistical data confidentiality in the transition countries: 2000/2001 winter survey. In: Joint ECE/Eurostat work session on statistical data confidentiality. Skopje, Macedonia

  • Willenborg L, de Waal T (2001) Elements of statistical disclosure control. Springer, New York

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthias Schmid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schneeweiss, H., Rost, D. & Schmid, M. Probability and quantile estimation from individually micro-aggregated data. Metrika 75, 721–742 (2012). https://doi.org/10.1007/s00184-011-0349-5

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-011-0349-5

Keywords

Navigation