Probability and quantile estimation from individually micro-aggregated data

Schneeweiss, Hans; Rost, Daniel; Schmid, Matthias

doi:10.1007/s00184-011-0349-5

Probability and quantile estimation from individually micro-aggregated data

Published: 16 February 2011

Volume 75, pages 721–742, (2012)
Cite this article

Metrika Aims and scope Submit manuscript

Hans Schneeweiss¹,
Daniel Rost² &
Matthias Schmid³

86 Accesses
3 Altmetric
Explore all metrics

Abstract

Micro-aggregation is a frequently used strategy to anonymize data before they are released to the scientific public. A sample of a continuous random variable is individually micro-aggregated by first sorting and grouping the data into groups of equal size and then replacing the values of the variable in each group by their group mean. In a similar way, data with more than one variable can be anonymized by individual micro-aggregation. Data thus distorted may still be used for statistical analysis. We show that if probabilities and quantiles are estimated in the usual way by computing relative frequencies and sample quantiles, respectively, these estimates are consistent and asymptotically normal under mild conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal CC, Yu PS (2008) Privacy-preserving data mining: models and algorithms. Springer, New York
Book Google Scholar
Bhattacharya RN, Rao RR (1976) Normal approximation and asymptotic expansions. Wiley, New York
MATH Google Scholar
Billingsley P (1968) Convergence of probability measures. Wiley, New York
MATH Google Scholar
Defays D, Anwar MN (1998) Masking microdata using microaggregation. J Official Stat 14: 449–461
Google Scholar
Domingo-Ferrer J, Torra V (2001) A quantitative comparison of disclosure control methods for microdata. In: Doyle P, Lane J, Theeuwes J, Zayatz L (eds) Confidentiality, disclosure, and data access. North-Holland, Amsterdam, pp 111–133
Google Scholar
Domingo-Ferrer J, Torra V (2004) Privacy in statistical databases. Springer, Berlin
Book Google Scholar
Domingo-Ferrer J, Oganian A, Torres A, Mateo-Sanz JM (2002) On the security of microaggregation with individual ranking: Analytical attacks. Int J Uncertain Fuzz Knowl Based Syst 10: 77–491
Article MathSciNet Google Scholar
Domingo-Ferrer J, Martinez-Balleste A, Mateo-Sanz JM, Sebe F (2006) Efficient multivariate data-oriented microaggregation. Int J Very Large Data Bases 15: 355–369
Article Google Scholar
Domingo-Ferrer J, Sebe F, Solanas A (2008) A polynomial-time approximation to optimal multivariate microaggregation. Comput Math Appl 55: 714–732
Article MathSciNet MATH Google Scholar
Doyle P, Lane J, Theeuwes J, Zayatz L (2001) Confidentiality, disclosure, and data access. North-Holland, Amsterdam
Google Scholar
Felsö F, Theeuwes J, Wagner GG (2001) Disclosure limitation methods in use: results of a survey. In: Doyle P, Lane J, Theeuwes J, Zayatz L (eds) Confidentiality, disclosure, and data access. North-Holland, Amsterdam, pp 17–42
Google Scholar
Hrynaszkiewicz I, Norton ML, Vickers AJ, Altman DG (2010) Preparing raw clinical data for publication: Guidance for journal editors, authors, and peer reviewers. BMJ 340: c181
Article Google Scholar
Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Lenz R, Naylor J, Nordholt ES, Seri G, De Wolf, PP (2010) Handbook on Statistical disclosure control, version 1.2 http://neon.vb.cbs.nl/CASC/SDC_Handbook.pdf
Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans Knowl Data Eng 17: 902–911
Article Google Scholar
Mateo-Sanz JM, Domingo-Ferrer J (1998) A comparative study of microaggregation methods. Questiio 22: 511–526
MATH Google Scholar
Ronning G, Sturm R, Höhne J, Lenz R, Rosemann M, Scheffler M, Vorgrimler D (2005) Handbuch zur Anonymisierung wirtschaftsstatistischer Mikrodaten. Statistik und Wissenschaft 4. Statistisches Bundesamt, Wiesbaden (in German)
Rosemann M, Lenz R, Vorgrimler D, Sturm R (2006) Anonymising business micro data—results of a German project. J Appl Soc Sci Stud 126: 635–651
Google Scholar
Schmid M (2006) Estimation of a linear model under microaggregation by individual ranking. J German Stat Soc 90: 419–438
MATH Google Scholar
Schmid M (2009) The effect of single-axis sorting on the estimation of a linear regression. J Official Stat 25: 529–548
Google Scholar
Schmid M, Schneeweiss H (2008) Estimation of a linear model in transformed variables under microaggregation by individual ranking. AStA Adv Stat Anal 92: 359–374
Article MathSciNet Google Scholar
Schmid M, Schneeweiss H (2009) The effect of microaggregation by individual ranking on the estimation of moments. J Econom 153: 174–182
Article MathSciNet Google Scholar
Schmid M, Schneeweiss H, Küchenhoff H (2007) Estimation of a linear regression under microaggregation with the response variable as a sorting variable. Stat Neerlandica 61: 407–431
Article MATH Google Scholar
Shorack GR, Wellner JA (1986) Empirical processes with applications to statistics. Wiley, New York
MATH Google Scholar
Solanas A, Martinez-Balleste A (2006) V-MDAV: a multivariate microaggregation with variable group size. In: Proceedings in computational statistics (COMPSTAT 2006). Physica-Verlag, Berlin, pp 917–925
Solanas A, Martinez-Balleste A (2009) Advances in artificial intelligence for privacy protection and security. World Scientific, Singapore
Book Google Scholar
Strudler M, Oh HL, Scheuren F (1986) Protection of taxpayer confidentiality with respect to the tax model. In: Proceedings of the section on survey research methods of the American Statistical Association, pp 375–381
UNECE Secretariat (2001) Statistical data confidentiality in the transition countries: 2000/2001 winter survey. In: Joint ECE/Eurostat work session on statistical data confidentiality. Skopje, Macedonia
Willenborg L, de Waal T (2001) Elements of statistical disclosure control. Springer, New York
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Munich, Akademiestrasse 1 I, 80799, Munich, Germany
Hans Schneeweiss
Department of Mathematics, University of Munich, Theresienstrasse 39, 80333, Munich, Germany
Daniel Rost
Department of Medical Informatics, Biometry and Epidemiology, University of Erlangen-Nuremberg, Waldstrasse 6, 91054, Erlangen, Germany
Matthias Schmid

Authors

Hans Schneeweiss
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Rost
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Schmid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias Schmid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schneeweiss, H., Rost, D. & Schmid, M. Probability and quantile estimation from individually micro-aggregated data. Metrika 75, 721–742 (2012). https://doi.org/10.1007/s00184-011-0349-5

Download citation

Received: 23 July 2010
Published: 16 February 2011
Issue Date: August 2012
DOI: https://doi.org/10.1007/s00184-011-0349-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Probability and quantile estimation from individually micro-aggregated data

Abstract

Access this article

Similar content being viewed by others

Normalized Entropy Aggregation for Inhomogeneous Large-Scale Data

Tests for aggregated dispersion: Van Valen’s test and a new competitor

On Some Alternative Probability Density Metrics for Analyzing Empirical Datasets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Probability and quantile estimation from individually micro-aggregated data

Abstract

Access this article

Similar content being viewed by others

Normalized Entropy Aggregation for Inhomogeneous Large-Scale Data

Tests for aggregated dispersion: Van Valen’s test and a new competitor

On Some Alternative Probability Density Metrics for Analyzing Empirical Datasets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation