Conditional Masking to Numerical Data

  • Debolina GhatakEmail author
  • Bimal K. Roy
Original Article


Protecting the privacy of datasets has become hugely important these days. Many real-life datasets like income data and medical data need to be secured before making it public. However, security comes at the cost of losing some useful statistical information about the dataset. Data obfuscation deals with this problem of masking a dataset in such a way that the utility of the data is maximized while minimizing the risk of the disclosure of sensitive information. Two popular approaches to data obfuscation for numerical data involve (i) data swapping and (ii) adding noise to data. While the former masks well sacrificing the whole of correlation information, the latter gives estimates for most of the popular statistics like mean, variance, quantiles and correlation but fails to give an unbiased estimate of the distribution curve of the original data. In this paper, we propose a mixed method of obfuscation combining the above two approaches and discuss how the proposed method succeeds in giving an unbiased estimation of the distribution curve while giving reliable estimates of the other well-known statistics like moments and correlation.


Data obfuscation Quantile estimation Privacy protection Masking numerical data-sets 



  1. 1.
    Steinberg J, Pritzker L (1967) Some experiences with and reactions on data linkage in the United States. Bulletin of the International Statistical Institute, pp 786–808Google Scholar
  2. 2.
    Bachi R, Baron R (1969) Confidentiality problems related to data banks. Bulletin of the International Statistical Institute, vol 43, pp 225–241Google Scholar
  3. 3.
    Dalenius T (1974) The invasion of privacy problem and statistics production—an overview. Statistisk Tidskrzft 12:213.A–225.AGoogle Scholar
  4. 4.
    Dalenius T (1977) Computers and individual privacy some international implications. Bulletin of the International Statistical Institute, vol 47, pp 203–211Google Scholar
  5. 5.
    Mugge RH (1983) Issues in protecting confidentiality in national health statistics. In: Proceedings of the Section on Survey Research Methods. American Statistical Association, pp 592–594Google Scholar
  6. 6.
    Fienberg SE (1994) Conflict between the needs for access to statistical information and demands for confidentiality. J Off Stat 10(2):115–132Google Scholar
  7. 7.
    Trabelsi S, Salzgeber V, Bezzi M, Montagnon G (2009) Data disclosure risk evaluation. IEEE Xplore.
  8. 8.
    Fuller WA (1993) Masking procedures for microdata disclosure limitation. J Off Stat 9:383–406Google Scholar
  9. 9.
    Fuller WA (1987) Measurement error models. Wiley, New YorkCrossRefGoogle Scholar
  10. 10.
    Dalenius T, Reiss SP (1982) Data-swapping: a technique for disclosure control. J Stat Plan Inference 6:73–85MathSciNetCrossRefGoogle Scholar
  11. 11.
    Moore RA (1996) Controlled data swapping techniques for masking use microdata sets. US Bureau of the Census, Statistical Research Division.
  12. 12.
    Sarathy R, Muralidhar K, Parsa R (2002) Perturbing non-normal confidential attributes: the copula approach. Manag Sci 48(12):1613–1627CrossRefGoogle Scholar
  13. 13.
    Ghatak D, Roy B (2018) Estimation of true quantiles from quantitative data obfuscated with additive noise. J Off Stat 34:671–694CrossRefGoogle Scholar

Copyright information

© Grace Scientific Publishing 2019

Authors and Affiliations

  1. 1.Applied Statistics UnitIndian Statistical InstituteKolkataIndia

Personalised recommendations