Computing Covariance and Correlation in Optimally Privacy-Protected Statistical Databases: Feasible Algorithms

Conference paper
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 312)

Abstract

In many real-life situations, e.g., in medicine, it is necessary to process data while preserving the patients’ confidentiality. One of the most efficient methods of preserving privacy is to replace the exact values with intervals that contain these values. For example, instead of an exact age, a privacy-protected database only contains the information that the age is, e.g., between 10 and 20, or between 20 and 30, etc. Based on this data, it is important to compute correlation and covariance between different quantities. For privacy-protected data, different values from the intervals lead, in general, to different estimates for the desired statistical characteristic. Our objective is then to compute the range of possible values of these estimates.

Algorithms for effectively computing such ranges have been developed for situations when intervals come from the original surveys, e.g., when a person fills in whether his or her age is between 10 or 20, between 20 and 30, etc. These intervals, however, do not always lead to an optimal privacy protection; it turns out that more complex, computer-generated “intervalization” can lead to better privacy under the same accuracy, or, alternatively, to more accurate estimates of statistical characteristics under the same privacy constraints. In this paper, we extend the existing efficient algorithms for computing covariance and correlation based on privacy-protected data to this more general case of interval data.

Keywords

privacy protection statistical database computing covariance computing correlation interval uncertainty 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: A Framework for Efficient Data Anonymization under Privacy and Accuracy Constraints. ACM Transactions on Database Systems 34(2), Article 9 (2009)Google Scholar
  2. 2.
    Jalal-Kamali, A., Kreinovich, V.: Estimating Correlation under Interval Uncertainty. Mechanical Systems and Signal Processing 37, 43–53 (2013)CrossRefGoogle Scholar
  3. 3.
    Jalal-Kamali, A., Kreinovich, V., Longpré, L.: Estimating Covariance for Privacy Case under Interval (and Fuzzy) Uncertainty. In: Yager, R.R., Reformat, M., Shahbazova, S., Ovchinnikov, S. (eds.) Proceedings of the World Conference on Soft Computing, San Francisco, CA, May 23-26 (2011)Google Scholar
  4. 4.
    Kreinovich, V., Longpré, L., Starks, S.A., Xiang, G., Beck, J., Kandathi, R., Nayak, A., Ferson, S., Hajagos, J.: Interval Versions of Statistical Techniques, with Applications to Environmental Analysis, Bioinformatics, and Privacy in Statistical Databases. Journal of Computational and Applied Mathematics 199(2), 418–423 (2007)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Kreinovich, V., Xiang, G., Starks, S.A., Longpré, L., Ceberio, M., Araiza, R., Beck, J., Kandathi, R., Nayak, A., Torres, R., Hajagos, J.: Towards combining probabilistic and interval uncertainty in engineering calculations: algorithms for computing statistics under interval uncertainty, and their computational complexity. Reliable Computing 12(6), 471–501 (2006)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Nguyen, H.T., Kreinovich, V., Wu, B., Xiang, G.: Computing Statistics under Interval and Fuzzy Uncertainty. SCI, vol. 393. Springer, Heidelberg (2012)CrossRefMATHGoogle Scholar
  7. 7.
    Papadimitriou, C.H.: Computational Complexity. Addison Wesley, San Diego (1994)MATHGoogle Scholar
  8. 8.
    Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures. Chapman & Hall/CRC, Boca Raton (2011)MATHGoogle Scholar
  9. 9.
    Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-Based System 10(5), 557–570 (2002)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Xiang, G., Ferson, S., Ginzburg, L., Longpré, L., Mayorga, E., Kosheleva, O.: Data Anonymization that Leads to the Most Accurate Estimates of Statistical Characteristics: Fuzzy-Motivated Approach. In: Proceedings of the Joint World Congress of the International Fuzzy Systems Association and Annual Conference of the North American Fuzzy Information Processing Society IFSA/NAFIPS 2013, Edmonton, Canada, June 24-28, pp. 611–616 (2013)Google Scholar
  11. 11.
    Xiang, G., Kreinovich, V.: Data Anonymization that Leads to the Most Accurate Estimates of Statistical Characteristics. In: Proceedings of the IEEE Symposium on Computational Intelligence for Engineering Solutions CIES 2013, Singapore, April 16-19, pp. 163–170 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Joshua Day
    • 1
  • Ali Jalal-Kamali
    • 2
  • Vladik Kreinovich
    • 2
  1. 1.Department of Computer ScienceUniversity of Wisconsin at WhitewaterWhitewaterUSA
  2. 2.Department of Computer ScienceUniversity of Texas at El PasoEl PasoUSA

Personalised recommendations