Microaggregation for Categorical Variables: A Median Based Approach

  • Vicenç Torra
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3050)

Abstract

Microaggregation is a masking procedure used for protecting confidential data prior to their public release. This technique, that relies on clustering and aggregation techniques, is solely used for numerical data. In this work we introduce a microaggregation procedure for categorical variables. We describe the new masking method and we analyse the results it obtains according to some indices found in the literature. The method is compared with Top and Bottom Coding, Global recoding, Rank Swapping and PRAM.

Keywords

Privacy preserving data mining Data protection Masking methods Clustering Microaggregation Categorical data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proc. 2000 ACM SIGMOD Int’l Conf. Management of Data, pp. 439–450. ACM Press, New York (2000)CrossRefGoogle Scholar
  2. 2.
    Chiang, J.-H., Hao, P.-Y.: A new kernel-based fuzzy clustering approach: support vector clustering with cell growing. IEEE Trans. on Fuzzy Systems 11(4), 518–527 (2003)CrossRefGoogle Scholar
  3. 3.
    Data Extraction System (DES), U. S. Census Bureau, http://www.census.gov/DES/www/welcome.html
  4. 4.
    Domingo-Ferrer, J., Torra, V.: Disclosure Control Methods and Information Loss for Microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L.M. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. Elsevier, Amsterdam (2001)Google Scholar
  5. 5.
    Domingo-Ferrer, J., Torra, V.: A Quantitative Comparison of Disclosure Control Methods for Microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L.M. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–133. Elsevier, Amsterdam (2001)Google Scholar
  6. 6.
    Domingo-Ferrer, J., Torra, V.: Median based aggregation operators for prototype construction in ordinal scales. Intl. J. of Intel. Syst. 6, 633–655 (2003)CrossRefGoogle Scholar
  7. 7.
    Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L.M. (eds.): Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Elsevier, Amsterdam (2001)Google Scholar
  8. 8.
    Eschrich, S., Ke, J., Hall, L.O., Goldgof, D.B.: Fast accurate fuzzy clustering through data reduction. IEEE Trans. on Fuzzy Systems 11(2), 262–270 (2003)CrossRefGoogle Scholar
  9. 9.
    Felso, F., Theeuwes, J., Wagner, G.G.: Disclosure Limitation Methods in Use: Results of a Survey. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L.M. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 17–42. Elsevier, Amsterdam (2001)Google Scholar
  10. 10.
    Godo, L., Torra, V.: On aggregation operators for ordinal qualitative information. IEEE Trans. on Fuzzy Systems 8(2), 143–154 (2000)CrossRefGoogle Scholar
  11. 11.
    Herrera, F., Herrera-Viedma, E., Verdegay, J.L.: A Sequential Selection process in Group Decision Making with a Linguistic Assessment Approach. Information Science 85, 223–239 (1995)MATHCrossRefGoogle Scholar
  12. 12.
    Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. on Fuzzy Systems 7(4), 446–452 (1999)CrossRefGoogle Scholar
  13. 13.
    Kolen, J.F., Hutcheson, T.: Reducing the time complexity of the fuzzy c-means algorithm. IEEE Trans. on Fuzzy Systems 10(2), 263–267 (2002)CrossRefGoogle Scholar
  14. 14.
    Kooiman, P., Willenborg, L., Gouweleeuw, J.: PRAM: a method for disclosure limitation of microdata, Statistics Netherlands, Research Report (1998)Google Scholar
  15. 15.
    Leski, J.M.: Generalized weighted conditional fuzzy clustering. IEEE Trans. on Fuzzy Systems 11(6), 709–715 (2003)CrossRefGoogle Scholar
  16. 16.
    Miyamoto, S.: Introduction to fuzzy clustering. Morikita, Japan (1999)Google Scholar
  17. 17.
    Miyamoto, S., Umayahara, K.: Methods in Hard and Fuzzy Clustering. In: Liu, Z.-Q., Miyamoto, S. (eds.) Soft Computing and Human-Centered Machines, pp. 85–129. Springer, Tokyo (2000)Google Scholar
  18. 18.
    Sande, G.: Exact and approximate methods for data directed microaggregation in one or more dimensions. Int. J. of Unc. Fuzziness and Knowledge Based Systems 10(5), 459–476 (2002)MATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Sugeno, M.: Theory of Fuzzy Integrals and its Applications (PhD Dissertation). Tokyo Institute of Technology, Tokyo, Japan (1974)Google Scholar
  20. 20.
    Torra, V.: Negation functions based semantics for ordered linguistic labels. Intl. J. of Intel. Syst. 11, 975–988 (1996)CrossRefGoogle Scholar
  21. 21.
    Torra, V.: The Weighted OWA operator. Intl. J. of Intel. Syst. 12, 153–166 (1997)MATHCrossRefGoogle Scholar
  22. 22.
    Torra, V.: Aggregation of linguistic labels when semantics is based on antonyms. Intl. J. of Intel. Systems 16, 513–524 (2001)MATHCrossRefGoogle Scholar
  23. 23.
    Willenborg, L., De Waal, T.: Statistical Disclosure Control in Practice. LNS, vol. 111. Springer, Heidelberg (1996)MATHGoogle Scholar
  24. 24.
    Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer, Heidelberg (2001)MATHCrossRefGoogle Scholar
  25. 25.
    Winkler, W.E.: Matching and record linkage. In: Cox, B.G. (ed.) Business Survey Methods, pp. 355–384. Wiley, New York (1995)Google Scholar
  26. 26.
    Xu, Z.S., Da, Q.L.: An overview of operators for aggregating information. Int. J. of Intel. Systems 18, 953–969 (2003)MATHCrossRefGoogle Scholar
  27. 27.
    Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure Risk Assessment in Perturbative Microdata Protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152. Springer, Heidelberg (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Vicenç Torra
    • 1
  1. 1.Institut d’Investigació en Intel·ligència Artificial, Campus de BellaterraBellaterra, CataloniaSpain

Personalised recommendations