Evaluating Fuzzy Clustering Algorithms for Microdata Protection

  • Vicenç Torra
  • Sadaaki Miyamoto
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3050)

Abstract

Microaggregation is a well-known technique for data protection. It is usually operationally defined in a two-step process: (i) a large number of small clusters are built from data and (ii) data are replaced by cluster aggregates. In this work we study the use of fuzzy clustering in the first step. In particular, we consider standard fuzzy c-means and entropy based fuzzy c-means. For both methods, our study includes variable-size and non-variable-size variations. The resulting masking methods are compared using standard scoring methods.

Keywords

Privacy preserving data mining Statistical Disclosure Control Inference Control Microdata Protection Microaggregation Fuzzy clustering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bacher, J., Brand, R., Bender, S.: Re-identifying register data by survey data using cluster analysis: an empirical study. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 589–608 (2002)MATHCrossRefGoogle Scholar
  2. 2.
    Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)MATHGoogle Scholar
  3. 3.
    Chiang, J.-H., Hao, P.-Y.: A new kernel-based fuzzy clustering approach: support vector clustering with cell growing. IEEE Trans. on Fuzzy Systems 11(4), 518–527 (2003)CrossRefGoogle Scholar
  4. 4.
    Data Extraction System, U.S. Census Bureau (2002), http://www.census.gov/DES/www/welcome.html
  5. 5.
    Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 93–112. North-Holland, Amsterdam (2002)Google Scholar
  6. 6.
    Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 113–134. North- Holland, Amsterdam (2002)Google Scholar
  7. 7.
    Domingo-Ferrer, J., Torra, V.: On the use of “At least k fuzzy c-means” in microaggregation: description and evaluation. In: Proc. of the Joint 1st Int. Conference on Soft Computing and Intelligent Systems and 3rd Int. Symposium on Advanced Intelligent Systems, CD-ROM, Tsukuba, Japan (2002)Google Scholar
  8. 8.
    Domingo-Ferrer, J., Torra, V.: Fuzzy Microaggregation for Microdata Protection. J. of Advanced Computational Intelligence and Intelligent Informatics 7(2), 153–159 (2003)Google Scholar
  9. 9.
    Eschrich, S., Ke, J., Hall, L.O., Goldgof, D.B.: Fast accurate fuzzy clustering through data reduction. IEEE Trans. on Fuzzy Systems 11(2), 262–270 (2003)CrossRefGoogle Scholar
  10. 10.
    Hundepool, A., Willenborg, L., Wessels, A., Van Gemerden, L., Tiourine, S., Hurkens, C.: μ-Argus 3.0 User’s Manual, Statistics Netherlands (1998)Google Scholar
  11. 11.
    Ichihashi, H., Honda, K., Tani, N.: Gaussian mixture PDF approximation and fuzzy c-means clustering with entropy regularization. In: Proc. of the 4th Asian Fuzzy System Symposium, Tsukuba, Japan, May 31-June 3, pp. 217–221 (2000)Google Scholar
  12. 12.
    Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 Census of Tampa, Florida. Journal of the American Statistical Association 84, 414–420 (1989)CrossRefGoogle Scholar
  13. 13.
    Kolen, J.F., Hutcheson, T.: Reducing the time complexity of the fuzzy c-means algorithm. IEEE Trans. on Fuzzy Systems 10(2), 263–267 (2002)CrossRefGoogle Scholar
  14. 14.
    Leski, J.M.: Generalized weighted conditional fuzzy clustering. IEEE Trans. on Fuzzy Systems 11(6), 709–715 (2003)CrossRefGoogle Scholar
  15. 15.
    Miyamoto, S., Mukaidono, M.: Fuzzy c - means as a regularization and maximum entropy approach. In: Proc. of the 7th International Fuzzy Systems Association World Congress (IFSA 1997), Prague, Chech, June 25-30, vol. II, pp. 86–92 (1997)Google Scholar
  16. 16.
    Miyamoto, S.: Introduction to fuzzy clustering. Morikita, Japan (1999)Google Scholar
  17. 17.
    Miyamoto, S., Umayahara, K.: Fuzzy c-means with variables for cluster sizes. In: 16th Fuzzy System Symposium, Akita, September 6-8, pp. 537–538 (2000) (in Japanese)Google Scholar
  18. 18.
    Miyamoto, S., Suizu, D.: Fuzzy c-means clustering using kernel functions in support vector machines. Journal of Advanced Computational Intelligence and Intelligent Informatics 7(1), 25–30 (2003)Google Scholar
  19. 19.
    Miyamoto, S., Umayahara, K.: Methods in Hard and Fuzzy Clustering. In: Liu, Z.-Q., Miyamoto, S. (eds.) Soft Computing and Human-Centered Machines, pp. 85–129. Springer, Tokyo (2000)Google Scholar
  20. 20.
    Shibuya, K., Miyamoto, S., Takata, O., Umayahara, K.: Regularization and Constraints in Fuzzy c-means and Possibilistic Clustering. Journal of the Japanese Fuzzy Society 13(6), 707–715 (2001)Google Scholar
  21. 21.
    Torra, V., Domingo-Ferrer, J.: Record linkage methods for multidatabase data mining. In: Torra, V. (ed.) Information Fusion in Data Mining, pp. 99–130. Springer, Berlin (2003)Google Scholar
  22. 22.
    Willenborg, L., De Waal, T.: Elements of Statistical Disclosure Control. LNS, vol. 155. Springer, New York (2001)CrossRefGoogle Scholar
  23. 23.
    Winkler, W.E.: Matching and record linkage. In: Cox, B.G. (ed.) Business Survey Methods, pp. 355–384. Wiley, New York (1995)Google Scholar
  24. 24.
    Winkler, W.E.: Advanced methods for record linkage. In: Proceedings of the American Statistical Association Section on Survey Research Methods, pp. 467–472 (1995)Google Scholar
  25. 25.
    Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure Risk Assessment in Perturbative Microdata Protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152. Springer, Heidelberg (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Vicenç Torra
    • 1
  • Sadaaki Miyamoto
    • 2
  1. 1.Institut d’Investigació en Intel·ligència Artificial (IIIA-CSIC)Bellaterra
  2. 2.Institute of Engineering Mechanics and SystemsUniversity of TsukubaIbarakiJapan

Personalised recommendations