Journal of Intelligent Information Systems

, Volume 33, Issue 2, pp 209–234 | Cite as

(α, k)-anonymous data publishing

  • Raymond Wong
  • Jiuyong LiEmail author
  • Ada Fu
  • Ke Wang


Privacy preservation is an important issue in the release of data for mining purposes. The k-anonymity model has been introduced for protecting individual identification. Recent studies show that a more sophisticated model is necessary to protect the association of individuals to sensitive information. In this paper, we propose an (α, k)-anonymity model to protect both identifications and relationships to sensitive information in data. We discuss the properties of (α, k)-anonymity model. We prove that the optimal (α, k)-anonymity problem is NP-hard. We first present an optimal global-recoding method for the (α, k)-anonymity problem. Next we propose two scalable local-recoding algorithms which are both more scalable and result in less data distortion. The effectiveness and efficiency are shown by experiments. We also describe how the model can be extended to more general cases.


Privacy Data mining Anonymity Privacy preservation Data publishing 



We are grateful to the anonymous reviewers for their constructive comments on this paper. This research was supported in part by HKSAR RGC Direct Allocation Grant DAG08/09.EG01 to Raymond Chi-Wing Wong, This research was supported by ARC discovery grant DP0774450 to Jiuyong Li.


  1. Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., et al. (2005). Anonymizing tables. In ICDT (pp. 246–258).Google Scholar
  2. Agrawal, D., & Aggarwal, C. C. (2001). On the design and quantification of privacy preserving data mining algorithms. In PODS ’01: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (pp. 247–255). New York: ACM.CrossRefGoogle Scholar
  3. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In VLDB.Google Scholar
  4. Agrawal, R., & Srikant, R. (2000). Privacy-preserving data mining. In Proc. of the ACM SIGMOD conference on management of data (pp. 439–450). New York: ACM.CrossRefGoogle Scholar
  5. Bayardo, R., & Agrawal, R. (2005). Data privacy through optimal k-anonymization. In ICDE (pp. 217–228).Google Scholar
  6. Blake, E. K. C., & Merz, C. J. (1998). UCI repository of machine learning databases.
  7. Bu, Y., Fu, A. W.-C., Wong, R. C.-W., Chen, L., & Li, J. (2008). Privacy preserving serial data publishing by role composition. In VLDB.Google Scholar
  8. Cox, L. (1980). Suppression methodology and statistical disclosure control. Journal of the American Statistical Association, 75, 377–385.zbMATHCrossRefGoogle Scholar
  9. Fayyad, U. M., & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the thirteenth international joint conference on artificial intelligence (IJCAI-93) (pp. 1022–1027). San Francisco: Morgan Kaufmann.Google Scholar
  10. Fung, B. C. M., Wang, K., & Yu, P. S. (2005). Top-down specialization for information and privacy preservation. In ICDE (pp. 205–216).Google Scholar
  11. Holyer, I. (1981). The np-completeness of some edge-partition problems. SIAM Journal on Computing, 10(4), 713–717.zbMATHCrossRefMathSciNetGoogle Scholar
  12. Hundepool, A. (2004). The argus software in the casc-project: Casc project international workshop. In Privacy in statistical databases. Lecture notes in computer science (Vol. 3050, pp. 323–335). Barcelona: Springer.Google Scholar
  13. Hundepool, A., & Willenborg, L. (1996). μ-and τ- argus: Software for statistical disclosure control. In Third international seminar on statsitcal confidentiality, Bled.Google Scholar
  14. Iyengar, V. S. (2002). Transforming data to satisfy privacy constraints. In KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 279–288).Google Scholar
  15. LeFevre, K., DeWitt, D. J., & Ramakrishnan, R. (2005). Incognito: Efficient full-domain k-anonymity. In SIGMOD conference (pp. 49–60).Google Scholar
  16. Li, J., Wong, R. C.-W., Fu, A. W.-C., & Pei, J. (2006). Achieving k-anonymity by clustering in attribute hierarchical structures. In DaWaK.Google Scholar
  17. Li, N., & Li, T. (2007). t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE.Google Scholar
  18. Machanavajjhala, A., Gehrke, J., & Kifer, D. (2006). l-diversity: Privacy beyond k-anonymity. In ICDE06.Google Scholar
  19. Meyerson, A., & Williams, R. (2004). On the complexity of optimal k-anonymity. In PODS (pp. 223–228).Google Scholar
  20. Rizvi, S., & Haritsa, J. (2002). Maintaining data privacy in association rule mining. In Proceedings of the 28th conference on very large data base (VLDB02) (pp. 682–693). VLDB Endowment.Google Scholar
  21. Samarati, P. (2001). Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6), 1010–1027.CrossRefGoogle Scholar
  22. Sweeney, L. (2002a). Achieving k-anonymity privacy protection using generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowldege Based Systems, 10(5), 571–588.zbMATHCrossRefMathSciNetGoogle Scholar
  23. Sweeney, L. (2002b). k-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowldeg Based Systems, 10(5), 557–570.zbMATHCrossRefMathSciNetGoogle Scholar
  24. Verykios, V. S., Elmagarmid, A. K., Bertino, E., Saygin, Y., & Dasseni, E. (2004). Association rule hiding. IEEE Transactions on Knowledge and Data Engineering, 16(4), 434–447.CrossRefGoogle Scholar
  25. Wang, K., Fung, B. C. M., & Yu, P. S. (2005). Template-based privacy preservation in classification problems. In ICDM05.Google Scholar
  26. Wang, K., Fung, B., & Yu, P. (2007). Handicapping attacker’s confidence: An alternative to k-anonymization. Knowledge and Information Systems: An International Journal, 11(3), 345–368.CrossRefGoogle Scholar
  27. Wang, K., Yu, P. S., & Chakraborty, S. (2004). Bottom-up generalization: A data mining solution to privacy protection. In ICDM (pp. 249–256).Google Scholar
  28. Willenborg, L., & de Waal, T. (1996). Statistical disclosure control in practice. Lecture Notes in Statistics, 111.Google Scholar
  29. Xiao, X., & Tao, Y. (2006). Personalized privacy preservation. In SIGMOD.Google Scholar
  30. Xiao, X., & Tao, Y. (2007). m-invariance: Towards privacy preserving re-publication of dynamic datasets. In SIGMOD.Google Scholar
  31. Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., & Fu, A. (2006). Utility-based anonymization using local recoding. In KDD.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringHong Kong University of Science and TechnologyKowloonHong Kong
  2. 2.School of Computer and Information SciencesUniversity of South AustraliaMawson LakesAustralia
  3. 3.Department of Computer Science and EngineeringChinese University of Hong KongShatinHong Kong
  4. 4.Department of Computer ScienceSimon Fraser UniversityBurnabyCanada

Personalised recommendations