Abstract
An important issue in releasing individual data is to protect the sensitive information from being leaked and maliciously utilized. Famous privacy preserving principles that aim to ensure both data privacy and data integrity, such as k-anonymity and l-diversity, have been extensively studied both theoretically and empirically. Nonetheless, these widely-adopted principles are still insufficient to prevent attribute disclosure if the attacker has partial knowledge about the overall sensitive data distribution. The t-closeness principle has been proposed to fix this, which also has the benefit of supporting numerical sensitive attributes. However, in contrast to k-anonymity and l-diversity, the theoretical aspect of t-closeness has not yet been well investigated.
We initiate the first systematic theoretical study on the t-closeness principle under the commonly-used attribute suppression model. We prove that for every constant t such that 0 ≤ t < 1, it is NP-hard to find an optimal t-closeness generalization of a given table. The proof consists of several reductions each of which works for different values of t, which together cover the full range. To complement this negative result, we also provide exact and fixed-parameter algorithms. Finally, we answer some open questions regarding the complexity of k-anonymity and l-diversity left in the literature.
This work was supported in part by the National Basic Research Program of China Grant 2011CBA00300, 2011CBA00301, the National Natural Science Foundation of China Grant 61033001, 61061130540, 61073174. The research of the second author was supported by the Research Grants Council of Hong Kong under grant 9041688 (CityU 124411).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, G., Feder, T., Motwani, K.K.R., Panigrahy, R., Thomas, D., Zhu, A.: Anonymizing tables. In: ICDT, pp. 246–258 (2005)
Anshelevich, E., Karagiozova, A.: Terminal backup, 3D matching, and covering cubic graphs. SIAM Journal on Computing 40(3), 678–708 (2011)
Baig, M.M., Li, J., Liu, J., Wang, H.: Cloning for privacy protection in multiple independent data publications. In: CIKM, pp. 885–894 (2011)
Blocki, J., Williams, R.: Resolving the complexity of some data privacy problems. In: Abramsky, S., Gavoille, C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds.) ICALP 2010, Part II. LNCS, vol. 6199, pp. 393–404. Springer, Heidelberg (2010)
Bonizzoni, P., Vedova, G.D., Dondi, R.: Anonymizing binary and small tables is hard to approximate. Journal of Combinatorial Optimization 22(1), 97–119 (2011)
Bonizzoni, P., Vedova, G.D., Dondi, R., Pirola, Y.: Parameterized complexity of k-anonymity: hardness and tractability. Journal of Combinatorial Optimization (in press)
Bredereck, R., Nichterlein, A., Niedermeier, R., Philip, G.: The effect of homogeneity on the complexity of k-anonymity. In: Owe, O., Steffen, M., Telle, J.A. (eds.) FCT 2011. LNCS, vol. 6914, pp. 53–64. Springer, Heidelberg (2011)
Cao, J., Karras, P., Kalnis, P., Tan, K.-L.: SABRE: a sensitive attribute bucketization and redistribution framework for t-closeness. The VLDB Journal 20(1), 59–81 (2011)
Dondi, R., Mauri, G., Zoppis, I.: The l-diversity problem: Tractability and approximability. Theoretical Computer Science (2012) (in press), doi:10.1016/j.tcs.2012.05.024
Downey, R.G., Fellows, M.R.: Parameterized Complexity. Springer (1999)
Evans, P.A., Wareham, T., Chaytor, R.: Fixed-parameter tractability of anonymizing data by suppressing entries. Journal of Combinatorial Optimization 18(4), 362–375 (2009)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman (1979)
Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified NP-complete problems. In: STOC, pp. 47–63 (1974)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: Efficient full-domain k-anonymity. In: SIGMOD, pp. 49–60 (2005)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE (2006)
Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: Privacy beyond k-Anonymity and l-Diversity. In: ICDE, pp. 106–115 (2007)
Li, N., Li, T., Venkatasubramanian, S.: Closeness: A new privacy measure for data publishing. IEEE Transactions on Knowledge and Data Engineering 22(7), 943–956 (2010)
Liang, H., Yuan, H.: On the complexity of t-closeness anonymization and related problems. Technical report (2012), http://arxiv.org/abs/1301.1751
Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data (TKDD) 1(1) (2007)
Martin, D.J., Kifer, D., Machanavajjhala, A., Gehrke, J., Halpern, J.Y.: Worst-case background knowledge for privacy preserving data publishing. In: ICDE, pp. 126–135 (2007)
Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS (2004)
Mohammed, N., Chen, R., Fung, B.C.M., Yu, P.S.: Differentially private data release for data mining. In: KDD (2011)
Nergiz, M.E., Atzori, M., Clifton, C.: Hiding the presence of individuals from shared databases. In: SIGMOD, pp. 665–676 (2007)
Park, H., Shim, K.: Approximate algorithms for k-anonymity. In: SIGMOD (2007)
Rebollo-Monedero, D., Forné, J., Domingo-Ferrer, J.: From t-closeness-like privacy to postrandomization via information theory. IEEE Transactions on Knowledge and Data Engineering 22(11), 1623–1636 (2010)
Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision 40(2), 99–121 (2000)
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13(6), 1010–1027 (2001)
Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 557–570 (2002)
Xiao, X., Tao, Y.: Personalized privacy preservation. In: SIGMOD, pp. 229–240 (2006)
Xiao, X., Tao, Y.: m-invariance: Towards privacy preserving re-publication of dynamic datasets. In: SIGMOD, pp. 689–700 (2007)
Xiao, X., Yi, K., Tao, Y.: The hardness and approximation algorithms for l-diversity. In: EDBT, pp. 135–146 (2010)
Xue, M., Karras, P., Raissi, C., Pung, H.K.: Utility-driven anonymization in data publishing. In: CIKM, pp. 2277–2280 (2011)
Zhang, Q., Koudas, N., Srivastava, D., Yu, T.: Aggregate query answering on anonymized tables. In: ICDE, pp. 116–125 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liang, H., Yuan, H. (2013). On the Complexity of t-Closeness Anonymization and Related Problems. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds) Database Systems for Advanced Applications. DASFAA 2013. Lecture Notes in Computer Science, vol 7825. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37487-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-37487-6_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37486-9
Online ISBN: 978-3-642-37487-6
eBook Packages: Computer ScienceComputer Science (R0)