Abstract
In recent years, there has been an alarming increase of online identity theft and attacks using personally identifiable information. The goal of privacy preservation is to de-associate individuals from sensitive or microdata information. Microaggregation techniques seeks to protect microdata in such a way that can be published and mined without providing any private information that can be linked to specific individuals. Microaggregation works by partitioning the microdata into groups of at least k records and then replacing the records in each group with the centroid of the group. An optimal microaggregation method must minimize the information loss resulting from this replacement process. The challenge is how to minimize the information loss during the microaggregation process. This paper presents a new microaggregation technique for Statistical Disclosure Control (SDC). It consists of two stages. In the first stage, the algorithm sorts all the records in the data set in a particular way to ensure that during microaggregation very dissimilar observations are never entered into the same cluster. In the second stage an optimal microaggregation method is used to create k-anonymous clusters while minimizing the information loss. It works by taking the sorted data and simultaneously creating two distant clusters using the two extreme sorted values as seeds for the clusters. The performance of the proposed technique is compared against the most recent microaggregation methods. Experimental results using benchmark datasets show that the proposed algorithm has the lowest information loss compared with a basket of techniques in the literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Academic Publishers, Norwell (1981)
Domingo-Ferrer, J., Torra, V.: Privacy in data mining. Data Min. Knowl. Disc. 11(2), 117–119 (2005)
Domingo-Ferrer, J., Mateo-Sanz, J.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)
Domingo-Ferrer, J., Torra, V.: Extending microaggregation procedures using defuzzification methods for categorical variables. In: 1st international IEEE symposium on intelligent systems, pp. 44–49, Verna (2002)
May, P., Ehrlich, H.C., Steinke, T.: ZIB structure prediction pipeline: composing a complex biological workflow through web services. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128, pp. 1148–1158. Springer, Heidelberg (2006)
Domingo-Ferrer, J., Torra, V.: Towards fuzzy \(c\)-means based microaggregation. In: Grzegorzewski, P., Hryniewicz, O., Gil, A. (eds.) Soft Methods in Probability, Statistics and Data Analysis. Advances in soft computing, vol. 16, pp. 289–294. Physica-Verlag, Heidelberg (2002)
Domingo-Ferrer, J., Torra, V.: Fuzzy microaggregation for microdata protection. J. Adv. Comput. Intell. Intell. Informatics 7(2), 153–159 (2003)
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous kanonymity through microaggregation. Data Min. Knowl. Disc. 11(2), 195–212 (2005)
Domingo-Ferrer, J., Martinez-Balleste, A., Mateo-Sanz, J.M., Sebe, F.: Efficient multivariate data-oriented microaggregation. VLDB J. 15(4), 355–369 (2006)
Domingo-Ferrer, J., Sebe, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregation. Comput. Math. Appl. 55(4), 714–732 (2008)
Han, J.-M., Cen, T.-T., Yu, H.-Q., Yu, J.: A multivariate immune clonal selection microaggregation algorithm. In: IEEE international conference on granular computing, pp. 252–256, Hangzhou (2008)
Hansen, S., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Trans. Knowl. Data Eng. 15(4), 1043–1044 (2003)
Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17(7), 902–911 (2005)
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. U. Nations Econ. Comm. Eur. 18, 345–354 (2001)
Solanas, A.: Privacy protection with genetic algorithms. In: Yang, A., Shan, Y., Bui, L.T. (eds.) Success in Evolutionary Computation. Studies in Computional Intelligence, vol. 92, pp. 215–237. Springer, Heidelberg (2008)
Solanas, A., Martinez-Balleste, A., Domingo-Ferrer, J.: \(V-MDAV\): a multivariate microaggregation with variable group size. In: 17th COMPSTAT Symposium of the IASC, Rome (2006)
Samarati, P.: Protecting respondent’s privacy in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Sweeney, L.: \(k\)-Anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Torra, V.: Microaggregation for categorical variables: a median based approach. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 162–174. Springer, Heidelberg (2004)
Kabir, M.E., Wang, H.: Systematic clustering-based microaggregation for statistical disclosure control. In: IEEE International Conference on Network and System Security, pp. 435–441, Melbourne (2010)
Kabir, M.E., Wang, H., Bertino, E., Chi, Y.: Systematic clustering method for \(l\)-diversity model. In: Australasian Database Conference, pp. 93–102, Brisbane (2010)
Kabir, M.E., Wang, H.: Microdata protection method through microaggragation: a median based approach. Inf. Secur. J. Global Perspect. 20(1), 1–8 (2011)
Ward, J.H.J.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)
Wang, H., Zhang, Y., Cao, J.: Effective collaboration with information sharing in virtual universities. IEEE Trans. Knowl. Data Eng. 21(6), 840–853 (2009)
Willenborg, L., Waal, T.D.: Elements of Statistical Disclosure Control. Lecture notes in statistics. Springer, New York (2001)
Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C–20(1), 68–86 (1971)
Chang, C.-C., Li, Y.-C., Huang, W.-H.: TFRP: an efficient microaggregation algorithm for statistical disclosure control. J. Syst. Softw. 80(11), 1866–1878 (2007)
Lin, J.-L., Wen, T.-H., Hsieh, J.-C., Chang, P.-C.: Density-based microaggregation for statistical disclosure control. Expert Syst. Appl. 37(4), 3256–3263 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Mahmood, A.N., Kabir, M.E., Mustafa, A.K. (2015). Novel Iterative Min-Max Clustering to Minimize Information Loss in Statistical Disclosure Control. In: Tian, J., Jing, J., Srivatsa, M. (eds) International Conference on Security and Privacy in Communication Networks. SecureComm 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 153. Springer, Cham. https://doi.org/10.1007/978-3-319-23802-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-23802-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23801-2
Online ISBN: 978-3-319-23802-9
eBook Packages: Computer ScienceComputer Science (R0)