Novel Iterative Min-Max Clustering to Minimize Information Loss in Statistical Disclosure Control

Mahmood, Abdun Naser; Kabir, Md Enamul; Mustafa, Abdul K.

doi:10.1007/978-3-319-23802-9_14

Abdun Naser Mahmood¹⁸,
Md Enamul Kabir¹⁹ &
Abdul K. Mustafa²⁰

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 153))

Included in the following conference series:

International Conference on Security and Privacy in Communication Networks

759 Accesses

Abstract

In recent years, there has been an alarming increase of online identity theft and attacks using personally identifiable information. The goal of privacy preservation is to de-associate individuals from sensitive or microdata information. Microaggregation techniques seeks to protect microdata in such a way that can be published and mined without providing any private information that can be linked to specific individuals. Microaggregation works by partitioning the microdata into groups of at least k records and then replacing the records in each group with the centroid of the group. An optimal microaggregation method must minimize the information loss resulting from this replacement process. The challenge is how to minimize the information loss during the microaggregation process. This paper presents a new microaggregation technique for Statistical Disclosure Control (SDC). It consists of two stages. In the first stage, the algorithm sorts all the records in the data set in a particular way to ensure that during microaggregation very dissimilar observations are never entered into the same cluster. In the second stage an optimal microaggregation method is used to create k-anonymous clusters while minimizing the information loss. It works by taking the sorted data and simultaneously creating two distant clusters using the two extreme sorted values as seeds for the clusters. The performance of the proposed technique is compared against the most recent microaggregation methods. Experimental results using benchmark datasets show that the proposed algorithm has the lowest information loss compared with a basket of techniques in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Academic Publishers, Norwell (1981)
Book MATH Google Scholar
Domingo-Ferrer, J., Torra, V.: Privacy in data mining. Data Min. Knowl. Disc. 11(2), 117–119 (2005)
Article MathSciNet Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)
Article Google Scholar
Domingo-Ferrer, J., Torra, V.: Extending microaggregation procedures using defuzzification methods for categorical variables. In: 1st international IEEE symposium on intelligent systems, pp. 44–49, Verna (2002)
Google Scholar
May, P., Ehrlich, H.C., Steinke, T.: ZIB structure prediction pipeline: composing a complex biological workflow through web services. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128, pp. 1148–1158. Springer, Heidelberg (2006)
Google Scholar
Domingo-Ferrer, J., Torra, V.: Towards fuzzy \(c\)-means based microaggregation. In: Grzegorzewski, P., Hryniewicz, O., Gil, A. (eds.) Soft Methods in Probability, Statistics and Data Analysis. Advances in soft computing, vol. 16, pp. 289–294. Physica-Verlag, Heidelberg (2002)
Chapter Google Scholar
Domingo-Ferrer, J., Torra, V.: Fuzzy microaggregation for microdata protection. J. Adv. Comput. Intell. Intell. Informatics 7(2), 153–159 (2003)
Article MATH Google Scholar
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous kanonymity through microaggregation. Data Min. Knowl. Disc. 11(2), 195–212 (2005)
Article Google Scholar
Domingo-Ferrer, J., Martinez-Balleste, A., Mateo-Sanz, J.M., Sebe, F.: Efficient multivariate data-oriented microaggregation. VLDB J. 15(4), 355–369 (2006)
Article Google Scholar
Domingo-Ferrer, J., Sebe, F., Solanas, A.: A polynomial-time approximation to optimal multivariate microaggregation. Comput. Math. Appl. 55(4), 714–732 (2008)
Article MathSciNet MATH Google Scholar
Han, J.-M., Cen, T.-T., Yu, H.-Q., Yu, J.: A multivariate immune clonal selection microaggregation algorithm. In: IEEE international conference on granular computing, pp. 252–256, Hangzhou (2008)
Google Scholar
Hansen, S., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Trans. Knowl. Data Eng. 15(4), 1043–1044 (2003)
Article Google Scholar
Laszlo, M., Mukherjee, S.: Minimum spanning tree partitioning algorithm for microaggregation. IEEE Trans. Knowl. Data Eng. 17(7), 902–911 (2005)
Article Google Scholar
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. U. Nations Econ. Comm. Eur. 18, 345–354 (2001)
Google Scholar
Solanas, A.: Privacy protection with genetic algorithms. In: Yang, A., Shan, Y., Bui, L.T. (eds.) Success in Evolutionary Computation. Studies in Computional Intelligence, vol. 92, pp. 215–237. Springer, Heidelberg (2008)
Chapter Google Scholar
Solanas, A., Martinez-Balleste, A., Domingo-Ferrer, J.: \(V-MDAV\): a multivariate microaggregation with variable group size. In: 17th COMPSTAT Symposium of the IASC, Rome (2006)
Google Scholar
Samarati, P.: Protecting respondent’s privacy in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Article Google Scholar
Sweeney, L.: \(k\)-Anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Article MathSciNet MATH Google Scholar
Torra, V.: Microaggregation for categorical variables: a median based approach. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 162–174. Springer, Heidelberg (2004)
Google Scholar
Kabir, M.E., Wang, H.: Systematic clustering-based microaggregation for statistical disclosure control. In: IEEE International Conference on Network and System Security, pp. 435–441, Melbourne (2010)
Google Scholar
Kabir, M.E., Wang, H., Bertino, E., Chi, Y.: Systematic clustering method for \(l\)-diversity model. In: Australasian Database Conference, pp. 93–102, Brisbane (2010)
Google Scholar
Kabir, M.E., Wang, H.: Microdata protection method through microaggragation: a median based approach. Inf. Secur. J. Global Perspect. 20(1), 1–8 (2011)
Article Google Scholar
Ward, J.H.J.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)
Article MathSciNet Google Scholar
Wang, H., Zhang, Y., Cao, J.: Effective collaboration with information sharing in virtual universities. IEEE Trans. Knowl. Data Eng. 21(6), 840–853 (2009)
Article Google Scholar
Willenborg, L., Waal, T.D.: Elements of Statistical Disclosure Control. Lecture notes in statistics. Springer, New York (2001)
Book MATH Google Scholar
Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C–20(1), 68–86 (1971)
Article MATH Google Scholar
Chang, C.-C., Li, Y.-C., Huang, W.-H.: TFRP: an efficient microaggregation algorithm for statistical disclosure control. J. Syst. Softw. 80(11), 1866–1878 (2007)
Article Google Scholar
Lin, J.-L., Wen, T.-H., Hsieh, J.-C., Chang, P.-C.: Density-based microaggregation for statistical disclosure control. Expert Syst. Appl. 37(4), 3256–3263 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering and Information Technology, University of New South Wales Australian Defence Force Academy, Canberra, 2600, Australia
Abdun Naser Mahmood
School of Human Movement Studies, University of Queensland, St Lucia, 4072, Australia
Md Enamul Kabir
School of Applied Technology, Humber College, North Campus, Toronto, Canada
Abdul K. Mustafa

Authors

Abdun Naser Mahmood
View author publications
You can also search for this author in PubMed Google Scholar
Md Enamul Kabir
View author publications
You can also search for this author in PubMed Google Scholar
Abdul K. Mustafa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdun Naser Mahmood .

Editor information

Editors and Affiliations

CAS, Institute of Information Engineering, Beijing, China
Jin Tian
Institute of Information Engineering, Beijing, China
Jiwu Jing
IBM Thomas J. Watson Research Center, New York, New York, USA
Mudhakar Srivatsa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mahmood, A.N., Kabir, M.E., Mustafa, A.K. (2015). Novel Iterative Min-Max Clustering to Minimize Information Loss in Statistical Disclosure Control. In: Tian, J., Jing, J., Srivatsa, M. (eds) International Conference on Security and Privacy in Communication Networks. SecureComm 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 153. Springer, Cham. https://doi.org/10.1007/978-3-319-23802-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-23802-9_14
Published: 19 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23801-2
Online ISBN: 978-3-319-23802-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics