Abstract
Huge amount of information is managed and shared publically by the individuals and data controllers. Publically shared data contains information that can reveal identity of users, thus affecting privacy of individuals. To palliate these disclosure risks, Statistical Disclosure Control (SDC) methods are applied to the data before it is released. Microaggregation is one of the SDC methods that aggregate similar records into clusters, and then transform them into m indistinguishable records. K-means is a famous data mining clustering algorithm for continuous data, which iteratively maps similar elements into k-cluster until they all converge. However, adapting k-means algorithm for categorical multivariate is a challenging task due to high dimensionality of attributes. In this paper, we extend k-means clustering algorithm to achieve notion of microaggregation of structured data. Moreover, to preserve data utility, we extend fixed clustering nature of this algorithm to adaptive size clusters. For this purpose, we introduce n-means clustering approach that construct clusters based on the semantics of the datasets. In experiments, we proved significance of our proposed system by measuring cohesion of clusters and information loss for utility purpose.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Batet, M., Erola, A., Sánchez, D., Castellà -Roca, J.: Utility preserving query log anonymization via semantic microaggregation. Inf. Sci. 242, 49–63 (2013)
Domingo-Ferrer, J.: Microaggregation. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 1736–1737. Springer, Boston (2009)
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10, 557–570 (2002)
MartÃnez, S., Sánchez, D., Valls, A.: Semantic adaptive microaggregation of categorical microdata. Comput. Secur. 31, 653–672 (2012)
Domingo-Ferrer, J., Sánchez, D., Rufian-Torrell, G.: Anonymization of nominal data based on semantic marginality. Inf. Sci. 242, 35–48 (2013)
Erola, A., Castellà -Roca, J., Navarro-Arribas, G., Torra, V.: Semantic microaggregation for the anonymization of query logs. In: Privacy in Statistical Databases, pp. 127–137. Springer, Heidelberg (2010)
Ahmad, A., Dey, L.: A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets. Pattern Recogn. Lett. 32, 1062–1069 (2011)
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2, 283–304 (1998)
Jiang, F., Liu, G., Du, J., Sui, Y.: Initialization of K-modes clustering using outlier detection techniques. Inf. Sci. 332, 167–183 (2016)
Kuo, R.J., Potti, Y., Zulvia, F.E.: Application of metaheuristic based fuzzy K-modes algorithm to supplier clustering. Comput. Ind. Eng. 120, 298–307 (2018)
Han, J., Yu, J., Mo, Y., Lu, J., Liu, H.: MAGE: a semantics retaining K-anonymization method for mixed data. Knowl.-Based Syst. 55, 75–86 (2014)
Wei, T., Lu, Y., Chang, H., Zhou, Q., Bao, X.: A semantic approach for text clustering using WordNet and lexical chains. Expert Syst. Appl. 42, 2264–2275 (2015)
Ben Salem, S., Naouali, S., Chtourou, Z.: A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach. Comput. Electr. Eng. 68, 463–483 (2018)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 189–201 (2002)
Templ, M., Meindl, B., Kowarik, A., Chen, S.: Introduction to Statistical Disclosure Control (SDC). IHSN Working Paper No. 007 (2014)
Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39, 7718–7728 (2012)
Domingo-Ferrer, J., MartÃnez-Ballesté, A., Mateo-Sanz, J.M., Sebé, F.: Efficient multivariate data-oriented microaggregation. VLDB J. 15, 355–369 (2006)
Abril, D., Navarro-Arribas, G., Torra, V.: Towards semantic microaggregation of categorical data for confidential documents. In: Modeling Decisions for Artificial Intelligence, pp. 266–276. Springer, Heidelberg (2010)
Acknowledgment
We acknowledge Higher Education Commission Pakistan and Foundation University Islamabad for their support to publish this research work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Imran-Daud, M. (2019). n-means: Adaptive Clustering Microaggregation of Categorical Medical Data. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Intelligent Computing. CompCom 2019. Advances in Intelligent Systems and Computing, vol 998. Springer, Cham. https://doi.org/10.1007/978-3-030-22868-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-22868-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22867-5
Online ISBN: 978-3-030-22868-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)