n-means: Adaptive Clustering Microaggregation of Categorical Medical Data

Imran-Daud, Malik

doi:10.1007/978-3-030-22868-2_2

Malik Imran-Daud¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 998))

Included in the following conference series:

Intelligent Computing - Proceedings of the Computing Conference

1629 Accesses

Abstract

Huge amount of information is managed and shared publically by the individuals and data controllers. Publically shared data contains information that can reveal identity of users, thus affecting privacy of individuals. To palliate these disclosure risks, Statistical Disclosure Control (SDC) methods are applied to the data before it is released. Microaggregation is one of the SDC methods that aggregate similar records into clusters, and then transform them into m indistinguishable records. K-means is a famous data mining clustering algorithm for continuous data, which iteratively maps similar elements into k-cluster until they all converge. However, adapting k-means algorithm for categorical multivariate is a challenging task due to high dimensionality of attributes. In this paper, we extend k-means clustering algorithm to achieve notion of microaggregation of structured data. Moreover, to preserve data utility, we extend fixed clustering nature of this algorithm to adaptive size clusters. For this purpose, we introduce n-means clustering approach that construct clusters based on the semantics of the datasets. In experiments, we proved significance of our proposed system by measuring cohesion of clusters and information loss for utility purpose.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Batet, M., Erola, A., Sánchez, D., Castellà-Roca, J.: Utility preserving query log anonymization via semantic microaggregation. Inf. Sci. 242, 49–63 (2013)
Article Google Scholar
Domingo-Ferrer, J.: Microaggregation. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 1736–1737. Springer, Boston (2009)
Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10, 557–570 (2002)
Article MathSciNet Google Scholar
Martínez, S., Sánchez, D., Valls, A.: Semantic adaptive microaggregation of categorical microdata. Comput. Secur. 31, 653–672 (2012)
Article Google Scholar
Domingo-Ferrer, J., Sánchez, D., Rufian-Torrell, G.: Anonymization of nominal data based on semantic marginality. Inf. Sci. 242, 35–48 (2013)
Article Google Scholar
Erola, A., Castellà-Roca, J., Navarro-Arribas, G., Torra, V.: Semantic microaggregation for the anonymization of query logs. In: Privacy in Statistical Databases, pp. 127–137. Springer, Heidelberg (2010)
Chapter Google Scholar
Ahmad, A., Dey, L.: A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets. Pattern Recogn. Lett. 32, 1062–1069 (2011)
Article Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2, 283–304 (1998)
Article Google Scholar
Jiang, F., Liu, G., Du, J., Sui, Y.: Initialization of K-modes clustering using outlier detection techniques. Inf. Sci. 332, 167–183 (2016)
Article Google Scholar
Kuo, R.J., Potti, Y., Zulvia, F.E.: Application of metaheuristic based fuzzy K-modes algorithm to supplier clustering. Comput. Ind. Eng. 120, 298–307 (2018)
Article Google Scholar
Han, J., Yu, J., Mo, Y., Lu, J., Liu, H.: MAGE: a semantics retaining K-anonymization method for mixed data. Knowl.-Based Syst. 55, 75–86 (2014)
Article Google Scholar
Wei, T., Lu, Y., Chang, H., Zhou, Q., Bao, X.: A semantic approach for text clustering using WordNet and lexical chains. Expert Syst. Appl. 42, 2264–2275 (2015)
Article Google Scholar
Ben Salem, S., Naouali, S., Chtourou, Z.: A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach. Comput. Electr. Eng. 68, 463–483 (2018)
Article Google Scholar
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 189–201 (2002)
Article Google Scholar
Templ, M., Meindl, B., Kowarik, A., Chen, S.: Introduction to Statistical Disclosure Control (SDC). IHSN Working Paper No. 007 (2014)
Google Scholar
Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39, 7718–7728 (2012)
Article Google Scholar
Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J.M., Sebé, F.: Efficient multivariate data-oriented microaggregation. VLDB J. 15, 355–369 (2006)
Article Google Scholar
Abril, D., Navarro-Arribas, G., Torra, V.: Towards semantic microaggregation of categorical data for confidential documents. In: Modeling Decisions for Artificial Intelligence, pp. 266–276. Springer, Heidelberg (2010)
Chapter Google Scholar

Download references

Acknowledgment

We acknowledge Higher Education Commission Pakistan and Foundation University Islamabad for their support to publish this research work.

Author information

Authors and Affiliations

Department of Software Engineering, Foundation University Rawalpindi Campus, Islamabad, Pakistan
Malik Imran-Daud

Authors

Malik Imran-Daud
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Malik Imran-Daud .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Imran-Daud, M. (2019). n-means: Adaptive Clustering Microaggregation of Categorical Medical Data. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Intelligent Computing. CompCom 2019. Advances in Intelligent Systems and Computing, vol 998. Springer, Cham. https://doi.org/10.1007/978-3-030-22868-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-22868-2_2
Published: 09 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22867-5
Online ISBN: 978-3-030-22868-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics