Anonymizing Categorical Data with a Recoding Method Based on Semantic Similarity

Martínez, Sergio; Valls, Aida; Sánchez, David

doi:10.1007/978-3-642-14058-7_62

Sergio Martínez⁴,
Aida Valls⁴ &
David Sánchez⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 81))

Included in the following conference series:

International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems

1315 Accesses
1 Citations

Abstract

With the enormous growth of the Information Society and the necessity to enable access and exploitation of large amounts of data, the preservation of its confidentiality has become a crucial issue. Many methods have been developed to ensure the privacy of numerical data but very few of them deal with textual (categorical) information. In this paper a new method for protecting the individual’s privacy for categorical attributes is proposed. It is a masking method based on the recoding of words that can be linked to less than k individuals. This assures the fulfillment of the k-anonymity property, in order to prevent the re-identification of individuals. On the contrary to related works, which lack a proper semantic interpretation of text, the recoding exploits an input ontology in order to estimate the semantic similarity between words and minimize the information loss.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Semantic Anonymisation of Categorical Datasets

Data Anonymization Through Multi-modular Clustering

Contributions on Semantic Similarity and Its Applications to Data Privacy

References

Domingo-Ferrer, J.: A survey of inference control methods for privacy-preserving data mining. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms. Advances in Database Systems, vol. 34, pp. 53–80. Springer, Heidelberg (2008)
Chapter Google Scholar
Bouchon-Meunier, B., Marsala, C., Rifqi, M., Yager, R.R.: Uncertainty and Intelligent Information Systems. World Scientific, Singapore (2008)
Book MATH Google Scholar
Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Engineering, 2nd printing, pp. 79–84. Springer, Heidelberg (2004)
Google Scholar
Willenborg, L., De Eaal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001)
MATH Google Scholar
Guo, L., Wu, X.: Privacy preserving categorical data analysis with unknown distortion parameters. Transactions on Data Privacy 2, 185–205 (2009)
Google Scholar
Gouweleeuw, J.M., Kooiman, P., Willenborg, L.C.R.J., DeWolf, P.P.: Post randomization for statistal disclousure control: Theory and implementation. Research paper no. 9731 (Voorburg: Statistics Netherlands) (1997)
Google Scholar
Reiss, S.P.: Practical data-swapping: the first steps. ACM Transactions on Database Systems 9, 20–37 (1984)
Article MATH Google Scholar
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Wai-Chee Fu, A.: Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, Philadelphia, PA, USA, pp. 785–790 (2006)
Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)
Article MATH MathSciNet Google Scholar
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression, Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)
Google Scholar
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Minig (KDD), pp. 279–288 (2002)
Google Scholar
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: Proceedings of the 21st International Conference on Data Engineering (ICDE), pp. 217–228 (2005)
Google Scholar
Li, T., Li, N.: Towards optimal k-anonymization. Data & Knowledge Engineering 65, 22–39 (2008)
Article Google Scholar
Guarino, N.: Formal Ontology in Information Systems. In: Guarino, N. (ed.) 1st Int. Conf. on Formal Ontology in Information Systems, pp. 3–15. IOS Press, Trento (1998)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V., Sachs, Swoogle, J.: A Search and Metadata Engine for the Semantic Web. In: Proc. 13th ACM Conference on Information and Knowledge Management, pp. 652–659. ACM Press, New York (2004)
Google Scholar
Rada, R., Mili, H., Bichnell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics 9(1), 17–30 (1989)
Article Google Scholar
Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proc. 32nd annual Meeting of the Association for Computational Linguistics, New Mexico, USA, pp. 133–138 (1994)
Google Scholar
Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum (ed.) WordNet: An electronic lexical database, pp. 265–283. MIT Press, Cambridge (1998)
Google Scholar
Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proc. Int. Conf. on Research in Computational Linguistics, Japan, pp. 19–33 (1997)
Google Scholar
Cimiano, P.: Ontology Learning and Population from Text. In: Algorithms, Evaluation and Applications, Springer, Heidelberg (2006)
Google Scholar
Porter: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar

Download references

Author information

Authors and Affiliations

Departament d’Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, Avda. Països Catalans, 26, 43007, Tarragona, Spain
Sergio Martínez, Aida Valls & David Sánchez

Authors

Sergio Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Aida Valls
View author publications
You can also search for this author in PubMed Google Scholar
David Sánchez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fachbereich Mathematik und Informatik, Philipps-Universität Marburg, Marburg, Germany
Eyke Hüllermeier
Department of Knowledge Processing and Language Engineering, Otto-von-Guericke University of Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Rudolf Kruse
Fakultät für Elektrotechnik und Informationstechnik, Technische Universität Dortmund, 44221, Dortmund, (Germany)
Frank Hoffmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martínez, S., Valls, A., Sánchez, D. (2010). Anonymizing Categorical Data with a Recoding Method Based on Semantic Similarity. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. Applications. IPMU 2010. Communications in Computer and Information Science, vol 81. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14058-7_62

Download citation

DOI: https://doi.org/10.1007/978-3-642-14058-7_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14057-0
Online ISBN: 978-3-642-14058-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Anonymizing Categorical Data with a Recoding Method Based on Semantic Similarity

Abstract

Access this chapter

Preview

Similar content being viewed by others

Semantic Anonymisation of Categorical Datasets

Data Anonymization Through Multi-modular Clustering

Contributions on Semantic Similarity and Its Applications to Data Privacy

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Anonymizing Categorical Data with a Recoding Method Based on Semantic Similarity

Abstract

Access this chapter

Preview

Similar content being viewed by others

Semantic Anonymisation of Categorical Datasets

Data Anonymization Through Multi-modular Clustering

Contributions on Semantic Similarity and Its Applications to Data Privacy

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation