Abstract
With the enormous growth of the Information Society and the necessity to enable access and exploitation of large amounts of data, the preservation of its confidentiality has become a crucial issue. Many methods have been developed to ensure the privacy of numerical data but very few of them deal with textual (categorical) information. In this paper a new method for protecting the individual’s privacy for categorical attributes is proposed. It is a masking method based on the recoding of words that can be linked to less than k individuals. This assures the fulfillment of the k-anonymity property, in order to prevent the re-identification of individuals. On the contrary to related works, which lack a proper semantic interpretation of text, the recoding exploits an input ontology in order to estimate the semantic similarity between words and minimize the information loss.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Domingo-Ferrer, J.: A survey of inference control methods for privacy-preserving data mining. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms. Advances in Database Systems, vol. 34, pp. 53–80. Springer, Heidelberg (2008)
Bouchon-Meunier, B., Marsala, C., Rifqi, M., Yager, R.R.: Uncertainty and Intelligent Information Systems. World Scientific, Singapore (2008)
Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Engineering, 2nd printing, pp. 79–84. Springer, Heidelberg (2004)
Willenborg, L., De Eaal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001)
Guo, L., Wu, X.: Privacy preserving categorical data analysis with unknown distortion parameters. Transactions on Data Privacy 2, 185–205 (2009)
Gouweleeuw, J.M., Kooiman, P., Willenborg, L.C.R.J., DeWolf, P.P.: Post randomization for statistal disclousure control: Theory and implementation. Research paper no. 9731 (Voorburg: Statistics Netherlands) (1997)
Reiss, S.P.: Practical data-swapping: the first steps. ACM Transactions on Database Systems 9, 20–37 (1984)
Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Wai-Chee Fu, A.: Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, Philadelphia, PA, USA, pp. 785–790 (2006)
Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression, Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Minig (KDD), pp. 279–288 (2002)
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: Proceedings of the 21st International Conference on Data Engineering (ICDE), pp. 217–228 (2005)
Li, T., Li, N.: Towards optimal k-anonymization. Data & Knowledge Engineering 65, 22–39 (2008)
Guarino, N.: Formal Ontology in Information Systems. In: Guarino, N. (ed.) 1st Int. Conf. on Formal Ontology in Information Systems, pp. 3–15. IOS Press, Trento (1998)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V., Sachs, Swoogle, J.: A Search and Metadata Engine for the Semantic Web. In: Proc. 13th ACM Conference on Information and Knowledge Management, pp. 652–659. ACM Press, New York (2004)
Rada, R., Mili, H., Bichnell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics 9(1), 17–30 (1989)
Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proc. 32nd annual Meeting of the Association for Computational Linguistics, New Mexico, USA, pp. 133–138 (1994)
Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum (ed.) WordNet: An electronic lexical database, pp. 265–283. MIT Press, Cambridge (1998)
Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proc. Int. Conf. on Research in Computational Linguistics, Japan, pp. 19–33 (1997)
Cimiano, P.: Ontology Learning and Population from Text. In: Algorithms, Evaluation and Applications, Springer, Heidelberg (2006)
Porter: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Martínez, S., Valls, A., Sánchez, D. (2010). Anonymizing Categorical Data with a Recoding Method Based on Semantic Similarity. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. Applications. IPMU 2010. Communications in Computer and Information Science, vol 81. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14058-7_62
Download citation
DOI: https://doi.org/10.1007/978-3-642-14058-7_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14057-0
Online ISBN: 978-3-642-14058-7
eBook Packages: Computer ScienceComputer Science (R0)