Skip to main content

Abstract

With the enormous growth of the Information Society and the necessity to enable access and exploitation of large amounts of data, the preservation of its confidentiality has become a crucial issue. Many methods have been developed to ensure the privacy of numerical data but very few of them deal with textual (categorical) information. In this paper a new method for protecting the individual’s privacy for categorical attributes is proposed. It is a masking method based on the recoding of words that can be linked to less than k individuals. This assures the fulfillment of the k-anonymity property, in order to prevent the re-identification of individuals. On the contrary to related works, which lack a proper semantic interpretation of text, the recoding exploits an input ontology in order to estimate the semantic similarity between words and minimize the information loss.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Domingo-Ferrer, J.: A survey of inference control methods for privacy-preserving data mining. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms. Advances in Database Systems, vol. 34, pp. 53–80. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  2. Bouchon-Meunier, B., Marsala, C., Rifqi, M., Yager, R.R.: Uncertainty and Intelligent Information Systems. World Scientific, Singapore (2008)

    Book  MATH  Google Scholar 

  3. Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Engineering, 2nd printing, pp. 79–84. Springer, Heidelberg (2004)

    Google Scholar 

  4. Willenborg, L., De Eaal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001)

    MATH  Google Scholar 

  5. Guo, L., Wu, X.: Privacy preserving categorical data analysis with unknown distortion parameters. Transactions on Data Privacy 2, 185–205 (2009)

    Google Scholar 

  6. Gouweleeuw, J.M., Kooiman, P., Willenborg, L.C.R.J., DeWolf, P.P.: Post randomization for statistal disclousure control: Theory and implementation. Research paper no. 9731 (Voorburg: Statistics Netherlands) (1997)

    Google Scholar 

  7. Reiss, S.P.: Practical data-swapping: the first steps. ACM Transactions on Database Systems 9, 20–37 (1984)

    Article  MATH  Google Scholar 

  8. Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Wai-Chee Fu, A.: Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, Philadelphia, PA, USA, pp. 785–790 (2006)

    Google Scholar 

  9. Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  10. Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression, Technical Report SRI-CSL-98-04, SRI Computer Science Laboratory (1998)

    Google Scholar 

  11. Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Minig (KDD), pp. 279–288 (2002)

    Google Scholar 

  12. Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: Proceedings of the 21st International Conference on Data Engineering (ICDE), pp. 217–228 (2005)

    Google Scholar 

  13. Li, T., Li, N.: Towards optimal k-anonymization. Data & Knowledge Engineering 65, 22–39 (2008)

    Article  Google Scholar 

  14. Guarino, N.: Formal Ontology in Information Systems. In: Guarino, N. (ed.) 1st Int. Conf. on Formal Ontology in Information Systems, pp. 3–15. IOS Press, Trento (1998)

    Google Scholar 

  15. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  16. Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V., Sachs, Swoogle, J.: A Search and Metadata Engine for the Semantic Web. In: Proc. 13th ACM Conference on Information and Knowledge Management, pp. 652–659. ACM Press, New York (2004)

    Google Scholar 

  17. Rada, R., Mili, H., Bichnell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics 9(1), 17–30 (1989)

    Article  Google Scholar 

  18. Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proc. 32nd annual Meeting of the Association for Computational Linguistics, New Mexico, USA, pp. 133–138 (1994)

    Google Scholar 

  19. Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum (ed.) WordNet: An electronic lexical database, pp. 265–283. MIT Press, Cambridge (1998)

    Google Scholar 

  20. Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proc. Int. Conf. on Research in Computational Linguistics, Japan, pp. 19–33 (1997)

    Google Scholar 

  21. Cimiano, P.: Ontology Learning and Population from Text. In: Algorithms, Evaluation and Applications, Springer, Heidelberg (2006)

    Google Scholar 

  22. Porter: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Martínez, S., Valls, A., Sánchez, D. (2010). Anonymizing Categorical Data with a Recoding Method Based on Semantic Similarity. In: Hüllermeier, E., Kruse, R., Hoffmann, F. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. Applications. IPMU 2010. Communications in Computer and Information Science, vol 81. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14058-7_62

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14058-7_62

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14057-0

  • Online ISBN: 978-3-642-14058-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics