Skip to main content

Contributions on Semantic Similarity and Its Applications to Data Privacy

  • Chapter
  • First Online:
Advanced Research in Data Privacy

Part of the book series: Studies in Computational Intelligence ((SCI,volume 567))

Abstract

Semantic similarity aims at quantifying the resemblance between the meaning of textual terms. Thus, it represents the corner stone of textual understanding. Given the increasing availability and importance of textual sources within the current context of Information Societies, a lot of attention has been put in recent years in the development of mechanisms to automatically measure semantic similarity and to apply them to tasks dealing with textual inputs (e.g. document classification, information retrieval, question answering, privacy-protection, etc.). This chapter offers describes and discusses recent findings and proposals published by the authors on semantic similarity. Moreover, it also details recent works applying semantic similarity to privacy protection of textual data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Domingo-Ferrer, J.: A survey of inference control methods for privacy-preserving data mining. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining, pp. 53–80. Springer, Berlin (2008)

    Google Scholar 

  2. Torra, V.: Towards knowledge intensive data privacy. In: Proceedings of the 5th International Workshop on Data Privacy Management, pp. 1–7. Springer, Berlin (2011)

    Google Scholar 

  3. Martínez, S., Sánchez, D., Valls, A.: Semantic adaptive microaggregation of categorical microdata. Comput. Secur. 31, 653–672 (2012)

    Article  Google Scholar 

  4. Neches, R., Fikes, R., Finin, T., Gruber, T., Patil, R., Senator, T., Swartout, W.R.: Enabling technology for knowledge sharing. AI Mag. 12, 36–56 (1991)

    Google Scholar 

  5. Cimiano, P.: Ontology Learning and Population from Text: Algorithms. Evaluation and Applications. Springer, Berlin (2006)

    Google Scholar 

  6. Stumme, G., Ehrig, M., Handschuh, S., Hotho, S., Madche, A., Motik, B., Oberle, D., Schmitz, C., Staab, S., Stojanovic, L., Stojanovic, N., Studer, R., Sure, Y., Volz, R., Zacharia, V.: The karlsruhe view on ontologies. University of Karlsruhe, Institute AIFB, Germany, Technical report (2003)

    Google Scholar 

  7. Rada, R., Mili, H., Bichnell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Trans. Syst. Man Cybern. 9, 17–30 (1989)

    Article  Google Scholar 

  8. Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: 32nd Annual Meeting of the Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)

    Google Scholar 

  9. Leacock, C., Chodorow, M.: Combining Local Context and WordNet Similarity for Word Sense Identification. WordNet: An Electronic Lexical Database, pp. 265–283. MIT Press, Cambridge (1998)

    Google Scholar 

  10. Li, Y., Bandar, Z., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15, 871–882 (2003)

    Article  Google Scholar 

  11. Batet, M., Sánchez, D., Valls, A.: An ontology-based measure to compute semantic similarity in biomedicine. J. Biomed. Inf. 44, 118–125 (2011)

    Article  Google Scholar 

  12. Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39, 7718–7728 (2012)

    Article  Google Scholar 

  13. Rodríguez, M.A., Egenhofer, M.J.: Determining semantic similarity among entity classes from different ontologies. IEEE Trans. Knowl. Data Eng. 15, 442–456 (2003)

    Article  Google Scholar 

  14. Petrakis, E.G.M., Varelas, G., Hliaoutakis, A., Raftopoulou, P.: X-similarity:computing semantic similarity between concepts from different ontologies. J. Digital Inf. Manage. 4, 233–237 (2006)

    Google Scholar 

  15. Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V., Sachs, J.: Swoogle: A search and metadata engine for the semantic web. In: Thirteenth ACM International Conference on Information and Knowledge Management, CIKM 2004, pp. 652–659. ACM Press, New York (2004)

    Google Scholar 

  16. Resnik, P.: Using information content to evalutate semantic similarity in a taxonomy. In: 14th International Joint Conference on Artificial Intelligence, IJCAI 1995, pp. 448–453. Morgan Kaufmann Publishers Inc., Burlington (1995)

    Google Scholar 

  17. Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: International Conference on Research in Computational Linguistics, ROCLING X, pp. 19–33 (1997)

    Google Scholar 

  18. Lin, D.: An Information-theoretic definition of similarity. In: Fifteenth International Conference on Machine Learning, ICML 1998, pp. 296–304. Morgan Kaufmann, Burlington (1998)

    Google Scholar 

  19. Seco, N., Veale, T., Hayes, J.: An intrinsic information content metric for semantic similarity in wordNet. In: 16th European Conference on Artificial Intelligence, ECAI 2004, including Prestigious Applicants of Intelligent Systems, PAIS 2004, pp. 1089–1090. IOS Press, Valencia (2004)

    Google Scholar 

  20. Sánchez, D., Batet, M.: A new model to compute the information content of concepts from taxonomic knowledge. Int. J. Semant. Web Inf. Syst. 8, 34–50 (2012)

    Article  Google Scholar 

  21. Sánchez, D., Batet, M., Isern, D.: Ontology-based Information Content computation. Knowl. Based Syst. 24, 297–303 (2011)

    Google Scholar 

  22. Sánchez, D., Batet, M., Valls, A., Gibert, K.: Ontology-driven web-based semantic similarity. J. Intell. Inf. Syst. 35, 383–413 (2009)

    Article  Google Scholar 

  23. Pirró, G.: A semantic similarity metric combining features and intrinsic information content. Data Knowl. Eng. 68, 1289–1308 (2009)

    Article  Google Scholar 

  24. Zhou, Z., Wang, Y., Gu, J.: A new model of information content for semantic similarity in wordNet. In: Second International Conference on Future Generation Communication and Networking Symposia, FGCNS 2008, pp. 85–89. IEEE Computer Society (2008)

    Google Scholar 

  25. Blank, A.: Words and concepts in time: towards diachronic cognitive onomasiology. In: Eckardt, R., von Heusinger, K., Schwarze, C. (eds.) Words and Concepts in Time: Towards Diachronic Cognitive Onomasiology, pp. 37–66. Mouton de Gruyter, Berlin, Germany (2003)

    Google Scholar 

  26. Al-Mubaid, H., Nguyen, H.A.: Measuring semantic similarity between biomedical concepts within multiple ontologies. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 39, 389–398 (2009)

    Article  Google Scholar 

  27. Sánchez, D., Solé-Ribalta, A., Batet, M., Serratosa, F.: Enabling semantic similarity estimation across multiple ontologies: an evaluation in the biomedical domain. J. Biomed. Inf. 45, 141–155 (2012)

    Article  Google Scholar 

  28. Batet, M., Sánchez, D., Valls, A., Gibert, K.: Semantic similarity estimation from multiple ontologies. Appl. Intell. 38, 29–44 (2013)

    Article  Google Scholar 

  29. Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Engineering. Springer, Berlin (2004)

    Google Scholar 

  30. Tversky, A.: Features of similarity. Psycological Rev. 84, 327–352 (1977)

    Google Scholar 

  31. Sánchez, D., Batet, M.: A semantic similarity method based on information content exploiting multiple ontologies. Expert Syst. Appl. 40, 1393–1399 (2013)

    Article  Google Scholar 

  32. Waltinger, U., Cramer, I., TonioWandmacher: from social networks to distributional properties: a comparative study on computing semantic relatedness. In: Thirty-First Annual Meeting of the Cognitive Science Society, CogSci 2009, pp. 3016–3021. Cognitive Science Society (2009)

    Google Scholar 

  33. Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: 12th European Conference on Machine Learning, ECML 2001, pp. 491–502. Springer, Berlin (2001)

    Google Scholar 

  34. Cilibrasi, R.L., Vitányi, P.M.B.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19, 370–383 (2006)

    Article  Google Scholar 

  35. Bollegala, D., Matsuo, Y., Ishizuka, M.: A relational model of semantic similarity between words using automatically extracted lexical pattern clusters from the web. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, pp. 803–812. ACL and AFNLP, (2009)

    Google Scholar 

  36. Lemaire, B., Denhière, G.: Effects of high-order co-occurrences on word semantic similarities. Current Psychol. Lett. Behav. Brain Cogn. 18, 1 (2006)

    Google Scholar 

  37. Banerjee, S., Pedersen, T.: Extended gloss overlaps as a measure of semantic relatedness. In: 18th International Joint Conference on Artificial Intelligence, IJCAI 2003, pp. 805–810. Morgan Kaufmann, Burlington (2003)

    Google Scholar 

  38. Wan, S., Angryk, R.A.: Measuring semantic similarity using wordNet-based context Vectors. In: IEEE International Conference on Systems, Man and Cybernetics, SMC 2007, pp. 908–913. IEEE Computer Society (2007)

    Google Scholar 

  39. Patwardhan, S., Pedersen, T.: Using wordNet-based context vectors to estimate the semantic relatedness of concepts. In: EACL 2006 Workshop on Making Sense of Sense: Bringing Computational Linguistics and Psycholinguistics Together, pp. 1–8 (2006)

    Google Scholar 

  40. Harris, Z.: Distributional structure. In: Katz, J.J. (ed.) The Philosophy of Linguistics, pp. 26–47. Oxford University Press, New York (1985)

    Google Scholar 

  41. Sahami, M., Heilman, T.D.: A Web-based kernel function for measuring the similarity of short text snippets. In: 15th International World Wide Web Conference, WWW 2006, pp. 377–386. ACM Press, New York (2006)

    Google Scholar 

  42. Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of semantic distance. Comput. Linguist. 32, 13–47 (2006)

    Article  MATH  Google Scholar 

  43. MRA Health Information Services, http://mrahis.com/blog/mra-thought-of-the-day-medical-record-redacting-a-burdensome-and-problematic-method-for-protecting-patient-privacy/

  44. Martínez, S., Sánchez, D., Valls, A.: A semantic framework to protect the privacy of electronic health records with non-numerical attributes. J. Biomed. Inf. 46, 294–303 (2013)

    Article  Google Scholar 

  45. http://www.osti.gov/opennet

  46. Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E.S., Spicer, K., Wolf, P.P.D.: Statistical Disclosure Control. Wiley, New York (2013)

    Google Scholar 

  47. Auer, S.R., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A Nucleus for a Web of Open Data. In: The Semantic Web, p. 722 (2007)

    Google Scholar 

  48. Martínez, S., Valls, A., Sánchez, D.: Semantically-grounded construction of centroids for datasets with textual attributes. Knowl. Based Syst. 35, 160–172 (2012)

    Article  Google Scholar 

  49. Domingo-Ferrer, J., Sánchez, D., Rufian-Torrel, G.: Anonymization of nominal data based on semantic marginality. Inf. Sci. 242, 35–48 (2013)

    Article  Google Scholar 

  50. Batet, M.: Ontology based semantic clustering. AI Commun. 24, 291–292 (2011)

    Google Scholar 

  51. Batet, M., Erola, A., Sánchez, D., Castellà-Roca, J.: Utility preserving query log anonymization via semantic microaggregation. Inf. Sci. 242, 49–63 (2013)

    Article  Google Scholar 

  52. Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Dis. 11, 195–212 (2005)

    Article  MathSciNet  Google Scholar 

  53. Martínez, S., Sánchez, D., Valls, A.: Towards k-anonymous non-numerical data via semantic resampling. In: Information Processing and Management of Uncertainty (IPMU), pp. 519–528 (2012)

    Google Scholar 

  54. Martínez, S., Sánchez, D., Valls, A., Batet, M.: Privacy protection of textual attributes through a semantic-based masking method. Inf. Fusion 13, 304–314 (2012)

    Article  Google Scholar 

  55. Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. SRI International Report (1998)

    Google Scholar 

  56. Dwork, C.: Differential privacy. In: 33rd International Colloquium ICALP, pp. 1–12. Springer, Berlin (2006)

    Google Scholar 

  57. Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D., Martínez, S.: Enhancing data utility in differential privacy via microaggregation-based k-anonymity. VLDB J. (2014) (in press)

    Google Scholar 

  58. Soria-Comas, J., Domingo-Ferrer, J., Sánchez, D., Martínez, S.: Improving the utility of differentially private data releases via k-anonymity. In: 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (2013)

    Google Scholar 

  59. Terrovitis, M., Mamoulis, N., Kalnis, P.: Privacy-preserving anonymization of set-valued data. In: VLDB Endowment, pp. 115–125 (2008)

    Google Scholar 

  60. Batet, M., Erola, A., Sánchez, D., Castellà-Roca, J.: Semantic anonymisation of set-valued data. In: 6th International Conference on Agents and Artificial Intelligence, pp. 102–112 (2014)

    Google Scholar 

  61. Sánchez, D., Batet, M., Viejo, A.: Automatic general-purpose sanitization of textual documents. IEEE Trans. Inf. Forensics Secur. 8, 853–862 (2013)

    Article  Google Scholar 

  62. Sánchez, D., Batet, M., Viejo, A.: Minimizing the disclosure risk of semantic correlations in document sanitization. Inf. Sci. 249, 110–123 (2013)

    Article  Google Scholar 

  63. Nettleton, D.G., Abril, D.: Document sanitization: measuring search engine information loss and risk of disclosure for the wikileaks cables. In: International Conference on Privacy in Statistical Databases, pp. 308–321 (2012)

    Google Scholar 

  64. Abril, D., Navarro-Arribas, G., Torra, V.: Towards a private vector space model for confidential documents. In: 28th Annual ACM Symposium on Applied Computing, pp. 944–945 (2013)

    Google Scholar 

  65. Batet, M.: Ontology-based semantic clustering. AI Commun. 24, 291–292 (2011)

    Google Scholar 

  66. Martínez, S., Sánchez, D., Valls, A.: Evaluation of the disclosure risk of masking methods dealing with textual attributes. Int. J. Innovative Comput. Inf. Control 8, 4869–4882 (2012)

    Google Scholar 

Download references

Acknowledgments

Authors are solely responsible for the views expressed in this chapter, which do not necessarily reflect the position of UNESCO nor commit that organisation. This work was partly supported by the European Commission under FP7 project Inter-Trust, by the Spanish Ministry of Science and Innovation (through projects eAEGIS TSI2007-65406-C03-01, ICWT TIN2012-32757, ARES-CONSOLIDER INGENIO 2010 CSD2007-00004, CO-PRIVACY TIN2011-27076-C03-01 and BallotNext IPT-2012-0603-430000) and by the Government of Catalonia (under grant 2009 SGR 1135).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Montserrat Batet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Batet, M., Sánchez, D. (2015). Contributions on Semantic Similarity and Its Applications to Data Privacy. In: Navarro-Arribas, G., Torra, V. (eds) Advanced Research in Data Privacy. Studies in Computational Intelligence, vol 567. Springer, Cham. https://doi.org/10.1007/978-3-319-09885-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09885-2_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09884-5

  • Online ISBN: 978-3-319-09885-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics