Automatic Construction of Generalization Hierarchies for Publishing Anonymized Data

  • Vanessa Ayala-Rivera
  • Liam Murphy
  • Christina Thorpe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9983)


Concept hierarchies are widely used in multiple fields to carry out data analysis. In data privacy, they are known as Value Generalization Hierarchies (VGHs), and are used by generalization algorithms to dictate the data anonymization. Thus, their proper specification is critical to obtain anonymized data of good quality. The creation and evaluation of VGHs require expert knowledge and a significant amount of manual effort, making these tasks highly error-prone and time-consuming. In this paper we present AIKA, a knowledge-based framework to automatically construct and evaluate VGHs for the anonymization of categorical data. AIKA integrates ontologies to objectively create and evaluate VGHs. It also implements a multi-dimensional reward function to tailor the VGH evaluation to different use cases. Our experiments show that AIKA improved the creation of VGHs by generating VGHs of good quality in less time than when manually done. Results also showed how the reward function properly captures the desired VGH properties.


Semantic Similarity Data Anonymization Word Sense Disambiguation Concept Hierarchy Disclosure Risk 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported with the financial support of the Science Foundation Ireland grants 10/CE/I1855 and 13/RC/2094.


  1. 1.
    Chicago Homicides.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
    Ayala-Rivera, V., McDonagh, P., Cerqueus, T., Murphy, L.: A systematic comparison and evaluation of k -anonymization algorithms for practitioners. Trans. Data Priv. 7(3), 337–370 (2014)MathSciNetGoogle Scholar
  6. 6.
    Ayala-Rivera, V., McDonagh, P., Cerqueus, T., Murphy, L.: Ontology-based quality evaluation of value generalization hierarchies for data anonymization. In: PSD (2014)Google Scholar
  7. 7.
    Banerjee, S., Pedersen, T.: An adapted lesk algorithm for word sense disambiguation using WordNet. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 136–145. Springer, Heidelberg (2002). doi: 10.1007/3-540-45715-1_11 CrossRefGoogle Scholar
  8. 8.
    Campan, A., Cooper, N., Truta, T.M.: On-the-fly generalization hierarchies for numerical attributes revisited. In: Jonker, W., Petković, M. (eds.) SDM 2011. LNCS, vol. 6933, pp. 18–32. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-23556-6_2 CrossRefGoogle Scholar
  9. 9.
    D’Aquin, M., Natalya, N.F.: Where to publish and find ontologies? A survey of ontology libraries. Web Semant. (online) 11, 96–111 (2012)CrossRefGoogle Scholar
  10. 10.
    Domingo-Ferrer, J., Sánchez, D., Rufian-Torrell, G.: Anonymization of nominal data based on semantic marginality. Inf. Sci. 242, 35–48 (2013)CrossRefGoogle Scholar
  11. 11.
    Kröll, M., Fukazawa, Y., Ota, J., Strohmaier, M.: Concept hierarchies of health-related human goals. In: KSEM, pp. 124–135 (2011)Google Scholar
  12. 12.
    Lee, S., Huh, S.-Y., McNiel, R.D.: Automatic generation of concept hierarchies using WordNet. Expert Syst. Appl. 35(3), 1132–1144 (2008)CrossRefGoogle Scholar
  13. 13.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: International Conference on Management of Data, pp. 49–60 (2005)Google Scholar
  14. 14.
    Lichman, M.: UCI Machine Learning Repository (2013)Google Scholar
  15. 15.
    Martínez, S., Sánchez, D., Valls, A., Batet, M.: Privacy protection of textual attributes through a semantic-based masking method. Inf. Fusion 13, 304–314 (2012)CrossRefGoogle Scholar
  16. 16.
    Meng, L., Huang, R., Gu, J.: A review of semantic similarity measures in WordNet. Int. J. Hybrid Inf. Technol. 6(1), 1–12 (2013)Google Scholar
  17. 17.
    Peffers, K., Tuunanen, T., Gengler, C.E., Rossi, M., Hui, W., Virtanen, V., Bragge, J.: The design science research process: a model for producing and presenting information systems research. DESRIST 24, 83–106 (2006)Google Scholar
  18. 18.
    Portillo-Dominguez, A.O., Wang, M., Magoni, D., Perry, P., Murphy, J.: Load balancing of java applications by forecasting garbage collections. In: ISPDC (2014)Google Scholar
  19. 19.
    Sánchez, D., Batet, M., Martínez, S., Domingo-Ferrer, J.: Semantic variance: an intuitive measure for ontology accuracy evaluation. EAAI 39, 89–99 (2015)Google Scholar
  20. 20.
    Solé-Ribalta, A., Sánchez, D., Batet, M., Serratosa, F.: Towards the estimation of feature-based semantic similarity using multiple ontologies. Knowl. Based Syst. 55, 101–113 (2014)CrossRefGoogle Scholar
  21. 21.
    Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(05), 571–588 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Wang, Y., Liu, W., Bell, D.: A concept hierarchy based ontology mapping approach. In: Bi, Y., Williams, M.-A. (eds.) KSEM 2010. LNCS (LNAI), vol. 6291, pp. 101–113. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15280-1_12 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Vanessa Ayala-Rivera
    • 1
  • Liam Murphy
    • 1
  • Christina Thorpe
    • 1
  1. 1.Lero@UCD, School of Computer ScienceUniversity College DublinDublinIreland

Personalised recommendations