De-anonymising Set-Generalised Transactions Based on Semantic Relationships

  • Hoang Ong
  • Jianhua Shao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8860)

Abstract

Transaction data are important to applications such as marketing analysis and medical studies. However, such data can contain personal information, thus must be sanitised before being used. One popular approach to protecting transaction data is set-based generalisation, where an item in a transaction is replaced by a set of items. In this paper, we study how well transaction data can be protected by this approach. More specifically, we propose de-anonymisation methods that aim to reconstruct original transaction data from its set-generalised version by analysing semantic relationship that exist among the items. Our experiments on both real and synthetic data show that set-based generalisation may not provide adequate protection for transaction data, and about 50% of the items added to the transactions during generalisation can be detected by our method with a precision greater than 80%.

Keywords

Semantic Relatedness Semantic Relationship Generalise Item Transaction Data Weight Table 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: SIGMOD 2000 Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 439–450 (2000)Google Scholar
  2. 2.
    Anandan, B., Clifton, C.: Significance of Term Relationships on Anonymization. In: 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, pp. 253–256. IEEE (2011)Google Scholar
  3. 3.
    Budanitsky, A., Hirst, G.: Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. Evaluation (1998)Google Scholar
  4. 4.
    Carlson, M.: A data-swapping technique for generating synthetic samples; A method for disclosure control (2000)Google Scholar
  5. 5.
    Cilibrasi, R.L., Vitányi, P.M.B.: The google similarity distance. In: Knowledge and Data Engineering, pp. 370–383 (2007)Google Scholar
  6. 6.
    Datta, A., Sharma, D., Sinha, A.: Provable de-anonymization of large datasets with sparse dimensions. In: Degano, P., Guttman, J.D. (eds.) POST 2012. LNCS, vol. 7215, pp. 229–248. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  7. 7.
    Ghinita, G., Tao, Y., Kalnis, P.: On the anonymization of sparse high-dimensional data. In: 2008 IEEE 24th International Conference on Data Engineering, pp. 715–724 (2008)Google Scholar
  8. 8.
    Giannella, C.R., Liu, K., Kargupta, H.: Breaching Euclidean distance-preserving data perturbation using few known inputs. Data & Knowledge Engineering (301) (2012)Google Scholar
  9. 9.
    He, Y., Naughton, J.F.: Anonymization of set-valued data via top-down, local generalization. Proceedings of the VLDB Endowment (2009)Google Scholar
  10. 10.
    Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: SIGMOD 2005 (2005)Google Scholar
  11. 11.
    Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data 2(2), 1–25 (2008)CrossRefGoogle Scholar
  12. 12.
    Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Data Mining, pp. 99–106 (2003)Google Scholar
  13. 13.
    Kifer, D.: Attacks on privacy and deFinetti’s theorem. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD 2009, p. 127 (2009)Google Scholar
  14. 14.
    Liu, J., Wang, K.: Anonymizing transaction data by integrating suppression and generalization. In: Advances in Knowledge Discovery and Data Mining, vol. 1, pp. 1–10 (2010)Google Scholar
  15. 15.
    Loukides, G., Gkoulalas-Divanis, A., Malin, B.: COAT: COnstraint-based anonymization of transactions. Knowledge and Information Systems (2010)Google Scholar
  16. 16.
    Narayanan, A., Shmatikov, V.: Robust De-anonymization of Large Sparse Datasets. In: 2008 IEEE Symposium on Security and Privacy (sp 2008), pp. 111–125 (May 2008)Google Scholar
  17. 17.
    Sánchez, D., Batet, M., Viejo, A.: Detecting Term Relationships to Improve Textual Document Sanitization. In: PACIS 2013 (2013)Google Scholar
  18. 18.
    Terrovitis, M., Mamoulis, N., Kalnis, P.: Anonymity in unstructured data. In: Very Large Data Bases (VLDB) Conference, pp. 1–21 (2008)Google Scholar
  19. 19.
    Xu, Y., Fung, B.C.M., Wang, K.: Publishing sensitive transactions for itemset utility. In: Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 1109 – 1114 (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Hoang Ong
    • 1
  • Jianhua Shao
    • 1
  1. 1.School of Computer Science & InformaticsCardiff UniversityUK

Personalised recommendations