FDSE 2014: Future Data and Security Engineering pp 107-121 | Cite as
De-anonymising Set-Generalised Transactions Based on Semantic Relationships
Abstract
Transaction data are important to applications such as marketing analysis and medical studies. However, such data can contain personal information, thus must be sanitised before being used. One popular approach to protecting transaction data is set-based generalisation, where an item in a transaction is replaced by a set of items. In this paper, we study how well transaction data can be protected by this approach. More specifically, we propose de-anonymisation methods that aim to reconstruct original transaction data from its set-generalised version by analysing semantic relationship that exist among the items. Our experiments on both real and synthetic data show that set-based generalisation may not provide adequate protection for transaction data, and about 50% of the items added to the transactions during generalisation can be detected by our method with a precision greater than 80%.
Keywords
Semantic Relatedness Semantic Relationship Generalise Item Transaction Data Weight TablePreview
Unable to display preview. Download preview PDF.
References
- 1.Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: SIGMOD 2000 Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 439–450 (2000)Google Scholar
- 2.Anandan, B., Clifton, C.: Significance of Term Relationships on Anonymization. In: 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, pp. 253–256. IEEE (2011)Google Scholar
- 3.Budanitsky, A., Hirst, G.: Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. Evaluation (1998)Google Scholar
- 4.Carlson, M.: A data-swapping technique for generating synthetic samples; A method for disclosure control (2000)Google Scholar
- 5.Cilibrasi, R.L., Vitányi, P.M.B.: The google similarity distance. In: Knowledge and Data Engineering, pp. 370–383 (2007)Google Scholar
- 6.Datta, A., Sharma, D., Sinha, A.: Provable de-anonymization of large datasets with sparse dimensions. In: Degano, P., Guttman, J.D. (eds.) POST 2012. LNCS, vol. 7215, pp. 229–248. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 7.Ghinita, G., Tao, Y., Kalnis, P.: On the anonymization of sparse high-dimensional data. In: 2008 IEEE 24th International Conference on Data Engineering, pp. 715–724 (2008)Google Scholar
- 8.Giannella, C.R., Liu, K., Kargupta, H.: Breaching Euclidean distance-preserving data perturbation using few known inputs. Data & Knowledge Engineering (301) (2012)Google Scholar
- 9.He, Y., Naughton, J.F.: Anonymization of set-valued data via top-down, local generalization. Proceedings of the VLDB Endowment (2009)Google Scholar
- 10.Huang, Z., Du, W., Chen, B.: Deriving private information from randomized data. In: SIGMOD 2005 (2005)Google Scholar
- 11.Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data 2(2), 1–25 (2008)CrossRefGoogle Scholar
- 12.Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the privacy preserving properties of random data perturbation techniques. In: Data Mining, pp. 99–106 (2003)Google Scholar
- 13.Kifer, D.: Attacks on privacy and deFinetti’s theorem. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD 2009, p. 127 (2009)Google Scholar
- 14.Liu, J., Wang, K.: Anonymizing transaction data by integrating suppression and generalization. In: Advances in Knowledge Discovery and Data Mining, vol. 1, pp. 1–10 (2010)Google Scholar
- 15.Loukides, G., Gkoulalas-Divanis, A., Malin, B.: COAT: COnstraint-based anonymization of transactions. Knowledge and Information Systems (2010)Google Scholar
- 16.Narayanan, A., Shmatikov, V.: Robust De-anonymization of Large Sparse Datasets. In: 2008 IEEE Symposium on Security and Privacy (sp 2008), pp. 111–125 (May 2008)Google Scholar
- 17.Sánchez, D., Batet, M., Viejo, A.: Detecting Term Relationships to Improve Textual Document Sanitization. In: PACIS 2013 (2013)Google Scholar
- 18.Terrovitis, M., Mamoulis, N., Kalnis, P.: Anonymity in unstructured data. In: Very Large Data Bases (VLDB) Conference, pp. 1–21 (2008)Google Scholar
- 19.Xu, Y., Fung, B.C.M., Wang, K.: Publishing sensitive transactions for itemset utility. In: Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 1109 – 1114 (2008)Google Scholar