Detecting Traitors in Re-publishing Updated Datasets

  • Anh-Tu HoangEmail author
  • Hoang-Quoc Nguyen-Son
  • Minh-Triet Tran
  • Isao Echizen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8389)


The application of fingerprinting techniques to relational data cannot protect personal information against a collusion attack, in which the attacker has access to a set of published data. The general fingerprinting techniques such as Li et at.’s, Guo et al.’s, and Schrittwieser et al.’s focus on detecting the traitor, who leaked the data. Among them, Schrittwieser et al.’s fingerprinting technique combines \(k\)-anonymity and full-domain generalization in order to not only detect traitors but also protect personal records. However, the technique has two main limitations. First, it does not allow the data provider to insert or delete records from the original data. Secondly, it does not create enough fingerprints for data recipients. To overcome these limitations, in this paper, we propose an (\(\alpha ,k\))-privacy protection model, an extension of \(m\)-invariance and (\(\alpha , k\))-anonymity, and a new top-down (\(\alpha , k\))-privacy fingerprinting algorithm based on that model. The model not only protects sensitive personal information against collusion attacks but also allows data providers to republish their updated original data without degrading the privacy protection. The algorithm embeds fingerprints in the generalized data and extracts them from leaked data to detect the traitors. We extensively evaluate the proposed algorithm on our own built software. The evaluation results show that our algorithm creates more fingerprints than Schrittwieser et al.’s algorithm (64000 vs 1536) while achieving the same generalized data quality. Moreover, our (\(\alpha , k\))-privacy algorithm creates generalized data even in the case of having small number of distinct sensitive values in the original data without adding faked records as in \(m\)-invariance.


Fingerprint Generalization Privacy 


  1. 1.
    Agrawal, R., Kiernan, J.: Watermarking relational databases. In: Proceedings of the 28th International Conference on Very Large Data Bases. pp. 155–166. VLDB Endowment (2002)Google Scholar
  2. 2.
    Anjum, A., Raschia, G.: Anonymizing sequential releases under arbitrary updates. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops, EDBT ’13, pp. 145–154 (2013)Google Scholar
  3. 3.
    Bache, K., Lichman, M.: UCI machine learning repository (2013).
  4. 4.
    Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 217–228. IEEE (2005)Google Scholar
  5. 5.
    Byun, J.-W., Sohn, Y., Bertino, E., Li, N.: Secure anonymization for incremental datasets. In: Jonker, W., Petković, M. (eds.) SDM 2006. LNCS, vol. 4165, pp. 48–63. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  6. 6.
    Fung, B., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. (CSUR) 42(4), 14 (2010)CrossRefGoogle Scholar
  7. 7.
    Fung, B.C., Wang, K., Yu, P.S.: Anonymizing classification data for privacy preservation. IEEE Trans. Knowl. Data Eng. 19(5), 711–725 (2007)CrossRefGoogle Scholar
  8. 8.
    Guo, F., Wang, J., Li, D.: Fingerprinting relational databases. In: Proceedings of the 2006 ACM Symposium on Applied Computing, pp. 487–492. ACM (2006)Google Scholar
  9. 9.
    Hoang, A.T., Tran, M.T., Duong, A.D., Echizen, I.: An indexed bottom-up approach for publishing anonymized data. In: 2012 Eighth International Conference on Computational Intelligence and Security (CIS), pp. 641–645. IEEE (2012)Google Scholar
  10. 10.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM (2005)Google Scholar
  11. 11.
    Li, Y., Swarup, V., Jajodia, S.: Constructing a virtual primary key for fingerprinting relational data. In: Proceedings of the 3rd ACM Workshop on Digital Rights Management, pp. 133–141. ACM (2003)Google Scholar
  12. 12.
    Li, Y., Swarup, V., Jajodia, S.: Fingerprinting relational databases: schemes and specialties. IEEE Trans. Dependable Secure Comput. 2(1), 34–45 (2005)CrossRefGoogle Scholar
  13. 13.
    Liu, S., Wang, S., Deng, R.H., Shao, W.-Z.: A block oriented fingerprinting scheme in relational database. In: Park, C., Chee, S. (eds.) ICISC 2004. LNCS, vol. 3506, pp. 455–466. Springer, Heidelberg (2005) CrossRefGoogle Scholar
  14. 14.
    Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data (TKDD) 1(1), 3 (2007)CrossRefGoogle Scholar
  15. 15.
    Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 223–228. ACM (2004)Google Scholar
  16. 16.
    Mohammed, N., Fung, B., Wang, K., Hung, P.C.: Privacy-preserving data mashup. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. pp. 228–239. ACM (2009)Google Scholar
  17. 17.
    Pournaghshband, V.: A new watermarking approach for relational data. In: Proceedings of the 46th Annual Southeast Regional Conference on XX, pp. 127–131. ACM, New York (2008)Google Scholar
  18. 18.
    Schrittwieser, S., Kieseberg, P., Echizen, I., Wohlgemuth, S., Sonehara, N., Weippl, E.: An algorithm for k-anonymity-based fingerprinting. In: Shi, Y.Q., Kim, H.-J., Perez-Gonzalez, F. (eds.) IWDW 2011. LNCS, vol. 7128, pp. 439–452. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  19. 19.
    Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty, Fuzziness Knowl.-Based Syst. 10(05), 557–570 (2002)CrossRefzbMATHMathSciNetGoogle Scholar
  20. 20.
    Wong, R.C.W., Li, J., Fu, A.W.C., Wang, K.: (\(\alpha \), k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 754–759. ACM (2006)Google Scholar
  21. 21.
    Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 689–700. ACM (2007)Google Scholar
  22. 22.
    Zhang, Q., Koudas, N., Srivastava, D., Yu, T.: Aggregate query answering on anonymized tables. In: IEEE 23rd International Conference on Data Engineering, ICDE 2007, pp. 116–125. IEEE (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Anh-Tu Hoang
    • 1
    Email author
  • Hoang-Quoc Nguyen-Son
    • 2
  • Minh-Triet Tran
    • 1
  • Isao Echizen
    • 3
  1. 1.University of Science, VNU-HCMHo Chi Minh cityVietnam
  2. 2.The Graduate University for Advanced Studies (Sokendai)HayamaJapan
  3. 3.National Institute of InformaticsTokyoJapan

Personalised recommendations