Advertisement

Knowledge and Information Systems

, Volume 47, Issue 3, pp 625–645 | Cite as

A transversal hypergraph approach for the frequent itemset hiding problem

  • Elias C. StavropoulosEmail author
  • Vassilios S. Verykios
  • Vasileios Kagklis
Regular Paper

Abstract

We propose a methodology for hiding all sensitive frequent itemsets in a transaction database. Our methodology relies on a novel technique that enumerates the minimal transversals of a hypergraph in order to induce the ideal border between frequent and sensitive itemsets. The ideal border is then utilized to formulate an integer linear program (ILP) that answers whether a feasible sanitized database that attains the ideal border, exists. The solution of the program identifies the set of transactions that need to be modified (sanitized) so that the hiding can be achieved with the maximum accuracy. If no solution exists, we modify the ILP by relaxing the constraints needed to be satisfied so that the sanitized database preserves the privacy with guarantee but with minimum effect in data quality. Experimental evaluation of the proposed approach on a number of real datasets has shown that the produced sanitized databases exhibit higher accuracy when compared with the solutions of other well-known approaches.

Keywords

Privacy-preserving data mining Hiding frequent itemsets Transversal hypergraph generation 

Notes

Acknowledgments

The authors wish to thank the anonymous referees for their valuable comments that improved the final presentation of the paper.

References

  1. 1.
    Aggarwal CC, Yu PS (eds) (2008) Privacy-preserving data mining: models and algorithms. Advances in database systems. Springer, New YorkGoogle Scholar
  2. 2.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB’94), pp 487–499Google Scholar
  3. 3.
    Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM-SIGMOD international conference on management of data (SIGMOD 2000), pp 439–450Google Scholar
  4. 4.
    Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios V (1999) Disclosure limitation of sensitive rules. In: Proceedings of the knowledge and data engineering exchange (KDEX’99), pp 45–52Google Scholar
  5. 5.
    Bailey J, Manoukian T, Ramamohanarao K (2003) A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM 2003), pp 485–488. IEEE computer Society, Dec 2003Google Scholar
  6. 6.
    Bayardo R (1998) Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM-SIGMOD international conference on management of data (SIGMOD’98), pp 85–93Google Scholar
  7. 7.
    Berge C (1989) Hypergraphs: combinatorics of finite sets, volume 45 of North Holland mathematical library. Elsevier Science Publishers B.V., AmsterdamGoogle Scholar
  8. 8.
    Bodon F (2003) A fast APRIORI implementation. In: Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations (FIMI’03), vol 90, pp 56–65Google Scholar
  9. 9.
    Bonchi F, Ferrari E (2011) Privacy-aware knowledge discovery: novel applications and new techniques. Chapman & Hall/CRC data mining and knowledge discovery series. CRC Press INCGoogle Scholar
  10. 10.
    Borgelt C (2012) Frequent item set mining. Wiley Interdiscip Rev: Data Min Knowl Discov 2(6):437–456Google Scholar
  11. 11.
    Boros E, Elbassioni K, Gurvich V, Khachiyan L (2003) An efficient implementation of a quasi-polynomial algorithm for generating hypergraph transversals. In: Proceedings of the 11th annual European symposium on algorithms (ESA 2003), vol 2432 of LNCS, 556–567Google Scholar
  12. 12.
    Boros E, Elbassioni K, Makino K (2008) On Berge multiplication for monotone boolean dualization. In: Proceedings of the 35th international colloquium on automata, languages and programming (ICALP 2008), volume 5125 of LNCS, 48–59Google Scholar
  13. 13.
    Boros E, Gurvich V, Khachiyan L, Makino K (2003) On maximal frequent and minimal infrequent sets in binary matrices. Ann Math Artif Intell 39(3):211–221MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Brijs T, Swinnen G, Vanhoof K, Wets G (1999) Using association rules for product assortment decisions: a case study. In: proceedings of the 5th ACM-SIGKDD international conference on knowledge discovery and data mining (KDD’99), pp 254–260Google Scholar
  15. 15.
    Bu S, Lakshmanan LVS, Ng RT, Ramesh G (2007) Preservation of patterns and input–output privacy. In: Proceedings of the IEEE 23rd international conference on data engineering (ICDE 2007), pp 696–705Google Scholar
  16. 16.
    Calders T (2004) Computational complexity on itemset frequency satisfiability. In: Proceedings of symposium on principles of database systems 2004 (PODS’04), pp 143–154Google Scholar
  17. 17.
    Calders T (2008) Itemset frequency satisfiability: complexity and axiomatization. Theor Comput Sci 394(1–2):84–111MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Clifton C (1999) Protecting against data mining through samples. In: Proceedings of the 13th international conference on database security (DBSec’99), pp 193–207Google Scholar
  19. 19.
    Dong G, Li J (2005) Mining border descriptions of emerging patterns from dataset pairs. Knowl Info Syst 8(2):178–202CrossRefGoogle Scholar
  20. 20.
    Eiter T, Gottlob G (1995) Identifying the minimal transversals of a hypergraph and related problems. SIAM J Comput 24(6):1278–1304MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Eiter T, Gottlob G (2002) Hypergraph transversal computation and related problems in Logic and AI. In: Proceedings of European conference on logic in AI (JELIA 2002), vol 2424 of LNCS/LNAI, pp 549–564Google Scholar
  22. 22.
    Eiter T, Gottlob G, Makino K (2003) New results on monotone dualization and generating hypergraph transversals. SIAM J Comput 32(2):514–537MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Evfimievski AV, Srikant R, Agrawal R, Gehrke J (2004) Privacy preserving mining of association rules. Info Syst 29(4):343–364CrossRefGoogle Scholar
  24. 24.
    Faloutsos C, Megalooikonomou V (2007) On data mining, compression, and Kolmogorov complexity. Data Min Knowl Discov 15(1):3–20MathSciNetCrossRefGoogle Scholar
  25. 25.
    Frequent itemset mining dataset repository. http://fimi.ua.ac.be/data/
  26. 26.
    Fredman ML, Khachiyan L (1996) On the complexity of dualization of monotone disjunctive normal forms. J Algorithm 21:618–628MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4):571–588CrossRefGoogle Scholar
  28. 28.
    Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman and Company, San FranciscozbMATHGoogle Scholar
  29. 29.
    Georgakopoulos G, Kavvadias D, Papadimitriou CH (1988) Probabilistic satisfiability. J Complex 4:1–11MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Gkoulalas-Divanis A, Verykios VS (2009) Hiding sensitive knowledge without side effects. Knowl Info Syst 20(3):263–299CrossRefGoogle Scholar
  31. 31.
    Goldsmith J, Levy MA, Mundhenk M (1996) Limited nondeterminism. ACM SIGACT News 27(2):20–29MathSciNetCrossRefGoogle Scholar
  32. 32.
    Gottlob G (2013) Deciding monotone duality and identifying frequent itemsets in quadratic logspace. Technical report arxiv:1212.1881v3 [cs.DC]
  33. 33.
    Gunopulos D, Khardon R, Mannila H, Saluja S, Sharma HTR (2003) Discovering all most specific sentences. ACM Trans Database Syst 28(2):140–174CrossRefGoogle Scholar
  34. 34.
    Gurvich V, Khachiyan L (1999) On generating the irredundant conjunctive and disjunctive normal forms of monotone Boolean functions. Discret Appl Math 96–97:363–373MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    Guzzo A, Moccia L, Saccà D, Serra E (2013) Solving inverse frequent itemset mining with infrequency constraints via large-scale linear programs. ACM Trans Knowl Discov Data 7(4), Article 18, 1–39Google Scholar
  36. 36.
    Guzzo A, Saccà D, Serra E (2009) An effective approach to inverse frequent set mining. In: Proceedings of the 9th IEEE international conference on data mining (ICDM’09), pp 806–811Google Scholar
  37. 37.
    Hagen M (2009) Lower bounds for three algorithms for transversal hypergraph generation. Discret Appl Math 157:1460–1469MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    IBM ILOG CPLEX user’s manual v12.6Google Scholar
  39. 39.
  40. 40.
    Kagklis V, Verykios VS, Tzimas G, Tsakalidis AK (2014) An integer linear programming scheme to sanitize sensitive frequent itemsets. In: Proceedings of 2014 IEEE international conference on tools with AI (ICTAI 2014), 2014. To appearGoogle Scholar
  41. 41.
    Kantarcioglu M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: Proceedings of the 10th ACM-SIGKDD international conference on knowledge discovery and data mining (KDD’04), pp 599–604Google Scholar
  42. 42.
    Kavvadias DJ, Stavropoulos EC (2003) Monotone Boolean dualization is in co-NP[\(\log ^2n\)]. Info Process Lett 85(1):1–6MathSciNetCrossRefzbMATHGoogle Scholar
  43. 43.
    Kavvadias DJ, Stavropoulos EC (2005) An efficient algorithm for the transversal hypergraph generation. J Graph Algorithms Appl 9(2):239–264MathSciNetCrossRefzbMATHGoogle Scholar
  44. 44.
    Kohavi R, Brodley C, Frasca B, Mason L, Zheng Z (2000) KDD-Cup 2000 organizers’ report: peeling the onion. SIGKDD explorations, 2(2):86–98. http://www.ecn.purdue.edu/KDDCUP
  45. 45.
    Leloglu E, Ayav T, Ergenc B (2014) Coefficient-based exact approach for frequent itemset hiding. In: eKNOW2014: The 6th international conference on information, process, and knowledge management, pp 124–130Google Scholar
  46. 46.
    Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1:241–258CrossRefGoogle Scholar
  47. 47.
    Menon S, Sarkar S, Mukherjee S (2005) Maximizing accuracy of shared databases when concealing sensitive patterns. Info Syst Res 16(3):256–270CrossRefGoogle Scholar
  48. 48.
    Mielikäinen T (2003) On inverse frequent set mining problems. In: Proceedings of the 2nd workshop on privacy preserving data mining (PPDM’03), pp 18–33Google Scholar
  49. 49.
    Moustakides GV, Verykios VS (2008) A maxmin approach for hiding frequent itemsets. Data Knowl Eng 65(1):75–89CrossRefGoogle Scholar
  50. 50.
    Murakami K, Uno T (2011) Efficient algorithms for dualizing large-scale hypergraphs. Technical report arxiv:1102.3813v2 [cs.DC]
  51. 51.
    Rizvi S, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: Proceedings of the 28th international conference on very large data bases (VLDB’02), pp 682–693Google Scholar
  52. 52.
    Sun X, Yu P (2005) A border–based approach for hiding sensitive frequent itemsets. In: Proceedings of 5th IEEE internationa conference on data mining (ICDM 2005), pp 426–433Google Scholar
  53. 53.
    Sun X, Yu PS (2007) Hiding sensitive frequent itemsets by a border-based approach. J Comput Sci Eng 1(1):74–94CrossRefGoogle Scholar
  54. 54.
    Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(5):571–588MathSciNetCrossRefzbMATHGoogle Scholar
  55. 55.
    Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570MathSciNetCrossRefzbMATHGoogle Scholar
  56. 56.
    Takata K (2007) A worst-case analysis of the sequential method to list the minimal hitting sets of a hypergraph. SIAM J Discret Math 21(4):936–946MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  1. 1.Educational Content, Methodology and Technology LaboratoryHellenic Open UniversityPatrasGreece
  2. 2.Business Administration DepartmentTechnological Educational Institute of Western GreecePatrasGreece
  3. 3.School of Science and TechnologyHellenic Open UniversityPatrasGreece
  4. 4.Computer Engineering and Informatics DepartmentUniversity of PatrasPatrasGreece

Personalised recommendations