Abstract
Sensitive knowledge hiding in large transactional databases is one of the major goals of privacy preserving data mining. However, it is only recently that researchers were able to identify exact solutions for the hiding of knowledge, depicted in the form of sensitive frequent itemsets and their related association rules. Exact solutions allow for the hiding of vulnerable knowledge without any critical compromises, such as the hiding of nonsensitive patterns or the accidental uncovering of infrequent itemsets, amongst the frequent ones, in the sanitized outcome. In this paper, we highlight the process of border revision, which plays a significant role towards the identification of exact hiding solutions, and we provide efficient algorithms for the computation of the revised borders. Furthermore, we review two algorithms that identify exact hiding solutions, and we extend the functionality of one of them to effectively identify exact solutions for a wider range of problems (than its original counterpart). Following that, we introduce a novel framework for decomposition and parallel solving of hiding problems, which are handled by each of these approaches. This framework improves to a substantial degree the size of the problems that both algorithms can handle and significantly decreases their runtime. Through experimentation, we demonstrate the effectiveness of these approaches toward providing high quality knowledge hiding solutions.
Similar content being viewed by others
References
Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng (TKDE) 8(1): 962–969
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Databases (VLDB), pp 487–499
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp 439–450
Atallah M, Bertino E, Elmagarmid A, Ibrahim M, Verykios VS (1999) Disclosure limitation of sensitive rules. In: Proceedings of the 1999 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX), pp 45–52
Bayardo R (1998) Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data
Bertino E, Fovino IN, Povenza LP (2005) A framework for evaluating privacy preserving data mining algorithms. Data Mining Knowl Discov (DMKD) 11(2): 121–154
Cheung D, Xiao Y (1998) Effect of data skewness in parallel mining of association rules. In: Proceedings of the 2nd Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining (PAKDD), pp 48–60
Clifton C, Kantarciog̈lu M, Vaidya J (2002) Defining privacy for data mining. National Science Foundation Workshop on Next Generation Data Mining (WNGDM), pp 126–133
Clifton C, Marks D (1996) Security and privacy implications of data mining. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp 15–19
Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: Proceedings of the 4th International Workshop on Information Hiding, pp 369–383
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 343–364
Farkas C, Jajodia S (2002) The inference problem: a survey. ACM SIGKDD Exploration Newsl 4(2): 6–11
Fienberg S, Slavkovic A (2005) Preserving the confidentiality of categorical statistical data bases when releasing information for association rules. Data Mining Knowl Discov (DMKD) 11(2): 155–180
Gkoulalas-Divanis A, Verykios VS (2006) An integer programming approach for frequent itemset hiding. In: Proceedings of the 2006 ACM Conference on Information and Knowledge Management (CIKM)
Gkoulalas-Divanis A, Verykios VS (2007) A hybrid approach to frequent itemset hiding. In: Proceedings of the 2007 IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp 297–304
Han E-H, Karypis G, Kumar V (2007) Scalable parallel data mining for association rules. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pp 277–288
ILOG CPLEX 9.0 User’s Manual (2003) ILOG Inc, Gentilly, France
Kantarciog̈lu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans Knowl Data Eng (TKDE) 16(9): 1026–1037
Kargupta H, Datta S, Wang Q, Sivakumar K (2005) Random-data perturbation techniques and privacy-preserving data mining. Knowl Inform Syst (KAIS) 7(4): 387–414
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1): 359–392
Kohavi R, Brodley C, Frasca B, Mason L, Zheng Z (2000) KDD-Cup 2000 organizers’ report: Peeling the onion. SIGKDD Explorations 2(2): 86–98. http://www.ecn.purdue.edu/KDDCUP
Lee G, Lee K, Chen A (2001) Efficient graph-based algorithms for discovering and maintaining association rules in large databases. Knowl Inform Syst (KAIS) 3(3): 338–355
Menon S, Sarkar S, Mukherjee S (2005) Maximizing accuracy of shared databases when concealing sensitive patterns. Inform Syst Res 16(3): 256–270
Morgenstern M (1988) Controlling logical inference in multilevel database and knowledge-base systems. In: Proceedings of the 1988 IEEE Symposium on Security and Privacy, pp 245–255
Moustakides G, Verykios VS (2006) A max-min approach for hiding frequent itemsets. In: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM), pp 502–506
Oliveira SRM, Zaïane OR (2002) Privacy preserving frequent itemset mining. In: Proceedings of the 2002 IEEE International Conference on Privacy, Security and Data Mining (CRPITS), pp 43–54
Oliveira SRM, Zaïane OR (2003) Protecting sensitive knowledge by data sanitization. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM), pp 211–218
Parthasarathy S, Zaki M, Ogihara M, Li W (2001) Parallel data mining for association rules on shared-memory systems. Knowl Inform Syst (KAIS) 3(1): 1–29
Pontikakis E, Theodoridis Y, Tsitsonis A, Chang L, Verykios VS (2004) A quantitative and qualitative analysis of blocking in association rule hiding. In: Proceedings of the 2004 ACM Workshop on Privacy in the Electronic Society (WPES), pp 29–30
Rizvi S, Haritsa JR (2002) Maintaining data privacy in association rule mining. In: Proceedings of the 28th International Conference on Very Large Databases (VLDB)
Saygin Y, Verykios VS, Clifton C (2001) Using unknowns to prevent discovery of association rules. ACM SIGMOD Record 30(4): 45–54
Sun X, Yu PS (2005) A border-based approach for hiding sensitive frequent itemsets. In: Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM), pp 426–433
Vaidya J, Clifton C (2002) Privacy preserving association rule mining in vertically partitioned data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 639–644
Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004a) State-of-the-art in privacy preserving data mining. ACM SIGMOD Record 33(1): 50–57
Verykios VS, Emagarmid AK, Bertino E, Saygin Y, Dasseni E (2004b) Association rule hiding. IEEE Trans Knowl Data Eng (TKDE) 16(4): 434–447
Xu S, Zhang J, Han D, Wang J (2006) Singular value decomposition based data distortion strategy for privacy protection. Knowl Inform Syst (KAIS) 10(3): 383–397
Yokoo M, Durfee E, Ishida T, Kuwabara K (1998) The distributed constraint satisfaction problem: formalization and algorithms. IEEE Trans Knowl Data Eng (TKDE) 10(5): 673–685
Zaïane OR, El-Hajj M, Lu P (2001) Fast parallel association rule mining without candidacy generation. In: Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM), pp 665–668
Zou Q, Chu W, Johnson D, Chiu H (2002) A pattern decomposition algorithm for data mining of frequent patterns. Knowl Inform Syst (KAIS) 4(4): 466–482
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gkoulalas-Divanis, A., Verykios, V.S. Hiding sensitive knowledge without side effects. Knowl Inf Syst 20, 263–299 (2009). https://doi.org/10.1007/s10115-008-0178-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-008-0178-7