Advertisement

A Parallelization Framework for Exact Knowledge Hiding in Transactional Databases

  • Aris Gkoulalas-Divanis
  • Vassilios S. Verykios
Part of the IFIP – The International Federation for Information Processing book series (IFIPAICT, volume 278)

Abstract

The hiding of sensitive knowledge, mined from transactional databases, is one of the primary goals of privacy preserving data mining. The increased storage capabilities of modern databases and the necessity for hiding solutions of superior quality, paved the way for parallelization of the hiding process. In this paper, we introduce a novel framework for decomposition and parallel solving of a category of hiding algorithms, known as exact. Exact algorithms hide the sensitive knowledge without any critical compromises, such as the blocking of non-sensitive patterns or the appearance of infrequent itemsets, among the frequent ones, in the sanitized outcome. The proposed framework substantially improves the size of the problems that the exact algorithms can efficiently handle, by significantly reducing their runtime. Furthermore, the generality of the framework makes it appropriate for any hiding algorithm that leads to a constraint satisfaction problem involving linear constraints of binary variables. Through experiments, we demonstrate the effectiveness of our solution on handling a large variety of hiding problem instances.

Key words

Exact knowledge hiding Parallelization Constraints satisfaction problems Binary integer programming 

References

  1. 1.
    1. ILOG CPLEX 9.0 User’s Manual. ILOG Inc., Gentilly, France (2003)Google Scholar
  2. 2.
    Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering (TKDE) 8(1), 962–969Google Scholar
  3. 3.
    3. Atallah, M., Bertino, E., Elmagarmid, A., Ibrahim, M., Verykios, V.S.: Disclosure limitation of sensitive rules. In: Proceedings of the IEEE Knowledge and Data Engineering Exchange Workshop (KDEX), pp. 45–52 (1999)Google Scholar
  4. 4.
    4. Bayardo, R.: Efficiently mining long patterns from databases. Proceedings of the ACM SIGMOD International Conference on Management of Data (1998)Google Scholar
  5. 5.
    5. Clifton, C., Kantarcioglu, M., Vaidya, J.: Defining privacy for data mining. National Science Foundation Workshop on Next Generation Data Mining (WNGDM) pp. 126–133 (2002)Google Scholar
  6. 6.
    6. Clifton, C., Marks, D.: Security and privacy implications of data mining. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 15–19 (1996)Google Scholar
  7. 7.
    7. Gkoulalas-Divanis, A., Verykios, V.S.: An integer programming approach for frequent itemset hiding. In: Proceedings of the ACM Conference on Information and Knowledge Management (CIKM) (2006)Google Scholar
  8. 8.
    8. Gkoulalas-Divanis, A., Verykios, V.S.: A hybrid approach to frequent itemset hiding. In: Proceedings of the IEEE International Conference on Tools with Artificial Intelligence (ICTAI) (2007)Google Scholar
  9. 9.
    9. Han, E.H., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 277–288 (2007)Google Scholar
  10. 10.
    Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering (TKDE) 16(9), 1026–1037 (2004)CrossRefGoogle Scholar
  11. 11.
    Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal of Scientific Computing 20(1), 359–392 (1998)CrossRefGoogle Scholar
  12. 12.
    Kohavi, R., Brodley, C., Frasca, B., Mason, L., Zheng, Z.: KDD-Cup 2000 organizers’ report: Peeling the onion. SIGKDD Explorations 2(2), 86–98 (2000)Google Scholar
  13. 13.
    Menon, S., Sarkar, S., Mukherjee, S.: Maximizing accuracy of shared databases when concealing sensitive patterns. Information Systems Research 16(3), 256–270 (2005)Google Scholar
  14. 14.
    14. Oliveira, S.R.M., Zaïane, O.R.: Privacy preserving frequent itemset mining. In: Proceedings of the IEEE International Conference on Privacy, Security and Data Mining (CRPITS), pp. 43–54 (2002)Google Scholar
  15. 15.
    Saygin, Y., Verykios, V.S., Clifton, C.: Using unknowns to prevent discovery of association rules. ACM SIGMOD Record 30(4), 45–54 (2001)CrossRefGoogle Scholar
  16. 16.
    16. Sun, X., Yu, P.S.: A border-based approach for hiding sensitive frequent itemsets. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), pp. 426–433 (2005)Google Scholar
  17. 17.
    Yokoo, M., Durfee, E.H., Ishida, T., Kuwabara, K.: The distributed constraint satisfaction problem: Formalization and algorithms. IEEE Transactions on Knowledge and Data Engineering 10(5), 673–685 (1998)CrossRefGoogle Scholar
  18. 18.
    18. Zaïane, O.R., M.El-Hajj, Lu, P.: Fast parallel association rule mining without candidacy generation. In: Proceedings of the IEEE International Conference on Data Mining (ICDM), pp. 665–668 (2001)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2008

Authors and Affiliations

  • Aris Gkoulalas-Divanis
    • 1
  • Vassilios S. Verykios
    • 1
  1. 1.Department of Computer and Communication EngineeringUniversity of ThessalyVolosGreece

Personalised recommendations