Advertisement

Secure Top-k Subgroup Discovery

  • Henrik Grosskreutz
  • Benedikt Lemmen
  • Stefan Rüping
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6549)

Abstract

Supervised descriptive rule discovery techniques like subgroup discovery are quite popular in applications like fraud detection or clinical studies. Compared with other descriptive techniques, like classical support/confidence association rules, subgroup discovery has the advantage that it comes up with only the top-k patterns, and that it makes use of a quality function that avoids patterns uncorrelated with the target. If these techniques are to be applied in privacy-sensitive scenarios involving distributed data, precise guarantees are needed regarding the amount of information leaked during the execution of the data mining. Unfortunately, the adaptation of secure multi-party protocols for classical support/confidence association rule mining to the task of subgroup discovery is impossible for fundamental reasons. The source is the different quality function and the restriction to a fixed number of patterns – i.e. exactly the desired features of subgroup discovery. In this paper, we present a new protocol which allows distributed subgroup discovery while avoiding the disclosure of the individual databases. We analyze the properties of the protocol, describe a prototypical implementation and present experiments that demonstrate the feasibility of the approach.

Keywords

Association Rule Quality Function Privacy Preserve Subgroup Discovery Privacy Preserve Data Mining 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar
  2. 2.
    Bay, S.D., Pazzani, M.J.: Detecting group differences: Mining contrast sets. Data Min. Knowl. Discov. 5(3), 213–246 (2001)CrossRefzbMATHGoogle Scholar
  3. 3.
    Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: Generalizing association rules to correlations. In: Peckham, J. (ed.) SIGMOD Conference, pp. 265–276. ACM Press, New York (1997)CrossRefGoogle Scholar
  4. 4.
    Cheung, D., Han, J., Ng, V., Fu, A., Fu, Y.: A fast distributed algorithm for mining association rules. In: International Conference on Parallel and Distributed Information Systems, p. 0031 (1996)Google Scholar
  5. 5.
    Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.Y.: Tools for privacy preserving distributed data mining. SIGKDD Explor. Newsl. 4(2) (2002)Google Scholar
  6. 6.
    Fürnkranz, J., Flach, P.A.: Roc ’n’ rule learning-towards a better understanding of covering algorithms. Machine Learning 58(1), 39–77 (2005)CrossRefzbMATHGoogle Scholar
  7. 7.
    Goldreich, O.: General Cryptographic Protocols. In: The Foundations of Cryptography, vol. 2. Cambridge University Press, Cambridge (2004)CrossRefGoogle Scholar
  8. 8.
    Grosskreutz, H., Rüping, S., Wrobel, S.: Tight optimistic estimates for fast subgroup discovery. In: ECML/PKDD (1). Springer, Heidelberg (2008)Google Scholar
  9. 9.
    Hämäläinen, W., Nykänen, M.: Efficient discovery of statistically significant association rules. In: ICDM, pp. 203–212. IEEE Computer Society, Los Alamitos (2008)Google Scholar
  10. 10.
    Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering 16(9), 1026–1037 (2004)CrossRefGoogle Scholar
  11. 11.
    Klösgen, W.: Explora: A multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining (1996)Google Scholar
  12. 12.
    Lavrac, N., Kavsek, B., Flach, P., Todorovski, L.: Subgroup discovery with cn2-sd. Journal of Machine Learning Research 5, 153–188 (2004)MathSciNetGoogle Scholar
  13. 13.
    Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, p. 36. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  14. 14.
    Lindell, Y., Pinkas, B.: A proof of yao’s protocol for secure two-party computation. Technical report (2004)Google Scholar
  15. 15.
    Mielikäinen, T.: On inverse frequent set mining. In: Workshop on Privacy Preserving Data Mining (2003)Google Scholar
  16. 16.
    Nijssen, S., Guns, T., Raedt, L.D.: Correlated itemset mining in roc space: a constraint programming approach. In: KDD, pp. 647–656 (2009)Google Scholar
  17. 17.
    Novak, P.K., Lavrač, N., Webb, G.I.: Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research 10 (2009)Google Scholar
  18. 18.
    Pinkas, B.: Cryptographic techniques for privacy-preserving data mining. SIGKDD Explor. Newsl. 4(2), 12–19 (2002)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Scholz, M.: On the tractability of rule discovery from distributed data. In: ICDM, pp. 761–764. IEEE Computer Society, Los Alamitos (2005)Google Scholar
  20. 20.
    Shaneck, M., Kim, Y., Kumar, V.: Privacy preserving nearest neighbor search. In: ICDM Workshops, pp. 541–545 (2006)Google Scholar
  21. 21.
    Shundong, L., Yiqi, D., Daoshun, W., Ping, L.: Symmetric encryption solutions to millionaire’s problem and its extension. In: 1st International Conference on Digital Information Management (2006)Google Scholar
  22. 22.
    Webb, G.I.: Discovering significant rules. In: KDD 2006: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 434–443. ACM, New York (2006)Google Scholar
  23. 23.
    Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Zytkow, J. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  24. 24.
    Wurst, M., Scholz, M.: Distributed subgroup mining. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 421–433. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  25. 25.
    Yao, A.C.-C.: Protocols for secure computations (extended abstract). In: FOCS. IEEE, Los Alamitos (1982)Google Scholar
  26. 26.
    Yao, A.C.-C.: How to generate and exchange secrets. In: 27th Annual Symposium on Foundations of Computer Science, 1985, pp. 162–167 (October 1986)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Henrik Grosskreutz
    • 1
  • Benedikt Lemmen
    • 1
  • Stefan Rüping
    • 1
  1. 1.Fraunhofer IAISSchloss BirlinghovenSankt AugustinGermany

Personalised recommendations