Distributed Subgroup Mining

  • Michael Wurst
  • Martin Scholz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4213)


Subgroup discovery is a popular form of supervised rule learning, applicable to descriptive and predictive tasks. In this work we study two natural extensions of classical subgroup discovery to distributed settings. In the first variant the goal is to efficiently identify global subgroups, i.e. the rules an analysis would yield after collecting all the data at a single central database. In contrast, the second considered variant takes the locality of data explicitly into account. The aim is to find patterns that point out major differences between individual databases with respect to a specific property of interest (target attribute). We point out substantial differences between these novel learning problems and other kinds of distributed data mining tasks. These differences motivate new search and communication strategies, aiming at a minimization of computation time and communication costs. We present and empirically evaluate new algorithms for both considered variants.


Association Rule Communication Cost Frequent Itemset Count Polling Pruning Strategy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Zaki, M.J.: Parallel and Distributed Association Mining: A Survey. IEEE Concurrency 7 (1999)Google Scholar
  2. 2.
    Park, B.H., Kargupta, H.: Distributed Data Mining: Algorithms, Systems, and Applications. In: Ye, N. (ed.) Data Mining Handbook. IEA (2002)Google Scholar
  3. 3.
    Klösgen, W.: Subgroup discovery. In: Handbook of Data Mining and Knowledge Discovery. Oxford University Press, Oxford (2002)Google Scholar
  4. 4.
    Wrobel, S.: An Algorithm for Multi–relational Discovery of Subgroups. In: Principles of Data Mining and Knowledge Discovery: First European Symposium (1997)Google Scholar
  5. 5.
    Klösgen, W.: Explora: A Multipattern and Multistrategy Discovery Assistant. In: Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park (1996)Google Scholar
  6. 6.
    Lavrac, N., Cestnik, B., Gamberger, D., Flach, P.: Decision support through subgroup discovery: three case studies and the lessons learned. MLJ 57 (2004)Google Scholar
  7. 7.
    Atzmüller, M., Puppe, F., Buscher, H.P.: Exploiting background knowledge for knowledge-intensive subgroup discovery. In: Proc. of IJCAI (2005)Google Scholar
  8. 8.
    Scholz, M.: Sampling-Based Sequential Subgroup Mining. In: Proc. of KDD (2005)Google Scholar
  9. 9.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large data bases. In: Proc. of VLDB (1994)Google Scholar
  10. 10.
    Fürnkranz, J., Flach, P.: ROC ’n’ Rule Learning – Towards a Better Understanding of Covering Algorithms. MLJ 58 (2005)Google Scholar
  11. 11.
    Nada Lavrac, N., Flach, P., Zupan, B.: Rule Evaluation Measures: A Unifying View. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS, vol. 1634, p. 174. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  12. 12.
    Scheffer, T., Wrobel, S.: Finding the Most Interesting Patterns in a Database Quickly by Using Sequential Sampling. JMLR 3 (2002)Google Scholar
  13. 13.
    Scholz, M.: On the Tractability of Rule Discovery from Distributed Data. In: Proc. of ICDM (2005)Google Scholar
  14. 14.
    Otey, M.E., Parthasarathy, S., Wang, C., Veloso, A., Meira, W.: Parallel and Distributed Methods for Incremental Frequent Itemset Mining. IEEE Transactions on Systems, Man, and Cybernetics, Part B 34, 2439–2450 (2004)CrossRefGoogle Scholar
  15. 15.
    Lazarevic, A., Obradovic, Z.: Boosting algorithms for parallel and distributed learning. Distributed and Parallel Databases Journal 11 (2002)Google Scholar
  16. 16.
    Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Transactions On Knowledge And Data Engineering 8 (1996)Google Scholar
  17. 17.
    Cheung, D., Han, J., Ng, V., Fu, A., Fu, Y.: A Fast Distributed Algorithm for Mining Association Rules. In: International Conference on Parallel and Distributed Information Systems (1996)Google Scholar
  18. 18.
    Schuster, A., Wolff, R.: Communication-efficient distributed mining of association rules. In: Proc. of SIGMOD (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Michael Wurst
    • 1
  • Martin Scholz
    • 1
  1. 1.Artificial Intelligence GroupUniversity of DortmundGermany

Personalised recommendations