Interesting Subset Discovery and Its Application on Service Processes

Part of the Studies in Big Data book series (SBD, volume 3)


Various real-life datasets can be viewed as a set of records consisting of attributes explaining the records and set of measures evaluating the records. We address the problem of automatically discovering interesting subsets from such a dataset, such that the discovered interesting subsets have significantly different characteristics of performance than the rest of the dataset. We present an algorithm to discover such interesting subsets. The proposed algorithm uses a generic domain-independent definition of interestingness and uses various heuristics to intelligently prune the search space in order to build a solution scalable to large size datasets. We present application of the interesting subset discovery algorithm on four real-world case-studies and demonstrates the effectiveness of the interesting subset discovery algorithm in extracting insights in order to identify problem areas and provide improvement recommendations to wide variety of systems.


  1. 1.
    Atzmueller, M., Puppe, F.: Sd-map: a fast algorithm for exhaustive subgroup discovery. In: Proceedings of PKDD 2006, LNAI, vol. 4213, pp. 6–17. Springer, Berlin (2006)Google Scholar
  2. 2.
    Atzmueller, M., Puppe, F., Buscher, H.: Profiling examiners using intelligent subgroup mining. In: Proceedings of 10th International Workshop on Intelligent Data Analysis in Medicine and, Pharmacology (IDAMAP-2005), pp. 46–51 (2005)Google Scholar
  3. 3.
    c, N.L., Cestnik, B., Gemberger, D., Flach, P.: Subgroup discovery with cn2-sd. Machine Learning 57, 115–143 (2004).Google Scholar
  4. 4.
    Lavrac, N., Sek, B.K., Flach, P., Todorovski, L.: Subgroup discovery with cn2-sd. J. Mach. Learn. Res. 5, 153–188 (2004)Google Scholar
  5. 5.
    Friedman, J., Fisher, N.I.: Bump hunting in high-dimensional data. Stat. Comput. 9, 123–143 (1999)CrossRefGoogle Scholar
  6. 6.
    Scheffer, T., Wrobel, S.: Finding the most interesting patterns in a database quickly by using sequential sampling. J. Mach. Learn. Res. 3, 833–862 (2002)MathSciNetGoogle Scholar
  7. 7.
    Scholtz, M.: Sampling based sequential subgroup mining. In: Proceedings of 11th SIG KDD, pp. 265–274 (2005)Google Scholar
  8. 8.
    Sek, B.K., Lavrac, N., Jovanoski, V.: Apriori-sd: adapting association rule learning to subgroup discovery. In: Proceedings of 5th International Symposium On Intelligent Data Analysis, pp. 230–241. Springer, Berlin (2003)Google Scholar
  9. 9.
    Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learn. 3(4), 261–283 (1989)Google Scholar
  10. 10.
    Palshikar, G., Deshpande, S., Bhat, S.: Quest: Discovering insights from survey responses. In: Proceedings of 8th Australasian Data Mining Conference (AusDM09), pp. 83–92 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Tata Research Development and Design CentreTata Consultancy Services LimitedPuneIndia

Personalised recommendations