Abstract
Various real-life datasets can be viewed as a set of records consisting of attributes explaining the records and set of measures evaluating the records. We address the problem of automatically discovering interesting subsets from such a dataset, such that the discovered interesting subsets have significantly different characteristics of performance than the rest of the dataset. We present an algorithm to discover such interesting subsets. The proposed algorithm uses a generic domain-independent definition of interestingness and uses various heuristics to intelligently prune the search space in order to build a solution scalable to large size datasets. We present application of the interesting subset discovery algorithm on four real-world case-studies and demonstrates the effectiveness of the interesting subset discovery algorithm in extracting insights in order to identify problem areas and provide improvement recommendations to wide variety of systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Atzmueller, M., Puppe, F.: Sd-map: a fast algorithm for exhaustive subgroup discovery. In: Proceedings of PKDD 2006, LNAI, vol. 4213, pp. 6–17. Springer, Berlin (2006)
Atzmueller, M., Puppe, F., Buscher, H.: Profiling examiners using intelligent subgroup mining. In: Proceedings of 10th International Workshop on Intelligent Data Analysis in Medicine and, Pharmacology (IDAMAP-2005), pp. 46–51 (2005)
c, N.L., Cestnik, B., Gemberger, D., Flach, P.: Subgroup discovery with cn2-sd. Machine Learning 57, 115–143 (2004).
Lavrac, N., Sek, B.K., Flach, P., Todorovski, L.: Subgroup discovery with cn2-sd. J. Mach. Learn. Res. 5, 153–188 (2004)
Friedman, J., Fisher, N.I.: Bump hunting in high-dimensional data. Stat. Comput. 9, 123–143 (1999)
Scheffer, T., Wrobel, S.: Finding the most interesting patterns in a database quickly by using sequential sampling. J. Mach. Learn. Res. 3, 833–862 (2002)
Scholtz, M.: Sampling based sequential subgroup mining. In: Proceedings of 11th SIG KDD, pp. 265–274 (2005)
Sek, B.K., Lavrac, N., Jovanoski, V.: Apriori-sd: adapting association rule learning to subgroup discovery. In: Proceedings of 5th International Symposium On Intelligent Data Analysis, pp. 230–241. Springer, Berlin (2003)
Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learn. 3(4), 261–283 (1989)
Palshikar, G., Deshpande, S., Bhat, S.: Quest: Discovering insights from survey responses. In: Proceedings of 8th Australasian Data Mining Conference (AusDM09), pp. 83–92 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Natu, M., Palshikar, G.K. (2014). Interesting Subset Discovery and Its Application on Service Processes. In: Yada, K. (eds) Data Mining for Service. Studies in Big Data, vol 3. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45252-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-45252-9_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45251-2
Online ISBN: 978-3-642-45252-9
eBook Packages: EngineeringEngineering (R0)