Evaluating Learning Algorithms to Support Human Rule Evaluation with Predicting Interestingness Based on Objective Rule Evaluation Indices
Summary
In this paper, we present an evaluation of learning algorithms of a rule evaluation support method with rule evaluation models based on objective indices for data mining post-processing. Post-processing of mined results is one of the key processes in a data mining process. However, it is difficult for human experts to evaluate several thousands of rules from a large dataset with noises for finding out reraly included valuable rules. To reduce the costs in such rule evaluation task, we have developed the rule evaluation support method with rule evaluation models which learn from a dataset. This dataset comprises objective indices for mined classification rules and evaluations by a human expert for each rule. To evaluate performances of learning algorithms for constructing the rule evaluation models, we have done a case study on the meningitis data mining as an actual problem. Furthermore, we have also evaluated our method with twelve rule sets obtained from twelve UCI datasets. With regard to these results, we show the availability of our rule evaluation support method for human experts.
Keywords
Data Mining Post-processing Rule Evaluation Support Objective Rule Evaluation IndexPreview
Unable to display preview. Download preview PDF.
References
- 1.Ali, K., Manganaris, S., Srikant, R.: Partial Classification Using Association Rules. Proc. of Int. Conf. on Knowledge Discovery and Data Mining KDD-1997 (1997) 115–118Google Scholar
- 2.Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. Proc. of ACM SIGMOD Int. Conf. on Management of Data (1997) 255–264Google Scholar
- 3.Frank, E., Wang, Y., Inglis, S., Holmes, G., and Witten, I. H.: Using model trees for classification. Machine Learning, Vol.32, No.1 (1998) 63–76MATHCrossRefGoogle Scholar
- 4.Frank, E, Witten, I. H., Generating accurate rule sets without global optimization. Proc. of the Fifteenth International Conference on Machine Learning, (1998) 144–151Google Scholar
- 5.Gago, P., Bento, C.: A Metric for Selection of the Most Promising Rules. Proc. of Euro. Conf. on the Principles of Data Mining and Knowledge Discovery PKDD-1998 (1998) 19–27Google Scholar
- 6.Goodman, L. A., Kruskal, W. H.: Measures of association for cross classifications. Springer Series in Statistics, 1, Springer-Verlag (1979)Google Scholar
- 7.Gray, B., Orlowska, M. E.: CCAIIA: Clustering Categorical Attributes into Interesting Association Rules. Proc. of Pacific-Asia Conf. on Knowledge Discovery and Data Mining PAKDD-1998 (1998) 132–143Google Scholar
- 8.Hamilton, H. J., Shan, N., Ziarko, W.: Machine Learning of Credible Classifications. Proc. of Australian Conf. on Artificial Intelligence AI-1997 (1997) 330–339Google Scholar
- 9.Hatazawa, H., Negishi, N., Suyama, A., Tsumoto, S., and Yamaguchi, T.: Knowledge Discovery Support from a Meningoencephalitis Database Using an Automatic Composition Tool for Inductive Applications. Proc. of KDD Challenge 2000 in conjunction with PAKDD2000 (2000) 28–33Google Scholar
- 10.Hettich, S., Blake, C. L., and Merz, C. J.: UCI Repository of machine learning databases [http://www.ics.uci.edu/ \tilde{}mlearn/MLRepository.html], Irvine, CA: University of California, Department of Information and Computer Science, (1998)
- 11.Hilderman, R. J. and Hamilton, H. J.: Knowledge Discovery and Measure of Interest. Kluwe Academic Publishers (2001)Google Scholar
- 12.Hinton, G. E.: “Learning distributed representations of concepts”, Proceedings of 8th Annual Conference of the Cognitive Science Society, Amherest, MA. REprinted in R.G.M. Morris (ed.) (1986)Google Scholar
- 13.Holte, R. C.: Very simple classification rules perform well on most commonly used datasets, Machine Learning, Vol. 11 (1993) 63–91MATHCrossRefGoogle Scholar
- 14.Klösgen, W.: Explora: A Multipattern and Multistrategy Discovery Assistant. in Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy R. (Eds.): Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, California (1996) 249–271Google Scholar
- 15.Ohsaki, M., Kitaguchi, S., Kume, S., Yokoi, H., and Yamaguchi, T.: Evaluation of Rule Interestingness Measures with a Clinical Dataset on Hepatitis. Proc. of ECML/PKDD 2004, LNAI3202 (2004) 362–373Google Scholar
- 16.Piatetsky-Shapiro, G.: Discovery, Analysis and Presentation of Strong Rules. In Piatetsky-Shapiro, G., Frawley, W. J. (eds.): Knowledge Discovery in Databases. AAAI/MIT Press (1991) 229–248Google Scholar
- 17.Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: B. Schölkopf, C. Burges, and A. Smola (eds.): Advances in Kernel Methods - Support Vector Learning, MIT Press (1999) 185–208Google Scholar
- 18.Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, (1993)Google Scholar
- 19.Rijsbergen, C.: Information Retrieval, Chapter 7, Butterworths, London, (1979) http://www.dcs.gla.ac.uk/Keith/Chapter.7/Ch.7.html
- 20.Smyth, P., Goodman, R. M.: Rule Induction using Information Theory. In Piatetsky-Shapiro, G., Frawley, W. J. (eds.): Knowledge Discovery in Databases. AAAI/MIT Press (1991) 159–176Google Scholar
- 21.Tan, P. N., Kumar V., Srivastava, J.: Selecting the Right Interestingness Measure for Association Patterns. Proc. of Int. Conf. on Knowledge Discovery and Data Mining KDD-2002 (2002) 32–41Google Scholar
- 22.Witten, I. H and Frank, E.: DataMining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, (2000)Google Scholar
- 23.Yao, Y. Y. Zhong, N.: An Analysis of Quantitative Measures Associated with Rules. Proc. of Pacific-Asia Conf. on Knowledge Discovery and Data Mining PAKDD-1999 (1999) 479–488Google Scholar
- 24.Zhong, N., Yao, Y. Y., Ohshima, M.: Peculiarity Oriented Multi-Database Mining. IEEE Trans. on Knowledge and Data Engineering, 15, 4, (2003) 952–960CrossRefGoogle Scholar