Abstract
Picking up appropriate classification algorithms for a given data set is very important and useful in practice. One of the most challenging issues for algorithm selection is how to characterize different data sets. Recently, we extracted the structural information of a data set to characterize itself. Although these kinds of characteristics work well in identifying similar data sets and recommending appropriate classification algorithms, the extraction method can only be applied to binary data sets and its performance is not high. Thus, in this paper, an improved data set characterization method is proposed to address these problems. For the purpose of evaluating the effectiveness of the improved method on algorithm recommendation, the unsupervised learning method EM is employed to build the algorithm recommendation model. Extensive experiments with 17 different types of classification algorithms are conducted upon 84 public UCI data sets; the results demonstrate the effectiveness of the proposed method.
Similar content being viewed by others
Notes
In this paper and [46], the feature vector of a data set is made of the characteristics of the data set.
As the information we extracted from a data set is quite different from that of the existing work, we use “feature” instead of characteristics hereafter.
Count(Win/Loss, A 1 : A i ) denotes the total number of data sets that algorithm A 1 won/lost against A i .
The probability that we make a mistake to reject the null hypothesis, i.e., a misjudgement to say there exists significant difference but actually does not.
References
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol 22. ACM, pp 207–216
Aha DW (1992) Generalizing from case studies: A case study. In: Proceedings of the Ninth International Conference on Machine Learning. Citeseer, pp 1–10
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
Ali S, Smith KA (2006) On learning algorithm selection for classification. Appl Soft Comput 6(2):119–138
Asuncion A NDJ. UCI machine learning repository (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html
Bensusan H (1998) God doesn’t always shave with occam’s razor - learning when and how to prune. In: Proceedigs of the 10th European Conference on Machine Learning. Springer, pp 119–124
Bensusan H, Giraud-Carrier C (2000) Casa batlo is in passeig de gracia or landmarking the expertise space. In: Proceedings of the ECML’2000 workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, pp 29–47
Brazdil P, Gama J, Henery B (1994) Characterizing the applicability of classification algorithms using meta-level learning. In: Proceedings of the European conference on Machine Learning. Springer, pp 83–102
Brazdil P, Soares C (2000) A comparison of ranking methods for classification algorithm selection. In: 11th European Conference on Machine Learning. Springer, pp 63–75
Brazdil PB, Soares C, Da Costa JP (2003) Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Mach Learn 50(3):251–277
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random Forests. Mach Learn 45:5–32
Brodley CE (1993) Addressing the selective superiority problem: Automatic algorithm/model class selection. In: Proceedings of the Tenth International Conference on Machine Learning. Citeseer, pp 17–24
Castiello C, Castellano G, Fanelli A (2005) Meta-data: Characterization of input features for meta-learning. Modeling Decisions for Artificial Intelligence pp. 457–468
Cleary JG, Trigg LEK* (1995) An Instance-based Learner Using and Entropic Distance Measure. In: International Conference on Machine Learning, pp 108–114
Cohen WW (1995) Fast effective rule induction. In: Proceedings of the International Conference on Machine Learning, pp 115–123
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J Royal Stat Soc:1–38
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Duin RPW, Pekalska E, Tax DMJ (2004) The characterization of classification problems by classifier disagreements. In: Proceedings of the 17th International Conference on Pattern Recognition, vol 1. IEEE, pp 140–143
Fisher D, Xu L, Zard N (1992) Ordering effects in clustering. In: Proceedings of the Ninth International Conference on Machine Learning
Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Proceedings of the 15th International Conference on Machine Learning. Citeseer
Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceeding of the Thirteenth International Conference on Machine Learning. Citeseer, pp 148–156
Gama J, Brazdil P (1995) Characterization of classification algorithms. Progress in Artificial Intelligence pp. 189–200
Henery RJ (1994) Methods for comparison. Ellis Horwood, Upper Saddle River, NJ, USA, pp 107–124. http://dl.acm.org/citation.cfm?id=212782.212789
Hilario M, Kalousis A (2001) Fusion of meta-knowledge and meta-data for case-based model selection. Principles of Data Mining and Knowledge Discovery pp. 180–191
Ho TK (2000) Complexity of classification problems and comparative advantages of combined classifiers. Multiple Classifier Systems pp. 97–106
Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300
John GH, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence, vol 1. Citeseer , pp 338–345
Kalousis A (2002) Algorithm selection via meta-learning. Ph.D. thesis, University of Geneve
Kalousis A, Gama J, Hilario M (2004) On data and algorithms: Understanding inductive performance. Mach Learn 54(3):275–312
Kalousis A, Hilario M (2000) Model selection via meta-learning: a comparative study. In: Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence. IEEE, pp 406–413
Kalousis A, Theoharis T (1999) NOEMON: Design, implementation and performance results of an intelligent assistant for classifier selection. Intell Data Anal 3(5):319–337
King RD, Feng C, Sutherland A (1995) Statlog: comparison of classification algorithms on large real-world problems. Appl Artif Intell Int J 9(3):289–333
Lindner G, Studer R (1999) AST: Support for algorithm selection with a CBR approach. Principles of Data Mining and Knowledge Discovery pp. 418–423
Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification. Citeseer
Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning. neural and statistical classification
Peng Y, Flach P, Soares C, Brazdil P (2002) Improved dataset characterisation for meta-learning. In: Discovery Science. Springer, pp 193–208
Pfahringer B, Bensusan H, Giraud-Carrier C (2000) Meta-learning by landmarking various learning algorithms. Morgan Kaufmann, pp 743–750
Pizarro J, Guerrero E, Galindo PL (2002) Multiple comparison procedures applied to model selection. Neurocomputing 48: 155–173
Platt J (1998) Machines using sequential minimal optimization
Quinlan JR (1994) Comparing connectionist and symbolic learning methods. In: Computational Learning Theory and Natural Learning Systems: Constraints and Prospects. Citeseer
Smith KA, Woo F, Ciesielski V, Ibrahim R (2001) Modelling the relationship between problem characteristics and data mining algorithm performance using neural networks. Smart Engineering System Design: Neural Networks, Fuzzy Logic, Evolutionary Programming. Data Mining, and Complex Systems pp. 357–362
Smith KA, Woo F, Ciesielski V, Ibrahim R (2002) Matching data mining algorithm suitability to data characteristics using a self-organising map. Hybrid Information Systems pp. 169–180
Smith-Miles KA (2008) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput Surv 41(1):1–25
Sohn SY (1999) Meta analysis of classification algorithms for pattern recognition. IEEE Trans Pattern Anal Mach Intell 21(11):1137–1144
Song Q, Wang G, Wang C (2012) Automatic recommendation of classification algorithms based on data set characteristics. Pattern Recognition
Tatti N (2007) Distances between data sets based on summary statistics. J Mach Learn Res 8:131–154
Webb GI (2000) Multiboosting: A technique for combining boosting and wagging. Mach Learn 40(2):159–196
Wolpert DH (2001) The supervised learning no-free-lunch theorems. In: Proceedings of 6th Online World Conference on Soft Computing in Industrial Applications. Citeseer, pp 25–42
Acknowledgments
This work is supported by China Postdoctoral Science Foundation under grant NO. 2014M562417 and the National Natural Science Foundation of China under grant 61402355.
Author information
Authors and Affiliations
Corresponding author
Appendix A: The five different kinds of meta-features
Appendix A: The five different kinds of meta-features
-
1.
Statistical and Information-Theory Based Measures (See Table 5)
-
2.
Model Structure Based Measures (See Table 6)
-
3.
Land Marking Based Measures Following the suggestions in [7, 38], the following six classifiers are selected as the landmark learners: i) Naive Bayes, ii) 1-NN (Nearest Neighbor), iii) Elite 1-NN, iv) a decision node learner, v) a random chosen node learner and vi) the worst node learner. Where the last three learners can be achieved based on the well-known learning algorithm C4.5.
-
4.
Problem Complexity Based Measures (See Table 7)
-
5.
Structural Information Based Measures First, the two feature vectors, one-item feature vector and two-item feature vector, are extracted from the given problem. These two vectors consists of the frequencies of one-item sets and two-item sets, respectively. Afterward, the minimum, 1/8 quantile, 2/8 quantile, 3/8 quantile, 4/8 quantile, 5/8 quantile, 6/8 quantile, 7/8 quantile and maximum are computed for these two vectors and form the final set of data set characteristics.
Rights and permissions
About this article
Cite this article
Wang, G., Song, Q. & Zhu, X. An improved data characterization method and its application in classification algorithm recommendation. Appl Intell 43, 892–912 (2015). https://doi.org/10.1007/s10489-015-0689-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-015-0689-3