An improved data characterization method and its application in classification algorithm recommendation

Wang, Guangtao; Song, Qinbao; Zhu, Xiaoyan

doi:10.1007/s10489-015-0689-3

An improved data characterization method and its application in classification algorithm recommendation

Published: 02 July 2015

Volume 43, pages 892–912, (2015)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Guangtao Wang¹,
Qinbao Song¹ &
Xiaoyan Zhu¹

672 Accesses
15 Citations
Explore all metrics

Abstract

Picking up appropriate classification algorithms for a given data set is very important and useful in practice. One of the most challenging issues for algorithm selection is how to characterize different data sets. Recently, we extracted the structural information of a data set to characterize itself. Although these kinds of characteristics work well in identifying similar data sets and recommending appropriate classification algorithms, the extraction method can only be applied to binary data sets and its performance is not high. Thus, in this paper, an improved data set characterization method is proposed to address these problems. For the purpose of evaluating the effectiveness of the improved method on algorithm recommendation, the unsupervised learning method EM is employed to build the algorithm recommendation model. Extensive experiments with 17 different types of classification algorithms are conducted upon 84 public UCI data sets; the results demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating Clustering Meta-features for Classifier Recommendation

CIAMS: clustering indices-based automatic classification model selection

Article 19 August 2023

Multi-label Classification for Recommender Systems

Notes

In this paper and [46], the feature vector of a data set is made of the characteristics of the data set.
As the information we extracted from a data set is quite different from that of the existing work, we use “feature” instead of characteristics hereafter.
Count(Win/Loss, A ₁ : A _i) denotes the total number of data sets that algorithm A ₁ won/lost against A _i.
The probability that we make a mistake to reject the null hypothesis, i.e., a misjudgement to say there exists significant difference but actually does not.

References

Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol 22. ACM, pp 207–216
Aha DW (1992) Generalizing from case studies: A case study. In: Proceedings of the Ninth International Conference on Machine Learning. Citeseer, pp 1–10
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
Google Scholar
Ali S, Smith KA (2006) On learning algorithm selection for classification. Appl Soft Comput 6(2):119–138
Article Google Scholar
Asuncion A NDJ. UCI machine learning repository (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html
Bensusan H (1998) God doesn’t always shave with occam’s razor - learning when and how to prune. In: Proceedigs of the 10th European Conference on Machine Learning. Springer, pp 119–124
Bensusan H, Giraud-Carrier C (2000) Casa batlo is in passeig de gracia or landmarking the expertise space. In: Proceedings of the ECML’2000 workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, pp 29–47
Brazdil P, Gama J, Henery B (1994) Characterizing the applicability of classification algorithms using meta-level learning. In: Proceedings of the European conference on Machine Learning. Springer, pp 83–102
Brazdil P, Soares C (2000) A comparison of ranking methods for classification algorithm selection. In: 11th European Conference on Machine Learning. Springer, pp 63–75
Brazdil PB, Soares C, Da Costa JP (2003) Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Mach Learn 50(3):251–277
Article MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH MathSciNet Google Scholar
Breiman L (2001) Random Forests. Mach Learn 45:5–32
Article MATH Google Scholar
Brodley CE (1993) Addressing the selective superiority problem: Automatic algorithm/model class selection. In: Proceedings of the Tenth International Conference on Machine Learning. Citeseer, pp 17–24
Castiello C, Castellano G, Fanelli A (2005) Meta-data: Characterization of input features for meta-learning. Modeling Decisions for Artificial Intelligence pp. 457–468
Cleary JG, Trigg LEK* (1995) An Instance-based Learner Using and Entropic Distance Measure. In: International Conference on Machine Learning, pp 108–114
Cohen WW (1995) Fast effective rule induction. In: Proceedings of the International Conference on Machine Learning, pp 115–123
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J Royal Stat Soc:1–38
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MATH MathSciNet Google Scholar
Duin RPW, Pekalska E, Tax DMJ (2004) The characterization of classification problems by classifier disagreements. In: Proceedings of the 17th International Conference on Pattern Recognition, vol 1. IEEE, pp 140–143
Fisher D, Xu L, Zard N (1992) Ordering effects in clustering. In: Proceedings of the Ninth International Conference on Machine Learning
Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Proceedings of the 15th International Conference on Machine Learning. Citeseer
Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceeding of the Thirteenth International Conference on Machine Learning. Citeseer, pp 148–156
Gama J, Brazdil P (1995) Characterization of classification algorithms. Progress in Artificial Intelligence pp. 189–200
Henery RJ (1994) Methods for comparison. Ellis Horwood, Upper Saddle River, NJ, USA, pp 107–124. http://dl.acm.org/citation.cfm?id=212782.212789
Google Scholar
Hilario M, Kalousis A (2001) Fusion of meta-knowledge and meta-data for case-based model selection. Principles of Data Mining and Knowledge Discovery pp. 180–191
Ho TK (2000) Complexity of classification problems and comparative advantages of combined classifiers. Multiple Classifier Systems pp. 97–106
Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300
Article Google Scholar
John GH, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence, vol 1. Citeseer , pp 338–345
Kalousis A (2002) Algorithm selection via meta-learning. Ph.D. thesis, University of Geneve
Kalousis A, Gama J, Hilario M (2004) On data and algorithms: Understanding inductive performance. Mach Learn 54(3):275–312
Article MATH Google Scholar
Kalousis A, Hilario M (2000) Model selection via meta-learning: a comparative study. In: Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence. IEEE, pp 406–413
Kalousis A, Theoharis T (1999) NOEMON: Design, implementation and performance results of an intelligent assistant for classifier selection. Intell Data Anal 3(5):319–337
Article MATH Google Scholar
King RD, Feng C, Sutherland A (1995) Statlog: comparison of classification algorithms on large real-world problems. Appl Artif Intell Int J 9(3):289–333
Article Google Scholar
Lindner G, Studer R (1999) AST: Support for algorithm selection with a CBR approach. Principles of Data Mining and Knowledge Discovery pp. 418–423
Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification. Citeseer
Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning. neural and statistical classification
Peng Y, Flach P, Soares C, Brazdil P (2002) Improved dataset characterisation for meta-learning. In: Discovery Science. Springer, pp 193–208
Pfahringer B, Bensusan H, Giraud-Carrier C (2000) Meta-learning by landmarking various learning algorithms. Morgan Kaufmann, pp 743–750
Pizarro J, Guerrero E, Galindo PL (2002) Multiple comparison procedures applied to model selection. Neurocomputing 48: 155–173
Article MATH Google Scholar
Platt J (1998) Machines using sequential minimal optimization
Quinlan JR (1994) Comparing connectionist and symbolic learning methods. In: Computational Learning Theory and Natural Learning Systems: Constraints and Prospects. Citeseer
Smith KA, Woo F, Ciesielski V, Ibrahim R (2001) Modelling the relationship between problem characteristics and data mining algorithm performance using neural networks. Smart Engineering System Design: Neural Networks, Fuzzy Logic, Evolutionary Programming. Data Mining, and Complex Systems pp. 357–362
Smith KA, Woo F, Ciesielski V, Ibrahim R (2002) Matching data mining algorithm suitability to data characteristics using a self-organising map. Hybrid Information Systems pp. 169–180
Smith-Miles KA (2008) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput Surv 41(1):1–25
Article Google Scholar
Sohn SY (1999) Meta analysis of classification algorithms for pattern recognition. IEEE Trans Pattern Anal Mach Intell 21(11):1137–1144
Article Google Scholar
Song Q, Wang G, Wang C (2012) Automatic recommendation of classification algorithms based on data set characteristics. Pattern Recognition
Tatti N (2007) Distances between data sets based on summary statistics. J Mach Learn Res 8:131–154
MATH MathSciNet Google Scholar
Webb GI (2000) Multiboosting: A technique for combining boosting and wagging. Mach Learn 40(2):159–196
Article Google Scholar
Wolpert DH (2001) The supervised learning no-free-lunch theorems. In: Proceedings of 6th Online World Conference on Soft Computing in Industrial Applications. Citeseer, pp 25–42

Download references

Acknowledgments

This work is supported by China Postdoctoral Science Foundation under grant NO. 2014M562417 and the National Natural Science Foundation of China under grant 61402355.

Author information

Authors and Affiliations

Department of Computer Science, Technology, Xi’an Jiaotong University, Xi’an, Shaanxi, 710049, China
Guangtao Wang, Qinbao Song & Xiaoyan Zhu

Authors

Guangtao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qinbao Song
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyan Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guangtao Wang.

Appendix A: The five different kinds of meta-features

1.
Statistical and Information-Theory Based Measures (See Table 5)
2.
Model Structure Based Measures (See Table 6)
3.
Land Marking Based Measures Following the suggestions in [7, 38], the following six classifiers are selected as the landmark learners: i) Naive Bayes, ii) 1-NN (Nearest Neighbor), iii) Elite 1-NN, iv) a decision node learner, v) a random chosen node learner and vi) the worst node learner. Where the last three learners can be achieved based on the well-known learning algorithm C4.5.
4.
Problem Complexity Based Measures (See Table 7)
5.
Structural Information Based Measures First, the two feature vectors, one-item feature vector and two-item feature vector, are extracted from the given problem. These two vectors consists of the frequencies of one-item sets and two-item sets, respectively. Afterward, the minimum, 1/8 quantile, 2/8 quantile, 3/8 quantile, 4/8 quantile, 5/8 quantile, 6/8 quantile, 7/8 quantile and maximum are computed for these two vectors and form the final set of data set characteristics.

Table 5 Statistical and Information-Theory Based Measures

Full size table

Table 6 Model Structure Based Measures

Full size table

Table 7 Problem Complexity Based Measures

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, G., Song, Q. & Zhu, X. An improved data characterization method and its application in classification algorithm recommendation. Appl Intell 43, 892–912 (2015). https://doi.org/10.1007/s10489-015-0689-3

Download citation

Published: 02 July 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10489-015-0689-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An improved data characterization method and its application in classification algorithm recommendation

Abstract

Access this article

Similar content being viewed by others

Evaluating Clustering Meta-features for Classifier Recommendation

CIAMS: clustering indices-based automatic classification model selection

Multi-label Classification for Recommender Systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix A: The five different kinds of meta-features

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An improved data characterization method and its application in classification algorithm recommendation

Abstract

Access this article

Similar content being viewed by others

Evaluating Clustering Meta-features for Classifier Recommendation

CIAMS: clustering indices-based automatic classification model selection

Multi-label Classification for Recommender Systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix A: The five different kinds of meta-features

Appendix A: The five different kinds of meta-features

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation