Skip to main content
Log in

An improved data characterization method and its application in classification algorithm recommendation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Picking up appropriate classification algorithms for a given data set is very important and useful in practice. One of the most challenging issues for algorithm selection is how to characterize different data sets. Recently, we extracted the structural information of a data set to characterize itself. Although these kinds of characteristics work well in identifying similar data sets and recommending appropriate classification algorithms, the extraction method can only be applied to binary data sets and its performance is not high. Thus, in this paper, an improved data set characterization method is proposed to address these problems. For the purpose of evaluating the effectiveness of the improved method on algorithm recommendation, the unsupervised learning method EM is employed to build the algorithm recommendation model. Extensive experiments with 17 different types of classification algorithms are conducted upon 84 public UCI data sets; the results demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. In this paper and [46], the feature vector of a data set is made of the characteristics of the data set.

  2. As the information we extracted from a data set is quite different from that of the existing work, we use “feature” instead of characteristics hereafter.

  3. Count(Win/Loss, A 1 : A i ) denotes the total number of data sets that algorithm A 1 won/lost against A i .

  4. The probability that we make a mistake to reject the null hypothesis, i.e., a misjudgement to say there exists significant difference but actually does not.

References

  1. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol 22. ACM, pp 207–216

  2. Aha DW (1992) Generalizing from case studies: A case study. In: Proceedings of the Ninth International Conference on Machine Learning. Citeseer, pp 1–10

  3. Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66

    Google Scholar 

  4. Ali S, Smith KA (2006) On learning algorithm selection for classification. Appl Soft Comput 6(2):119–138

    Article  Google Scholar 

  5. Asuncion A NDJ. UCI machine learning repository (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html

  6. Bensusan H (1998) God doesn’t always shave with occam’s razor - learning when and how to prune. In: Proceedigs of the 10th European Conference on Machine Learning. Springer, pp 119–124

  7. Bensusan H, Giraud-Carrier C (2000) Casa batlo is in passeig de gracia or landmarking the expertise space. In: Proceedings of the ECML’2000 workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, pp 29–47

  8. Brazdil P, Gama J, Henery B (1994) Characterizing the applicability of classification algorithms using meta-level learning. In: Proceedings of the European conference on Machine Learning. Springer, pp 83–102

  9. Brazdil P, Soares C (2000) A comparison of ranking methods for classification algorithm selection. In: 11th European Conference on Machine Learning. Springer, pp 63–75

  10. Brazdil PB, Soares C, Da Costa JP (2003) Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Mach Learn 50(3):251–277

    Article  MATH  Google Scholar 

  11. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  MathSciNet  Google Scholar 

  12. Breiman L (2001) Random Forests. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  13. Brodley CE (1993) Addressing the selective superiority problem: Automatic algorithm/model class selection. In: Proceedings of the Tenth International Conference on Machine Learning. Citeseer, pp 17–24

  14. Castiello C, Castellano G, Fanelli A (2005) Meta-data: Characterization of input features for meta-learning. Modeling Decisions for Artificial Intelligence pp. 457–468

  15. Cleary JG, Trigg LEK* (1995) An Instance-based Learner Using and Entropic Distance Measure. In: International Conference on Machine Learning, pp 108–114

  16. Cohen WW (1995) Fast effective rule induction. In: Proceedings of the International Conference on Machine Learning, pp 115–123

  17. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J Royal Stat Soc:1–38

  18. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MATH  MathSciNet  Google Scholar 

  19. Duin RPW, Pekalska E, Tax DMJ (2004) The characterization of classification problems by classifier disagreements. In: Proceedings of the 17th International Conference on Pattern Recognition, vol 1. IEEE, pp 140–143

  20. Fisher D, Xu L, Zard N (1992) Ordering effects in clustering. In: Proceedings of the Ninth International Conference on Machine Learning

  21. Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Proceedings of the 15th International Conference on Machine Learning. Citeseer

  22. Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceeding of the Thirteenth International Conference on Machine Learning. Citeseer, pp 148–156

  23. Gama J, Brazdil P (1995) Characterization of classification algorithms. Progress in Artificial Intelligence pp. 189–200

  24. Henery RJ (1994) Methods for comparison. Ellis Horwood, Upper Saddle River, NJ, USA, pp 107–124. http://dl.acm.org/citation.cfm?id=212782.212789

    Google Scholar 

  25. Hilario M, Kalousis A (2001) Fusion of meta-knowledge and meta-data for case-based model selection. Principles of Data Mining and Knowledge Discovery pp. 180–191

  26. Ho TK (2000) Complexity of classification problems and comparative advantages of combined classifiers. Multiple Classifier Systems pp. 97–106

  27. Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300

    Article  Google Scholar 

  28. John GH, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence, vol 1. Citeseer , pp 338–345

  29. Kalousis A (2002) Algorithm selection via meta-learning. Ph.D. thesis, University of Geneve

  30. Kalousis A, Gama J, Hilario M (2004) On data and algorithms: Understanding inductive performance. Mach Learn 54(3):275–312

    Article  MATH  Google Scholar 

  31. Kalousis A, Hilario M (2000) Model selection via meta-learning: a comparative study. In: Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence. IEEE, pp 406–413

  32. Kalousis A, Theoharis T (1999) NOEMON: Design, implementation and performance results of an intelligent assistant for classifier selection. Intell Data Anal 3(5):319–337

    Article  MATH  Google Scholar 

  33. King RD, Feng C, Sutherland A (1995) Statlog: comparison of classification algorithms on large real-world problems. Appl Artif Intell Int J 9(3):289–333

    Article  Google Scholar 

  34. Lindner G, Studer R (1999) AST: Support for algorithm selection with a CBR approach. Principles of Data Mining and Knowledge Discovery pp. 418–423

  35. Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification. Citeseer

  36. Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning. neural and statistical classification

  37. Peng Y, Flach P, Soares C, Brazdil P (2002) Improved dataset characterisation for meta-learning. In: Discovery Science. Springer, pp 193–208

  38. Pfahringer B, Bensusan H, Giraud-Carrier C (2000) Meta-learning by landmarking various learning algorithms. Morgan Kaufmann, pp 743–750

  39. Pizarro J, Guerrero E, Galindo PL (2002) Multiple comparison procedures applied to model selection. Neurocomputing 48: 155–173

    Article  MATH  Google Scholar 

  40. Platt J (1998) Machines using sequential minimal optimization

  41. Quinlan JR (1994) Comparing connectionist and symbolic learning methods. In: Computational Learning Theory and Natural Learning Systems: Constraints and Prospects. Citeseer

  42. Smith KA, Woo F, Ciesielski V, Ibrahim R (2001) Modelling the relationship between problem characteristics and data mining algorithm performance using neural networks. Smart Engineering System Design: Neural Networks, Fuzzy Logic, Evolutionary Programming. Data Mining, and Complex Systems pp. 357–362

  43. Smith KA, Woo F, Ciesielski V, Ibrahim R (2002) Matching data mining algorithm suitability to data characteristics using a self-organising map. Hybrid Information Systems pp. 169–180

  44. Smith-Miles KA (2008) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput Surv 41(1):1–25

    Article  Google Scholar 

  45. Sohn SY (1999) Meta analysis of classification algorithms for pattern recognition. IEEE Trans Pattern Anal Mach Intell 21(11):1137–1144

    Article  Google Scholar 

  46. Song Q, Wang G, Wang C (2012) Automatic recommendation of classification algorithms based on data set characteristics. Pattern Recognition

  47. Tatti N (2007) Distances between data sets based on summary statistics. J Mach Learn Res 8:131–154

    MATH  MathSciNet  Google Scholar 

  48. Webb GI (2000) Multiboosting: A technique for combining boosting and wagging. Mach Learn 40(2):159–196

    Article  Google Scholar 

  49. Wolpert DH (2001) The supervised learning no-free-lunch theorems. In: Proceedings of 6th Online World Conference on Soft Computing in Industrial Applications. Citeseer, pp 25–42

Download references

Acknowledgments

This work is supported by China Postdoctoral Science Foundation under grant NO. 2014M562417 and the National Natural Science Foundation of China under grant 61402355.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guangtao Wang.

Appendix A: The five different kinds of meta-features

Appendix A: The five different kinds of meta-features

  1. 1.

    Statistical and Information-Theory Based Measures (See Table 5)

  2. 2.

    Model Structure Based Measures (See Table 6)

  3. 3.

    Land Marking Based Measures Following the suggestions in [7, 38], the following six classifiers are selected as the landmark learners: i) Naive Bayes, ii) 1-NN (Nearest Neighbor), iii) Elite 1-NN, iv) a decision node learner, v) a random chosen node learner and vi) the worst node learner. Where the last three learners can be achieved based on the well-known learning algorithm C4.5.

  4. 4.

    Problem Complexity Based Measures (See Table 7)

  5. 5.

    Structural Information Based Measures First, the two feature vectors, one-item feature vector and two-item feature vector, are extracted from the given problem. These two vectors consists of the frequencies of one-item sets and two-item sets, respectively. Afterward, the minimum, 1/8 quantile, 2/8 quantile, 3/8 quantile, 4/8 quantile, 5/8 quantile, 6/8 quantile, 7/8 quantile and maximum are computed for these two vectors and form the final set of data set characteristics.

Table 5 Statistical and Information-Theory Based Measures
Table 6 Model Structure Based Measures
Table 7 Problem Complexity Based Measures

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, G., Song, Q. & Zhu, X. An improved data characterization method and its application in classification algorithm recommendation. Appl Intell 43, 892–912 (2015). https://doi.org/10.1007/s10489-015-0689-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-015-0689-3

Keywords

Navigation