Abstract
Feature selection methods are used to tackle the problem of the curse of the dimensionality of data to be mined. This applies also to the area of animal breeding, in which datasets collect remarkably a large number of animal features. In this paper, we have conducted a comprehensive study of both 12 classification methods as well as 12 GA-based feature selection methods for classification of the Silesian horse data. To assess the performance of the wrappers and the classification methods over the animal dataset we used two metrics: a probability metric Area under the ROC curve (AUC), and a rank metric Root Mean Square Error (RMSE). All of the classifiers and the wrappers were taken from the Weka machine learning software. We find that most of the GA-based wrappers achieved results no worse than high-dimensional dataset. The statistical results obtained make the three classifiers: a decision tree ADT, a logistic regression Log and a bagging method Bag competitive method to be considered in the field of animal breeding data mining.
This work was conducted as part of research project no. N516 415138 financed by the Ministry of Science and Higher Education.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)
Burócziová, M., Řiha, J.: Horse breed discrimination using machine learning methods. J. Appl. Genet. 50(4), 375–377 (2009)
Caruana, R., Niculescu-Mizil, A.: Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: Proceedings of the 10th Int. Conf. Knowl. Disc. Data Mining, pp. 69–78 (2004)
Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27, 861–874 (2006)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association 32, 674–701 (1937)
Garner, S.R., Holmes, G., McQueen, R.J., Witten, I.H.: Machine learning from agricultural databases: practice and experience. J. Computing 6(1a), 69–73 (1997)
Goldberg, D.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison Wesley (1989)
Jensen, R., Shen, Q.: Computational Intelligence and Feature Selection: Rough and Fuzzy Approaches. IEEE Press, Wiley and Sons (2008)
Ling, C.X., Huang, J., Zhang, H.: AUC: A Better Measure than Accuracy in Comparing Learning Algorithms. In: Xiang, Y., Chaib-draa, B. (eds.) Canadian AI 2003. LNCS (LNAI), vol. 2671, pp. 329–341. Springer, Heidelberg (2003)
Walkowicz, E., Unold, O., Maciejewski, H., Skrobanek, P.: Zoometric indices in Silesian horses in the years 1945-2005. Ann. Anim. Sci. 11(4), 555–565 (2011)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2005)
Zhiwei, X., Xinghua, W.: Research for Information Extraction Based on Wrapper Model Algorithm. In: 2010 Second International Conference on Computer Research and Development, Kuala Lumpur, Malaysia, pp. 652–655 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Unold, O., Dobrowolski, M., Maciejewski, H., Skrobanek, P., Walkowicz, E. (2012). A GA-Based Wrapper Feature Selection for Animal Breeding Data Mining. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, SB. (eds) Hybrid Artificial Intelligent Systems. HAIS 2012. Lecture Notes in Computer Science(), vol 7209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28931-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-28931-6_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28930-9
Online ISBN: 978-3-642-28931-6
eBook Packages: Computer ScienceComputer Science (R0)