Abstract
Single Nucleotide Polymorphisms (SNPs) are considered nowadays one of the most important class of genetic markers with a wide range of applications with both scientific and economic interests. Although the advance of biotechnology has made feasible the production of genome wide SNP datasets, the cost of the production is still high. The transformation of the initial dataset into a smaller one with the same genetic information is a crucial task and it is performed through feature selection. Biologists evaluate features using methods originating from the field of population genetics. Although several studies have been performed in order to compare the existing biological methods, there is a lack of comparison between methods originating from the biology field with others originating from the machine learning. In this study we present some early results which support that biological methods perform slightly better than machine learning methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Wilkinson, S., Wiener, P., Archibald, A., et al.: Evaluation of approaches for identifying population informative markers from high density SNP chips. BMC Genet. 12, 45 (2011)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach Learn Res. 3, 1157–1182 (2003)
Nielsen, E., Cariani, A., Mac Aoidh, E., et al.: Gene-associated markers provide tools for tackling illegal fishing and false eco-certification. Nat. Com. 3, 851 (2012), doi:10.1038/ncomms1845
Wilkinson, S., Archibald, A., Haley, C., et al.: Development of a genetic tool for product regulation in the diverse British pig breed market. BMC Gen. 13, 580 (2012)
Piry, S., Alapetite, A., Cornuet, J.M., Petkau, D., Baudouin, L., Estoup, A.: GENECLASS2: A software for genetic assignment and first generation migrant detection. J. Hered. 95, 536–539 (2004)
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)
Shriver, M.D., Smith, M.W., Jin, L., et al.: Ethnic affiliation estimation by use of population-specific DNA markers. Am. J Hum. Genet. 60, 957–964 (1997)
Wright, S.: The genetical structure of populations. Ann Eugenic 15, 323 (1951)
Beebee, T., Rowe, G.: An Introduction to Molecular Ecology. Oxford University Press, Oxford (2004)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11, 10–18 (2009)
Wang, Y., et al.: Gene selection from microarray data for cancer classification–a machine learning approach. Comput. Biol. Chem. 29, 37–46 (2005)
Robnik-Sikonja, M., Kononenko, I.: Theoretical and empirical analysis of relief and relieff. Mach. Lean. 53, 23–69 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kavakiotis, I., Triantafyllidis, A., Tsoumakas, G., Vlahavas, I. (2014). Feature Evaluation Metrics for Population Genomic Data. In: Likas, A., Blekas, K., Kalles, D. (eds) Artificial Intelligence: Methods and Applications. SETN 2014. Lecture Notes in Computer Science(), vol 8445. Springer, Cham. https://doi.org/10.1007/978-3-319-07064-3_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-07064-3_36
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07063-6
Online ISBN: 978-3-319-07064-3
eBook Packages: Computer ScienceComputer Science (R0)