Introducing Machine Learning Concepts with WEKA

Smith, Tony C.; Frank, Eibe

doi:10.1007/978-1-4939-3578-9_17

Introducing Machine Learning Concepts with WEKA

Tony C. Smith⁴ &
Eibe Frank⁴

Protocol
First Online: 24 March 2016

9606 Accesses
84 Citations

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1418))

Abstract

This chapter presents an introduction to data mining with machine learning. It gives an overview of various types of machine learning, along with some examples. It explains how to download, install, and run the WEKA data mining toolkit on a simple data set, then proceeds to explain how one might approach a bioinformatics problem. Finally, it includes a brief summary of machine learning algorithms for other types of data mining problems, and provides suggestions about where to find additional information.

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Burlington, MA
Google Scholar
Ross Quinlan J (1993) C 4.5: programs for machine learning. Morgan Kaufmann, San Mateo, CA
Google Scholar
Blom N, Sicheritz-Pontén T, Gupta R, Gammeltoft S, Brunak S (2004) Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4:1633–1649
Article CAS PubMed Google Scholar
Ramana J, Gupta D (2010) Machine learning methods for prediction of CDK-inhibitors. PLoS One 5(10):e13357
Article PubMed PubMed Central Google Scholar
Buchwald F, Richter L, Kramer S (2011) Predicting a small molecule- kinase interaction map: a machine learning approach. J Cheminform 3:22
Article CAS PubMed PubMed Central Google Scholar
Fürnkranz J (1999) Separate-and-conquer rule learning. Artif Intell Rev 13(1):3–54
Article Google Scholar
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2-3):131–163
Article Google Scholar
Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29(2–3):103–130
Article Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(9):533–536
Article Google Scholar
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inform Theory 13(1):21–27
Article Google Scholar
Friedman JH, Bentley JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw 3(3):209–226
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Google Scholar
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. International conference on machine learning. Morgan Kaufmann, Bari, Italy
Google Scholar
Dietterich TG (2000) Ensemble methods in machine learning. Multiple classifier systems. Springer, Berlin, pp 1–15
Book Google Scholar
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Article Google Scholar
Ting KM (1998) Inducing cost-sensitive trees via instance weighting. Principles of data mining and knowledge discovery. Springer, Berlin, pp 139–147
Book Google Scholar
Duda RO, Hart PE (1973) Pattern classification and scene analysis, vol 3. Wiley, New York
Google Scholar
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324
Article Google Scholar
Hartigan JA (1975) Clustering algorithms. Wiley, New York
Google Scholar
Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254
Article CAS PubMed Google Scholar
McLachlan GJ, Basford KE (1987) Mixture models: inference and applications to clustering. CRC, New York
Google Scholar
Rakesh A, Srikant R (1994) Fast algorithms for mining association rules. International conference on very large databases. Morgan Kaufmann, Santiago de Chile, Chile
Google Scholar
Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5(3):299–314
Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Waikato, Hamilton, New Zealand
Tony C. Smith & Eibe Frank

Authors

Tony C. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Eibe Frank
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tony C. Smith .

Editor information

Editors and Affiliations

Ohio State University, Biomed Informatics, College of Medicine, Columbus, Ohio, USA
Ewy Mathé
National Cancer Institute, National Institutes of Health, Columbia, Maryland, USA
Sean Davis

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Smith, T.C., Frank, E. (2016). Introducing Machine Learning Concepts with WEKA. In: Mathé, E., Davis, S. (eds) Statistical Genomics. Methods in Molecular Biology, vol 1418. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3578-9_17

Download citation

DOI: https://doi.org/10.1007/978-1-4939-3578-9_17
Published: 24 March 2016
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3576-5
Online ISBN: 978-1-4939-3578-9
eBook Packages: Springer Protocols

Publish with us

Policies and ethics