Skip to main content

Introducing Machine Learning Concepts with WEKA

Part of the Methods in Molecular Biology book series (MIMB,volume 1418)

Abstract

This chapter presents an introduction to data mining with machine learning. It gives an overview of various types of machine learning, along with some examples. It explains how to download, install, and run the WEKA data mining toolkit on a simple data set, then proceeds to explain how one might approach a bioinformatics problem. Finally, it includes a brief summary of machine learning algorithms for other types of data mining problems, and provides suggestions about where to find additional information.

Key words

  • Machine learning
  • Data mining
  • WEKA
  • Bioinformatics
  • Tutorial

This is a preview of subscription content, access via your institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-1-4939-3578-9_17
  • Chapter length: 26 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-1-4939-3578-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   139.00
Price excludes VAT (USA)
Hardcover Book
USD   149.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Burlington, MA

    Google Scholar 

  2. Ross Quinlan J (1993) C 4.5: programs for machine learning. Morgan Kaufmann, San Mateo, CA

    Google Scholar 

  3. Blom N, Sicheritz-Pontén T, Gupta R, Gammeltoft S, Brunak S (2004) Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4:1633–1649

    CAS  CrossRef  PubMed  Google Scholar 

  4. Ramana J, Gupta D (2010) Machine learning methods for prediction of CDK-inhibitors. PLoS One 5(10):e13357

    CrossRef  PubMed  PubMed Central  Google Scholar 

  5. Buchwald F, Richter L, Kramer S (2011) Predicting a small molecule- kinase interaction map: a machine learning approach. J Cheminform 3:22

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  6. Fürnkranz J (1999) Separate-and-conquer rule learning. Artif Intell Rev 13(1):3–54

    CrossRef  Google Scholar 

  7. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2-3):131–163

    CrossRef  Google Scholar 

  8. Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29(2–3):103–130

    CrossRef  Google Scholar 

  9. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    Google Scholar 

  10. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(9):533–536

    CrossRef  Google Scholar 

  11. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inform Theory 13(1):21–27

    CrossRef  Google Scholar 

  12. Friedman JH, Bentley JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw 3(3):209–226

    CrossRef  Google Scholar 

  13. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Google Scholar 

  14. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. International conference on machine learning. Morgan Kaufmann, Bari, Italy

    Google Scholar 

  15. Dietterich TG (2000) Ensemble methods in machine learning. Multiple classifier systems. Springer, Berlin, pp 1–15

    CrossRef  Google Scholar 

  16. Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259

    CrossRef  Google Scholar 

  17. Ting KM (1998) Inducing cost-sensitive trees via instance weighting. Principles of data mining and knowledge discovery. Springer, Berlin, pp 139–147

    CrossRef  Google Scholar 

  18. Duda RO, Hart PE (1973) Pattern classification and scene analysis, vol 3. Wiley, New York

    Google Scholar 

  19. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324

    CrossRef  Google Scholar 

  20. Hartigan JA (1975) Clustering algorithms. Wiley, New York

    Google Scholar 

  21. Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254

    CAS  CrossRef  PubMed  Google Scholar 

  22. McLachlan GJ, Basford KE (1987) Mixture models: inference and applications to clustering. CRC, New York

    Google Scholar 

  23. Rakesh A, Srikant R (1994) Fast algorithms for mining association rules. International conference on very large databases. Morgan Kaufmann, Santiago de Chile, Chile

    Google Scholar 

  24. Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5(3):299–314

    Google Scholar 

  25. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tony C. Smith .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Smith, T.C., Frank, E. (2016). Introducing Machine Learning Concepts with WEKA. In: Mathé, E., Davis, S. (eds) Statistical Genomics. Methods in Molecular Biology, vol 1418. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3578-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3578-9_17

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3576-5

  • Online ISBN: 978-1-4939-3578-9

  • eBook Packages: Springer Protocols