Introduction to Machine Learning

  • Yalin Baştanlar
  • Mustafa Özuysal
Part of the Methods in Molecular Biology book series (MIMB, volume 1107)


The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning approaches for this application area. In this chapter, we first review the fundamental concepts of machine learning such as feature assessment, unsupervised versus supervised learning and types of classification. Then, we point out the main issues of designing machine learning experiments and their performance evaluation. Finally, we introduce some supervised learning methods.


Machine learning Supervised learning Unsupervised learning Clustering Classification Regression Model complexity Model evaluation Performance metrics Dimensionality reduction 


  1. 1.
    RapidMiner -- Data mining, ETL, OLAP, BI,
  2. 2.
    scikit-learn: machine learning in Python,
  3. 3.
    The SHOGUN machine learning toolbox,
  4. 4.
    Weka 3 - Data mining with open source machine learning software in Java,
  5. 5.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182Google Scholar
  6. 6.
    Yousef M, Nebozhyn M, Shatkay H et al (2006) Combining multi-species genomic data for microRNA identification using a Naïve Bayes classifier. Bioinformatics 22:1325–1334PubMedCrossRefGoogle Scholar
  7. 7.
    MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. University of California Press, Los Angeles, CA, pp 281–297Google Scholar
  8. 8.
    Hastie T, Tibshirani R, Friedman JH (2003) The elements of statistical learning. Springer, New York, NYGoogle Scholar
  9. 9.
    Ng AY, Jordan MI, Weiss Y et al (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inform Process Syst 2:849–856Google Scholar
  10. 10.
    Chapelle O, Schölkopf B, Zien A (eds) (2010) Semi-supervised learning. The MIT Press, Cambridge, MAGoogle Scholar
  11. 11.
    Alpaydın E (2010) Introduction to machine learning. The MIT Press, Cambridge, MAGoogle Scholar
  12. 12.
    Bishop C (2006) Pattern recognition and machine learning. Springer, New York, NYGoogle Scholar
  13. 13.
    Bellman RE (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton, NJGoogle Scholar
  14. 14.
    Liu H, Sun J, Liu L et al (2009) Feature selection with dynamic mutual information. Pattern Recogn 42:1330–1339CrossRefGoogle Scholar
  15. 15.
    Chen Y-T, Chen MC (2011) Using chi-square statistics to measure similarities for text categorization. Expert Syst Appl 38:3085–3090CrossRefGoogle Scholar
  16. 16.
    Lee C, Lee GG (2006) Information gain and divergence-based feature selection for machine learning-based text categorization. Inform Process Manag 42:155–165CrossRefGoogle Scholar
  17. 17.
    Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324CrossRefGoogle Scholar
  18. 18.
    Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39:1–38Google Scholar
  19. 19.
    Schlkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. The MIT Press, Cambridge, MAGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Yalin Baştanlar
    • 1
  • Mustafa Özuysal
    • 1
  1. 1.Department of Computer EngineeringIzmir Institute of TechnologyIzmirTurkey

Personalised recommendations