Skip to main content

Machine Learning and Data Mining

  • Chapter
  • First Online:
Introduction to Artificial Intelligence

Part of the book series: Undergraduate Topics in Computer Science ((UTICS))


One of the major AI applications is the development of intelligent autonomous robots. Since flexibility and adaptivity are important features of really intelligent agents, research into learning mechanisms and the development of machine learning algorithms is one of the most important branches of AI. After motivating and introducing basic concepts of machine learning like classification and approximation, this chapter presents basic supervised learning algorithms such as the perceptron, nearest neighbor methods and decision tree induction. Unsupervised clustering methods and data mining software tools complete the picture of this fascinating field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions


  1. 1.

    Python is a modern scripting language with very readable syntax, powerful data types, and extensive standard libraries, which can be used to this end.

  2. 2.

    Caution! This is not a proof of convergence for the perceptron learning rule. It only shows that the perceptron converges when the training dataset consists of a single example.

  3. 3.

    The functionals argmin  and argmax  determine, similarly to min and max, the minimum or maximum of a set or function. However, rather than returning the value of the maximum or minimum, they give the position, that is, the argument in which the extremum appears.

  4. 4.

    The Hamming distance between two bit vectors is the number of different bits of the two vectors.

  5. 5.

    To keep the example simple and readable, the feature vector x was deliberately kept one-dimensional.

  6. 6.

    The three day total of snowfall is in fact an important feature for determining the hazard level. In practice, however, additional attributes are used [Bra01]. The example used here is simplified.

  7. 7.

    In (7.9) on page 138 the natural logarithm rather than log 2 is used in the definition of entropy. Because here, and also in the case of the MaxEnt method, entropies are only being compared, this difference does not play a role. (see Exercise 8.12 on page 240).

  8. 8.

    It would be better to use the error on the test data directly. At least when the amount of training data is sufficient to justify a separate testing set.

  9. 9.

    Feature scaling is necessary or advantageous for many machine learning algorithms.

  10. 10.

    The nearest neighbor algorithm is not to be confused with the nearest neighbor method for classification from Sect. 8.3.

  11. 11.

    A minimum spanning tree is an acyclic, undirected graph with the minimum sum of edge lengths.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Wolfgang Ertel .

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ertel, W. (2017). Machine Learning and Data Mining. In: Introduction to Artificial Intelligence. Undergraduate Topics in Computer Science. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58486-7

  • Online ISBN: 978-3-319-58487-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics