Encyclopedia of Machine Learning and Data Mining

2017 Edition
| Editors: Claude Sammut, Geoffrey I. Webb


Reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7687-1_957
The training data for a learning algorithm is said to be noisy if the data contain errors. Errors can be of two types:
  • A measurement error occurs when some attribute values are incorrect or inaccurate. Note that measurement of physical properties by continuous values is always subject to some error.

  • In supervised learning, classification error means that a training example has an incorrect class label.

In addition to errors, training examples may have  missing attribute values. That is, the values of some attribute values are not recorded.

Noisy data can cause learning algorithms to fail to converge to a concept description or to build a concept description that has poor classification accuracy on unseen examples. This is often due to  over fitting.

For methods to minimize the effects of noise, see  Overfitting.

Copyright information

© Springer Science+Business Media New York 2017