Unsupervised Learning

Igual, Laura; Seguí, Santi

doi:10.1007/978-3-319-50017-1_7

Laura Igual¹² &
Santi Seguí¹²

Part of the book series: Undergraduate Topics in Computer Science ((UTICS))

374k Accesses
1 Citations

Abstract

In this chapter, we address the problem of analyzing a set of inputs/data without labels with the goal of finding “interesting patterns” or structures in the data. This type of problem is sometimes called a knowledge discovery problem. Compared to other machine learning problems such as supervised learning, this is a much more open problem, since in general there is no well-defined metric to use and neither there is any specific kind of patterns that we wish to look for. Within unsupervised machine learning, the most common type of problems is the clustering problem; though other problems such as novelty detection, dimensionality reduction and outlier detection are also part of this area. So here we will discuss different clustering methods, compare their advantages and disadvantages, and discuss measures for evaluating their quality. The chapter finishes with a case study using a real data set that analyzes the expenditure of different countries on education.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://en.wikipedia.org/wiki/Rand_index.
2.
The intracluster distance of sample i is obtained by the distance of the sample to the nearest sample from the same class, and the nearest-cluster distance is given by the distance to the closest sample from the cluster nearest to the cluster of sample i.
3.
http://ec.europa.eu/eurostat.

References

Press, WH; Teukolsky, SA; Vetterling, W.T.; Flannery, B.P. (2007). “Section 16.1. Gaussian Mixture Models and k-Means Clustering”. Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. ISBN 978-0-521-88068-8.
Google Scholar
Meilǎ, M.; Shi, J. (2001); “Learning Segmentation by Random Walks”, Neural Information Processing Systems 13 (NIPS 2000), 2001, pp. 873–879.
Google Scholar
Székely, G.J.; Rizzo, M.L. (2005). “Hierarchical clustering via Joint Between-Within Distances: Extending Ward’s Minimum Variance Method”, Journal of Classification 22, 151–183.
Google Scholar

Download references

Acknowledgements

This chapter was co-written by Petia Radeva and Oriol Pujol.

Author information

Authors and Affiliations

Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Barcelona, Spain
Laura Igual & Santi Seguí

Authors

Laura Igual
View author publications
You can also search for this author in PubMed Google Scholar
Santi Seguí
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laura Igual .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Igual, L., Seguí, S. (2017). Unsupervised Learning. In: Introduction to Data Science. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-50017-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-50017-1_7
Published: 23 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50016-4
Online ISBN: 978-3-319-50017-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics