Skip to main content

Unsupervised Learning

  • Chapter
  • First Online:
Introduction to Data Science

Part of the book series: Undergraduate Topics in Computer Science ((UTICS))

Abstract

In this chapter, we address the problem of analyzing a set of inputs/data without labels with the goal of finding “interesting patterns” or structures in the data. This type of problem is sometimes called a knowledge discovery problem. Compared to other machine learning problems such as supervised learning, this is a much more open problem, since in general there is no well-defined metric to use and neither there is any specific kind of patterns that we wish to look for. Within unsupervised machine learning, the most common type of problems is the clustering problem; though other problems such as novelty detection, dimensionality reduction and outlier detection are also part of this area. So here we will discuss different clustering methods, compare their advantages and disadvantages, and discuss measures for evaluating their quality. The chapter finishes with a case study using a real data set that analyzes the expenditure of different countries on education.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://en.wikipedia.org/wiki/Rand_index.

  2. 2.

    The intracluster distance of sample i is obtained by the distance of the sample to the nearest sample from the same class, and the nearest-cluster distance is given by the distance to the closest sample from the cluster nearest to the cluster of sample i.

  3. 3.

    http://ec.europa.eu/eurostat.

References

  1. Press, WH; Teukolsky, SA; Vetterling, W.T.; Flannery, B.P. (2007). “Section 16.1. Gaussian Mixture Models and k-Means Clustering”. Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. ISBN 978-0-521-88068-8.

    Google Scholar 

  2. Meilǎ, M.; Shi, J. (2001); “Learning Segmentation by Random Walks”, Neural Information Processing Systems 13 (NIPS 2000), 2001, pp. 873–879.

    Google Scholar 

  3. Székely, G.J.; Rizzo, M.L. (2005). “Hierarchical clustering via Joint Between-Within Distances: Extending Ward’s Minimum Variance Method”, Journal of Classification 22, 151–183.

    Google Scholar 

Download references

Acknowledgements

This chapter was co-written by Petia Radeva and Oriol Pujol.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laura Igual .

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Igual, L., Seguí, S. (2017). Unsupervised Learning. In: Introduction to Data Science. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-50017-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50017-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50016-4

  • Online ISBN: 978-3-319-50017-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics