Thus far, we have considered supervised learning from N observation data (x
1, y
1), …, (x
N, y
N), where y
1, …, y
N take either real values (regression) or a finite number of values (classification). In this chapter, we consider unsupervised learning, in which such a teacher does not exist, and the relations between the N samples and between the p variables are learned only from covariates x
1, …, x
N. There are various types of unsupervised learning; in this chapter, we focus on clustering and principal component analysis. Clustering means dividing the samples x
1, …, x
N into several groups (clusters). We consider K-means clustering, which requires us to give the number of clusters K in advance, and hierarchical clustering, which does not need such information. We also consider the principal component analysis (PCA), a data analysis method that is often used for machine learning and multivariate analysis. For PCA, we consider another equivalent definition along with its mathematical meaning.