Agreement Analysis of Quality Measures for Dimensionality Reduction
High-dimensional data sets commonly occur in various application domains. They are often analysed using dimensionality reduction methods, such as principal component analysis or multidimensional scaling. To determine the reliability of a particular embedding of a data set, users need to analyse its quality. For this purpose, the literature knows numerous quality measures. Most of these measures concentrate on a single aspect, such as the preservation of relative distances, while others aim to balance multiple aspects, such as intrusions and extrusions in k-neighbourhoods. Faced with multiple quality measures with different ranges and different value distributions, it is challenging to decide which aspects of a data set are preserved best by an embedding. We propose an algorithm based on persistent homology that permits the comparative analysis of different quality measures on a given embedding, regardless of their ranges. Our method ranks quality measures and provides local feedback about which aspects of a data set are preserved by an embedding in certain areas. We demonstrate the use of our technique by analysing quality measures on different embeddings of synthetic and real-world data sets.
- 3.Chazal, F., Guibas, L.J., Oudot, S.Y., Skraba, P.: Persistence-based clustering in Riemannian manifolds. J. ACM 60(6), 41:1–41:38 (2013)Google Scholar
- 21.van der Maaten, L.J.P., Postma, E.O., van den Herik, H.J.: Dimensionality reduction: a comparative review. Technical Report 005, Tilburg University (2009)Google Scholar