Abstract
High-dimensional data sets commonly occur in various application domains. They are often analysed using dimensionality reduction methods, such as principal component analysis or multidimensional scaling. To determine the reliability of a particular embedding of a data set, users need to analyse its quality. For this purpose, the literature knows numerous quality measures. Most of these measures concentrate on a single aspect, such as the preservation of relative distances, while others aim to balance multiple aspects, such as intrusions and extrusions in k-neighbourhoods. Faced with multiple quality measures with different ranges and different value distributions, it is challenging to decide which aspects of a data set are preserved best by an embedding. We propose an algorithm based on persistent homology that permits the comparative analysis of different quality measures on a given embedding, regardless of their ranges. Our method ranks quality measures and provides local feedback about which aspects of a data set are preserved by an embedding in certain areas. We demonstrate the use of our technique by analysing quality measures on different embeddings of synthetic and real-world data sets.
Keywords
- Scalar Field
- Quality Measure
- Jaccard Index
- Dimensionality Reduction Method
- Handwritten Digit
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bertini, E., Tatu, A., Keim, D.: Quality metrics in high-dimensional data visualization: an overview and systematization. IEEE Trans. Vis. Comput. Graph. 17(12), 2203–2212 (2011)
Carr, H., Snoeyink, J., Axen, U.: Computing contour trees in all dimensions. Comput. Geom. 24(2), 75–94 (2003)
Chazal, F., Guibas, L.J., Oudot, S.Y., Skraba, P.: Persistence-based clustering in Riemannian manifolds. J. ACM 60(6), 41:1–41:38 (2013)
Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17(8), 790–799 (1995)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge (2009)
Correa, C., Lindstrom, P.: Towards robust topology of sparsely sampled data. IEEE Trans. Vis. Comput. Graph. 17(12), 1852–1861 (2011)
Correa, C., Lindstrom, P., Bremer, P.T.: Topological spines: a structure-preserving visual representation of scalar fields. IEEE Trans. Vis. Comput. Graph. 17(12), 1842–1851 (2011)
Doraiswamy, H., Shivashankar, N., Natarajan, V., Wang, Y.: Topological saliency. Comput. Graph. 37(7), 787–799 (2013)
Edelsbrunner, H., Harer, J.: Computational Topology: An Introduction. American Mathematical Society, Providence, RI (2010)
Gerber, S., Bremer, P.T., Pascucci, V., Whitaker, R.: Visual exploration of high dimensional scalar functions. IEEE Trans. Vis. Comput. Graph. 16(6), 1271–1280 (2010)
Lee, J.A., Verleysen, M.: Quality assessment of dimensionality reduction: rank-based criteria. Neurocomputing 72(7–9), 1431–1443 (2009)
Lee, J.H., McDonnell, K.T., Zelenyuk, A., Imre, D., Mueller, K.: A structure-based distance metric for high-dimensional space exploration with multidimensional scaling. IEEE Trans. Vis. Comput. Graph. 20(3), 351–364 (2014)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Oesterling, P., Heine, C., Jänicke, H., Scheuermann, G., Heyer, G.: Visualization of high-dimensional point clouds using their density distribution’s topology. IEEE Trans. Vis. Comput. Graph. 17(11), 1547–1559 (2011)
Oesterling, P., Heine, C., Weber, G.H., Scheuermann, G.: Visualizing nD point clouds as topological landscape profiles to guide local data analysis. IEEE Trans. Vis. Comput. Graph. 19(3), 514–526 (2013)
Rieck, B., Mara, H., Leitte, H.: Multivariate data analysis using persistence-based filtering and topological signatures. IEEE Trans. Vis. Comput. Graph. 18(12), 2382–2391 (2012)
Sauber, N., Theisel, H., Seidel, H.P.: Multifield-graphs: an approach to visualizing correlations in multifield scalar data. IEEE Trans. Vis. Comput. Graph. 12(5), 917–924 (2006)
Schneider, D., Wiebel, A., Carr, H., Hlawitschka, M., Scheuermann, G.: Interactive comparison of scalar fields based on largest contours with applications to flow visualization. IEEE Trans. Vis. Comput. Graph. 14(6), 1475–1482 (2008)
Schneider, D., Heine, C., Carr, H., Scheuermann, G.: Interactive comparison of multifield scalar data based on largest contours. Comput. Aided Geom. Des. 30(6), 521–528 (2013)
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
van der Maaten, L.J.P., Postma, E.O., van den Herik, H.J.: Dimensionality reduction: a comparative review. Technical Report 005, Tilburg University (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Rieck, B., Leitte, H. (2017). Agreement Analysis of Quality Measures for Dimensionality Reduction. In: Carr, H., Garth, C., Weinkauf, T. (eds) Topological Methods in Data Analysis and Visualization IV. TopoInVis 2015. Mathematics and Visualization. Springer, Cham. https://doi.org/10.1007/978-3-319-44684-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-44684-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44682-0
Online ISBN: 978-3-319-44684-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)