Abstract
This chapter proposes a method for visualizing differences among labeled multidimensional data. The proposed method arranges multiple given multidimensional datasets on the same screen space applying the same dimensionality reduction scheme, and then displays a group of samples semi-transparently representing the labels with their particular colors. This representation makes it easy to observe which labels have the most in common or differences among the multidimensional data, and which labels tend to cause outliers. This chapter presents an application example using MNIST, USPS, and CIFAR10, which are representative sample datasets for machine learning, and discusses the effectiveness and issues of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bachthaler S, Weiskopf D (2008) Continuous scatterplots. IEEE Trans Visual Comput Graph 14(6):1428–1435
Bernard J, Hutter M, Zeppelzauer M, Fellner D, Sedlmair M (2018) Comparing visual-interactive labeling with active learning: an experimental study. IEEE Trans Visual Comput Graph 24(1):298–308
Harrison L, Yang F, Franconeri S, Chang R (2014) Ranking visualizations of correlation using weber’s law. IEEE Trans Visual Comput Graph 20(12):1943–1952
Iijima A, Itoh T (2021) Visualization for image annotations based on semantic differential. IEEE VIS Posters
Itoh T, Kumar A, Klein K, Kim J (2017) High dimensional data visualization by interactive construction of low dimensional parallel coordinate plots. J Vis Lang Comput 43:1–13
Kobak D, Linderman GC (2019) UMAP does not preserve global structure any better than t-SNE when using the same initialization. https://doi.org/10.1101/2019.12.19.877522
Kosaka K, Itoh T (2021) A visualization method for training data comparison. In: 25th International conference on information visualisation (IV2021), pp 205–211
Ma Y, Fan A, He J, Nelakurthi AR, Maciejewski R (2021) A visual analytics framework for explaining and diagnosing transfer learning processes. IEEE Trans Visual Comput Graph 27:1385–1395
Moehrmann J, Bernstein S, Schlegel T, Werner G, Heidemann G (2011) Improving the usability of hierarchical representations for interactively labeling large image data sets. In: 14th International conference on human computer interaction: design and development approaches volume part I (HCII’11), pp 618–627
Nakabayashi A, Itoh T (2019) A technique for selection and drawing of scatterplots for multi-dimensional data visualization. In: Proceedings of 23rd international conference on information visualisation (IV2019), pp 62–67
Rousseeuw PJ, Ruts I, Tukey JW (1999) The bagplot: a bivariate boxplot. Am Stat 53(4):382–387
Schreck T, Panse C (2007) A new metaphor for projection-based visual analysis and data exploration. In: IS &T/SPIE conference on visualization and data analysis
Schreck T, Schüßler M, Worm K, Zeilfelder F (2008) Butterfly plots for visual analysis of large point cloud data. In: International conference in central europe on computer graphics, visualization and computer vision (WSCG’08), pp 33–40
Sedlmair M, Tatu A, Munzner T, Tory M (2012) A taxonomy of visual cluster separation factors. Comput Graph Forum 31(3):1335–1344
Shao L, Mahajan A, Schreck T, Lehmann DJ (2017) Interactive regression lens for exploring scatter plots. Comput Graph Forum 36(3):157–166
Sips M, Neubert B, Lewis JP, Hanrahan PM (2009) Selecting good views of high-dimensional data using class consistency. Comput Graph Forum 28(3):831–838
Swayamdipta S, Schwartz R, Lourie N, Wang Y, Hajishirzi H, Smith NA, Choi Y (2020) Dataset cartography: mapping and diagnosing datasets with training dynamics. In: Proceedings of conference on empirical methods in natural language processing
Smilkov D, Thorat N, Nicholson C, Reif E, Viégas FB, Wattenberg M (2016) Embedding projector: interactive visualization and interpretation of embeddings. In: NIPS 2016 workshop on interpretable machine learning in complex systems
Trautner T, Bolte F, Stoppel S, Bruckner S (2020) Sunspot plots: model-based structure enhancement for dense scatter plots. Comput Graph Forum 39(3):551–563
Wang Y, Wang Z, Liu T, Correll M, Cheng Z, Deussen O, Sedlmair M (2020) Improving the robustness of scagnostics. IEEE Trans Visual Comput Graph 26(1):759–769
Wilkinson L, Anand A, Grossman R (2005) Graph-theoretic scagnostics. In: IEEE symposium on information visualization, pp 157–164
Xiang R, Wang W, Yang L, Wang S, Chaohan X, Chen X (2021) A comparison for dimensionality reduction methods of single-cell RNA-seq data. Front Genet 12:646936
Xiang S, Ye X, Xia J, Wu J, Chen Y, Liu S (2019) Interactive correction of mislabeled training data. In: IEEE conference on visual analytics science and technology (VAST), pp 57–68
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kosaka, K., Itoh, T. (2024). A Method for Comparative Visualization of Labeled Multidimensional Data and Its Application to Machine Learning Data. In: Kovalerchuk, B., Nazemi, K., Andonie, R., Datia, N., Bannissi, E. (eds) Artificial Intelligence and Visualization: Advancing Visual Knowledge Discovery. Studies in Computational Intelligence, vol 1126. Springer, Cham. https://doi.org/10.1007/978-3-031-46549-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-46549-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46548-2
Online ISBN: 978-3-031-46549-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)